Supporting Data for "<b>Development and Validation of Clinically Implementable immunoassays for Quantification of Plasma Biomarkers for Alzheimer’s Disease</b>"
<p dir="ltr">The supporting dataset accompanying the thesis <i>“Development and Validation of Clinically Implementable Immunoassays for Quantification of Plasma Biomarkers for Alzheimer’s Disease”</i> contains de-identified individual-level, biospecimen-level, and experiment-level data generated to develop, analytically validate, and clinically evaluate blood-based biomarkers for Alzheimer’s disease (AD). The project focuses on phosphorylated tau species pTau181 and pTau217, alongside other plasma biomarkers (NfL, GFAP, Aβ42, Aβ40), and combines multi-cohort clinical data with measurements from research-grade platforms (SIMOA, MSD) and newly developed chemiluminescent immunoassays (CLIA). All data are stored in tabular formats and organised into several folders that mirror the structure of the thesis and the full workflow from biospecimen collection to analytical validation and clinical analysis.</p><p><br></p><p dir="ltr">The repository is structured into four main scientific components plus documentation:</p><p><br></p><ol><li><b>Biospecimen and sample metadata (</b><b>01_Biospecimens_and_SampleMetadata/</b><b>)</b>This folder documents the biospecimens from which all measurements are derived. Each row typically represents a unique sample (e.g. EDTA plasma tube, serum tube, or CSF aliquot), linked to a de-identified Participant ID. Variables include biospecimen type (plasma, serum, CSF), collection date (in relative or de-identified form), collection tube type and anticoagulant, processing time intervals (e.g. time from venepuncture to centrifugation and freezing, where available), storage conditions (freezer temperature, storage location, aliquot ID), number of freeze–thaw cycles, and sample quality indicators (e.g. haemolysis, clotting, or insufficient volume flags). This biospecimen-level metadata provides the pre-analytical context needed to interpret biomarker values, supports sensitivity or subgroup analyses based on sample handling, and links the clinical cohort tables to the assay readouts through stable sample IDs.</li><li><b>Clinical cohort datasets (</b><b>02_ClinicalCohorts/</b><b>, corresponding mainly to Chapter 3)</b>These files provide participant-level demographic and diagnostic information for all individuals included in the thesis. Each row represents a unique participant and includes de-identified Participant ID, recruitment cohort, age, sex, education (where available), clinical diagnosis (e.g. cognitively unimpaired [CU], mild cognitive impairment [MCI], Alzheimer’s disease dementia [AD], and other relevant categories where applicable) according to contemporary NIA–AA–style criteria, cognitive test scores (e.g. MMSE or equivalent), APOE ε4 carrier status (if available), and other key clinical covariates used in the analyses. Participant IDs correspond to those used in the biospecimen folder, enabling users to trace from an individual’s clinical profile to their biospecimen records and biomarker measurements without revealing any direct personal identifiers.</li><li><b>Biomarker concentration datasets (</b><b>03_Biomarkers_PlasmaCSF/</b><b>, primarily Chapters 4–5)</b>This folder contains quantitative measurements of plasma and, where available, CSF biomarkers obtained using multiple platforms. Tables are typically organised at the level of Participant ID × Sample ID × Platform. Key variables include the de-identified Participant ID, Sample ID (matching the biospecimen metadata), cohort, biological matrix (plasma or CSF), sample type and time point (e.g. baseline, follow-up where applicable), assay platform (SIMOA, MSD, CLIA, and selected ELISA development runs), analyte name, and concentration expressed in pg/mL (or stated units). The core analytes are pTau181 and pTau217, but the datasets also include NfL, GFAP, Aβ42, Aβ40 and other exploratory markers where measured. Additional columns provide quality control (QC) information, such as flags for values below the limit of detection or quantification (LoD/LoQ), plate-level or sample-level exclusions, and indicators of out-of-range or technically unreliable results. These biomarker tables form the basis for all diagnostic performance analyses (ROC curves, AUCs, sensitivity/specificity, cut-off derivation), cross-platform comparisons (e.g. MSD vs SIMOA vs CLIA), correlation analyses between biomarkers and clinical measures, and multi-biomarker profiling across cohorts and disease stages.</li><li><b>Analytical validation and assay development datasets (</b><b>04_CLIA_ELISA_Validation/</b><b>, covering Chapters 1–3 and parts of Chapter 4)</b>This folder contains experiment-level data documenting the development and analytical validation of the in-house CLIA and related ELISA assays for pTau181 and pTau217. Each table is organised at the level of plates, runs, concentration levels, or dilution conditions. Variables include calibration and standard curve data (signal vs concentration with curve-fitting parameters), sensitivity metrics (limit of blank [LoB], limit of detection [LoD], limit of quantification [LoQ]), within-run and between-run precision (coefficients of variation at low, medium, and high concentration levels), linearity and dilution recovery in plasma and CSF matrices (including calculated recoveries and R² values), and spike-and-recovery experiments where recombinant pTau species are added to pooled matrices. Where available, stability assessments (e.g. freeze–thaw tolerance, short-term room-temperature stability, and long-term storage stability) are also included. These validation tables allow independent reproduction and critical appraisal of the analytical performance metrics reported in the thesis and provide a detailed technical foundation for assessing clinical implementability.</li><li><b>Documentation and mapping (</b><b>05_Documentation_and_Scripts/</b><b>)</b>This folder contains the main README file, a detailed data dictionary/codebook describing all variables, units, coding schemes and QC flags, and, where appropriate, example analysis scripts or figure-generation templates. A separate mapping file links thesis figures and tables to the exact datasets and variables used to generate them. Together, these materials explain how the biospecimen, clinical, biomarker, and validation folders relate to one another and provide sufficient context for other researchers to understand, reproduce, and appropriately reuse the data.</li></ol><p><br></p><p><br></p><p dir="ltr">Overall, the dataset is fully de-identified and does not include any direct personal identifiers. By explicitly linking clinical information to biospecimen metadata and biomarker measurements, and by providing granular analytical validation data, the resource supports transparency and reproducibility of the thesis results and enables secondary methodological, biomarker, and assay-comparison studies in the field of blood-based biomarkers for Alzheimer’s disease.</p>