Biomarker Analysis and What It Means for Diagnosis

July 22, 2024

What Is a Biomarker, Precisely?

The term "biomarker" is used broadly enough in oncology that it has nearly lost its precision. A biomarker is technically any measurable biological characteristic that serves as an indicator of a normal biological process, a pathological process, or a response to a therapeutic intervention. In cancer diagnostics, the relevant biomarkers fall into three main categories: protein biomarkers measurable in blood or tissue, imaging biomarkers derived computationally from radiological and pathological images, and genomic biomarkers identified through molecular profiling of tumor tissue or circulating DNA.

Each category has distinct clinical utilities, distinct limitations, and distinct integration requirements. The case for multi-modal diagnostic platforms is essentially the case that no single biomarker category is sufficient for reliable early cancer detection, and that their combination produces a qualitatively different - not merely incrementally better - diagnostic capability.

Protein Biomarkers: The Oldest Tools and Their Real Limitations

Serum protein biomarkers - CA-125 for ovarian cancer, CEA for colorectal cancer, PSA for prostate cancer, AFP for hepatocellular carcinoma, CA 19-9 for pancreatic cancer - are the most widely used and most widely misunderstood class of cancer biomarkers. They are easy to measure, inexpensive, and available in any clinical laboratory. They are also significantly less informative than their widespread use implies.

PSA is the canonical example of a widely used biomarker with significant clinical limitations. Prostate-specific antigen is not prostate-cancer-specific: it rises with benign prostatic hyperplasia, prostatitis, recent ejaculation, and post-prostate biopsy. The positive predictive value of a PSA above 4.0 ng/mL for detecting prostate cancer is approximately 25-30% - meaning that three out of four men with a PSA above that threshold who undergo biopsy will not have cancer. The U.S. Preventive Services Task Force has gone through multiple cycles of revising its PSA screening recommendations precisely because of this specificity problem, most recently recommending shared decision-making for men aged 55-69 rather than routine screening.

CA-125 for ovarian cancer has an even more challenging specificity profile when used in the general population. It is elevated in endometriosis, uterine fibroids, peritonitis, and other benign conditions. The PLCO Cancer Screening Trial found that CA-125 combined with transvaginal ultrasound did not reduce ovarian cancer mortality in a general-population screening setting and was associated with a significant increase in surgical complications from false-positive workups. In high-risk populations with BRCA1/2 mutations, the clinical calculus is different - but the lesson is that protein biomarkers require careful context to be useful rather than harmful.

Pegasi incorporates serum protein biomarker trends - specifically, serial measurements over time rather than single values - as one input to its multi-modal diagnostic model. A CA 19-9 that doubles over three months carries different clinical significance than a stable elevated value. The trend is often more informative than the absolute level, and that temporal dimension is something a multi-modal model can capture in a way that a single threshold-based alert cannot.

Imaging Biomarkers: From Qualitative to Quantitative

Radiology has historically been a qualitative discipline. Radiologists describe findings using language like "irregular margin," "heterogeneous enhancement," "groundglass opacity" - terms that encode clinical judgment but resist quantification and aggregation. This is changing, driven by the field of radiomics: the computational extraction of quantitative features from medical images that can serve as measurable, reproducible biomarkers.

Radiomic features extracted from CT images of pulmonary nodules include sphericity, compactness, surface area to volume ratio, and texture features like energy, correlation, and homogeneity - all calculated from the pixel-level data in the DICOM file. A 2022 meta-analysis of radiomic studies in lung nodule characterization found that radiomic models outperformed conventional radiologist interpretation for malignancy prediction with an average AUC of 0.84 versus 0.76 for experienced thoracic radiologists reading the same cases. Radiomics does not replace radiologist judgment, but it adds a quantitative layer that is reproducible across readers and aggregatable across time points in a way that narrative reports are not.

Pegasi's computer vision layer calculates a defined set of radiomic features from CT, PET/CT, and MRI DICOM files received through the FHIR imaging pipeline. These features feed into the multi-modal model as structured numerical inputs alongside genomic variants and clinical laboratory values. The radiologist's narrative report is also processed through our NLP pipeline to extract structured clinical impressions, but the radiomic features provide a quantitative biomarker layer that is independent of individual radiologist reporting style and available in real time without waiting for the formal radiology read.

Circulating Tumor DNA: The Most Promising New Biomarker Class

Circulating tumor DNA (ctDNA) - fragments of DNA shed by tumor cells into the bloodstream - has emerged as the most promising new biomarker class in cancer diagnostics over the past decade. The clinical utility of ctDNA spans detection, staging, treatment monitoring, and recurrence surveillance, and its potential to enable blood-based early cancer detection has attracted substantial investment from major diagnostic laboratories.

The biology is straightforward: tumor cells die at a higher rate than normal cells, and when they die, they release DNA fragments into circulation. These fragments can be detected in a blood sample and distinguished from normal cell-free DNA by their characteristic mutation patterns (which match the tumor's somatic mutation profile) and by aberrant methylation patterns (which reflect the epigenetic signature of the tumor cell type). Ultra-sensitive sequencing methods like digital PCR and error-corrected next-generation sequencing can detect ctDNA at allele frequencies as low as 0.01% - meaning ctDNA signal can be detected from a single tumor cell's DNA in a background of thousands of normal cells' DNA.

The challenge is clinical translation. Current FDA-cleared ctDNA assays are validated primarily for treatment monitoring in advanced cancer - detecting resistance mutations, monitoring response to targeted therapy, and identifying recurrence after curative-intent treatment. For early detection in the general population, the positive predictive value challenges are significant: a ctDNA signal detected by a pan-cancer screening test like Galleri requires follow-up imaging to localize the tumor, and that localization step is not yet reliable enough to avoid unnecessary invasive procedures in a meaningful proportion of cases.

In the context of multi-modal biomarker integration, ctDNA's value is highest when it is one signal among several rather than a standalone screening test. A patient with a ctDNA signal consistent with colorectal origin, combined with an elevated CEA trend and a CT finding of a suspicious 1.2 cm ascending colon lesion, presents a much more interpretable clinical picture than a ctDNA signal alone. That integration is exactly what Pegasi's multi-modal architecture is designed to perform.

The Emerging Science of Epigenetic Biomarkers

The next generation of cancer biomarkers extends beyond the DNA sequence itself to the epigenetic marks - primarily DNA methylation - that control gene expression. Cancer cells exhibit characteristic methylation pattern changes: hypermethylation of tumor suppressor gene promoters (silencing protective genes) and global hypomethylation of repetitive elements. These epigenetic signatures are both cancer-type-specific and cancer-stage-informative in ways that sequence-based genomic biomarkers often are not.

The Cologuard colorectal cancer screening test, FDA-cleared for average-risk adults over 45, combines DNA mutation detection with methylation biomarker analysis (specifically, NDRG4 and BMP3 promoter methylation) to achieve higher sensitivity than either approach alone. The 2014 New England Journal of Medicine validation study showed 92% sensitivity for colorectal cancer compared to 73% for FIT (fecal immunochemical testing), the previous standard.

At Pegasi, we are currently in early-stage development of an epigenetic biomarker module that will add methylation pattern analysis to our multi-modal fusion framework. The technical challenge is that DNA methylation assays are not yet widely available as standard FHIR-accessible laboratory results - most epigenetic profiling still occurs in specialized research laboratories. As these assays become more accessible and as clinical ordering patterns evolve, incorporating epigenetic biomarkers into the Pegasi model is a natural and clinically meaningful extension. As we discuss in our article on how AI is changing early cancer detection, the biomarker frontier is expanding rapidly and the value of integrative platforms increases with each new data type.

What Clinicians Actually Need from Biomarker Reports

The literature on clinical biomarker interpretation is consistent on one point: single-value thresholds are too often used as binary pass-fail indicators when they should be interpreted as probability inputs into a broader clinical picture. A CA 19-9 of 38 U/mL (reference range: less than 37) is not a positive pancreatic cancer test. It is a marginal elevation that should be trended, contextually interpreted, and combined with clinical history and imaging findings before driving any clinical decision.

Pegasi's interface is designed to reinforce this interpretive discipline. Biomarker values are displayed as trend lines over time, not as single-point values with a green/red flag. The alert context includes the specific combination of biomarker signals that contributed to the alert threshold, and the confidence score communicates the uncertainty inherent in multi-input probabilistic models. The goal is to give clinicians more information, presented in a way that supports sound interpretation rather than shortcutting it. A diagnostic AI that trains clinicians to trust thresholds instead of thinking is ultimately making oncology worse. We are trying to build something that makes it better.

Back to News