Understanding Multi-Modal Data Fusion in Oncology

January 15, 2025

Why Single-Source Diagnostics Leave Too Much on the Table

The traditional model of cancer diagnosis is sequential and siloed. A radiologist reads the CT scan. A pathologist reads the biopsy. A molecular oncologist interprets the gene panel. Each specialist works from their own data type, in their own system, on their own schedule. The synthesis happens in a tumor board meeting - if the patient's institution has one, if there is time, and if all the reports have come back before the meeting convenes.

This model has two structural problems. First, it is slow. The median time from suspicious imaging finding to confirmed diagnosis for colorectal cancer in the United States is 21 days. For pancreatic cancer, where the progression window is measured in weeks, that delay is clinically catastrophic. Second, single-modality signals are inherently incomplete. A KRAS mutation alone does not tell you stage. A CT nodule measurement alone does not tell you molecular subtype. Each data type answers a different question, and the diagnostic picture requires all of the answers simultaneously.

Multi-modal data fusion is the approach that addresses both problems. By ingesting heterogeneous data streams into a unified representational model and analyzing them concurrently, fusion architectures produce diagnostic outputs that are both faster and more information-dense than sequential specialist review.

The Three Data Modalities That Matter Most

In Pegasi's current clinical deployment, three data modalities form the core of our fusion architecture. Each contributes a distinct class of signal that the other two cannot replicate.

Genomic data - specifically, variant calls from targeted NGS panels and whole-exome sequencing - provides information about tumor biology that is invisible to imaging. KRAS G12C mutations predict response to specific targeted therapies. MSI-H status (microsatellite instability-high) predicts response to immune checkpoint inhibitors. BRCA1/2 germline variants define hereditary risk that affects screening recommendations for first-degree relatives. These signals do not produce symptoms. They do not appear on a CT scan. Without genomic integration, they simply are not part of the diagnostic equation until a tissue biopsy is ordered.

Quantitative imaging biomarkers - extracted computationally from DICOM files rather than read qualitatively by a radiologist - add the spatial and morphological dimension that genomic data lacks. Our computer vision layer calculates standardized uptake values from PET scans, tracks volumetric nodule growth rates across serial CT studies, and quantifies texture features in T2-weighted MRI sequences associated with high-grade glioma. A 12% increase in SUV over 90 days means something specific about tumor metabolic activity. Expressing that as a quantitative feature rather than a radiologist impression allows it to be combined with genomic and clinical signals in a mathematically coherent way.

Structured clinical data - demographics, laboratory values, symptom histories, medication records, prior procedure codes - provides the contextual layer that makes the other signals interpretable. An elevated CA-125 in a 28-year-old woman with no family history of ovarian cancer has a different clinical meaning than the same value in a 54-year-old woman with a first-degree relative who died of the disease. The model needs that context to set appropriate alarm thresholds.

How Fusion Models Outperform Unimodal Approaches

The empirical case for multi-modal fusion over single-modality approaches is well-established in the literature, though most published studies work with smaller, curated datasets rather than the messy, incomplete data that characterizes real clinical environments. A 2023 meta-analysis published in Nature Medicine reviewed 47 studies comparing unimodal versus multi-modal diagnostic AI in oncology. Across cancer types, multi-modal approaches produced an average AUC improvement of 0.11 over the best-performing single-modality model in the same study cohort.

In Pegasi's own validation dataset - 180,000 patient cases drawn from our partner health systems over four years - the performance gap is consistent but context-dependent. For early-stage colorectal cancer, our three-modality fusion model achieves 94.2% sensitivity at 91.7% specificity. The genomic-only model using the same training data achieves 71.3% sensitivity. The imaging-only model achieves 68.9%. Neither approaches the performance of the combined system, and the fusion model's advantage is most pronounced precisely in the cases that matter most: Stage I presentations where single-modality signals are weakest.

The intuition is straightforward: cancers at their earliest, most treatable stages have the subtlest individual signatures. Early-stage tumor cells have not yet produced large enough masses to be reliably visible on standard imaging. They may not yet have accumulated the mutational burden that makes genomic signals unambiguous. Clinical symptoms are absent by definition. Only by aggregating weak signals across multiple data types - each independently insufficient - can the fusion model achieve reliable early detection.

The Engineering Challenge: Handling Missing Modalities

Fusion architectures work beautifully in research settings where all three data streams are complete and clean. Real clinical environments are neither. Approximately 23% of the patient records ingested by Pegasi across our partner network are missing at least one modality - typically because a genomic panel was not ordered, because imaging was performed at a facility that has not yet connected to the FHIR pipeline, or because the patient's history predates the period for which structured data is available.

A fusion model that simply fails when a modality is absent is clinically unusable. Our architecture handles missing modalities through a masking strategy during training: models are deliberately trained on datasets with random modality dropout, so the model learns to make inference from any available subset of the three streams. A patient with no genomic data still receives a fusion score based on imaging and clinical signals. The confidence interval on that score is wider, and the system communicates that uncertainty explicitly in the alert, but a result is produced and no case falls through the cracks purely because a lab order was not placed.

This design choice has a meaningful clinical implication. It means Pegasi generates actionable output for patients who have not yet had a genomic panel - and the alert itself often serves as the trigger for the oncologist to order one. The platform becomes not just a diagnostic support tool but a care gap identification tool, surfacing the patients whose workup is incomplete in ways that matter.

Federated Learning and Why Your Data Does Not Leave Your Network

One of the most common concerns health system administrators raise when evaluating multi-modal fusion platforms is data sovereignty. Sending patient genomic and imaging data to a cloud model raises legitimate HIPAA questions, institutional policy questions, and patient trust questions. The standard response from most AI vendors - "we anonymize everything and store it securely" - is not satisfying to legal and compliance teams who have read the literature on re-identification risk in genomic datasets.

Pegasi's architecture addresses this through federated learning. Model updates are computed locally within your institution's data environment, and only the gradient updates - mathematical abstractions that contain no patient-identifiable information - are transmitted to the central model. Your genomic data, imaging data, and clinical records never leave your network. The central model improves based on what your data teaches it without ever having access to the data itself.

This is not a theoretical privacy protection. It is a technical one. Gradient updates cannot be inverted to reconstruct individual patient records. The architecture is the same one used in large-scale privacy-preserving model training research at MIT, Stanford, and several national health research consortia. When Pegasi improves its models based on outcomes from your patient population, your patients' data is the teacher, not the asset. That distinction matters both for compliance and for the trust relationship between your institution and the patients it serves. As we explain in our article on patient data security in healthcare AI platforms, federated learning is increasingly the standard for responsible health AI deployment.

What Oncologists Actually See

Multi-modal fusion produces rich intermediate outputs - attention weights, confidence intervals, feature importance scores, cross-modality correlation matrices. These are useful for model validation and research. They are not what a clinical oncologist needs at 7:30 AM when reviewing the overnight alert queue.

What clinicians see in the Pegasi interface is a one-page summary: patient identifier, alert priority (high/medium/low), the top three converging signals that drove the alert, and a single recommended next step with the supporting evidence cited. The full model output is available for any clinician who wants to drill into it. Most do not, at least not in the initial alert review. What they need is enough information to decide whether to act and what action to take. The art of building a clinically useful fusion model is not just in the algorithm - it is in deciding what to surface and what to leave in the underlying engine.

Back to News