The diagnostic window is expanding - and machine learning is the reason why.
For most of modern medicine, cancer diagnosis has been reactive. A patient develops a persistent cough, notices unexplained weight loss, or feels a lump. They schedule an appointment. They wait. By the time a tissue biopsy confirms the diagnosis, many cancers have progressed to Stage III or Stage IV - the stages where five-year survival rates drop dramatically.
In colorectal cancer, the five-year survival rate for Stage I is 92%. For Stage IV, that number falls to 14%. The difference is not in treatment modality - it is almost entirely in detection timing. This arithmetic is brutal and well-documented. What has changed in the past five years is that we now have the computational infrastructure to act on it.
AI-driven early detection is not a speculative technology. It is in clinical deployment today, and the data from early adopters is beginning to reshape how oncologists think about the diagnostic window. The question is no longer whether machine learning can improve early detection. The question is how fast health systems can integrate it without disrupting the workflows that clinicians already depend on.
The term "AI" gets applied broadly enough that it has become nearly meaningless in healthcare marketing. When we describe Pegasi's approach to early cancer detection, we are specifically talking about supervised multi-modal classification models trained on labeled clinical outcome data - not chatbots, not general-purpose large language models, not rule-based alert systems dressed up in AI language.
Our core models ingest three data streams simultaneously: genomic variant calls from next-generation sequencing panels, quantitative imaging biomarkers extracted from DICOM files by our computer vision layer, and structured clinical data pulled via HL7 FHIR from your EHR. Each stream contributes signal that the others cannot provide. A genomic variant that looks incidental in isolation becomes actionable when combined with a 0.4 cm nodule growth rate and an elevated carcinoembryonic antigen trend over 18 months.
The model does not diagnose cancer. It surfaces a ranked alert: "This patient has a 78% probability of meeting Stage I colorectal cancer criteria based on these four converging signals. Recommended next step: colonoscopy." The clinical judgment stays with the oncologist. What changes is the completeness and timeliness of the information they receive before making that judgment.
Early detection algorithms have a well-known failure mode: false positives. Alert systems that flag too many cases create alert fatigue - the condition where clinicians start dismissing notifications because the signal-to-noise ratio is too low to justify the cognitive overhead of review. Multiple studies have linked alert fatigue to adverse patient outcomes. A detection tool that clinicians stop trusting is worse than no detection tool at all.
Pegasi's production deployment at Houston Methodist maintains a false positive rate of 8.3% - meaning that for every 12 alerts the platform generates, 11 require clinical follow-through. We achieve this through a specificity calibration layer that adjusts alert thresholds dynamically based on the patient's complete clinical context, not just the signals that exceeded a static cutoff.
Compare this to the industry average for standalone rule-based EHR alerts, which typically run 40-60% false positive rates in oncology contexts. The difference is not marginal. At a health system seeing 2,000 oncology patients per month, the gap between 8% and 50% false positives translates to hundreds of unnecessary follow-up procedures and the physician time consumed reviewing them.
One of the most promising developments in early cancer detection is the maturation of circulating tumor DNA (ctDNA) assays - what the industry now commonly calls liquid biopsy. Unlike traditional tissue biopsy, ctDNA analysis requires only a standard blood draw and can detect tumor-derived genetic material years before a lesion would appear on imaging.
Grail's Galleri test, currently in clinical validation across several large health systems, demonstrated the ability to detect signals across 50+ cancer types in asymptomatic individuals. The challenge is not sensitivity - it is specificity and the clinical pathway for follow-up when a signal is detected. A positive ctDNA result without a clear localization strategy leaves both patient and clinician without an obvious next step.
Pegasi's roadmap includes integration with ctDNA assay results as a fourth data stream, feeding into the same multi-modal classification framework alongside genomic panels, imaging, and clinical history. When a Galleri signal is detected, our platform will automatically cross-reference it against imaging history and clinical markers to generate a localization hypothesis before the follow-up appointment. We expect to have this capability in pilot at MD Anderson in late 2025.
We have now completed full deployment at 12 health system partners, ranging from academic medical centers to regional community oncology practices. Three lessons from that experience are worth documenting here, because they contradict conventional assumptions about AI adoption in healthcare.
First, clinician trust is built through specificity of evidence, not breadth of marketing claims. The oncologists who became Pegasi's most active users were the ones who asked hard questions about our training data, our validation methodology, and our false positive rate - and received documented answers. Vague confidence intervals do not build trust with physicians who have spent careers interpreting statistical uncertainty in clinical trials.
Second, the hardest integration problem is not technical. Every major EHR has a FHIR API. The harder problem is organizational: agreeing on which alert types route to which care team members, how alerts appear in the clinical workflow without creating new documentation burdens, and who is responsible when an alert is overridden and the patient subsequently presents with advanced disease. These are governance questions, not engineering questions, and they take longer to resolve than any API connection.
Third, the sites that see the most clinical impact are the ones with the most complete historical data. A model trained on five years of your patient population's genomic and imaging data performs significantly better than one relying on population-level reference cohorts. This is why Pegasi's continuous retraining architecture is not just a technical feature - it is a strategic asset that compounds over time. As we discuss in our article on multi-modal data fusion in oncology, the quality of integrated data matters more than any single algorithmic innovation.
One of the most common objections to AI-assisted early detection is cost. Health systems are already operating on thin margins. Adding another platform, another integration, another line item on the IT budget - these are not trivial concerns.
The counter-argument is straightforward but needs to be made with specific numbers rather than general claims. Stage I colorectal cancer treatment costs an average of $22,000. Stage IV treatment averages $150,000 - and that is before accounting for extended hospital stays, ICU admissions, and palliative care. Detecting one case at Stage I rather than Stage IV saves the payer approximately $128,000 per patient. At Houston Methodist, Pegasi was associated with 22 Stage I confirmations in year one that were attributed to platform-initiated alerts. The implied cost avoidance at the lower bound is $2.8 million from a single deployment.
These numbers are imprecise and we publish them with appropriate caveats about attribution methodology. But the order of magnitude is defensible, and it is the right frame for health system administrators who need to justify technology investment to finance committees.
The current generation of AI diagnostic tools operates largely as decision support - augmenting the oncologist's judgment rather than replacing any step of the clinical workflow. The next generation will begin to operate more autonomously in specific, well-bounded contexts: automated interval imaging scheduling based on risk stratification, real-time pathology pre-screening to prioritize slides for human review, and treatment response prediction at the time of diagnosis rather than after the first chemotherapy cycle.
None of this eliminates the oncologist. It changes what the oncologist spends their time doing - shifting from data assembly and pattern recognition across disparate systems toward interpretation, patient communication, and the judgment calls that genuinely require clinical expertise. That is the version of AI in oncology worth building toward: not replacing clinical intelligence, but directing it toward the problems that actually require it.
At Pegasi, that is the goal we measure ourselves against. Not the model metrics, though those matter. Not the integration count, though that matters too. The metric that keeps our team honest is how many Stage I diagnoses would not have happened without the platform. Everything else is infrastructure in service of that number.