Cancer Screening Programs: Data-Driven Approaches

Why standard screening guidelines underperform for high-risk populations - and how integrated data is changing the math.

Data-driven cancer screening programs

The Tension Between Population Guidelines and Individual Risk

Cancer screening guidelines are designed for populations, not individuals. The U.S. Preventive Services Task Force recommends colorectal cancer screening beginning at age 45 for average-risk adults. This recommendation is based on population-level cost-effectiveness modeling that balances the benefit of early detection against the harms of false-positive results and invasive follow-up procedures across a large and heterogeneous population.

What this population-optimized approach necessarily misses is the substantial variation in individual cancer risk. A 45-year-old with no family history, no prior adenomas, and no inflammatory bowel disease has a genuinely different lifetime risk profile than a 45-year-old with Lynch syndrome, two first-degree relatives with colorectal cancer before age 60, and a history of adenomatous polyps on prior colonoscopy. Both individuals fall in the same "average-risk, begin screening at 45" category under standard guidelines. Only one of them is actually average-risk.

Data-driven risk stratification - using the full available clinical data profile for each individual to calculate a personalized risk score rather than applying population-average thresholds - is not a replacement for evidence-based screening guidelines. It is an enhancement that allows guidelines to be applied more intelligently by directing more intensive screening toward the individuals who will benefit most from it.

The Three Major Screening Programs and Their Data Gaps

Colorectal, lung, and breast cancer together account for approximately 40% of all new cancer diagnoses in the United States. All three have established, guideline-supported screening programs. All three have significant data integration gaps that limit their effectiveness.

Colorectal cancer screening achieves coverage rates of approximately 68% in the eligible population aged 45-75 - a meaningful improvement from 40% a decade ago, driven partly by the availability of non-invasive stool-based testing options like Cologuard and FIT. The data gap is not in screening uptake for the average-risk population; it is in identifying and appropriately escalating the high-risk population whose first-degree relatives carry Lynch syndrome mutations. Current practice relies on patient self-report of family history - a notoriously unreliable data source. Pegasi's platform cross-references EHR family history data against known Lynch syndrome carrier records in our network to identify patients with documented high-risk relatives who have not been referred for genetic counseling or accelerated colonoscopy schedules.

Lung cancer screening with low-dose CT is the most underutilized major cancer screening test in the US. Eligibility criteria (adults aged 50-80 with at least 20 pack-year smoking history currently smoking or having quit within the past 15 years) cover approximately 14.5 million Americans, but screening rates remain below 6% of eligible adults. The primary failure mode is identification: primary care physicians do not have a reliable mechanism for identifying which of their patients meet the eligibility criteria without manually reviewing smoking history for each patient. Pegasi automates this identification, scanning structured smoking history records and patient demographics to generate a list of screening-eligible patients at each partner institution, along with direct alerts to primary care physicians for patients due for annual LDCT who have not had one in the past 12 months.

Breast cancer screening with mammography has near-universal guideline coverage but significant stratification gaps. Average-risk women following standard guidelines (annual or biennial mammography from age 40 or 50 depending on the guideline) receive essentially no differentiation based on breast density, hormonal history, or genomic risk. Women with dense breast tissue on mammography - roughly 40% of women in the screening-eligible population - have a substantially higher cancer risk and lower mammography sensitivity than women with fatty breast composition. This density result appears in the mammography report and is communicated to patients, but it rarely triggers a systematic change in screening protocol at the population level. Pegasi flags dense-breast patients for high-risk discussion with their gynecologist and, where appropriate, initiates a referral for supplemental MRI screening.

Risk Stratification Models: What Works and What Does Not

Multiple clinical risk stratification models exist for major cancer types. The Tyrer-Cuzick model for breast cancer risk, the Harvard Cancer Risk Index for colorectal cancer, and the PLCO m2012 model for lung cancer are among the most validated. Each accepts structured clinical inputs - age, family history, prior screening results, hormonal factors - and returns a lifetime or 10-year risk estimate that can be used to guide screening intensity.

These models work well for the populations they were validated on and for the data inputs they were designed to use. Their limitations in routine clinical practice are predictable: the data required to populate the models is often not available in structured form, the models do not incorporate genomic data even when available, and no single risk model spans multiple cancer types in a way that allows integrated risk management for patients at elevated risk for more than one cancer.

Pegasi's risk stratification approach is model-agnostic. Rather than implementing a single validated risk model as a proprietary black box, our platform integrates published risk calculation methodologies and applies them to the structured data available in the EHR and connected laboratory systems. When a validated model is applicable (e.g., Tyrer-Cuzick for a breast cancer screening decision), it is run with available inputs and its output is incorporated into the multi-modal risk score alongside model-specific inputs. When a patient's risk profile includes factors outside the validated model's scope (e.g., a newly identified BRCA2 germline variant that substantially changes breast cancer risk beyond the Tyrer-Cuzick prediction), the platform flags the discrepancy and incorporates the genomic information into an updated risk estimate with appropriate clinical notes.

Closing the Care Gap: From Risk Identification to Screening Action

Identifying high-risk patients is necessary but not sufficient. The translational gap between risk stratification and actual screening action is where most data-driven screening programs lose the benefit they theoretically provide. A health system that generates a list of 500 patients due for lung cancer screening but lacks the workflow to ensure those patients receive outreach, schedule appointments, and complete the scan has done analytical work without clinical benefit.

Pegasi's screening module is designed as a care gap closure tool, not just a risk identification tool. For each patient identified as overdue for a guideline-indicated screening, the platform generates not just an alert for the treating physician but a structured care gap record that can be assigned to a care coordinator, tracked for resolution, and reported in aggregate for quality metrics programs. This connects Pegasi's diagnostic intelligence to the operational systems that health systems use to manage population health - care management platforms, quality reporting dashboards, and patient outreach systems.

The operational integration is essential because individual physician alerts, by themselves, are insufficient to move the needle on population-level screening rates. A primary care physician seeing 25-30 patients per day will not personally manage the scheduling and outreach logistics for 50 overdue screening patients. That work requires care coordination infrastructure, and it requires Pegasi's outputs to flow into that infrastructure automatically rather than requiring manual handoffs at every step.

Comparing AI-Enhanced Screening to Standard Protocol: What the Evidence Shows

Several published studies have examined AI-enhanced cancer screening versus standard guideline-based programs, primarily in mammography and colonoscopy. The evidence is positive but comes with important caveats about study design and generalizability.

In mammography, AI-assisted reading programs have shown consistent sensitivity improvements - typically 4-8 percentage points - over single-radiologist reads in European comparative studies, where AI is used as a second reader rather than as a replacement for radiologist interpretation. The US data is less mature given slower regulatory adoption of AI-assisted mammography reading, but the directional evidence is consistent with the European findings.

In colonoscopy, computer-aided detection (CADe) systems that flag potential polyps during the procedure in real time have shown adenoma detection rate improvements of 9-14 percentage points in multiple randomized controlled trials. The clinical significance of this improvement - whether earlier detection of small adenomas translates to reduced interval cancer rates - is still being evaluated in long-term follow-up studies, but the polyp detection benefit is well-established. Pegasi does not currently operate in the procedural AI space; our focus is on pre-procedural risk stratification that determines which patients should undergo colonoscopy in the first place. Post-procedural outcome integration - incorporating colonoscopy findings back into the patient's longitudinal risk model - is on our near-term roadmap.

Back to News