Pathology AI: How Computer Vision Reads Tissue Slides

April 3, 2024

AI computer vision reading pathology tissue slides

The Scale Problem in Anatomical Pathology

A single hematoxylin and eosin (H&E) stained whole slide image (WSI) of a tissue biopsy, scanned at 40x magnification, contains approximately 100,000 image tiles - each tile a 256x256 pixel region representing approximately 0.025 mm squared of tissue. Reviewing a WSI at the level of individual cell morphology is fundamentally a large-scale image analysis problem. A trained pathologist can review a slide in 5-20 minutes by efficiently sampling the most informative regions. A convolutional neural network running on a modern GPU cluster can analyze every tile simultaneously in under 30 seconds.

The computational advantage is not primarily about speed, though speed matters in high-volume diagnostic labs with three-day turnaround time targets. The deeper advantage is in the ability to perform systematic, quantitative analysis of cellular features across the entire tissue specimen without the sampling bias that characterizes human visual inspection. A pathologist who focuses on the most morphologically abnormal regions of a slide may miss a spatially isolated but clinically significant finding in an area that does not attract attention during a time-pressured review. Computer vision systems are agnostic to the distribution of abnormal features across the slide - they process every tile with the same thoroughness.

What Deep Learning Models Actually Classify in Pathology

Modern computational pathology models are trained to perform specific classification tasks on WSI data, not to "read" slides in the holistic way a pathologist does. Understanding what these models are and are not doing is essential for evaluating their clinical utility appropriately.

The foundational classification tasks include: tumor versus normal tissue detection (identifying regions containing cancer cells versus benign tissue); tumor grading (for cancers with established grading systems like the Gleason score for prostate cancer or the Nottingham grade for breast cancer); mitotic figure counting (the number of cells in active division per unit area, a key prognostic indicator); and tumor-infiltrating lymphocyte (TIL) quantification (measuring immune cell infiltration into and around the tumor, which predicts response to immunotherapy).

More advanced models perform molecularly relevant classification - predicting the presence of genomic alterations from histological features alone. This capability, sometimes called "computational morphogenomics" or "virtual genomics," is one of the most clinically significant recent advances in computational pathology. The underlying observation is that somatic mutations that change tumor biology also change the morphological appearance of the cells - in ways that are too subtle for human visual inspection to detect reliably but that neural networks trained on genomically characterized datasets can learn to identify. A model trained on WSIs from colorectal tumors with known MSI status can predict MSI-H or MSS (microsatellite stable) from the histology image with accuracy approaching 80% - providing a rapid, cost-free MSI screening estimate that can prioritize cases for formal MSI testing.

FDA-Cleared Pathology AI: The Current State

The FDA has cleared several AI-powered pathology tools under the De Novo or 510(k) pathways, focused primarily on prostate cancer grading. Paige Prostate (FDA cleared 2021) and Ibex Galen Pro (FDA cleared 2022) both assist pathologists in detecting and grading prostate cancer on biopsy specimens, functioning as second-reader tools that flag suspicious regions for pathologist attention rather than generating autonomous diagnoses.

The FDA's approach to cleared pathology AI tools has been consistently framed around augmentation of pathologist performance, not replacement. Cleared tools are approved as aids to pathologist decision-making, with the pathologist retaining final diagnostic responsibility. This framework is appropriate given the current state of the evidence and the complexity of pathological diagnosis, which extends well beyond the specific classification tasks that validated AI models perform reliably.

Several additional pathology AI tools are in late-stage FDA review as of early 2024, including models for breast cancer subtype classification, lung cancer molecular subtype prediction, and colorectal cancer grading. The regulatory pathway for these tools is accelerating as the FDA develops more specific guidance for AI/ML-based diagnostic software, particularly in the Software as a Medical Device (SaMD) framework that governs AI-based diagnostic tools.

Interobserver Variability: The Problem AI Addresses Best

One of the most compelling arguments for AI in pathology is not that it is more accurate than expert pathologists in aggregate, but that it is more consistent. Interobserver variability - the degree to which two trained pathologists disagree on the same specimen - is a documented and clinically significant problem in anatomical pathology.

For prostate cancer Gleason grading, interobserver agreement rates between general surgical pathologists range from 60-70% for specific grade assignments. Even between urologic pathology subspecialists, agreement on distinguishing Gleason 3+4 from 4+3 disease - a distinction that substantially changes treatment recommendations - is imperfect. Multiple published studies have shown that AI grading systems achieve higher inter-reader agreement with expert pathologist consensus than human pathologists achieve with each other, primarily because they apply the same learned criteria consistently regardless of reader fatigue, lighting conditions, or time pressure.

For rare cancers and unusual morphological variants, the variability problem is even more pronounced. A general surgical pathologist in a community hospital who sees three cases of a rare sarcoma subtype per year has a substantially different performance profile on that entity than a specialist at a sarcoma referral center who sees thirty cases per year. AI models trained on large datasets from reference centers can bring reference-center-caliber pattern recognition to community settings - though the evidence base for this use case is less mature than for the major common cancers where large training datasets exist.

Pegasi's Computational Pathology Integration

Pegasi integrates with pathology data in two distinct ways. First, we ingest structured pathology reports - diagnosis codes, grade assignments, margin status, lymphovascular invasion findings - as structured FHIR DiagnosticReport resources, which feed directly into the multi-modal diagnostic model. This structured data is available from most modern pathology laboratory information systems (LIS) through HL7 FHIR interfaces or standard HL7 v2 messages, and it captures the key prognostic variables in a form that the model can use directly.

Second, for partner institutions with digital pathology infrastructure (whole slide scanners and an integrated digital pathology platform), Pegasi processes WSI thumbnails and AI-generated feature extracts to add quantitative histological inputs to the multi-modal model. This requires the health system to have made the capital investment in digital pathology hardware - not yet universal, but increasingly common in academic medical centers. The WSI processing pipeline adds computational features that are not captured in the narrative or coded pathology report: TIL density, tumor cellularity, mitotic index, and molecular subtype prediction for cases where the computational morphogenomics models are applicable.

The integration of computational pathology features into a multi-modal diagnostic model is clinically meaningful specifically because these features carry prognostic information that is correlated with but not redundant to genomic biomarkers. A colorectal tumor with high TIL infiltration on H&E histology is likely but not certain to be MSI-H; the histological feature and the molecular test result are complementary, and having both in the same model improves predictive accuracy for treatment response outcomes. As we explain in our article on multi-modal data fusion, the diagnostic value is in the intersection of data types, not in any single source alone.

Limitations and Where Pathology AI Still Requires Human Oversight

Computational pathology is most reliable for well-defined, high-prevalence classification tasks with large, well-annotated training datasets. It is least reliable for rare tumor types with limited training data, for morphological entities at the edge of recognized classification categories, and for the complex contextual judgments that characterize the most diagnostically challenging cases - the cases where a pathologist consults a subspecialty colleague, pulls reference texts, or refers the slide for expert review.

The appropriate model for current clinical deployment is human-AI collaboration: AI handles the systematic, high-volume, consistency-dependent aspects of slide analysis (tumor detection, routine grading, quantitative feature extraction) while the pathologist applies judgment to the synthesizing interpretation, the unusual case, and the final diagnostic conclusion. This is not a failure mode of AI pathology. It is the correct use case for the current state of the technology - one that reduces pathologist workload on routine cases, improves consistency on standardized tasks, and frees pathologist attention for the complex cases where expert human judgment is genuinely irreplaceable.

Back to News