Foundation Model
Multimodal
Large Scale
Project Overview
This project aims to address a pressing challenge in trauma care — the vast amount of underutilized data stored across hospital systems. At Harborview Medical Center, a Level I trauma center, large volumes of imaging, clinical notes, and outcomes data are collected but remain fragmented and difficult to integrate for research or clinical use. Our objective is to explore how these heterogeneous data sources can be transformed into structured, privacy-preserving knowledge that supports clinicians in understanding injuries, tracking recovery, and making equitable treatment decisions. Rather than simply building another foundation model, the project investigates how large multimodal AI systems can be responsibly adapted to serve as domain specialists in traumatic brain and spinal injury.
Detailed Introduction
Motivation. Traumatic brain injury (TBI) and spinal cord injury (SCI) continue to impose major clinical and societal burdens, yet the datasets that could improve care remain largely unexploited due to privacy, accessibility, and scalability barriers. Through close collaboration with Harborview Medical Center, we discovered that many of these data are securely stored but not analyzed or connected across departments. This project seeks to tackle that gap by developing and evaluating technical strategies for adapting and fine-tuning large-scale foundation models to this domain—assessing how much data, what model configurations, and what governance processes are needed for a general-purpose foundation model (such as AWS Nova) to become a reliable, transparent, and clinically useful specialist model for TBI and SCI.
Our approach is research-driven and interdisciplinary: we work directly with clinicians from neurosurgery, orthopedics, and rehabilitation medicine to define meaningful downstream tasks while experimenting with scalable, privacy-preserving multimodal learning frameworks. The long-term goal is to establish a reproducible pathway for converting complex, unstructured clinical data into equitable and interpretable AI systems that augment, rather than replace, medical decision-making.
Evaluation, safety & fairness
Evaluation will include standard technical metrics (AUC, sensitivity, specificity, dice for segmentation), but must also include:
- Fairness audits across age, sex, race/ethnicity, payer status, and geography.
- Privacy leakage testing (can identifiers be recovered?) and red-team adversarial probes.
- Clinical utility studies with clinicians to measure interpretability, workflow fit, and downstream impact on decision-making.
Team & roles
Core investigators
- MS June Lee — Project Lead, Data engineering & model operations
- MS Benjamin Han — Data Scientist, Data engineering & model operations
- MS Mary Nguyen — Fullstack, Data Engineering
- DO Asouri Souri — Neurology, Data Analysis
- PhD Rupak Rajachar — Project Advisor
- MD Diana Wiseman — Neurosurgery lead, clinical labeling & validation
- MD Jamie Ott, MD Heather Barnett — Orthopedics clinical partners
- MD Christopher Lewis — Rehabilitation medicine, outcomes & SDOH integration
- PhD Vikash Gupta — Sr.Solution Architect, AWS Healthcare
- PhD Ujjwal Ratan — Machine Learning Team Leader, AWS Healthcare
Advisors & collaborators
- Clinical informatics & IRB governance
- Imaging informatics (Radiology IT, PACS/Visage integration)
- Privacy & legal (data use, HIPAA compliance)
- External auditing partners for fairness/privacy evaluation
Infrastructure & tooling
- Secure cloud environment (AWS recommended for Bedrock integration) with encryption at rest/in transit and VPC isolation.
- Data lake for SGT-protected artifacts + metadata catalog.
- Model training platform supporting multimodal transformers, distributed training, and experiment tracking (MLflow, Weights & Biases, or similar).
- API gateway for clinical integration with Epic and Visage (FHIR mappings, HL7/RIS connectors) — outputs should be pluggable clinical decision support artifacts rather than black-box predictions.
Timeline & milestones (example)
Months 0–3: Governance, IRB, data pipeline prototypes, SGT proof-of-concept, pilot fine-tuning on AWS Bedrock (text-only).
Months 4–9: Add imaging modality (CT/CT-angio), structured labeling for key injury classes, clinician-in-the-loop evaluation, initial external audit.
Months 10–18: Scale data, continual pretraining for domain encoder, multimodal fusion, prospective clinical pilot for a targeted use-case (e.g., injury categorization & discharge planning assistance).
Months 18+: Production integration, monitoring, regulatory preparations, multi-center validation and model updates.
Deliverables
- SGT-protected trauma dataset & metadata catalog (for internal, governed use).
- Multimodal trauma foundation encoder and fine‑tuned downstream models for categorization, segmentation, and prognosis.
- Clinician dashboard / FHIR API for integration with Epic and radiology systems.
- Evaluation reports (technical, fairness, privacy) and documentation for audits and regulators.
Risks & mitigation
- Data privacy risk: Mitigate via SGT, strict governance, limited access controls, and regular privacy audits.
- Model bias & fairness: Continuous audits, balanced data sampling, subgroup metrics, and clinician oversight.
- Clinical adoption: Early clinician involvement, human-in-the-loop design, interpretable outputs and clear limitations.
- Cost & compute: Use staged approach (Bedrock pilot → expand) to manage spend; use spot instances and efficient training recipes.
Connection to recent radiology scaling research
Recent work on radiology scaling laws shows large performance gains when performing continual in-domain pretraining on institution-specific imaging corpora (even modestly sized datasets can provide outsized improvements). This supports our strategy to begin with Bedrock fine-tuning for speed and safety, and then move to larger in-domain continual pretraining if required.
Contact & next steps
If you're interested in collaborating, contributing labeled data, or reviewing the governance plan, please contact the project leads:
- MS June Lee — june0604@uw.edu
- Core team: Benjamin Han, Mary Nguyen, Asouri Souri, Rupak Rajachar — (internal UW contacts)