Research Projects

AI-4-AI bridges medicine, biomedical informatics, engineering, and social science to design and test innovative, real-world solutions for ambulatory and personal healthcare. We focus on using digital data and platforms to improve how care is delivered and experienced.

Project Themes: 
- Building data resources
: Creating large, privacy-compliant repositories of multimodal clinical data to support diverse research communities
- Advancing privacy methods: Developing cutting-edge approaches to de-identify clinical video and audio so these resources can be shared safely across institutions.
- Fostering collaboration: Hosting multidisciplinary forums where clinicians, researchers, and technologists critically examine clinical encounters and health records to inspire improvements in workflows and care delivery.

 

Current Projects

Team: Sriharsha Mopidevi, Matthew Hill, Basam Alasaly, Rachel Wu

 The Observer Repository compiles multimodal data—including videos, audio, annotations, and surveys—collected from clinical visits across diverse healthcare settings, such as Penn Medicine departments and the Penn Medicine Clinical Simulation Center. By providing privacy-compliant access to previously hard-to-obtain clinical data, the repository supports both medical and non-medical researchers in advancing healthcare studies and innovation. 

Observer Dashboard

Paper

Team: Sriharsha Mopidevi

MedVidDeID is a six-stage modular and scaleable pipeline designed to de-identify personal health information (PHI) from raw audio and video recordings of clinical visits. Using a combination of Natural Language Processing (NLP), computer vision, and algorithms, the de-identification process removes PHI from transcripts, audio, and video (including faces and/or explicit body parts). This ensures that the data can be securely accessed and utilized by researchers across institutions for analysis and research.

Paper GitHub

Team: Sydney Pugh, Matthew HillRachel Wu

Addressing the significant underdiagnosis of Alzheimer’s disease, the WATCH study aims to identify key diagnostic clues for assessing cognitive impairment risk. In collaboration with board-certified neurologists, this project harnesses large language models and multimodal systems to analyze patients' linguistic patterns, visual cues, and electronic health records (EHR). Eventually this project seeks to develop an AI-driven tool capable of real-time detection from clinical visits or natural speech, offering personalized follow-up recommendations based on assessment results. By enabling earlier diagnosis and intervention, the WATCH study has the potential to enhance patient outcomes and alleviate the burden of undetected cognitive decline.

Paper GitHub

Current Team: Sameer Bhatti 

Past Contributors: Kuk Jin JangChimezie MadunoAlexander Budko 

The ACAM project is focused on enhancing clinical interactions by facilitating real-time agenda setting between clinicians and patients. Its goals are to track the discussions and reduce the chances of unaddressed or unexpected topics brought up during the visit. By leveraging advanced language models (LLMs), ACAM aims to improve the efficiency and effectiveness of patient-provider interactions, ensuring that critical issues are addressed in every conversation.

Paper

Team: Matthew Hill, Basam Alasaly, Sameer BhattiKevin Johnson

Ambient scribing technology transcribes clinical encounters in real time, aiming to streamline documentation tasks and alleviate provider workload. These set of projects aim to evaluate ambient scribes’ effect on documentation efficiency and patient-centered care by:

  • Examining how ambient scribes influence provider-patient communication through their interactions, time allocation during visits, electronic health record engagement, and satisfaction metrics.
  • Exploring methods to accurately extract medical concepts from clinical notes using natural language processing and large language model pipelines.
  • Investigating content quality and detail in ambient-scribe-produced documentation, focusing on note bloat, standardized quality metrics, and clinical value.

Paper

Current Team: Andrew Zolensky, Basam Alasaly, Sriharsha Mopidev

Past Contributors: Kuk Jin Jang

This project develops a model that can assign speaker roles to utterances in a real-time, inpatient clinical setting to improve live captioning and documentation. Finetuning decision trees and Large Language Models (LLMs), it explores the effectiveness of diarization—segmenting and grouping speech by speaker—to infer speaker roles directly from semantic cues. By integrating transcription, segmentation, and role classification, the system enhances live captioning, real-time decision support, and automatic documentation. This work improves communication for hearing-impaired patients, streamlines clinical workflows, and advances intelligent healthcare assistance.

Paper

Team: Jean Park

This study analyzes multimodal integration in Video Question Answering (VidQA) datasets using a novel Modality Importance Score derived from Multi-modal Large Language Models (MLLMs). Our critical analysis of popular VidQA benchmarks reveals an unexpected prevalence of questions biased toward a single modality, lacking the complexity that would require genuine multimodal integration. This exposes limitations in current dataset designs, suggesting many existing questions may not effectively test models' ability to integrate cross-modal information. Our work guides the creation of more balanced datasets and improves assessment of models' multimodal reasoning capabilities, advancing the field of multimodal AI. 

Poster

 

Team: Amogh Ananda Rao, Patricia Chen, Sydney Pugh

PCP-Bot is a conversational AI tool designed to streamline pre-visit planning in primary care. By guiding patients through natural dialogue, it captures patient health histories and converts them into structured summaries for seamless EHR integration. In doing so, PCP-Bot can replace traditional questionnaires and enhance clinical workflows while maintaining an empathetic patient experience. Once developed, we plan to validate PCP-Bot in clinical settings, expand its use across different medical environments, and enhance its conversation strategies to improve provider-patient interactions. 

 

Past Projects

Team: Kuk Jang, Basam Alasaly

CLIPS uses crowdsourcing techniques to identify opportunities to improve ambulatory care using AI and advanced technology and to identify broadly interesting observations after watching snippets of clinic visits. These data both provide opportunities for computational research and generate metadata about the role audio versus video plays in identifying clinical insights.

Poster

Enhancing Computer-Patient-Provider Interaction Analysis through Neurosymbolic Reasoning and Large Language Modelsning and Large Language Models

Download 

Automating Responses to Patient Portal Messages Using Generative AI

Download 

Automating Sentiment and Emotion Recognition in Patient-Provider Interaction Transcripts for Improved Health CommunicationDownload

Silent Risks: Validating a Dementia-Prediction Tool on the MIMIC-IV Dataset

Download

Penn Engineering Undergraduate Senior Design Project

Current Team: Yanbo FengKuk Jang

This project leverages body movement analysis from de-identified video datasets such as Dem@Care to predict dementia. Building on well-established research linking gait—traditionally focused on lower-body movement—to dementia, this work expands the scope to full-body motion. Using advanced computer vision methods to measure both motion and posture among patients, the project aims to enhance early detection and deepen our understanding of dementia-related motor changes.

Report

Thank you to our funders