Research Projects

From medical video de-identification to predicting dementia from patient body movements, the AI-4-AI Lab's projects are centered around the use of artificial intelligence and machine learning to enhance patient-provider communication in the clinical setting. Check out our current projects below!

Current Projects

Team: Sriharsha Mopidevi, Matthew Hill, Basam Alasaly, Rachel Wu

The Observer Platform compiles multimodal data—including videos, audio, annotations, and surveys—collected from clinical visits across diverse healthcare settings, such as Penn Medicine departments, the Penn Medicine Clinical Simulation Center, and Geisinger. By providing privacy-compliant access to previously hard-to-obtain clinical data, the repository supports both medical and non-medical researchers in advancing healthcare studies and innovation.

Team: Sriharsha Mopidevi

MedVidDeID is a six-stage modular and scaleable pipeline designed to de-identify personal health information (PHI) from raw audio and video recordings of clinical visits. Using a combination of Natural Language Processing (NLP), computer vision, and algorithms, the de-identification process removes PHI from transcripts, audio, and video (including faces and/or explicit body parts). This ensures that the data can be securely accessed and utilized by researchers across institutions for analysis and research.

Team: Kuk Jang, Basam Alasaly

The Book Club is a national, multidisciplinary focus group that reviews actual clinical videos and EHR artifacts following a goal-free methodology. This monthly group observes visits from their unique perspectives, questioning the status quo and proposing alternatives that integrate technology into existing workflows, or suggest changes to the workflow leveraging technology.

Team: Sameer Bhatti, Kuk Jin Jang, Chimezie Maduno, Alexander Budko

The ACAM project is focused on enhancing clinical interactions by facilitating real-time agenda setting between clinicians and patients. Its goals are to track the discussions and reduce the chances of unaddressed or unexpected topics brought up during the visit. By leveraging advanced language models (LLMs), ACAM aims to improve the efficiency and effectiveness of patient-provider interactions, ensuring that critical issues are addressed in every conversation.

Paper

Team: Matthew Hill, Basam Alasaly

Nuance Dragon Ambient eXperience (DAX) Copilot is an ambient scribing technology that transcribes clinical encounters in real time, aiming to streamline documentation tasks and alleviate provider workload. These set of projects aim to evaluate DAX technology’s effect on documentation efficiency and patient-centered care by:

Examining how DAX influences provider-patient communication through their interactions, time allocation during visits, electronic health record engagement, and satisfaction metrics.
Exploring methods to accurately extract medical concepts from clinical notes using natural language processing and large language model pipelines.
Investigating content quality and detail in DAX-produced documentation, focusing on note bloat, standardized quality metrics, and clinical value.

Team: Jean Park

This study analyzes multimodal integration in Video Question Answering (VidQA) datasets using a novel Modality Importance Score derived from Multi-modal Large Language Models (MLLMs). Our critical analysis of popular VidQA benchmarks reveals an unexpected prevalence of questions biased toward a single modality, lacking the complexity that would require genuine multimodal integration. This exposes limitations in current dataset designs, suggesting many existing questions may not effectively test models' ability to integrate cross-modal information. Our work guides the creation of more balanced datasets and improves assessment of models' multimodal reasoning capabilities, advancing the field of multimodal AI.

Poster

Team: Yanbo Feng, Kuk Jang

This project leverages body movement analysis from de-identified video datasets such as Dem@Care to predict dementia. Building on well-established research linking gait—traditionally focused on lower-body movement—to dementia, this work expands the scope to full-body motion. Using advanced computer vision methods to measure both motion and posture among patients, the project aims to enhance early detection and deepen our understanding of dementia-related motor changes.

Report

Team: Sydney Pugh, Rachel Wu, Matthew Hill

Addressing the significant underdiagnosis of Alzheimer’s disease, the WATCH study aims to identify key diagnostic clues for assessing cognitive impairment risk. In collaboration with board-certified neurologists, this project harnesses large language models and multimodal systems to analyze patients' linguistic patterns, visual cues, and electronic health records (EHR). Eventually this project seeks to develop an AI-driven tool capable of real-time detection from clinical visits or natural speech, offering personalized follow-up recommendations based on assessment results. By enabling earlier diagnosis and intervention, the WATCH study has the potential to enhance patient outcomes and alleviate the burden of undetected cognitive decline.

Past Projects

Team: Kuk Jang, Basam Alasaly

CLIPS uses crowdsourcing techniques to identify opportunities to improve ambulatory care using AI and advanced technology and to identify broadly interesting observations after watching snippets of clinic visits. These data both provide opportunities for computational research and generate metadata about the role audio versus video plays in identifying clinical insights.

Poster

Team: Andrew Zolensky, Kuk Jin Jang, Basam Alasaly, Sriharsha Mopidev

This project develops a model that can assign speaker roles to utterances in a real-time, inpatient clinical setting to improve live captioning and documentation. Finetuning decision trees and Large Language Models (LLMs), it explores the effectiveness of diarization—segmenting and grouping speech by speaker—to infer speaker roles directly from semantic cues. By integrating transcription, segmentation, and role classification, the system enhances live captioning, real-time decision support, and automatic documentation. This work improves communication for hearing-impaired patients, streamlines clinical workflows, and advances intelligent healthcare assistance.

Enhancing Computer-Patient-Provider Interaction Analysis through Neurosymbolic Reasoning and Large Language Modelsning and Large Language Models