Informatics
The Informatics division supports the substantial operational and data science needs of I3H and its collaborators by developing and maintaining the institute’s powerful computing platform. Using the highest security protocols and cutting-edge technologies, including the latest cloud-based services and machine-learning techniques, we collect, integrate, organize, analyze, and sort large-scale complex heterogeneous data, ensuring that high-quality, curated information reaches the right team at the right time.
As the volume of biomedical data rapidly increases, employing artificial intelligence (AI) to help interpret data and translate findings into clinical practice becomes increasingly necessary. As AI’s role in medicine evolves, I3H is evolving with it, vetting AI applications, expanding computational infrastructure, and innovating informatics approaches to amplify the efforts of individual labs and collaborations. Enhanced integration of AI into our current operations will further accelerate I3H’s translation of research into clinical implementation.
Our Services
The Informatics division provides a range of integrated services, such as:
- Scientific data management and integration
- Private data collaboration
- Data analytics pipeline infrastructure
- Automated report generation
- Data publishing
Charged with minimizing friction in scientific and operational data workflows, we strive to increase efficiency at every step.
Scientific Data Management and Integration
- Ability to ingest large volumes of complex data spanning organizations, platforms, and disciplines
- Management of large-scale scientific and clinical information, including multimodal immune profiling data obtained through fluorescent- or mass-spec-based cytometry, serum and plasma proteomics, serology, and other types of single-cell omics
- Organization of datasets and metadata schemas
- Seamless querying of metadata and files
- Secure tracking of data use and access
Private Data Collaboration
Selective data sharing within labs, consortia, and institutes using industry-standard security mechanisms
Data Analytics Pipeline Infrastructure
- Creation of analytics pipelines across events within the data ecosystem, at scale
- Creation of custom-tailored pipelines to meet specific needs
- Providing mechanisms that allow users to register computer nodes
Automated Report Generation
Ability to auto-generate reports, leveraging our pipeline infrastructure:
- Data reports
- Quality assurance reports
- Data insight reports
- Custom reports for your specific projects
Data Publishing
We assist in making data public, according to all required and recommended findability, accessibility, interoperability, and reusability (FAIR) Principles of data sharing. Our underlying proprietary Pennsieve Data Management Platform, maintained by the University of Pennsylvania’s Wagenaar Lab, is an NIH-supported scientific data repository. Pennsieve is being leveraged for several NIH programs/initiatives, including SPARC, HEAL RE-JOIN, and HEAL PRECISION, and is listed on the NIH data sharing website.
Our data publishing capabilities include support for:
- Integrative data analysis — using advanced univariate and multivariate statistical methods, network analysis, clustering analysis, etc.
- Prediction models — using artificial intelligence (AI) for machine learning, including deep learning techniques
- Collaboration — working closely with research and clinical project leaders and staff members to answer project-specific research questions
Technologies
Advanced Cloud-Based Platform
With infrastructure supported by the extensive resources of the Penn Institute for Biomedical Informatics, our Pennsieve Platform leverages the Amazon Web Services (AWS) Data Science Ecosystem — the most reliable, flexible, secure cloud-computing architecture available — to provide scalable, sustainable data storage, management, collaboration, integration, and analytics. The AWS technology also supports programmatic and web-based user access to all functionality and facilitates our development and use of machine learning for shared prediction models.
AI Assistance
Demonstrating the reliability and safety of AI methods is imperative in bridging the divide between theoretical and proof-of-concept methodologies and widespread clinical implementation. Penn’s newly created Center for AI-driven Translational Informatics (CATI), led by Dokyoon Kim, PhD, supports computational and organizational resources to advance translational AI research endeavors at Penn, promoting further collaboration, ensuring scientific rigor, facilitating project success, and surmounting the final hurdle: taking these AI-driven approaches into Penn Medicine clinics.
Security
Our data ecosystem employs the strongest security mechanisms available. Before you register with the platform, we require de-identification of all information intended for transfer. We can assist you with this aspect of data cleansing, upon request.
All data is encrypted on disk and in transit, and the platform supports advanced mechanisms to grant access permissions selectively to users. In addition, all activity and data provenance are tracked to ensure data integrity and use.
For More Information
General Informatics Questions
For general questions and to inquire about leveraging the I3H Informatics core to accelerate your project, please complete our contact form.
Pennsieve Platform
For more information about the Pennsieve Platform please visit the Wagenaar Lab or contact Joost Wagenaar.
Integrative Analysis
For inquiries about integrative analysis, please visit the Kim Lab or contact Dokyoon Kim, Associate Director of Informatics.