PhD Curriculum - Biomedical Informatics Track

GCB students following the Biomedical Informatics (BMI) track are required to take 10 courses (10 credit units, CU), which include required courses, courses from a pool for specific requirements based on the student's background, and electives. These requirements provide a foundation of core knowledge across GCB students plus advanced training specifically along the axis of biomedical informatics.

  • 1 CU of Statistics Fundamentals
  • 1 CU of Fundamentals in Computational Biology and Algorithms
  • BMIN 5010: Introduction to Biomedical and Health Informatics
  • BMIN 5020: Database and Data Integration in Biomedical Research
  • BMIN 5200: Foundations of Artificial Intelligence in Health
  • BMIN 5210: Advanced Methods and Health Applications in Machine Learning
  • BMIN 5220: Natural Language Processing for Health
  • BMIN 6010: Learning Health Systems for Biomedical Informatics
  • 2 CUs of elective coursework

Example Schedule

  Fall Spring Summer
Year 1 BMIN 5010
BMIN 5200
GCB 5330
Lab Rotation 1
BMIN 5020
BMIN 5220
Lab Rotations 2 and 3
Pre-dissertation Research
Year 2 BMIN 5210
GCB 5360
Pre-dissertation Research
BMIN 6010
Pre-dissertation Research
Candidacy Exam
Year 3 & beyond Dissertation Dissertation Dissertation


Statistics Fundamentals

All GCB students are required to take a fundamental course in Statistics. Typically, this will be a probability theory class we developed for our students (GCB5330), but some students enter our program with the background to take more advanced Statistical training. Courses in this pool that satisfy this requirement are:

  • GCB 5330: Statistics for Genomics and Biomedical Informatics
  • STAT 5100: Probability Theory
  • BSTA 6200: Probability I
  • Chair-approved course in advanced statistics (STAT 5000+) or advanced biostatistics (BSTA 6000+)

Fundamentals in Computational Biology and Algorithms

All GCB students are required to take a fundamental in course in Computational Biology, Algorithms, or programming. The selection of this course requirement is tailored to the specifics of the student, as matriculating students tend to emerge from diverse backgrounds where they may have had very little to substantial levels of algorithmic experiences. Courses in this pool that satisfy this requirement are:

  • GCB 5360: Fundamentals of Computational Biology (not being offered Fall 2023)
  • BIOL 5535: Introduction to Computational Biology & Biological Modeling
  • BIOL 5860: Mathematical Modeling in Biology
  • CIS 5450: Big Data Analytics
  • CIS 5520: Advanced Programing
  • CIS 6770: Advanced Topics in Algorithms and Complexity

Additional alternatives may also satisfy this requirement, subject to Chair approval.

Lab Rotations (GCB 6990)

Because it is essential that candidates have a firm training in biology and experimental techniques, a crucial component of the GCB curriculum is research rotations in the laboratories of GCB-affiliated faculty. Students in this program are required to do three lab rotations as part of their training. The definition of a lab rotation is flexible and includes the possibility of rotations in a computer science lab (for example, the application of data mining techniques to biological information sources) or a course of directed reading and research in mathematics/statistics, but students should expect to spend at least 25 hours per week in their rotation lab. At least one rotation must be a wet-lab project, and one must be computational.

For PhD students, each rotation lasts 11 weeks, with the first rotation beginning towards the end of September, the second rotation beginning during the first week of January, and the third rotation beginning in late March and running until mid-June.

The dissertation laboratory is usually chosen from one (or more) of these rotation labs, although this is not required. To ensure breadth of the training experience, all laboratory assignments must be approved in advance by the GCB Chair or the Chair of the Advising Committee.

Pre-dissertation Research (GCB 8990)

Once the student has identified a thesis lab, generally during their first summer and no later than the end of their third semester, they begin graded lab work in their chosen dissertation laboratory. These lab projects serve as a foundation to the more formal dissertation research that follows the Candidacy Exam.

List of Electives

  • Any Statistics Fundamentals or Fundamentals in Computational Biology and Algorithms course
  • BIOM 5550: Regulation of the Genome
  • BMIN 5050: Precision Medicine and Health Policy
  • BMIN 5060: Standards and Clinical Terminology
  • BMIN 5070: Human Factors
  • BMIN 5090: Telehealth and Mobile Health
  • BMIN 5250: Introduction to Python Programming
  • CAMB 5500: Genetic Principles
  • CIS 5220: Deep Learning for Data Science
  • CIS 5370: Biomedical Image Analysis
  • EPID 7010: Introduction to Epidemiologic Research
  • GCB 5340: Experimental Genome Science
  • NURS 6510: Nursing Informatics
  • STAT 4310: Statistical Inference
  • STAT 5000: Applied Regression and Analysis of Variance
  • STAT 9270: Bayesian Statistics

This list is not exhaustive, and additional qualifying courses may be approved by the Advising Committee and GCB Chair. Students can visit the University Catalog to view available courses.

Descriptions of BMI Courses

This course is designed to provide a survey of the major topic areas in medical informatics, especially as they apply to clinical research. Through a series of lectures and demonstrations, students will learn about topics such as medical data standards, electronic health record systems, natural language processing, clinical research informatics, clinical decision support, imaging informatics, public health informatics, and consumer health informatics. It is recommended that students have basic familiarity with biomedical concepts.


This course is intended to provide in-depth, practical exposure to the design, implementation, and use of databases in biomedical research, and to provide students with the skills needed to design and conduct a research project using primary and secondary data. Topics to be covered include: database architectures, data normalization, database implementation, client-server databases, concurrency, validation, Structured-Query Language (SQL) programming, reporting, maintenance, and security. All examples will use problems or data from biomedical domains. MySQL will be used as the database platform for the course, although the principles apply generally to biomedical research and other relational databases.


As a subfield of computer science, artificial intelligence is often used interchangeably with the term ‘machine learning’, which itself is more accurately a subfield of AI dealing with the broader concept of inductive reasoning. However, a wealth of key prerequisite topics that focus on deductive reasoning align with the bulk of biomedical informatics applications being actively utilized today. These founding principles of AI and their intersection with biomedical informatics are the focus of this first course on artificial intelligence. The course is divided into modules that cover (1) introductory/background materials, (2) knowledge representation, (3) logic, (4) essentials of rule-based systems, (5) search, (6) information structure and inference, and (7) special topics. These topics offer a global foundation for branches of AI application and research in biomedical domains, including concepts that will later support a deeper understanding of inductive reasoning and machine learning. In a practical sense, this course focuses on how biomedical data can be organized, represented, interpreted, searched, and applied in order to derive knowledge, make decisions, and ultimately make predictions while avoiding bias. It is expected that all students will be somewhat familiar with basic biomedical concepts and terminology, statistics. Additionally, students must be familiar with basic computer programming concepts including data structures, control flow, and I/O. Recommended, but not required, that students have taken Introduction to Biomedical Informatics (BMIN 5010), Data Science for Biomedical Informatics (BMIN 5030), and a programming course (any language). No previous exposure to artificial intelligence is assumed.


Machine learning studies how computers learn from data and has enormous potential to impact biomedical research and applications. This course will cover fundamental topics in machine learning with an application focus on biomedical informatics. Specifically, the course will cover: supervised learning methods such as linear regression, logistic regression, nearest neighbors, support vector machines, decision trees and random forests; unsupervised learning topics such as clustering and dimensionality reduction; neural networks and deep learning methods for supervised or unsupervised learning tasks, including Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Auto Encoder (AE), Generative Adversarial Network (GAN), and Graph Neural Network (GNN); and the applications of these machine learning techniques to various biomedical informatics problems via analyzing imaging, biomarker, electronic health record, clinical and/or other biomedical data. The precise topics may vary from year to year based on student interest and developments in the field. Students are required to have completed BMIN 5250 (Python Class) or equivalent programming experience. It is recommended that students have basic knowledge in data analysis and biomedical research. Basic knowledge of machine learning, linear algebra, statistics and probability is preferred.


The growing volume of unstructured health-related data presents unparalleled challenges and opportunities for informaticians, clinicians, epidemiologists and other public health researchers that seek to mine the rich information "locked" within free-texts. Clinical records, social media, published literature, are all sources designed for human eyes, but not necessarily for automatic processing. In this class, we will survey the most recent natural language processing methods used for identifying and classifying information present in these sources. The class provides learning of health language processing – that is, the fundamental principles and methods of both natural language processing and machine learning and how they are currently applied in the biomedical domain. The class will focus on real problems in the context of health research where data are inherently biased, e.g., noisy, missing, or extremely imbalanced (where instances of interest are rare in the data). Methods for addressing these biases, such as text normalization, rules-based systems, machine learning (supervised, unsupervised, active learning), deep learning, and large language models will be discussed. In-class lectures will be most often taught using Jupyter notebooks and guest speakers presenting how an NLP/ML method was used to solve a driving biomedical use case. This course requires proficiency in python programming demonstrated through examination.


This course is currently under development and will be offered beginning in Fall 2024.