Cohort Identification and Feasibility

In the early trial design stage, it may be beneficial to take a look at exactly how many available subjects within Penn Medicine there are for a trial. This can be helpful information to include in a grant proposal or when planning a recruitment strategy. It may also be necessary information to provide to an external sponsor during the site selection process. Below is a quick snapshot and some more details about some of the resource available to support this.

For information on access to PennOmics/ PennSeek/ Penn G&P/ Clarity or Cosmos please contact the Penn Data Analytics Center (DAC)

For research requests in which identifiable data is needed, IRB approval will need to be supplied and access granted to appropriate tools. For most researchers, Slicer Dicer or reporting Workbench are the recommended tools for this activity. The alternative is to use the DAC who will serve as a broker for the identifiable data.

Slicer Dicer research users access will be coming in August 2020. Permissible use cases for the system are detailed below.

Slicer Dicer Use Cases in Research

Preparatory to Research:

The preparatory research provision permits covered entities to use or disclose protected health information for purposes that are in preparation of research, such as to aid study recruitment or determine if there is a viable cohort of patients at a site. The preparatory to research provision allows such a researcher to identify prospective research participants for purposes of seeking their authorization to use or disclose their prospective health information as part of a research study. However, it does not allow for a researcher outside the covered entity to identify patients that may be eligible for a study, this must be done under a full HIPAA waiver and with appropriate agreements, as well as IRB approval, in place. It also does not allow the Penn researcher to actually reach out to those patients to recruit or enrollment them in a trial without full IRB approval.

Slice Dicer Use Cases:

  • Feasibility/Cohort Identification: A researcher wants to determine if he or she has a large enough available pool of patients to enroll in a research study or needs to provide a sponsor or funder with an estimate of the number of evaluable patients he/ she would be able to enroll. Slicer Dicer can be used as a self service tool searching all of PennChart data to provide counts of eligible number of patients.
    • Anyone with PennChart access who has taken Slicer Dicer online training can perform this function.
    • Under this use case a researcher may not:
      • Access identifiers
      • Record level data may be accessed but the following will be excluded; HIV data, substance abuse information, behavioral health encounter information 
  • Preparatory to Research: A researcher can access potential research patient PHI to prepare a recruitment strategy but they may not actively reach out to patients until the study has been approved.
    • Under this use case a researcher may not:
      • Access name or SSN number but contact information can be viewed
      • Record level data may be accessed but the following will be excluded; HIV data, substance abuse information, behavioral health encounter information 
    • Query can be saved and rerun by the team so that once IRB approval is obtained, patient contact information can be obtained
  • IRB Approved Protocol Use Cases: In all of these cases Slicer Dicer may be used to conduct the research or for the purposes of research operations. A valid IRB number must be entered before accessing the data
    • Public health research
    • Quality improvement research
    • Retrospective review or biospecimen studies with necessary identifiers when consent and hipaa authorization are waived
    • Prospective recruitment for a clinical trial.
      • Patients can be contacted through IRB approved messaging via an IRB approved application – MPM, phone call,etc.

For the two cases which require IRB approval the following must be done:

  • IRB application to describe use of Slicer Dicer and identifiers to be obtained
  • At least one member of the research team must have Slicer Dicer access and gone through training. Training and access will be confirmed by the IRB.
  • IRB will confirm the PHI meets the minimum necessary standards
  • The following data will not be able to be accessed through the application; HIV status, substance abuse information, behavioral health encounters.
  • Data will be downloaded onto a Penn Medicine device or stored in a secure location

Research Cohort Exploration and Data Analytics Tools- TriNetX

Where does TriNetX data come from?

The main source of TriNetX data comes from healthcare organizations (HCOs) around the globe. Ranging from specialty clinics to large academic medical centers, HCOs start with providing data typically found in a structured format (e.g. Diagnoses, Procedures, Medications, Labs, and Vitals) from their electronic health records system (EHR). From there, HCOs can opt into sharing additional data not typically found in their EHR, such as cancer registry, genomics data, and data found in notes (extracted via natural language processing).

Availability of data can vary by institution or region. For example, nearly all of USA HCOs provide four or five of the main data types (Diagnoses, Procedures, Medications, Labs, and Vitals), but Procedures and Medications might not be as readily available to ingest from ex-USA HCOs.

See this link for more details (requires a TriNetX login):

How does TriNetX map the data?

As part of onboarding an HCO, their data is mapped to a set of standard terminologies. Demographics data (e.g., race and ethnicity) are mapped to HL7 administrative standards. Diagnoses are represented by ICD-9-CM and ICD-10-CM. Procedures are represented by ICD-9-CM, ICD-10-PCS and CPT. Medications are mapped to RxNorm ingredients. Laboratory test results and vital signs are mapped to LOINC. Molecular genomics data conforms to HGNC for gene naming and HGVS for variant descriptions.

The TriNetX Master Terminology also includes lab roll-ups and derived facts. For example, to ease finding and using common labs, LOINC codes are rolled up to clinically significant level for most frequent labs. One case you’ll see this is the lab TNX:LAB:9029 Sodium [Moles/volume] in Serum, Plasma or Blood corresponds to 2947-0 Sodium [Moles/volume] in Blood and 2951-2 Sodium [Moles/volume] in Serum or Plasma.

Examples of derived and calculated facts include:

  • The Oncology Treatments hierarchy identifies patients who have received radiation, chemotherapy, targeted therapy, hormone therapy, and stem cell transplants.
  • Chemotherapy Lines of Treatment identifies patients who received anywhere from 1 to 5 lines of chemotherapy.
  • Glomerular Filtration Rate (GFR) is based on serum creatinine and other information according to MDRD, CKD-EPI, and Schwartz formulas.

Are the dates in TriNetX shifted?

A subset of HCOs contributing data across TriNetX date shift between 1 and 365 days in either direction at the level of the patient record. HCOs implement this shift before sharing the data with TriNetX. Within a patient’s record, all dates will be shifted the same amount, but it’s important to take date shifting into account if your analysis examines seasonality.

What are the different networks available, and what are the advantages of each?

The TriNetX networks refer to different patient populations you can query. The network you’re querying is shown near the upper left corner when you’re in Query Builder. Networks have different functionalities that you should consider when building your analysis.

The following networks are available to Penn Medicine as an organization. If you do not see a network in the network dropdown in Query Builder, contact your admin to get access.

  • Penn Medicine: As you may suspect, this network contains patients seen across the Penn Medicine system. Data is fed from Epic to an OMOP, which feeds into TriNetX. Dates are not shifted on this network. A unique characteristic of this network is that it can be used for cohort discovery, including re-identifying patients for chart reviews or to be recruited into trials. If you need to re-identify patients, contact or contact the data analytics center (DAC) about the Export Patient IDs process. Please note patient re-identification will require IRB approval.
  • Research: The Research network is an anonymized network of participating HCOs across the globe, although >90% of the patients are from the USA as of 2022. A key component of this network is that researchers can download de-identified datasets, so all HCOs on the network have agreed to contribute their data to the network in exchange for access.
  • Linked: Linked lets researchers analyze more complete and more longitudinal patient data by integrating EHR data from HCOs with data from medical claims, pharmacy claims, and mortality databases. Like Research, HCOs must opt-in to link their patients, so this network is smaller but continuously growing. All patients on Linked have data sources from EHR and at least one of the 3rd party sources TriNetX has licensed. As of March 2022, the 3rd party sources include a closed claims database and multiple sources of additional mortality data. Advantages of this network include: the best longitudinal view of patients compared to other networks, no date shifting, the ability to query by continuous enrollment, and the ability to download datasets (which include cost data).
  • Regional Collaborative Networks: There are five regional collaborative networks for HCOs to anonymously pool their de-identified data. The networks are US, EMEA, LATAM, APAC, and Global. The Global Collaborative Network will be the largest network to query across TriNetX, and you can use the other networks if you’re interested in specific regions. Downloads are not allowed on these networks at this time.
  • Diamond: Diamond is a 3rd party dataset of open claims data across the US. This data is not linked to TriNetX HCOs. You may wish to use this network as datasets contain cost data and 3-digit zip codes. However, Diamond is no longer being refreshed and only contains data up to April 2020.
  • COVID-19 Research Network: TriNetX created the COVID-19 Research Network in April 2020 for HCOs across the globe to anonymously pool their data for in-platform analyses. At that time, the only network available for HCOs that included HCOs beyond their own was Research, but not all HCOs could join Research due to the dataset download functionality. At this point, the regional collaborative networks have the same functionalities and more HCOs. It’s recommended to use the regional collaborative networks instead of the COVID-19 Research Network, unless you have been previously using the COVID-19 Research Network for analyses and want to query the same network.