The past three years have been some of the most exciting of my entire career. With the support of Penn Medicine leadership, PSOM users and the many information systems employees in the School of Medicine, we have created a consolidated organization focused on the IT needs of nearly the entire school. The hard work of the PMACS team has enabled advances and investments in capabilities across all missions that are unparalleled in the majority of our peer organizations. We have improved computing infrastructure, system reliability, recoverability and security as well as expansion of the workforce. We have also won awards and achieved a higher level of integration with UPHS in support of Precision Medicine and other clinical research initiatives.
But as is always the case with information technology, our work is not finished. In fact, the accomplishments we have made to date may look easy when compared with the challenges ahead. Two of the major challenges relate to accessing the valuable information we have already collected. Two large categories of information are still too difficult for many to effectively use: clinical research registries and unstructured data.
Clinical research registries are created by research teams as a means to gather and link disparate information from multiple sources about the research subjects involved in trials or studies. Some examples include the tumor registry, the Penn Medicine bio-bank, the many Translational Centers of Excellence registries, etc. The data structures and ontologies used in these registries may not be based on industry standards or common terminologies. This results in many islands of potentially similar data that cannot be easily linked or consolidated. Almost all of the data is manually abstracted and entered into the registries by medically qualified staff. More efficient approaches that use the Epic EMR, automated data feeds and computer assisted data abstraction need to be broadly deployed.
Unstructured data is primarily textual data although image, audio and video also falls into this category. While PennSeek has enabled much faster and easier access to millions of textual documents, reports, notes, etc. within the health system, there is still a need to go beyond the powerful search capabilities in PennSeek. To provide the ultimate value from unstructured text we must programmatically parse or teach a computer to “read” the unstructured text and derive discrete data items from the text. This capability is commonly known as Natural Language Processing or NLP and is a very challenging field of information technology. If successful, the derived data items can then be stored in a database with an associated confidence factor. The data and associated confidence factors can then be queried using traditional methods in combination with other discrete data to provide additional value to users.
Both of these categories of work will require new approaches that bring together faculty and IS experts from across Penn Medicine and our peer institutions. Our new Institute for Biomedical Informatics (IBI) will bring the faculty experts together. Penn Medicine Information Systems will supply the IS experts. The challenge is daunting but the rewards are significant. Just the kind of opportunity Penn Medicine teams thrive on.
- CPU Hours – 873,971
- Disk (TB) – 1,215
- Archive (TB) – 161.2
- Total Number of Users – 237
- Total Number of users – 39
- Total Samples – 310,312
ACC Velos Statistics
- Total Studies: 2668
- Total Subjects: ~70000
- Total Active Accounts: 475
CSG (Customer Support Group) Tickets for July
- Total Tickets: 1,785
At the request of the Office of Clinical Research, a new application called the Clinical Research Staff Portal and Registry (CRSPR) is currently being re-designed. One of the key drivers behind the redesign of the application is to:
- Provide clinical researchers with a central online location where registered users can manage their own profiles, search for and connect with other users.
- Request access to other key OCR applications.
- Enable OCR Administrators to manage user requests, certification statuses, access to applications, generate reports about the population, and do group emailing based on a number of admin-selected criteria.
At this stage of development of the CRSPR application, users can register themselves in the system by providing details about their education, certifications, expertise and interests. OCR administrators can manage these users as they apply. In addition, OCR administrators have the ability to send an email (with attachments) to a customized subset of clinical research coordinators based on criteria that users themselves have specified. For example, an administrator may send email to everyone in the system who is interested in mentoring, has a specific certification and who has a BS degree. This email functionality also allows for an email to be sent to all registered CRCs, in addition to being able to email a subset of users.
A full-featured search functionality is being designed, that allows users to find one another based on specific criteria that the user has selected in their profile. This will enable users in the community to make connections with other clinical research coordinators that have similar interests or who have expertise in a specific area.
Future development will include the users' ability to request access to other OCR applications, including some applications only recently determined to be crucial to OCR's mission, like Epic research and Velos.
The goal is for the Office of Clinical Research to begin user acceptance testing at the end of July. PMACS' goal is to provide the search capability by the end of August to enable OCR to launch the first phase of the application to the general population after Labor Day, 2015.
Penn Medicine leadership created PMACS for a number of reasons, including:
- The need to eliminate redundant information technology (IT) infrastructure, cost, and personnel time by using easily provisioned resources such as virtual servers and centralized storage, administered by a centralized team that leverages economies of scale.
- A desire to continuously maintain secure and compliant IT, which is a top priority of the University’s Board of Trustees, President Gutman, and Penn Medicine leadership. This has been a very prominent goal in the last year due to breaches that have occurred at several other prominent peer institutions.
- Increasing integration between Penn Medicine’s clinical and research workflows.
Most importantly, Penn Medicine leadership formed PMACS to allow researchers to focus on their science, whether basic or computational, rather than focus on IT administration. By providing a secure IT infrastructure at reasonable cost, PMACS helps lower the total cost of ownership (TCO) for the individual researcher.
One of the “essential” services PMACS provides is data backup. A secure and sound backup infrastructure provides researchers with protection against data loss. When PMACS was formed we inherited several disparate backup solutions. This infrastructure was used to backup 300 TB of data over 200 servers. Given the disparate solutions, our management effort was becoming excessive. As a result, we needed a new system that streamlined our backup process. We evaluated solutions based on six criteria:
- Backed-up data is to be stored off-site
- A recovery operation should allow a single file to enable recovery
- Multiple retention policies are required due to the nature of the data
- Two methods of encryption should be available; in-flight and at-rest
- The system must be able to allow for expansion into a disaster recovery plan
- The solution should be a single product
Our efforts resulted in the decision to purchase CommVault’s Simpana product. To meet the off-site storage criteria, we recently completed installation of the new infrastructure at the Pennovation Works facility on Grey’s Ferry Avenue, formerly known as South Bank. The backup hardware is connected to our main servers in the Walnut St. Data Center via redundant 8GB/s fiber optic and 40GB/s Ethernet connections. The system uses a disk-to-disk-to-tape methodology allowing for minimal impact on the production infrastructure. In summary, we now have a single data backup solution that will more efficiently prevent against data loss.
The onboarding process is a multi-step endeavor which seeks to understand a lab’s current state, workflows, sample attributes, assays, and other related needs to determine how best to translate those needs into a configuration within the LIMS. The entire process, from initial analysis meeting to full go-live, varies significantly in both effort and duration based on each lab’s individual needs and current state. Some studies have taken as few as two weeks to go-live, while others have taken 3-4 months, if significant legacy data was involved.
The onboarding process follows these high-level steps:
- Initial client meeting/high-level needs analysis
- Lab walk-through/process investigation
- Data migration analysis
- Initial workflows flowcharted/client review
- Workflow refinement/data migration refinement
- Project charter signoff
- LIMS configuration completed
- Data migration staged
- Client high-level process/data migration approval
- Formal user acceptance testing (UAT) in TEST environment
- Formal regression testing in TEST environment
- Client configuration moved to PROD environment
- Lab/Study live
In some cases, the workflows and sample processes may be similar to templates other labs have needed, in which case these template workflows can be re-used/adjusted for the new lab/study, saving considerable time. Regardless of how the implementation is completed, each lab/study has security configurations in place so that the samples and associated data are viewable only to the users each lab/study identifies as authorized for each study.
Research labs/studies that are interested in using the LabVantage system should contact Jason Hughes, director of enterprise research applications, at email@example.com or 215-573-7079.
What is REDCap?
REDCap, originally developed at Vanderbilt University, is a secure web application used for building and managing online surveys and databases. The system provides audit trails for tracking data manipulation and user activity, as well as automated export procedures for seamless data downloads to Excel, PDF, and common statistical packages (SPSS, SAS, Stata). Also included are a built–in project calendar, a scheduling module, ad hoc reporting tools, and advanced features, such as branching logic, file uploading, and calculated fields. REDCap has been in use in the Perelman School and Abramson Cancer Center (ACC) for several years thanks to its introduction by Brian Wells and Dr. Mark Weiner. It is now pervasive across the entire Penn School of Medicine.
Current Usage of REDCap?
At Penn, REDCap currently has a total of 3,561 active projects with the project breakdown of 2,347 defined as research, 535 as operational support, 553 as quality improvement and 107 as “other”. To date, the REDCap database has logged over 7 million events. There are currently 4,525 user accounts.
Governance of REDCap
The governance process has the leadership of Dr. J. Richard Landis from the Center for Clinical Epidemiology and Biostatistics (CCEB). The governance process includes the creation of a REDCap governance board to help encourage use, training, and operations for this installation. The REDCap board makes its requests, actions, and reports to the Research Computing Advisory Board (RCAB), which is part of the overall Information Technology (IT) governance structure of Penn Medicine.
Current REDCap Updates
In the 2014 winter issue of this newsletter we introduced upcoming improvements to the REDCap infrastructure and application environment. These improvements are currently underway, with new REDCap features and functionality being made available to the Penn research community. We have recently completed a series of major upgrades to bring the Penn REDCap application up to the latest stable release; LTS version 6.5.3. In the near future, we will be performing another minor upgrade to bring our production REDCap version up to the STD version 6.6, the latest available update for the PENN REDCap research community.
As part of this series of updates, a formal set of operating procedures and processes have been developed to ensure we can continue to maintain REDCap at its most recent versions ongoing into the future, with both operations and support teams working together to support this product.
Support of REDCap
Currently, the Clinical Research Computing Unit (CRCU) within the CCEB provides user support, help desk activities, and management of the REDCap development environment. In conjunction with CRCU's activities, PMACS manages REDCap's technology infrastructure.
The International Classification of Diseases (ICD) is a standard coding system that provides unique codes for substantially different health diagnoses and procedures. Centers for Medicare and Medicaid Services (CMS) has mandated the transition from ICD-9 codes to ICD-10 with a transition date of October 1, 2015.
The benefits of transitioning to ICD-10 codes include:
- Measuring the quality, safety & efficacy of care
- Designing payment systems & processing claims
- Conducting research, epidemiological studies & clinical trials
- Improving clinical, financial & administrative performance
Researchers using clinical data sets from Penn Data Store (PDS) and/or PennOmics should be aware that these systems will start storing ICD-10 codes from the remediated UPHS clinical systems beginning on 10/1/15.
For more information on the ICD-10 transition:
- Please visit
- Send your questions to
- ICD-10 Readiness with Dr. C. William Hanson, III on Knowledge Link
Clinical Documenter Readiness with Dr. Hanson
- ICD-10 General Awareness eLearning on Knowledge Link
ICD-10 General Awareness eLearning
Phishing is a form of social engineering that attempts to ‘con’ users into giving up sensitive information, e.g. personally identifiable or financial information. Communications that seem to be from popular social web sites, financial institutions, well-known companies, IT support staff, etc. are used to lure people into giving up this information which could then be used to impersonate the individual for financial gain or some other reason.
Phishing is typically carried out through e-mail spoofing or instant messaging and it characteristically directs users to enter information at a fake website that looks very much like the legitimate website.
Signs That Someone Has Gone Phishing
The message contains misspelled words and/or poor grammar is used.
- Cybercriminals may get pictures, logos, URLs, and other graphics to look authentic but they don’t seem to be able to do a very good job with editing their communications.
The message is asking for personally identifiable information, such as credit card numbers, passwords, PINs or Social Security Numbers.
- A legitimate organization, e.g. bank, credit card company, IRS, IT support will never ask you for this type of information in an e-mail.
The message instructs you to “Click here”.
- Never click on a link in a suspicious message; it could be used to spread malicious software. The domain name may appear to be the real domain name but it may have been altered. If you have reason to believe that the message is real, call the organization or go out to a web browser and enter the web address but don’t click on the link in the message.
The message contains “threats” or statements that create a sense of urgency. For example: “Your account will be locked until you reply” or “We have noticed activity on your account from an unusual location.”
- Cybercriminals often use threats to get what they want. The threat could make you let your guard down and cause you to respond quickly without thinking.
Remember, you can usually trust your instincts; if you think the message is suspicious it probably is, just delete it. If you continually have an issue with phishing, please contact the PMACS Service Desk or Information Security for assistance.