As HSSC expands its focus to include a more implementation-centered approach to improve patient care and population health (see “HSSC to develop new strategic plan”), the organization is making tremendous progress in establishing a complex and powerful biomedical informatics infrastructure that will be used by health researchers throughout the state. This infrastructure underpins HSSC’s efforts to improve health care through the use of data-driven, evidence-based research.
A central aspect of this infrastructure is the clinical data warehouse (CDW). The CDW comprises a range of components, each of which will play a crucial role in managing, organizing, and providing useful access to the patient data that is generated by HSSC’s collaborating hospitals.
Back in early 2010, HSSC’s hospitals signed a memorandum of understanding agreeing to provide their data into the CDW, but the parties are still working out precisely how researchers and other health professionals will be allowed to make use of that data, which for now is simply being stored, but not accessed. That’s where a data collaboration agreement (DCA) will come into play.
“This is where the rubber hits the road,” says HSSC President and CEO Jay Moskowitz. “It’s easy to send the data to the warehouse, but when we begin to unravel it, to use it for analytics, all these questions will come up regarding things such as intellectual property, liability, data sharing, and so on.”
HSSC Chief Medical Information Officer Iain Sanderson further explains the complicated issues involved in making use of patient data.
“On one hand, there are areas that are regulated by the government, laws such as the Health Insurance Portability and Accountability Act (HIPAA), the laws around informed consent, and so on,” says Sanderson. “That’s what governs who can disclose what for research purposes. But beyond that, there are also the rules that a collaborative like HSSC would apply to its own members to determine who should see data, and what they can do with it. How do we play nice with the data between each other?”
The memorandum of understanding that the hospitals agreed to in 2010 covers basic items such as sending data and what would happen in the case of a data breach. “That MOU has allowed us to build the system, but it does not allow us to do anything with it,” says Sanderson. “That’s where we are right now.”
DCA represents the "true
meaning" of HSSC
A DCA would include three main components: (1) a formal research protocol that would govern the use of the CDW and other aspects of HSSC’s informatics infrastructure; (2) the rules that would govern researchers and others who access the data within the system (each user would sign a data use agreement); and (3) an arrangement by which HSSC can collect data for the CDW without specific consent from every patient.
The first part, the protocol, has already been approved by the institutional review board (a body that oversees research in order to protect the rights and well-being of human subjects) at each HSSC institution. Each institution has also given its approval for HSSC to collect data in the CDW without consent.
“It would be a major barrier to us if we had to ask every single patient whether or not they would give their consent to putting data into our system. We couldn’t possibly ask everybody,” says Sanderson.
Personal identifiers are removed from data in the CDW that is accessed by researchers, and Sanderson emphasizes that researchers using the CDW still have to follow all the rules of the research protocol and their institutional review board governing consent.
The part of the DCA that deals with rules governing the actual use of the system will be the “crown jewel” of the agreement, according to Sanderson.
“What are the rules for joining the club, what are the rules for leaving the club, what happens in the case of a breach, what are the liabilities associated with HSSC . . . who should have access to data? Should it be physicians? Should it be research nurses? What about medical students?” says Sanderson. “[That will all be] determined by the DCA.”
A draft agreement is currently being circulated among HSSC’s institutions for final edits and, ultimately, approval. Sanderson notes that hammering out the details of such an agreement is challenging, and for good reason.
“These hospitals are sharing one of their most precious assets,” he says. Even if they decide to quit the arrangement, they’ll have to leave their data in the system for a pre-determined number of years.
“It will be a very groundbreaking agreement,” he says, one which is central to “the true meaning of HSSC.”
The patient data in the CDW, which combines a person’s records from multiple hospitals all in one single database and makes it accessible for research purposes, is “what makes our ability to add value to patient care so powerful . . . Without the DCA, we’re no better than just a single hospital and its data. The real vision of HSSC as a collaborative is enabled by this agreement.”
More on the Clinical Data Warehouse
The first component of the CDW is a message engine, which manages streams of health care data as they are sent to the CDW in real time from all of HSSC’s collaborative hospitals. For example, when a patient is admitted to a hospital, that event would be sent as a message to the CDW, which would then incorporate that message as a piece of data within the CDW—one of many thousands of messages that are being generated by HSSC’s hospitals every hour.
The next component of the CDW is the Enterprise Master Patient Index (EPMI), which assigns a unique identifier to each patient who has data in the CDW, and ensures that each patient is accurately identified even if some of his or her personal information has changed since a previous hospital visit.
“Perhaps you got married, and your name changed,” says Sanderson. “But the system will know from the fact that you’re at the same address, or you have the same social security number, and that your first name is the same, that you are one and the same person, and not some other person.”
The EPMI uses sophisticated algorithms to correctly identify patients who could potentially be confused with others, such as twins or people who have the same name as a parent.
“Keeping a unique identifier for every individual in our system is essential if we’re going to have the best possible patient registry with the best possible data for research or patient care. We have to know who’s who,” says Sanderson.
Once a piece of data has traveled through those first two components of the CDW, it gets loaded into a Data Trust, which is the part of the CDW where all the clinical data is stored. Data from the Data Trust can then be "outputted" in one or more “data marts,” which are specialized views or subsets of data in the Data Trust. One of those subsets is called i2b2, or Informatics for Integrating Biology and the Bedside. It’s a specialized data mart that supports research discovery and hypothesis-building. HSSC has adapted i2b2 from its home institution at Partners Healthcare in Boston, where it is the cornerstone of hundreds of millions of dollars of sponsored research per year.
Accessing the data through i2b2
If a researcher wants to access data on a group of patients to perform a study, she would use the 2b2 web application. Sanderson refers to the i2b2 application as a “front door” to the CDW that makes it easier for researchers to sort through the huge store of data contained in the CDW so that they can make meaningful use of it.
Specifically, a researcher uses the i2b2 technology to browse through the i2b2 “data mart,” searching, for example, for the number of diabetics under age 16 in the Upstate. The information in the i2b2 data mart has been scrubbed of any identifying information on individual patients, to protect their privacy.