Blog Post

Challenges in Clinical Data Management: Findings from the Tufts CSDD Impact Report

February 9, 2018

Derek Lawrence, Senior Clinical Data Manager, has 9 years of data management and analysis experience in the health care/pharmaceutical industry. Derek serves as Rho’s Operational Service Leader in Clinical Data Management, an internal expert responsible for disseminating the application of new technology, best practices, and processes.

The most recent Impact Report from the Tufts Center for the Study of Drug Development presented the results of a study including nearly 260 sponsor and CRO companies into clinical data management practices and experience. A high-level summary of the findings included longer data management cycle times than those observed 10 years ago, delays in building clinical databases, a reported average of six applications to support each clinical study, and a majority of companies reporting technical challenges as it pertained to loading data into their primary electronic data capture (EDC) system.

These findings represent the challenges those of us in clinical data management are struggling with given the current state of the clinical research industry and technological changes. EDC systems are still the primary method of data capture in clinical research with 100% of sponsors and CROs reporting at least some usage. These systems are experiencing difficulties in dealing with the increases in data source diversity. More and more clinical data are being captured by new and novel applications (ePRO, wearable devices, etc.) and there is an increased capacity to work with imaging, genomic, and biomarker data. The increases in data volume and data velocity have resulted in a disconnect with the EDC paradigm. Data are either too large or are ill-formatted for import into the majority of EDC systems common to the industry. In addition, there are significant pre-study planning and technical support demands when it comes to loading data into these systems. With 77% of sponsors and CROs reporting similar barriers to effective loading, cleaning, and use of external data, the issue is one with which nearly everyone in clinical research is confronted.

Related to the issues regarding EDC integration are delays in database build. While nearly half of the build delays were attributed to protocol changes, just over 30% resulted from user acceptance testing (UAT) and database design functionality. Delays attributed to database design functionality were associated with a LPLV-to-lock cycle time that was 39% longer than the overall average. While the Tufts study did not address this directly, it would be no great stretch of the imagination to assume that the difficulties related to EDC system integration are a significant contributor to the reported database functionality issues. With there already being delays associated with loading data, standard data cleaning activities that are built into the EDC system and need to be performed before database lock would most certainly be delayed as well.

Clinical data management is clearly experiencing pains adapting to a rapidly-shifting landscape in which a portion of our current practices no longer play together nicely with advances in technology and data source diversity. All of this begs the question “What can we do to change our processes in order to accommodate these advances?” At Rho, we are confronting these challenges with a variety of approaches, beginning with limiting the impulse to automatically import all data from external vendors into our EDC systems. Configuring and updating EDC systems requires no small amount of effort on the part of database builders, statistical programmers, and other functional areas. Potential negative impacts to existing clinical data are a possibility when these updates are made as part of a database migration. At the end of the day, importing data into an EDC system results in no automatic improvement to data quality and, in some cases, actually hinders our ability to rapidly and efficiently clean the data. In developing standard processes for transforming and cleaning data external to the EDC systems, we increase flexibility in adapting to shifts in incoming data structure or format and mitigate the risk of untoward impacts to the contents of the clinical database by decreasing the prevalence of system updates.

The primary motivation for loading data received from external vendors into the EDC system is to provide a standard method of performing data cleaning activities and cross-checks against the clinical data themselves. To support this, we are developing tools to aggregate that data from a variety of sources and assemble them for data cleaning purposes. Similar to the ways the banking industry uses machine learning to identify ‘normal’ and ‘abnormal’ spending patterns and make real-time decisions to allow or decline purchases, similar algorithms can identify univariate and multivariate clusters of anomalous data for manual review. These continually-learning algorithms will enable a focused review of potentially erroneous data without the development of the traditional EDC infrastructure. This will save time performing data reviews and also identify potential issues which we would normally miss had we relied on the existing EDC model. With the future state resulting in an ever-broadening landscape of data sources and formats, an approach rooted in system agnosticism and sound statistical methodology will ensure we are always able to provide high levels of data quality.