Blog Post

Mining Metadata for Clinical Research Activities

July 26, 2017

Derek LawrenceDerek Lawrence, Senior Clinical Data Manager, has 9 years of data management and analysis experience in the health care/pharmaceutical industry. Derek serves as Rho’s Operational Service Leader in Clinical Data Management, an internal expert responsible for disseminating the application of new technology, best practices, and processes.

Metadata: An Underutilized Resource

As anyone involved in clinical database creation knows, considerable resources are devoted to the development and validation of electronic data capture (EDC) systems. Once these databases are live and clinical data begin coming in, various processes for setting up data cleaning programming, database quality review, and reporting are put into play. Unfortunately, most of the processes are manual and require the data managers, programmers, and biostatisticians to have a series of specific conversations concerning the database’s setup, structure, and dynamic behavior that would in turn affect how programming tasks were approached and how biostatistics should best approach the data.

The solution for not only decreasing the amount of time spent setting up these activities, but also increasing the accuracy of said setup presents itself in the effective usage of the project’s metadata. This metadata, or “data about data”, spans all elements of the clinical database, including:

  • CRF metadata
    • Labels, formats, response options, entry requirements, field-level checks, etc.
  • Form metadata
    • Source data verification (SDV), signature participation, orientation (standard vs. log), etc.
  • Event metadata
    • Visit windows, associated CRFs, repeatability, access requirements, etc.
  • Query metadata
    • Current status, dates, resolutions, marking groups, etc.

Establishing Usable Datasets

The first step in mining the metadata is to create machine-readable datasets from the source in question. In the case of most commercially- available EDC systems, the CRF and Event metadata contents of a project can be exported in a variety of formats (XML, Excel, etc.). During the nightly process by which clinical data are exported from our EDC studies and saved to the Rho network, we added a post-processing step where a macro reads in the exported study metadata files and produces working datasets. From here, these elements of the clinical database are machine-readable and available for use. Other standard EDC reports provide additional sources for Forms and Query metadata. These data can be extracted from the system either directly using an API (application programming interface) or by creating reports using EDC system-specific tools, which can be scheduled and saved to the network automatically. The contents of these reports can also be converted to datasets for ease of use.

A Wide Variety of Applications

From this point, we can automate a number of tasks that traditionally required manual review, specifications, and the application of subject matter expertise in order to successfully complete. From driving the database validation process to the creation of system performance metrics to the programming and configuration of statistical datachecks, the now-accessible metadata allows us to more rapidly and accurately initiate a multitude of tasks with much of the manual component removed. We will cover the use of some of the specific data monitoring and cleaning uses using study metadata in a series of future blog posts.