Rho site logo

Rho Knows Clinical Research Services

Maintaining Trial Integrity During COVID-19: Some Statistical Rules of Thumb

Posted by Rob Woolson and Ben Vaughn on Tue, Apr 21, 2020 @ 09:30 AM

The COVID-19 pandemic is having a substantial impact on many ongoing clinical studies in all phases of product development. Numerous difficult decisions are being made and steps are actively being taken to ensure the safe execution, or future resumption, of ongoing studies. While patient safety is paramount and should drive all study conduct related decisions, many of these decisions can impact the interpretability of estimates of efficacy at study conclusion. Changes that may seem innocuous on the surface can have a substantial impact on trial integrity, including the validity and reliability of results. Careful consideration, in consultation with a statistician, should be given to the impact that protocol changes, visit schedule amendments, collection methods, and incomplete or missing information will have on the final analysis and interpretation of results.

The FDA Guidance on Conduct of Clinical Trials of Medical Products during the COVID-19 Pandemic makes several thoughtful recommendations regarding methods to maintain the integrity of ongoing clinical studies through the COVID-19 pandemic. While the considerations raised are important to ongoing studies in all phases of clinical research, many of the issues raised take on added importance in the randomized phase 3 confirmatory trial setting. Changes to study design, assessment methods, and visit schedules, in addition to the possibility of higher rates of missing or incomplete information, may make it difficult to obtain an unbiased estimate of differences between treatment and comparator groups in these pivotal efficacy studies.

It is heartening to recognize that some of the study conduct and data-related issues we are presently confronting, including a few of the concepts discussed in the Guidance document, are not new to clinical research and are issues that investigators, protocol sponsors, and statisticians confront frequently, albeit under less difficult circumstances.

While a statistician should be consulted, we are providing some statistical rules of thumb (some are covered directly in the Guidance document) surrounding considerations related to data collection and missing/incomplete information in ongoing studies during the COVID-19 pandemic.

1. Keeping in mind that patient safety is paramount, efforts should be made to collect as much efficacy data as possible within the parameters of the current protocol. Though there will be exceptions, collecting data outside of a visit window or after treatment discontinuation is preferable to collecting no efficacy information whatsoever.

2. Changes to the protocol design may be needed to limit the amount of missing or incomplete efficacy information. However, some changes in study conduct may warrant changes to the planned primary analysis or additional sensitivity analyses.

3. It is important that the reasons for missing data, incomplete data, and patient discontinuations are captured directly, and in an easily identifiable manner, in the case report form. More specifically, this information should be collected in a manner that is readily accessible for analysis and at an appropriate resolution for the degree of missingness (e.g., instrument, visit, patient).

4. Previously unplanned analysis to assess the power of the study before continuing with enrollment may be appropriate. Mature studies which are close to planned enrollment may be sufficiently well-powered to stop early.

5. In many cases, it is likely that incomplete or missing information as a result of COVID-19 conveniently fall into the category of ignorable missing data. The plan for handling missing data due to COVID-19 should be described in the SAP. Sensitivity analyses that explore the missing data space should be planned and documented in the SAP prior to database lock.

6. Documentation in the protocol and SAP are of critical importance. For blinded studies, all decisions and changes to planned data collection, assessment, and analysis should be finalized in advance of database unblinding.

As described in the Guidance, amendments to key elements of efficacy data collection, assessment, and/or analysis should be discussed with the appropriate reviewing division. In consultation with a statistician, study sponsors should prepare now for regulatory interactions to discuss and gain agreement on any proposed changes.

robwoolsonRob Woolson, MS, JD, Chief Strategist, Biostatistics & Standards for Regulatory Submissions, has 18 years of experience as an applied statistician. Mr. Woolson brings an extensive background of statistical and project leadership experience on US and ex-US regulatory submissions, having led the biostatistical and technical aspects of 12 CDISC-compliant marketing applications, having guided the creation of ISS/ISE statistical analysis plans; integrated analysis dataset design and production; integrated display design and production; and submission-related documentation development. He has conducted statistical analyses in all phases of drug development (Phase I through IV, NDAs, and BLAs) and has led SDTM/ADaM dataset conversion projects in multiple therapeutic areas. Rob works extensively as a consultant advising sponsors on integrated statistical analysis planning, integrated database design, regulatory data submission requirements, and CDISC standards application and implementation. He has authored responses to numerous FDA queries and has represented sponsors at numerous FDA face-to-face meetings, including Advisory Committee meetings. Mr. Woolson’s educational background includes a Bachelor’s degree in mathematics from Northwestern University, a Juris Doctor degree from DePaul University, and a Master’s degree in applied statistics from DePaul University.

ben-vaughn-1Ben Vaughn, MS, RAC, Chief Strategist, Biostatistics & Protocol Design, has over twelve years of experience in clinical research. He has participated in over 25 regulatory submissions and is an expert on CDISC standards. His work has included serving as lead statistician to complete displays and datasets for ISS/ISEs and co-producing the ISS/ISE for multiple products, including six NDAs reviewed by DAAAP. Ben also co-produced the ISE for two opioid products; and provided statistical consultation, display generation and submission work for four separate products for OA knee pain. He has authored responses to FDA queries regarding NDAs, PMAs, IDEs, and SPAs and has represented sponsors in FDA meetings. In the past three years, he has supported five sponsors at DAAAP FDA advisory committee meetings. Additionally, he has represented sponsors in FDA teleconferences and face-to-face meetings for both OA knee pain products and opioid products. His analytic experience includes cross-over studies, survival analysis, non-parametrics, and extensive work with linear and non-linear repeated measure models.

Rho Participates in Innovative Graduate Student Workshop for the 8th Consecutive Time

Posted by Brook White on Thu, Aug 09, 2018 @ 09:18 AM

Petra LeBeau, ScD (@LebeauPetra) , is a Senior Biostatistician and Lead of the Bioinformatics Analytics Team at Rho. She has over 13 years of experience in providing statistical support in all areas of clinical trials and observational studies. Her experience includes 3+ years of working with genomic data sets (e.g. transcriptome and metagenome). Her current interest is in machine learning using clinical trial and high-dimensional data.

Agustin Calatroni, MS (@acalatr), is a Principal Statistical Scientist at Rho. His academic background includes a master’s degree in economics from the Université Paris 1 Panthéon-Sorbonne and a master’s degree in statistics from North Carolina State University. In the last 5 years, he has participated in a number of competitions to develop prediction models. He is particularly interested in the use of stacking models to combine several machine learning techniques into one predictive model in order to decrease the variance (bagging), bias (boosting) and improve the predictive accuracy.

At Rho, we are proud of our commitment to supporting education and fostering innovative problem-solving for the next generation of scientists, researchers, and statisticians. One way we enjoy promoting innovation is by participating in the annual Industrial Math/Stat Modeling Workshop for Graduate Students (IMSM) hosted by the National Science Foundation-supported Statistical and Applied Mathematical Sciences Institute (SAMSI).  IMSM is a 10-day program to expose graduate students in mathematics, statistics, and computational science to challenging and exciting real-world projects arising in industrial and government laboratory research.  The workshop is held in SAS Hall on the campus of North Carolina State University. This summer marked our 8th consecutive year as an IMSM Problem Presenter.  We were joined by industry leaders from Sandia National Laboratories, MIT Lincoln Laboratories, US Army Corps of Engineers (USACE), US Environmental Protection Agency (EPA) and , Savvysherpa.

samsi 2018

SAMSI participants 2018 Agustin Calatroni (first from left),Petra LeBeau (first from right), and Emily Lei Kang (second from right) with students from the SAMSI program.

Rho was represented at the 2018 workshop by investigators Agustin Calatroni and Petra LeBeau, with the assistance of Dr. Emily Lei Kang from the University of Cincinnati. Rho’s problem for this year was Visualizing and Interpreting Machine Learning Models for Liver Disease Detection. 

Machine learning (ML) interpretability is a hot topic as many tools have become available over the last couple of years (including a variety of very user-friendly ones) that are able to create pretty accurate ML models, but the constructs that could help us explain and trust these black-box models are still under development. 

The success of ML algorithms in medicine and multi-omics studies over the last decade has come as no surprise to ML researchers. This can be largely attributed to their superior predictive accuracy and their ability to work on both large volume and high-dimensional datasets. The key notion behind their performance is self-improvement. That is, these algorithms make predictions and improve them over time by analyzing mistakes made in earlier predictions and avoiding these errors in future predictions. The difficulty with this “predict and learn” paradigm is that these algorithms suffer from diminished interpretability, usually due to the high number of nonlinear interactions within the resulting models. This is often referred to as the “black-box” nature of ML methods.

In cases where interpretability is crucial, for instance in studies of disease pathologies, ad-hoc methods leveraging the strong predictive nature of these methods have to be implemented. These methods are used as aides for ML users to answer questions like: ‘why did the algorithm make certain decisions?’, ‘what variables were the most important in predictions?’, and/or ‘is the model trustworthy?’ 

The IMSM students were challenged with studying the interpretability of a particular class of ML methods called gradient boosting machines (GBM) on the prediction if a subject had liver disease or not. Rho investigators provided a curated data set and pre-built the model for the students. To construct the model, the open-source Indian Liver Patient Dataset was used which contains records of 583 liver patients from North East India (Dheeru and Karra Taniskidou, 2017). The dataset contains eleven variables: a response variable indicating disease status of the patient (416 with disease, 167 without) and ten clinical predictor variables (Age, Gender, Total Bilirubin, Direct Bilirubin, Alkaline Phosphatase, Alamine Aminotransferase, Aspartate Aminotransferase, Total Proteins, Albumin, Albumin and Globulin Ratio). The data was divided into 467 training and 116 test records for model building. 

The scope of work for the students was not to improve or optimize the performance of the GBM model but to explain and visualize the method’s intrinsic latent behavior.

The IMSM students decided to break interpretability down into two areas. Global, where the entire dataset is used for interpretation and local, where a subset of the data is used for deriving an interpretive analysis of the model. The details of these methods will be further discussed in two additional blog posts.

Rho is honored to have the opportunity to work with exceptional students and faculty to apply state of the art mathematical and statistical techniques to solve real-world problems and advance our knowledge of human diseases.

You can visit the IMSM Workshop website to learn more about the program, including the problem Rho presented and the students’ solution.

With thanks to the IMSM students Adams Kusi Appiah1, Sharang Chaudhry2, Chi Chen3, Simona Nallon4, Upeksha Perera5, Manisha Singh6, Ruyu Tan7 and advisor Dr. Emily Lei Kang from the University of Cincinnati

1Department of Biostatistics, University of Nebraska Medical Center; 2Department of Mathematical Sciences, University of Nevada, Las Vegas; 3Department of Biostatistics, State University of New York at Buffalo; 4Department of Statistics, California State University, East Bay; 5Department of Mathematics and Statistics, Sam Houston State University; 6Department of Information Science, University of Massachusetts; 7Department of Applied Mathematics, University of Colorado at Boulder

Dheeru, D. and Karra Taniskidou, E. (2017). UCI machine learning repository.

Site Investigator vs. Sponsor SAE Causality: Are they different?

Posted by Brook White on Thu, Jun 21, 2018 @ 11:25 AM

Heather Kopetskie, MS, is a Senior Biostatistician at Rho. She has over 10 years of experience in statistical planning, analysis, and reporting for Phase 1, 2 and 3 clinical trials and observational studies. Her research experience includes over 8 years focusing on solid organ and cell transplantation through work on the Immune Tolerance Network (ITN)and Clinical Trials in Organ Transplantation (CTOT) project.  In addition, Heather serves as Rho’s biostatistics operational service leader, an internal expert sharing biostatistical industry trends, best practices, processes and training.

Hyunsook Chin, MPH, is a Senior Biostatistician at Rho. She has over 10 years of experience in statistical design, analysis, and reporting for clinical trials and observational studies. Her therapeutic area experience includes: autoimmune diseases, oncology, nephrology, cardiovascular diseases, and ophthalmology. Specifically, her research experience has focused on solid organ transplantation for over 8 years on the CTOT projects. She also has several publications from research in nephrology and solid organ transplantation projects. She is currently working on several publications.

An Adverse Event (AE) is any unfavorable or unintended sign, symptom, or disease temporally associated with a study procedure or use of a drug, and does not imply any judgment about causality. An AE is considered Serious if in the view of either the investigator or sponsor, the outcome is any of the following: 

  • Death
  • Life-threatening event
  • Hospitalization (initial or prolonged)
  • Disability or permanent damage
  • Congenital anomaly/birth defect
  • Required intervention to prevent impairment or damage
  • Other important medical event

When a serious adverse event (SAE) occurs the site investigator immediately reports the event to the sponsor. Both the site investigator and the sponsor assess causality for every SAE. Causality is whether there is a reasonable possibility that the drug caused the event. The FDA believes the sponsor can better assess causality as they have access to SAE reports from multiple sites and studies along with a familiarity with the drug’s mechanism of action. When expedited SAE reports are delivered to the FDA the sponsor causality is reported instead of the site investigator’s.

complexity-resized-600Causality assessments may differ between the site investigator and sponsor. It is important to understand the difference in assessments to ensure proper reporting and conduct through a trial. For example, if stopping rules rely on causality should the sponsor’s or site investigator’s causality assessment be used? Which causality assessment should be used for DSMB and CSR reports? To better understand how to handle these situations it’s important to understand the differences.

We reviewed over 1400 SAEs from 76 studies over the last 6 years. Each SAE had causality assessed against an average of 3.8 study interventions (e.g. study medication 1, study procedure 1, etc.) for a total of over 5300 causality assessments. Related causality included definitely, possibly, and probably related while Not Related included unlikely related and unrelated. At the SAE level an SAE was considered related if at least one study intervention was determined related.

Table 1: Causality Comparisons

  Study Investigator Sponsor
Study Interventions    
Not Related 89% 81%
Related 11% 19%
Not Related 78% 67%
Related 22% 33%

Sponsors deemed more SAEs to be related to study interventions than site investigators. This relationship is maintained when looking at the breakdown of SAEs by severity with the sponsor determining a larger percentage of SAEs related to the study intervention. This also held for the majority of system organ classes reviewed. 

flowchartWhat actions can we take with this information when designing a trial?

  1. If any study stopping rules rely on causality the study team may want to consider using the sponsor causality to ensure all possible cases are captured. The biggest hurdle with this transition would be acquiring the sponsor causality in real time as it is not captured in the clinical database.
  2. For DSMB reports, if only the site investigator causality is reported the relationship to SAEs may be under reported versus the information the FDA receives. Given the sponsor more often assesses SAEs as related this is important information that should be provided to the DSMB members when evaluating the safety of the study.
  3. For clinical study reports, both SAE and non-serious adverse events are reported. The study team should determine what information they want to include. The sponsor safety assessments are not included in the clinical database but it is what the FDA receives during the conduct of the trial. Additionally, if the sponsor more often assesses SAEs as related the report may under report related SAEs if only the site investigator assessment is used in the report.
Note that these findings are based on studies Rho has supported and may not be consistent with findings from other trials/sponsors.  Additionally, in some studies the site investigator may have changed the relationship of the SAE based on discussions with the sponsor and we do not have any information to quantify how often this occurs.

Heat Maps for Database Lock

Posted by Brook White on Tue, Aug 08, 2017 @ 11:50 AM

Kristen Mason, Senior BiostatisticianKristen Mason, MS, is a Senior Biostatistician at Rho. She has over 4 years of experience providing statistical support for studies conducted under the Immune Tolerance Network (ITN) and Clinical Trials in Organ Transplantation (CTOT). She has a particular interest in data visualization, especially creating visualizations within SAS using the graph template language (GTL). 

Heather Kopetskie, Senior BiostatisticianHeather Kopetskie, MS, is a Senior Biostatistician at Rho. She has over 10 years of experience in statistical planning, analysis, and reporting for Phase 1, 2 and 3 clinical trials and observational studies. Her research experience includes over 8 years focusing on solid organ and cell transplantation through work on the Immune Tolerance Network (ITN)and Clinical Trials in Organ Transplantation (CTOT) project.  In addition, Heather serves as Rho’s biostatistics operational service leader, an internal expert sharing biostatistical industry trends, best practices, processes and training.

Preparing a database for lock can be a burdensome process. It requires coordinated effort from an entire clinical study team, including, but not limited to, the clinical data manager, study monitor, biostatistician, clinical project manager, principal investigator, and medical monitor. The team must work together to ensure the accuracy and reliability of the data, but with so many sites, subjects, visits, case report forms (CRFs), and data points it can be difficult to stay on top of the entire process. 

Using existing metadata (see Mining Metadata for Clinical Research Activities for more information on metadata) graphics can be created to visually represent the overall status of each requirement for database lock. This is possible using a graphic called a ‘heat map’ that displays the CRF metadata. The resulting graphic is shown below. 

heat map showing CRF metadata for database lock

The graphic has one row per subject and one column for each CRF collected at each visit. This results in one ‘box’ per subject per visit per CRF. Each box is colored and/or annotated to indicate the current status of each CRF. 

Broadly speaking, a quick glance at this graphic can show the clinical study team exactly how many CRFs have yet to be completed, where queries have not yet been closed, which CRFs have been source data verified, and whether or not an individual CRF has been locked.  Not to mention, all of this information can be identified for a specific subject at a specific visit for a specific CRF. 

Focusing on the details of our particular example, it is easy to see that no subject has yet initiated data entry for both Visit 4 and Visit 5. Additionally, three subjects have not started data entry for the Treatment Visit, ten for Visit 1, fifteen for Visit 2, and twenty-four for Visit 3. An open query remains for several subjects on the TRT form at the Treatment Visit, and for just subject 88528 on the PE form at the Screening Visit. A handful of forms have been source verified and no CRFs have been locked. Additionally, the graphic provides detail on the total number of subjects, visits, and CRFs for the study. This helps reveal specifics such as which visits are more burdensome with multiple CRFs and exactly how far along the subjects are in the study. 

Historically, this information has been conveyed through pages and pages of multiple listings, which can take minutes if not hours to decipher. Having all of the information in a single snapshot can help determine what steps need to be taken to get to database lock quickly and accurately. 

Further instruction on how to implement this graphic within SAS will be available soon. 

Post-Lock Data Flow: From CRF to FDA

FDA Guidance on Non-Inferiority Clinical Trials to Establish Effectiveness

Posted by Brook White on Thu, Apr 20, 2017 @ 11:42 AM

Heather Kopetskie, Senior BiostatisticianHeather Kopetskie, MS, is a Senior Biostatistician at Rho. She has over 10 years of experience in statistical planning, analysis, and reporting for Phase 1, 2 and 3 clinical trials and observational studies. Her research experience includes over 8 years focusing on solid organ and cell transplantation through work on the Immune Tolerance Network (ITN) and Clinical Trials in Organ Transplantation (CTOT) project.  In addition, Heather serves as Rho’s biostatistics operational service leader, an internal expert sharing biostatistical industry trends, best practices, processes and training.

In November 2016, the FDA released final guidance  on Non-Inferiority Clinical Trials to Establish Effectiveness providing researchers guidance on when to use non-inferiority trials to demonstrate effectiveness along with how to choose the non-inferiority margin, test the non-inferiority hypothesis, and provide interpretable results. The guidance does not provide recommendations for how to evaluate the safety of a drug using a non-inferiority trial design. This article provides background on a non-inferiority trial design along with assumptions and advantages and disadvantages of the trial design.


A non-inferiority trial is used to demonstrate a test drug is not clinically worse than an active treatment (active control) by more than a pre-specified margin (non-inferiority margin). There is no placebo arm in non-inferiority trials. A non-inferiority trial design is chosen when using a placebo arm would not be ethical because an available treatment provides an important benefit, especially for irreversible conditions (e.g. death). Without a placebo arm to compare either the test or active control against it is important to determine that the active control had its expected effect in the non-inferiority trial. If the active control had no effect in the non-inferiority trial it would not provide evidence that the test drug was effective.
The table below compares superiority with non-inferiority trials with respect to the objective and hypotheses. The effect of the test drug is ‘T’ and the effect of the active control is ‘C’. The difference tested during analyses is C – T.

  Superiority Trial Non-inferiority Trial
Objective To determine if one intervention is superior to another To determine if a test drug is not inferior to an active control intervention, by a preset margin
Null Hypothesis No difference between the two interventions The test drug (T) is inferior to the active control (C) by some margin (M) or more (C – T >= M).
Alternative Hypothesis One intervention is superior to the other The test drug (T) is inferior to the active control (C) by less than M (C-T < M)


Selecting a non-inferiority margin in a trial is challenging but also critical to a successful trial. The largest possible choice for the non-inferiority margin is the entire known effect of the active control compared to placebo, called M1. However, doing this, would lead to a finding that the test drug has an effect greater than 0. More generally, the non-inferiority margin is set to some portion of M1, called M¬2, to preserve some effect of the control drug, based on clinical judgment. For example, if a superiority trial of the active control demonstrated to be 15% better than placebo, a clinician may set the non-inferiority margin to be 9% (M1=15%, M2=9%). This would be 6% worse than the active treatment, but still 9% better than placebo.

Multiple results are possible in a non-inferiority trial as explained in the graphic below. The point estimate is indicated by the square and is the measure of C – T; the bars represent a 95% confidence interval; and ∆ is the non-inferiority margin.

non-inferiority drug trial, interpretation of results

  1. Point estimate favors test drug and both superiority and non-inferiority are demonstrated.
  2. Point estimate is 0 suggesting equal effect of active control and active treatment. The upper bound of the 95% confidence interval is below the non-inferiority margin so non-inferiority is demonstrated.
  3. The point estimate favors the active control. The upper bound of the 95% confidence interval is less than the non-inferiority margin, demonstrating non-inferiority. However, the point estimate is above zero indicating that active treatment is not as good as the active control (C – T > 0), even while meeting the non-inferiority standard.
  4. Point estimate is 0 suggesting equal effect, but the upper bound of the 95% confidence interval is greater than the non-inferiority margin so non-inferiority is not demonstrated.
  5. Point estimate favors the active control and the entire confidence interval is above the non-inferiority margin so inferiority is demonstrated.

Non-inferiority Margin

The selection of the non-inferiority margin is critical in designing a non-inferiority trial and the majority of the FDA guidance focuses on this. The non-inferiority margin is selected by reviewing historical trials of the active control. The active control must be a well-established intervention with at least one superiority trial establishing benefit over placebo. If approval of the active control was based on a single study (not unusual in the setting of risk reduction of major events such as death, stroke, and heart attack), changes in practice should be evaluated. Using the lower bound of the 95% confidence interval provides a conservative estimate of the active control effect. If multiple historical trials exist one of the assumptions of the non-inferiority trial is consistency of the effect between the historical studies and the non-inferiority trial. Therefore, if consistency isn’t present between the historical studies this can lead to problems in estimating the active control effect. Inconsistency can also sometimes lead researchers away from performing a non-inferiority trial, especially if a historical trial did not demonstrate an effect. In situations with multiple historical trials, careful review of all study results and a robust meta-analysis are crucial to selecting an appropriate non-inferiority margin.

Assay Sensitivity and Constancy Assumption

Assay sensitivity is essential to non-inferiority trials as it demonstrates that had the study included a placebo arm, the active control – placebo difference would have been at least M1. The guidance outlines three considerations when determining if a trial has assay sensitivity.

  1. Historical evidence of sensitivity to drug effects
  2. The similarity of the new non-inferiority trial to the historical trials (the constancy assumption)
  3. The quality of the new trial (ruling out defects that would tend to minimize differences between treatments)

The constancy assumption in #2 above is that the non-inferiority study is sufficiently similar to the past studies with respect to the following design features.

  • The characteristics of the patient population
  • Important concomitant medications
  • Definitions and ascertainment of study endpoints
  • Dose of active control
  • Entry criteria
  • Analytic approaches

The presence of constancy is important to evaluate. For example, if a disease definition has changed over time or the methodology used in the historical trial is outdated the constancy assumption may be violated and the use of a non-inferiority design may not be appropriate. If all the design features are similar except the patient characteristics the estimate of the size of the control effect can be adjusted if the effect size is known in the patient sub-groups.

Benefits of non-inferiority trials

  • A non-inferiority trial is useful when a placebo controlled trial is not appropriate.
  • A non-inferiority trial may also test for superiority without concern about inflating the Type I error rate with care planning of the order in which hypothesis are tested. The reverse is not true; a superiority trial cannot claim non-inferiority.

Disadvantages of non-inferiority trials

  • Must be able to demonstrate assay sensitivity and the constancy assumption hold. This is especially difficult when medical practice has changed since the superiority trial (e.g. the active control is always used with additional drugs currently).
  • When the active treatment is not well established or historical trials have shown inconsistent results choosing a non-inferiority margin proves to be difficult.
  • If the treatment effect of the active control is small, the sample size required for a non-inferiority study may not be feasible
Download: Understanding Dose Finding Studies

An Interactive Suite of Data Visualizations for Safety Monitoring

Posted by Brook White on Thu, Feb 23, 2017 @ 01:42 PM

This is the fourth in a series of posts introducing open source tools Rho is developing and sharing online. Click here to learn more about Rho's open source effort, here to read about our interactive data visualization library, Webcharts, and here to learn about SAS graphing tools we've developed.

Frequent and careful monitoring of patient safety is one of the most important concerns of any clinical trial. For the medical monitors and safety monitoring committees responsible for supervising patient well-being and ensuring product safety, this obligation requires continuous access to a variety of critical study data.

For trials with large participant enrollment, severe diseases, or complex treatments, study monitors may be tasked with reviewing thousands of data points and safety markers. Unfortunately, traditional reporting methods require monitors to comb through scores of static listings and summary tables. This method is inefficient and poses the risk that clinically-relevant signals will be obscured by the sheer volume of data common in clinical trials.

To improve safety monitoring, we created a suite of interactive data monitoring tools we call the Safety Explorer. Although the safety explorer can be configured to include a variety of charts specific to each study, the standard set-up includes 6 charts (click the links to learn more):

  • Adverse Events Explorer - dynamically query adverse event (AE) data in real time to go from study population view to individual patient records
  • Adverse Events Timeline - view interactive timelines for each participant showing when AEs occurred in a trial
  • Test Results Histogram- explore interactive histograms showing distribution of labs, vital signs, and other safety measures with linked data tables
  • Test Results Outlier Explorer - track patient trajectories over time for lab measures, vital signs, and other safety endpoints in line charts
  • Test Results Over Time - explore population averages for labs, vital signs, and other safety endpoints in box or violin plots
  • Shift Plot - monitor changes in lab measures, vital signs, and other safety endpoints between study events in a dot plot

The safety explorer utilizes common CDISC data standards to quickly create consistent charts for any project. Within a given chart, users can use filters to dynamically sort, highlight, and drill down to data points of interest using controls familiar to anyone who has used a website.

Interactive Histogram with Linked Table

interactive histogram safety data

Explore the distribution of test results (click here for interactive version)

Graphical representations of data grant reviewers a systematic snapshot of the data that helps tell the story of the information. By adding interactive elements, reviewers can quickly examine the charts for patterns of interest and drill down to subject-level data instantly. This ability to quickly distinguish signal from noise, gives monitors greater insight into their data and allows them to work much more efficiently.

It is common practice for us to create safety explorers for all full service projects and studies where Rho provides medical monitoring. All of the charts described here are open source and free to use, so please let us know if you have any feedback, or would like to contribute!

Interactive Box Plot Showing Results Over Time

interactive box plot showing results over time

Track changes in population test results through a study (click here for interactive version)

View "Visualizing Multivariate Data" Video

Ryan Bailey, Senior Clinical ResearcherRyan Bailey, MA is a Senior Clinical Researcher at Rho.  He has over 10 years of experience conducting multicenter asthma research studies, including theInner City Asthma Consortium (ICAC) and the Community Healthcare for Asthma Management and Prevention of Symptoms (CHAMPS) project. Ryan also coordinates Rho’s Center for Applied Data Visualization, which develops novel data visualizations and statistical graphics for use in clinical trials.

Using SAS to Create Novel Data Visualizations

Posted by Brook White on Tue, Feb 07, 2017 @ 12:59 PM

Ryan Bailey, Senior Clinical ResearcherRyan Bailey, MA is a Senior Clinical Researcher at Rho.  He has over 10 years of experience conducting multicenter asthma research studies, including theInner City Asthma Consortium (ICAC) and the Community Healthcare for Asthma Management and Prevention of Symptoms (CHAMPS) project. Ryan also coordinates Rho’s Center for Applied Data Visualization, which developsnovel data visualizations and statistical graphics for use in clinical trials.

Shane Rosanbalm, Senior BiostatisticianShane Rosanbalm, MS, Senior Biostatistician, has over fifteen years of experience providing statistical support for clinical trials in all phases of drug development, from Phase I studies through NDA submissions.  He has collaborated with researchers in several areas including neonatal sepsis, RA, oncology, chronic pain, hypertension, and Parkinson’s disease.  He is the lead SAS developer on Rho’s Center for Applied Data Visualization, where he develops tools and publishes on best practices for visualizing and reporting data.

This is the third in a series of posts introducing open source tools Rho is developing and sharing online. Click here to learn more about Rho's open source effort.

In our last post, we introduced Webcharts, one of our many interactive web-based charting tools that uses D3. In addition to the many web-based tools that Rho has on GitHub, we also maintain a number of SAS®-based graphics repositories. In fact, our strong reputation for clinical biostatistics and expertise with SAS (and SAS graphing tools) long predated our development of web graphics.

A sampling of some of our SAS tools is provided below, but we invite you to visit GitHub and check out our full offering of SAS tools. You can use the Find a repository... Search bar to search for "SAS". All of our SAS repositories begin with "sas-".


sas codebook

SAS codebook

The SAS codebook macro is designed to provide a quick and concise summary of every variable in a SAS dataset. In addition to information about variable names, labels, types, formats, and statistics, the macro also produces a small graphic showing the distribution of values for each variable. This report is a convenient way to provide a snapshot of your data and quickly get to know a new dataset.

Violin Plot

violin plot

The SAS violin plot macro is designed to allow for a quick assessment of how the distribution of a variable changes from one group to another. Think of it as a souped-up version of a box and whisker plot. In addition to seeing the median, quartiles, and min/max, you also get to see all of the individual data points as well as the density curves associated with the distributions.

Sankey Bar Chart

sankey bar chart

The SAS Sankey bar chart macro is an enhancement of a traditional stacked bar chart. In addition to showing how many subjects are in each category over time, this graphic also shows you how subjects transition from one category to another over time.

Other SAS graphics tools include a Beeswarm Plot (a strip plot with non-random jittering) and the Axis Macro for automating the selection of axis ranges for continuous variables. We are adding new SAS repositories frequently. We invite you to try the tools, share your feedback, and contribute to the development of the tools.

Visit Rho's Center for Applied Data Visualization

Webcharts: A Reusable Tool for Building Online Data Visualizations

Posted by Brook White on Wed, Jan 18, 2017 @ 01:39 PM


This is the second in a series of posts introducing open source tools Rho is developing and sharing online. Click here to learn more about Rho's open source effort.

When Rho created a team dedicated developing novel data visualization tools for clinical research, one of the group's challenges was to figure out how to scale our graphics to every trial, study, and project we work on. In particular, we were interested in providing interactive web-based graphics, which can run in a browser and allow for intuitive, real-time data exploration.

Our solution was to create Webcharts - a web-based charting library built on top of the popular Data-Driven Documents (D3) JavaScript library - to provide a simple way to create reusable, flexible, interactive charts.

Interactive Study Dashboard

interactive study dashboard--webcharts

Track key project metrics in a single view; built with Webcharts (click here for interactive version)

Webcharts allows users to compose a wide range of chart types, ranging from basic charts (e.g., scatter plots, bar charts, line charts), to intermediate designs (e.g., histograms, linked tables, custom filters), to advanced displays (e.g., project dashboards, lab results trackers, outcomes explorers, and safety timelines). Webcharts' extensible and customizable charting library allows us to quickly produce standard charts while also crafting tailored data visualizations unique to each dataset, phase of study, and project.

This flexibility has allowed us to create hundreds of custom interactive charts, including several that have been featured alongside Rho's published work. The Immunologic Outcome Explorer (shown below) was adapted from Figure 3 in the New England Journal of Medicine article, Randomized Trial of Peanut Consumption in Infants at Risk for Peanut Allergy. The chart was originally created in response to reader correspondence, and was later updated to include follow-up data in conjunction with a second article, Effect of Avoidance on Peanut Allergy after Early Peanut Consumption. The interactive version allows the user to select from 10 outcomes on the y-axis. Selections for sex, ethnicity, study population, skin prick test stratum, and peanut specific IgE at 60 and 72 months of age can be interactively chosen to filter the data and display subgroups of interest. Figure options (e.g., summary lines, box and violin plots) can be selected under the Overlays heading to alter the properties of the figure.

Immunologic Outcome Explorer

immunologic outcome explorer using webcharts

Examine participant outcomes for the LEAP study (click here for interactive version)

Because Webcharts is designed for the web, the charts require no specialized software. If you have a web browser (e.g., Firefox, Chrome, Safari, Internet Explorer) and an Internet connection, you can see the charts. Likewise, navigating the charts is intuitive because we use controls familiar to anyone who has used a web browser (radio buttons, drop-down menus, sorting, filtering, mouse interactions). A manuscript describing the technical design of Webcharts was recently published in the Journal of Open Research Software.

The decision to build for general web use was intentional. We were not concerned with creating a proprietary charting system - of which there are many - but an extensible, open, generalizable tool that could be adapted to a variety of needs. For us, that means charts to aid in the conduct of clinical trials, but the tool is not limited to any particular field or industry. We also released Webcharts open source so that other users could contribute to the tools and help us refine them.

Because they are web-based, charts for individual studies and programs are easily implemented in RhoPORTAL, our secure collaboration and information delivery portal which allows us to share the charts with study team members and sponsors while carefully limiting access to sensitive data.

Webcharts is freely available online on Rho's GitHub site. The site contains a wiki that describes the tool, an API, and interactive examples. We invite anyone to download and use Webcharts, give us feedback, and participate in its development.

View "Visualizing Multivariate Data" Video

Jeremy Wildfire, MS, Senior Biostatistician, has over ten years of experience providing statistical support for multicenter clinical trials and mechanistic studies related to asthma, allergy, and immunology.  He is the head of Rho’s Center for Applied Data Visualization, which develops innovative data visualization tools that support all phases of the biomedical research process. Mr. Wildfire also founded Rho’s Open Source Committee, which guides the open source release of dozens of Rho’s graphics tools for monitoring, exploring, and reporting data. 

Ryan Bailey, MA is a Senior Clinical Researcher at Rho.  He has over 10 years of experience conducting multicenter asthma research studies, including theInner City Asthma Consortium (ICAC) and the Community Healthcare for Asthma Management and Prevention of Symptoms (CHAMPS) project. Ryan also coordinates Rho’s Center for Applied Data Visualization, which developsnovel data visualizations and statistical graphics for use in clinical trials.

Embracing Open Source as Good Science

Posted by Brook White on Wed, Nov 30, 2016 @ 09:37 AM

Ryan Bailey, Senior Clinical ResearcherRyan Bailey, MA is a Senior Clinical Researcher at Rho.  He has over 10 years of experience conducting multicenter asthma research studies, including theInner City Asthma Consortium (ICAC) and the Community Healthcare for Asthma Management and Prevention of Symptoms (CHAMPS) project. Ryan also coordinates Rho’s Center for Applied Data Visualization, which developsnovel data visualizations and statistical graphics for use in clinical trials.

open source software in clinical researchSharing. It's one of the earliest lessons your parents try to teach you - don't hoard, take turns, be generous. Sharing is a great lesson for life. Sharing is also a driving force behind scientific progress and software development. Science and software rely on communal principles of transparency, knowledge exchange, reproducibility, and mutual benefit.

The practice of open sharing or open sourcing has advanced these fields in several ways:

We also feel strongly that the impetus for open sharing is reflected in Rho's core values - especially team culture, innovation, integrity, and quality. Given our values, and given our role in conducting science and creating software, we've been exploring ways that we can be more active in the so-called "sharing economy" when it comes to our work.

One of the ways we have been fulfilling this goal is to release our statistical and data visualization tools as freely-accessible, open source libraries on GitHub. GitHub is one of the world's largest open source platforms for virtual collaboration and code sharing. GitHub allows users to actively work on their code online, from anywhere, with the opportunity to share and collaborate with other users. As a result, we not only share our code for public use, we also invite feedback, improvements, and expansions of our tools for other uses.

We released our first open source tool - the openFDA Adverse Event Explorer - in June 2015. Now we have 26 team members working on 28 public projects, and that number has been growing rapidly. The libraries and tools we've been sharing have a variety of uses: monitor safety data, track project metrics, visualize data, summarize every data variable for a project, aid with analysis, optimize SAS tools, and explore population data.

Most repositories include examples and wikis that describe the tools and how they can be used. An example of one of these tools, the Population Explorer is shown below.

Interactive Population Explorer

interactive population explorer, clinical trial graphics

Access summary data on study population and subpopulations of interest in real time.

One of over 25 public projects on Rho's GitHub page - available at: https://github.com/RhoInc/PopulationExplorer

Over the next few months, we are going to highlight a few of our different open source tools here on the blog. We invite you to check back/subscribe to learn more about the tools we're making available to the public. We also encourage you to peruse the work for yourself on our GitHub page: https://github.com/RhoInc.

We are excited to be hosting public code and instructional wikis in a format that allows free access and virtual collaboration, and hope that an innovative platform like GitHub will give us a way to share our tools with the world and refine them with community feedback. As science and software increasingly embrace open source code, we are changing the way we develop tools and optimizing the way we do clinical research while staying true to our core purpose and values.

If you have any questions or want to learn more about one of our projects, email us at: graphics@rhoworld.com

Big Data: The New Bacon

Posted by Brook White on Wed, Nov 16, 2016 @ 04:10 PM

Dr. David Hall, Senior Research ScientistDavid Hall is a bioinformatician with an expertise in the development of algorithms, software tools, and data systems for the management and analysis of large biological data sets for biotechnology and biomedical research applications. He joined Rho in June, 2014 and is currently overseeing capabilities development in the areas of bioinformatics and big biomedical data. He holds a B.S. in Computer Science from Wake Forest University and a Ph.D. in Genetics with an emphasis in Computational Biology from the University of Georgia.

big data is the new baconData is the new bacon as the saying goes. And Big Data is all the rage as people in the business world realize that you can make a lot of money by finding patterns in data that allow you to target marketing to the most likely buyers. Big Data and a type of artificial intelligence called machine learning are closely connected. Machine learning involves teaching a computer to make predictions by training it to find and exploit patterns in Big Data. Whenever you see a computer make predictions—from predicting how much a home is worth to predicting the best time to buy an airline ticket to predicting which movies you will like—Big Data and machine learning are probably behind it.

However, Big Data and machine learning are nothing new to people in the sciences. We have been collecting big datasets and looking for patterns for decades. Most people in the biomedical sciences consider the Big Data era starting in the early to mid-1990s as various genome sequencing projects ramped up. The human genome project wrapped up in 2003, took more than 10 years, and cost somewhere north of $500 million. And that was to sequence just one genome. A few years later, the 1000 Genome Project started, whose goal was to characterize genetic differences across 1000 diverse individuals so that we can predict who is susceptible to various diseases among other things. This effort was partially successful, but we learned that 1000 genomes is not enough.

cost of human genome sequencingThe cost to sequence a human genome has fallen to around $1,000. So the ambition and scale of big biomedical data has increased proportionately. Researchers in the UK are undertaking a project to sequence the genomes of 100K individuals. In the US, the Precision Medicine Initiative will sequence 1 million individuals. Combining this data with detailed clinical and health data will allow machine learning and other techniques to more accurately predict a wider range of disease susceptibilities and responses to treatments. Private companies are undertaking their own big genomic projects and are even sequencing the “microbiomes” of research participants to see what role good and bad microbes play in health.

Like Moore’s law that predicted the vast increasing in computing power, the amount of biomedical data we can collect is on a similar trajectory. Genomics data combined with electronic medical records combined with data from wearables and mobile apps combined with environmental data will one day shroud each individual in a data cloud. In the not too distant future, maybe medicine will involve feeding a patient’s data cloud to an artificial intelligence that has learned to make diagnoses and recommendations by looking through millions of other personal data clouds. It seems hard to conceive, but this is the trajectory of precision medicine. Technology has a way of sneaking up on us and the pace of change keeps getting faster. Note that the management and analysis of all of this data will be very hard. I’ll cover that in a future post.

View "Visualizing Multivariate Data" Video