Rho site logo

Rho Knows Clinical Research Services

Rho Participates in Innovative Graduate Student Workshop for the 8th Consecutive Time

Posted by Brook White on Thu, Aug 09, 2018 @ 09:18 AM
Share:

Petra LeBeau, ScD (@LebeauPetra) , is a Senior Biostatistician and Lead of the Bioinformatics Analytics Team at Rho. She has over 13 years of experience in providing statistical support in all areas of clinical trials and observational studies. Her experience includes 3+ years of working with genomic data sets (e.g. transcriptome and metagenome). Her current interest is in machine learning using clinical trial and high-dimensional data.

Agustin Calatroni, MS (@acalatr), is a Principal Statistical Scientist at Rho. His academic background includes a master’s degree in economics from the Université Paris 1 Panthéon-Sorbonne and a master’s degree in statistics from North Carolina State University. In the last 5 years, he has participated in a number of competitions to develop prediction models. He is particularly interested in the use of stacking models to combine several machine learning techniques into one predictive model in order to decrease the variance (bagging), bias (boosting) and improve the predictive accuracy.

At Rho, we are proud of our commitment to supporting education and fostering innovative problem-solving for the next generation of scientists, researchers, and statisticians. One way we enjoy promoting innovation is by participating in the annual Industrial Math/Stat Modeling Workshop for Graduate Students (IMSM) hosted by the National Science Foundation-supported Statistical and Applied Mathematical Sciences Institute (SAMSI).  IMSM is a 10-day program to expose graduate students in mathematics, statistics, and computational science to challenging and exciting real-world projects arising in industrial and government laboratory research.  The workshop is held in SAS Hall on the campus of North Carolina State University. This summer marked our 8th consecutive year as an IMSM Problem Presenter.  We were joined by industry leaders from Sandia National Laboratories, MIT Lincoln Laboratories, US Army Corps of Engineers (USACE), US Environmental Protection Agency (EPA) and , Savvysherpa.

samsi 2018

SAMSI participants 2018 Agustin Calatroni (first from left),Petra LeBeau (first from right), and Emily Lei Kang (second from right) with students from the SAMSI program.

Rho was represented at the 2018 workshop by investigators Agustin Calatroni and Petra LeBeau, with the assistance of Dr. Emily Lei Kang from the University of Cincinnati. Rho’s problem for this year was Visualizing and Interpreting Machine Learning Models for Liver Disease Detection. 

Machine learning (ML) interpretability is a hot topic as many tools have become available over the last couple of years (including a variety of very user-friendly ones) that are able to create pretty accurate ML models, but the constructs that could help us explain and trust these black-box models are still under development. 

The success of ML algorithms in medicine and multi-omics studies over the last decade has come as no surprise to ML researchers. This can be largely attributed to their superior predictive accuracy and their ability to work on both large volume and high-dimensional datasets. The key notion behind their performance is self-improvement. That is, these algorithms make predictions and improve them over time by analyzing mistakes made in earlier predictions and avoiding these errors in future predictions. The difficulty with this “predict and learn” paradigm is that these algorithms suffer from diminished interpretability, usually due to the high number of nonlinear interactions within the resulting models. This is often referred to as the “black-box” nature of ML methods.

In cases where interpretability is crucial, for instance in studies of disease pathologies, ad-hoc methods leveraging the strong predictive nature of these methods have to be implemented. These methods are used as aides for ML users to answer questions like: ‘why did the algorithm make certain decisions?’, ‘what variables were the most important in predictions?’, and/or ‘is the model trustworthy?’ 

The IMSM students were challenged with studying the interpretability of a particular class of ML methods called gradient boosting machines (GBM) on the prediction if a subject had liver disease or not. Rho investigators provided a curated data set and pre-built the model for the students. To construct the model, the open-source Indian Liver Patient Dataset was used which contains records of 583 liver patients from North East India (Dheeru and Karra Taniskidou, 2017). The dataset contains eleven variables: a response variable indicating disease status of the patient (416 with disease, 167 without) and ten clinical predictor variables (Age, Gender, Total Bilirubin, Direct Bilirubin, Alkaline Phosphatase, Alamine Aminotransferase, Aspartate Aminotransferase, Total Proteins, Albumin, Albumin and Globulin Ratio). The data was divided into 467 training and 116 test records for model building. 

The scope of work for the students was not to improve or optimize the performance of the GBM model but to explain and visualize the method’s intrinsic latent behavior.

The IMSM students decided to break interpretability down into two areas. Global, where the entire dataset is used for interpretation and local, where a subset of the data is used for deriving an interpretive analysis of the model. The details of these methods will be further discussed in two additional blog posts.

Rho is honored to have the opportunity to work with exceptional students and faculty to apply state of the art mathematical and statistical techniques to solve real-world problems and advance our knowledge of human diseases.

You can visit the IMSM Workshop website to learn more about the program, including the problem Rho presented and the students’ solution.


With thanks to the IMSM students Adams Kusi Appiah1, Sharang Chaudhry2, Chi Chen3, Simona Nallon4, Upeksha Perera5, Manisha Singh6, Ruyu Tan7 and advisor Dr. Emily Lei Kang from the University of Cincinnati

1Department of Biostatistics, University of Nebraska Medical Center; 2Department of Mathematical Sciences, University of Nevada, Las Vegas; 3Department of Biostatistics, State University of New York at Buffalo; 4Department of Statistics, California State University, East Bay; 5Department of Mathematics and Statistics, Sam Houston State University; 6Department of Information Science, University of Massachusetts; 7Department of Applied Mathematics, University of Colorado at Boulder

References:
Dheeru, D. and Karra Taniskidou, E. (2017). UCI machine learning repository.

What We Learned at PhUSE US Connect

Posted by Brook White on Tue, Jun 12, 2018 @ 09:40 AM
Share:

ryan-baileyRyan Bailey, MA is a Senior Clinical Researcher at Rho.  He has over 10 years of experience conducting multicenter asthma research studies, including the Inner City Asthma Consortium (ICAC) and the Community Healthcare for Asthma Management and Prevention of Symptoms (CHAMPS) project. Ryan also coordinates Rho’s Center for Applied Data Visualization, which develops novel data visualizations and statistical graphics for use in clinical trials.

Last week, PhUSE hosted its first ever US Connect conference in Raleigh, NC. Founded in Europe in 2004, the independent, non-profit Pharmaceutical Users Software Exchange has been a rapidly growing presence and influence in the field of clinical data science. While PhUSE routinely holds smaller events in the US, including their popular Computational Science Symposia and Single Day Events, this was the first time they had held a large multi-day conference with multiple work streams outside of Europe. The three-day event attracted over 580 data scientists, biostatisticians, statistical programmers, and IT professionals from across the US and around the world to focus on the theme of "Transformative Current and Emerging Best Practices."

After three days immersed in data science, we wanted to provide a round-up of some of the main themes of the conference and trends for our industry.

Emerging Technologies are already Redefining our Industry

emerging technologyIt can be hard to distinguish hype from reality when it comes to emerging technologies like big data, artificial intelligence, machine learning, and blockchain.  Those buzzwords made their way into many presentations throughout the conference, but there was more substance than I expected.  It is clear that many players in our industry (FDA included) are actively exploring ways to scale up their capabilities to wrangle massive data sets, rely on machines to automate long-standing data processing, formatting, and cleaning processes, and use distributed database technologies like blockchain to keep data secure, private, and personalized.  These technologies are not just reshaping other sectors like finance, retail, and transportation; they are well on their way to disrupting and radically changing aspects of clinical research.

The FDA is Leading the Way

Our industry has gotten a reputation for being slow to evolve, and we sometimes use the FDA as our scapegoat. Regulations take a long time to develop, formalize, and finalize, and we tend to be reluctant to move faster than regulations. However, for those that think the FDA is lagging behind in technological innovation and data science, US Connect was an eye opener. With 30 delegates at the conference and 16 presentations, the agency had a strong and highly visible presence.

Moreover, the presentations by the FDA were often the most innovative and forward-thinking. Agency presenters provided insight into how the offices of Computational Science and Biomedical Informatics are applying data science to aid in reviewing submissions for data integrity and quality, detecting data and analysis errors, and setting thresholds for technical rejection of study data. In one presentation, the FDA demonstrated its Real-time Application for Portable Interactive Devices (RAPID) to show how the agency is able to track key safety and outcomes data in real time amid the often chaotic and frantic environment of a viral outbreak. RAPID is an impressive feat of technical engineering, managing to acquire massive amounts of unstructured symptom data from multiple device types in real time, process them in the cloud, and perform powerful analytics for "rapid" decision making. It is the type of ambitious technically advanced project you expect to see coming out of Silicon Valley, not Silver Spring, MD.

It was clear that the FDA is striving to be at the forefront of bioinformatics and data science, and in turn, they are raising expectations for everyone else in the industry.

The Future of Development is "Multi-lingual"  

A common theme through all the tracks is the need to evolve beyond narrowly focused specialization in our jobs. Whereas 10-15 years ago, developing deep expertise in one functional area or one tool was a good way to distinguish yourself as a leader and bring key value to your organization, a similar approach may hinder your career in the evolving clinical research space. Instead, many presenters advocated that the data scientist of the future specialize in a few different tools and have broad domain knowledge. As keynote speaker Ian Khan put it, we need to find a way to be both specialists and generalists at the same time. Nowhere was this more prevalent than in discussions around which programming languages will dominate our industry in the years to come.

While SAS remains the go-to tool for stats programming and biostatistics, the general consensus is that knowing SAS alone will not be adequate in years to come. The prevailing languages getting the most attention for data science are R and Python. While we heard plenty of debate about which one will emerge as the more prominent, it was agreed that the ideal scenario would be to know at least one, R or Python, in addition to SAS.

We Need to Break Down Silos and Improve our Teams

data miningOn a similar note, many presenters advocated for rethinking our traditional siloed approach to functional teams. As one vice president of a major Pharma company put it, "we have too much separation in our work - the knowledge is here, but there's no crosstalk." Rather than passing deliverables between distinct departments with minimal communication, clinical data science requires taking a collaborative multi-functional approach. The problems we face can no longer be parsed out and solved in isolation. As a multi-discipline field, data science necessarily requires getting diverse stakeholders in the room and working on problems together.

As for how to achieve this collaboration, Dr. Michael Rappa delivered an excellent plenary session on how to operate highly productive data science teams based on his experience directing the Institute for Advanced Analytics at North Carolina State University. His advice bucks the traditional notion that you solve a problem by selecting the most experienced subject matter experts and putting them in a room together. Instead, he demonstrated how artfully crafted teams that value leadership skills and motivation over expertise alone can achieve incredibly sophisticated and innovative output.

Change Management is an Essential Need

Finally, multiple sessions addressed the growing need for change management skills. As the aforementioned emerging technologies force us to acquire new knowledge and skills and adapt to a changing landscape, employees will need help to deftly navigate change. When asked what skills are most important for managers to develop, a VP from a large drug manufacturer put it succinctly, "our leaders need to get really good at change management."

In summary, PhUSE US Connect is helping our industry look to the future, especially when it comes to clinical data science, but the future may be closer than we think. Data science is not merely an analytical discipline to be incorporated into our existing work; it is going to fundamentally alter how we operate and what we achieve in our trials. The question for industry is if we're paying attention and pushing ourselves to evolve in step to meet those new demands.

Webinar: Understanding the FDA Guidance on Data Standards

“This drug might be harmful!  Why was it approved?”  What the news reports fail to tell us.

Posted by Brook White on Thu, Apr 19, 2018 @ 08:39 AM
Share:

Jack Modell, MD, Vice President and Senior Medical OfficerJack Modell, MD, Vice President and Senior Medical Officer is a board-certified psychiatrist with 35 years’ of experience in clinical research and patient care including 15 years’ experience in clinical drug development. He has led successful development programs and is a key opinion leader in the neurosciences, has served on numerous advisory boards, and is nationally known for leading the first successful development of preventative pharmacotherapy for the depressive episodes of seasonal affective disorder.

David Shoemaker, PhD, Senior Vice President, R&DDavid Shoemaker, PhD, Senior Vice President R&D, has extensive experience in the preparation and filing of all types of regulatory submissions including primary responsibility for four BLAs and three NDAs.  He has managed or contributed to more than two dozen NDAs, BLAs, and MAAs and has moderated dozens of regulatory authority meetings.  

Once again, we see news of an approved medication* being linked to bad outcomes, even deaths, and the news media implores us to ask:  

drugs and biologics in the news“How could this happen?”
“Why was this drug approved?”
“Why didn’t the pharmaceutical company know this or tell us about it?”
“What’s wrong with the FDA that they didn’t catch this?”
“Why would a drug be developed and approved if it weren’t completely safe?”

And on the surface, these questions might seem reasonable.  Nobody, including the drug companies and FDA, wants a drug on the market that is unsafe, or for that matter, wants any patient not to fare well on it.  And to be very clear at the outset, in pharmaceutical development, there is no room for carelessness, dishonesty, intentionally failing to study or report suspected safety signals, exaggerating drug benefits, or putting profits above patients – and while there have been some very disturbing examples of these happening, none of this should ever be tolerated.  But we do not believe that the majority of reported safety concerns with medications are caused by any intentional misconduct or by regulators failing to do their jobs, or that a fair and balanced portrayal of a product’s risk-benefit is likely to come from media reports or public opinion alone.

While we are not in a position to speculate or comment upon the product mentioned in this article specifically, in most cases we know of where the media have reported on bad outcomes for patients taking a particular medication, the reported situations, while often true, have rarely been shown to have been the actual result of taking the medication; rather, they occurred in association with taking the medication.  There is, of course, a huge difference between these two, with the latter telling us little or nothing about whether the medication itself had anything to do with the bad outcome.  Nonetheless, the news reports, which include catchy headlines that disparage the medication (and manufacturer), almost always occur years in advance of any conclusive data on whether the medication actually causes the alleged problems; and in many cases, the carefully controlled studies that are required to determine whether the observed problems have anything directly to do with the medication eventually show that the medication either does not cause the initially reported outcomes, or might do so only very rarely.  Yet the damage has been done by the initial headlines:  patients who are benefiting from the medication stop it and get into trouble because their underlying illness becomes less well controlled, and others are afraid to start it, thus denying themselves potentially helpful – and sometimes lifesaving – therapy.  And ironically, when the carefully controlled and adequately powered studies finally do show that the medication was not, after all, causing the bad outcomes, these findings, if reported at all, rarely make the headlines. 

Medications do, of course, have real risks, some serious, and some of which might take many years to become manifest.  But why take any risk?  Who wants to take a medication that could be potentially harmful?  If the pharmaceutical companies have safety as their first priority, why would they market something that they know carries risk or for which they have not yet fully assessed all possible risks?  There’s an interesting parallel here that comes to mind.  I recently airplane-1heard an airline industry representative say that the airlines’ first priority is passenger safety.  While the U.S. major airlines have had, for decades, a truly outstanding safety record, could safety really be their first priority?  If passenger safety were indeed more important than anything else, no plane would ever leave the gate; no passengers would ever board.  No boarding, no leaving, and no one could ever possibly get hurt.  And in this scenario, no one ever flies anywhere, either.  The airlines’ first priority has to be efficient transportation, though undoubtedly followed by safety as a very close second.  Similarly, the pharmaceutical industry cannot put guaranteed safety above all else, or no medications would ever be marketed.  No medications and no one could ever get hurt.  And in this scenario, no one ever gets treated for illnesses that, without medications, often harm or kill.  In short, where we want benefit, we must accept risks, including those that may be unforeseeable, and balance these against the potential benefits.

OK then:  so bad outcomes might happen anyway and are not necessarily caused by medication, worse outcomes can happen without the medications, and we must accept some risk.  But isn’t it negligent of a pharmaceutical company to market a medication before they actually know all the risks, including the serious ones that might only happen rarely?  Well, on average, a new medicine costs nearly three-billion dollars and takes well over a decade to develop, and it is tested on up to a few thousand subjects.  But if a serious adverse event did not occur in the 3000 subjects who participated in the clinical trials to develop the medicine, does this show us that the medicine is necessarily safe and unlikely to ever harm anybody?  Unfortunately, it does not.  As can be seen by the statistical rule of three**, this can only teach us that, with 95% confidence, the true rate of such an event is between zero and 1/1000.  And while it may be comforting that a serious event is highly unlikely to occur in more than 1/1000 people who take the medication, if the true rate of this event is, let’s say, even 1/2000, there is still greater than a 90% chance that a serious adverse event will occur in at least one person among the first 5000 patients who take the medication!  Such is the nature of very low frequency events over thousands of possible ways for them to become manifest.

So why not study the new medication in 10,000 subjects before approval, so that we can more effectively rule out the chances of even rarer serious events?  There is the issue of cost, yes; but more importantly, we would now be extending the time to approval for a new medicine by several additional years, during which time far more people are likely to suffer by not having a new and needed treatment than might ever be prevented from harm by detecting a few more very rare events.  There is a good argument to be made that hurting more people by delaying the availability of a generally safe medication to treat an unmet medical need in an effort to try to ensure what might not even be possible – that all potential safety risks are known before marketing – is actually the more negligent course of action.  It is partly on this basis that the FDA has mechanisms in place (among them, breakthrough therapy, accelerated approval, and priority review) to speed the availability of medications that treat serious diseases, especially when the medications are the first available treatment or if the medication has advantages over existing treatments.  When these designations allow for a medication to be marketed with a smaller number of subjects or clinical endpoints than would be required for medications receiving standard regulatory review, it is possible that some of these medications might have more unknown risks than had they been studied in thousands of patients.  In the end, however, whatever the risks – both known and unknown – if we as a society cannot accept them, then we need to stop the development and prescribing of medicines altogether.  

*Neither of the authors nor Rho was involved in the development of the referenced product.  This post is not a comment on this particular product or the referenced report, but rather a response to much of the media coverage of marketed drugs and biologics more broadly.

**In statistical analysis, the rule of three states that if a certain event did not occur in a sample with n subjects, the interval from 0 to 3/n is a 95% confidence interval for the rate of occurrences in the population.  https://en.wikipedia.org/wiki/Rule_of_three_(statistics)  

The probability that no event with this frequency will occur in 5000 people is (1 - .005)5000, or about 0.082.

Free Webinar: Expedited Development and Approval Programs

The Future, Today: Artificial Intelligence Applications for Clinical Research

Posted by Brook White on Tue, Feb 13, 2018 @ 08:37 AM
Share:

Petra LeBeau, ScD, Senior Biostatistician and Lead of the Bioinformatics Analytics TeamPetra LeBeau, ScD, is a Senior Biostatistician and Lead of the Bioinformatics Analytics Team at Rho. She has over 13 years of experience in providing statistical support for clinical trials and observational studies, from study design to reporting. Her experience includes 3+ years of working with genomic data sets (e.g. transcriptome and metagenome). Her current interest is in machine learning using clinical trial and high-dimensional data.

Agustin Calatroni, MS, is a Principal Statistical Scientist at Rho. His academic background includes a master’s degree in economics from the Univesité Paris 1 Panthéon-Sorbonne and a master’s degree in statistics from North Carolina State University. In the last 5 years, he has participated in a number of competitions to develop prediction models. He is particularly interested in the use of stacking models to combine several machine learning techniques into one predictive model in order to decrease the variance (bagging), bias (boosting) and improve the predictive accuracy.

Derek Lawrence, Senior Clinical Data ManagerDerek Lawrence, Senior Clinical Data Manager, has 9 years of data management and analysis experience in the health care / pharmaceutical industry. Derek serves as Rho’s Operational Service Leader in Clinical Data Management, an internal expert responsible for disseminating the application of new technology, best practices, and processes.

artificial intelligence in clinical researchArtificial Intelligence (AI) may seem like rocket science, but most people use it every day without realizing it. Ride-sharing apps, airplane ticket purchasing aggregators, ATM machines, recommendations for your next eBook or superstore purchase, or the photo library within your smartphone—all these common apps use machine learning algorithms to improve the user experience.

Machine learning (ML) algorithms make predictions and, in turn, learn from their own predictions resulting in improved performance over time. ML has slowly been making its way into health research and the healthcare system due in part to an exponential growth in data stemming from new developments in technology like genomics. Rho supports many studies with large datasets including the microbiome, proteome, metabolome, and the transcriptome. The rapid growth of health-related data will continue along with the development of new methodologies like systems biology (i.e. the computational and mathematical modeling of interactions within biological systems) that leverage these data. machine learning in clinical researchML will continue to be a key enabler in these areas. The ever-increasing amounts of computational power, improvements in data storage devices, and falling computational costs have given clinical trial centers the opportunity to apply ML techniques to large and complex data which would not have been possible a decade ago. In general, ML is divided into two main types of techniques: (1) Supervised learning, in which a model is trained on known input and output data in order to predict future outputs, and (2) unsupervised learning, where instead of predicting outputs, the system tries to find naturally occurring patterns or groups within the data. In each type of ML, there a large number of existing algorithms. Example supervised learning algorithms include random forest, boosted trees, neural networks, and deep neural networks just to name a few. Similarly, unsupervised learning has a plethora of algorithms.

Lately, it has become clear that in order to substantially increase the accuracy of a predictive model, we need to use an ensemble of models. The idea behind ensembles is that by combining a diverse set of models one is able to produce a stronger, higher performing model which in turn results in better predictions. By creating an ensemble of models, we maximize the accuracy, precision, and stability of our predictions. The power of the ensemble technique can be intuited with a real-world example: In the early 20th century, the famous English statistician Francis Galton (who created the statistical concept of correlation) attended a local fair. While there, he came across a contest that involved guessing the weight of an ox. He looked around and noticed a very diverse crowd; there were people like him who maybe had little knowledge about cattle, and there were farmers and butchers whose guesses would be considered that of an expert. In general, the diverse audience ended up giving a wide variety of responses. He wondered what would happen if he took the average of all these responses, expert, and non-expert alike. What he found was that the average of all the responses was much closer to the true weight of the ox than any individual guess alone. This phenomenon has been called the “wisdom of crowds.” Similarly, today’s best prediction models are often the result of an ensemble of various models which together provide a better overall prediction accuracy than any individual one would be capable of.

As data management is concerned, the current clinical research model is centered on electronic data capture systems (EDC), in which a database is constructed that comprises the vast majority of the data for a particular study or trial. Getting all of the data into a single system involves a significant investment in the form of external data imports, redundant data entry, transcription from paper sources, transfers from electronic medical/health record systems (EMR/EHR), and the like. Additionally, the time and effort required to build, test, and validate complicated multivariate edit checks into the EDC system to help clean the data as they are entered is substantial and can only utilize data that currently exist in the EDC system itself. As data source variety increases, along with surges in data volume and data velocity, this model becomes less and less effective at identifying anomalous data.

At Rho, we are investing in talent and technology that in the near future will use ML ensemble models in the curation and maintenance of clinical databases. Our current efforts to develop tools to aggregate that data from a variety of sources will be a key enabler. Similar to the ways the banking industry uses ML to identify ‘normal’ and ‘abnormal’ spending patterns and make real-time decisions to allow or decline purchases, ML algorithms can identify univariate and multivariate clusters of anomalous data for manual review. These continually-learning algorithms will enable a focused review of potentially erroneous data without the development of the traditional EDC infrastructure, saving not only time performing data reviews but also identifying potential issues of which we would normally have been unaware.

Webinar: ePRO and Smart Devices

Challenges in Clinical Data Management: Findings from the Tufts CSDD Impact Report

Posted by Brook White on Fri, Feb 09, 2018 @ 12:24 PM
Share:

Derek Lawrence, Senior Clinical Data ManagerDerek Lawrence, Senior Clinical Data Manager, has 9 years of data management and analysis experience in the health care/pharmaceutical industry.  Derek serves as Rho's Operational Service Leader in Clinical Data Management, an internal expert responsible for disseminating the application of new technology, best practices, and processes.

The most recent Impact Report from the Tufts Center for the Study of Drug Development presented the results of a study including nearly 260 sponsor and CRO companies into clinical data management practices and experience. A high-level summary of the findings included longer data management cycle times than those observed 10 years ago, delays in building clinical databases, a reported average of six applications to support each clinical study, and a majority of companies reporting technical challenges as it pertained to loading data into their primary electronic data capture (EDC) system.

These findings represent the challenges those of us in clinical data management are struggling with given the current state of the clinical research industry and technological changes. EDC systems are still the primary method of data capture in clinical research with 100% of sponsors and CROs reporting at least some usage. These systems are experiencing difficulties in dealing with the increases in data source diversity. More and more clinical data are being captured by new and novel applications (ePRO, wearable devices, etc.) and there is an increased capacity to work with imaging, genomic, and biomarker data. The increases in data changing EDC paradigmvolume and data velocity have resulted in a disconnect with the EDC paradigm. Data are either too large or are ill-formatted for import into the majority of EDC systems common to the industry. In addition, there are significant pre-study planning and technical support demands when it comes to loading data into these systems. With 77% of sponsors and CROs reporting similar barriers to effective loading, cleaning, and use of external data, the issue is one with which nearly everyone in clinical research is confronted.

EDC integrationRelated to the issues regarding EDC integration are delays in database build. While nearly half of the build delays were attributed to protocol changes, just over 30% resulted from user acceptance testing (UAT) and database design functionality. Delays attributed to database design functionality were associated with a LPLV-to-lock cycle time that was 39% longer than the overall average. While the Tufts study did not address this directly, it would be no great stretch of the imagination to assume that the difficulties related to EDC system integration are a significant contributor to the reported database functionality issues. With there already being delays associated with loading data, standard data cleaning activities that are built into the EDC system and need to be performed before database lock would most certainly be delayed as well.

Clinical data management is clearly experiencing pains adapting to a rapidly-shifting landscape in which a portion of our current practices no longer play together nicely with advances in data-mining.jpgtechnology and data source diversity. All of this begs the question “What can we do to change our processes in order to accommodate these advances?” At Rho, we are confronting these challenges with a variety of approaches, beginning with limiting the impulse to automatically import all data from external vendors into our EDC systems. Configuring and updating EDC systems requires no small amount of effort on the part of database builders, statistical programmers, and other functional areas. Potential negative impacts to existing clinical data are a possibility when these updates are made as part of a database migration. At the end of the day, importing data into an EDC system results in no automatic improvement to data quality and, in some cases, actually hinders our ability to rapidly and efficiently clean the data. In developing standard processes for transforming and cleaning data external to the EDC systems, we increase flexibility in adapting to shifts in incoming data structure or format and mitigate the risk of untoward impacts to the contents of the clinical database by decreasing the prevalence of system updates.

The primary motivation for loading data received from external vendors into the EDC system is to provide a standard method of performing data cleaning activities and cross-checks against the clinical data themselves. To support this, we are developing tools to aggregate that data from a variety of sources and assemble them for data cleaning purposes. Similar to the ways the banking industry uses machine learning to identify ‘normal’ and ‘abnormal’ spending patterns and make real-time decisions to allow or decline purchases, similar algorithms can identify univariate and multivariate clusters of anomalous data for manual review. These continually-learning algorithms will enable a focused review of potentially erroneous data without the development of the traditional EDC infrastructure. This will save time performing data reviews and also identify potential issues which we would normally miss had we relied on the existing EDC model. With the future state resulting in an ever-broadening landscape of data sources and formats, an approach rooted in system agnosticism and sound statistical methodology will ensure we are always able to provide high levels of data quality.

Highlights from TEDMED 2017

Posted by Brook White on Tue, Nov 07, 2017 @ 04:49 PM
Share:

TEDMED TheatreLast week I had the opportunity to attend TEDMED 2017 in Palm Springs and want to share some highlights of the experience.  This certainly isn’t a comprehensive summary, but rather highlights of some of the themes that were most interesting to me.  

Understanding the Brain

In order to make significant progress on mental illness and neurological disorders, we need a better understanding of how the brain works.  Several speakers shared progress and innovation in understanding the brain.  Geneticist Steven McCarroll discussed drop-seq, an innovative method for understanding which cell types have which molecules.  Using this technology, his team has been looking at what genetic variations in individuals with schizophrenia may tell us about the underlying biology of the disease.  Chee Yeun Chang and Yumanity Therapeutics are using yeast to better understand how improper protein folding relates to brain disease.  Dan Sobek of Kernel discussed how electrical stimulation may be used to “tune” the brain both as a method for addressing brain diseases as well as increasing performance in normal brains.  Guo-Li Ming talked about creating organoids which are essentially mini-organs created using stem cells.  These organoids have been used to look at neural development and the Zika virus.  Jill Goldstein discussed sex differences in brain development and how that relates to disparities in the prevalence of some mental illnesses between sexes.  Collectively, it is amazing to see some of the progress that is being made on some very difficult diseases.

Delivering Healthcare on the Frontlines

Some of the most touching stories came from those on the frontlines of healthcare.  Dr. Farida shared her stories as the only OB-GYN left in Aleppo and what it meant to put herself and her family in danger to ensure women still had access to care.  Camilla Ventura is a Brazilian ophthalmologist who first connected ocular damage to Zika infection.  Dr. Soka Moses shared stories from the Ebola outbreak in Liberia and the challenges of delivering care with severe shortages of equipment, staff, and supplies.  Agnes Binagwaho returned to her home country of Rwanda following the genocide and told her story of rebuilding her country’s healthcare infrastructure.  Each of these stories was inspiring and a testament to humanity at its best. 

The Opioid Crisis

There were a number of talks as well as a discussion group focused on various aspects of the opioid crisis.  Perspectives were shared from law enforcement personnel, those working on harm reduction programs such as supervised injection sites, and treatment programs for addiction.  One of the most moving talks was given by Chera Kowalsky.  Chera is a librarian in the Kensington area of Philadelphia, an area that has been hit hard by the opioid crisis.  They’ve instituted an innovative program where librarians have been trained to deliver naloxone, and she shared her personal story of using naloxone to help save the life of one of the library’s visitors.  Despite the challenges posed by the crisis, it was uplifting to see the range of solutions being proposed as well as the commitment of those working on them.

The Hive

One of the most interesting aspects of TEDMED was the Hive. Each year, a selection of entrepreneurs and start-ups come to TEDMED to share their innovations in healthcare and medicine. These companies were available throughout the conference to talk with attendees. There was also a special session on day 2 where each entrepreneur had two minutes to share their vision with the audience.

Finally, perhaps the most valuable part of the experience was all of the people I had a chance to meet, each of whom is playing a unique role in the future of healthcare.

Webcharts: A Reusable Tool for Building Online Data Visualizations

Posted by Brook White on Wed, Jan 18, 2017 @ 01:39 PM
Share:

 

This is the second in a series of posts introducing open source tools Rho is developing and sharing online. Click here to learn more about Rho's open source effort.

When Rho created a team dedicated developing novel data visualization tools for clinical research, one of the group's challenges was to figure out how to scale our graphics to every trial, study, and project we work on. In particular, we were interested in providing interactive web-based graphics, which can run in a browser and allow for intuitive, real-time data exploration.

Our solution was to create Webcharts - a web-based charting library built on top of the popular Data-Driven Documents (D3) JavaScript library - to provide a simple way to create reusable, flexible, interactive charts.

Interactive Study Dashboard

interactive study dashboard--webcharts

Track key project metrics in a single view; built with Webcharts (click here for interactive version)

Webcharts allows users to compose a wide range of chart types, ranging from basic charts (e.g., scatter plots, bar charts, line charts), to intermediate designs (e.g., histograms, linked tables, custom filters), to advanced displays (e.g., project dashboards, lab results trackers, outcomes explorers, and safety timelines). Webcharts' extensible and customizable charting library allows us to quickly produce standard charts while also crafting tailored data visualizations unique to each dataset, phase of study, and project.

This flexibility has allowed us to create hundreds of custom interactive charts, including several that have been featured alongside Rho's published work. The Immunologic Outcome Explorer (shown below) was adapted from Figure 3 in the New England Journal of Medicine article, Randomized Trial of Peanut Consumption in Infants at Risk for Peanut Allergy. The chart was originally created in response to reader correspondence, and was later updated to include follow-up data in conjunction with a second article, Effect of Avoidance on Peanut Allergy after Early Peanut Consumption. The interactive version allows the user to select from 10 outcomes on the y-axis. Selections for sex, ethnicity, study population, skin prick test stratum, and peanut specific IgE at 60 and 72 months of age can be interactively chosen to filter the data and display subgroups of interest. Figure options (e.g., summary lines, box and violin plots) can be selected under the Overlays heading to alter the properties of the figure.

Immunologic Outcome Explorer

immunologic outcome explorer using webcharts


Examine participant outcomes for the LEAP study (click here for interactive version)

Because Webcharts is designed for the web, the charts require no specialized software. If you have a web browser (e.g., Firefox, Chrome, Safari, Internet Explorer) and an Internet connection, you can see the charts. Likewise, navigating the charts is intuitive because we use controls familiar to anyone who has used a web browser (radio buttons, drop-down menus, sorting, filtering, mouse interactions). A manuscript describing the technical design of Webcharts was recently published in the Journal of Open Research Software.

The decision to build for general web use was intentional. We were not concerned with creating a proprietary charting system - of which there are many - but an extensible, open, generalizable tool that could be adapted to a variety of needs. For us, that means charts to aid in the conduct of clinical trials, but the tool is not limited to any particular field or industry. We also released Webcharts open source so that other users could contribute to the tools and help us refine them.

Because they are web-based, charts for individual studies and programs are easily implemented in RhoPORTAL, our secure collaboration and information delivery portal which allows us to share the charts with study team members and sponsors while carefully limiting access to sensitive data.

Webcharts is freely available online on Rho's GitHub site. The site contains a wiki that describes the tool, an API, and interactive examples. We invite anyone to download and use Webcharts, give us feedback, and participate in its development.

View "Visualizing Multivariate Data" Video

Jeremy Wildfire, MS, Senior Biostatistician, has over ten years of experience providing statistical support for multicenter clinical trials and mechanistic studies related to asthma, allergy, and immunology.  He is the head of Rho’s Center for Applied Data Visualization, which develops innovative data visualization tools that support all phases of the biomedical research process. Mr. Wildfire also founded Rho’s Open Source Committee, which guides the open source release of dozens of Rho’s graphics tools for monitoring, exploring, and reporting data. 

Ryan Bailey, MA is a Senior Clinical Researcher at Rho.  He has over 10 years of experience conducting multicenter asthma research studies, including theInner City Asthma Consortium (ICAC) and the Community Healthcare for Asthma Management and Prevention of Symptoms (CHAMPS) project. Ryan also coordinates Rho’s Center for Applied Data Visualization, which developsnovel data visualizations and statistical graphics for use in clinical trials.

The Rise of Electronic Clinical Outcome Assessments (eCOAs) in the Age of Patient Centricity

Posted by Brook White on Tue, Dec 06, 2016 @ 10:36 AM
Share:

Lauren Neighbours, Clinical Research ScientistLauren Neighbours is a Research Scientist at Rho. She leads cross-functional project teams for clinical operations and regulatory submission programs and has over ten years of scientific writing and editing experience. Lauren has served as a project manager and lead author for multiple clinical studies across a range of therapeutic areas that use patient- and clinician-reported outcome assessments, and she worked with a company to develop a patient-reported outcome instrument evaluation package for a novel electronic clinical outcome assessment (eCOA).

Jeff Abolafia, Chief Strategist Data StandardsJeff Abolafia is a Chief Strategist for Data Standards at Rho and has been involved in clinical research for over thirty years. He is responsible for setting strategic direction and overseeing data management, data standards, data governance, and data exchange for Rho’s federal and commercial divisions. In this role, Jeff is responsible for data collection systems, data management personnel, developing corporate data standards and governance, and developing systems to ensure that data flows efficiently from study start-up to submission or publication. Jeff has also developed systems for managing, organizing, and integrating both data and metadata for submission to the FDA and other regulatory authorities.

patient centricityWith the industry-wide push towards patient-centricity, electronic clinical outcome assessments (eCOAs) have become a more widely used strategy to streamline patient data collection, provide real-time access to data (for review and monitoring), enhance patient engagement, and improve the integrity and accuracy of clinical studies.  These eCOAs are comprised of a variety of electronically captured assessments, including patient reported outcomes (PROs), clinician-reported and health-care professional assessments (ClinROs), observer reported outcomes (ObsROs), and patient performance outcomes administered by health-care professionals (PerfOs).  The main methods for collection of eCOA data include computers, smartphones, and tablets, as well as telephone systems.  While many companies have chosen to partner with eCOA vendors to provide these electronic devices for use in a clinical study, other sponsors are exploring “bring your own device (BYOD)” strategies to save costs and start-up time.  No matter what strategy is used to implement an eCOA for your clinical study, there are several factors to consider before embarking on this path.  

Designing a Study with eCOAs

The decision to incorporate an eCOA into your clinical study design is multifaceted and includes considerations such as the therapeutic area, the type of data being collected, and study design, but the choice can first be boiled down to 2 distinct concepts: 1) the need for clinical outcome data from an individual, and 2) the need for this data to be collected electronically. Thus, the benefits and challenges to eCOAs can be aligned with either or both of these concepts.

Regarding the first concept, the need for clinical outcome data should be driven by your study objectives and a cost-benefit analysis on the optimal data collection technique. Using eCOAs to collect data is undoubtedly more patient-centric than an objective measure such as body mass index (BMI), as calculated by weight and height measurements. The BMI calculation does not tell you anything about how the patient feels about their body image, or whether the use of a particular product impacts their feelings of self-worth. If the study objective is to understand the subjective impact of a product on the patient or health-care community, a well designed eCOA can be a valuable tool to capture this information. These data can tell you specific information about your product and help inform the labeling language that will be included in the package insert of your marketed product. Additionally, FDA has encouraged the use of PROs to capture certain data endpoints, such as pain intensity, from a patient population who can respond themselves (see eCOA Regulatory Considerations below). Of course, it’s important to note that the inherent subjectivity of eCOAs does come with its own disadvantages. The data is subject to more bias than other objective measures, so it’s critical to take steps to reduce bias as much as possible. Examples of ways to reduce bias include single- or double-blind trial designs, wherein the patient or assessor is not aware of the assigned treatment, and building in a control arm (e.g., placebo or active comparator) to compare eCOA outcome data across treatment groups.

Another important concept is the process for identifying and implementing the electronic modality for eCOA data collection.  Many studies still use paper methods to collect clinical outcome data, and there are cases when it may make more sense to achieve your study objectives through paper rather than electronic methods (e.g., Phase 1 studies with limited subjects).  However, several types of clinical outcome data can be collected more efficiently, at lower cost, and at higher quality with electronic approaches (e.g., diary data or daily pain scores).  From an efficiency standpoint, data can be entered directly into a device and integrated with the electronic data management system being used to maintain data collection balancing time and cost when considering paper or electronic clinical outcomes assessmentsfor the duration of the study.  This saves time (and cost) associated with site personnel printing, reviewing, interpreting, and/or transcribing data collected on paper into the electronic data management system, and it also requires less monitoring time to review and remediate data.  Additionally, paper data is often “dirty” data, with missing or incorrectly recorded data in the paper version, followed by missing or incorrectly recorded data entered into the data management system.  The eCOA allows for an almost instantaneous transfer of data that saves the upfront data entry time but also saves time and cost down the road as it reduces the effort required to address queries associated with the eCOA data.  Aside from efficiencies, eCOA methods allow for more effective patient compliance measures to be implemented in the study.  The eCOA device can be configured to require daily or weekly data entry and real-time review by site personnel prior to the next scheduled clinic visit.  Additionally, the eCOA system can send out alerts and reminders to patients (to ensure data is entered in a timely manner) and to health-care personnel (to ensure timely review and verification of data and subsequent follow-up with patients as needed).  The downsides to electronic data collection methods tend to be associated with the costs and time to implement the system at the beginning of the study.  It’s therefore essential to select an appropriate eCOA vendor  early who will work with you to design, validate, and implement the clinical assessment specifically for your study.

eCOA Regulatory Considerations

In line with the industry push for patient-focused clinical studies, recent regulatory agency guidance has encouraged the use of eCOAs to evaluate clinical outcome data.  The fifth authorization of the Prescription Drug User Fee Act (PDUFA V), which was enacted in 2012 as part of the Food and Drug Administration Safety and Innovation Act (FDASIA), included a commitment by the FDA to more systematically obtain patient input on certain diseases and their treatments.  In so doing, PDUFA V supports the use of PRO endpoints to collect data directly from the patients who participate in clinical studies but also as a way to actively engage patients in their treatment.  The 2009 FDA guidance for industry on Patient-Reported Outcome Measures: Use in Medical Product Development to Support Labeling Claims , further underscores this idea by stating “[the use] of a PRO instrument is advised when measuring a concept best known by the patient or best measured from the patient perspective.”  The 2013 Guidance for Industry on Electronic Source Data in Clinical Investigations  provides the Agency’s recommendations on “the capture, review, and retention of electronic source data” and is to be used in conjunction with the 2007 guidance on Computerized Systems Used in Clinical Investigations for all electronic data and systems used in FDA-regulated clinical studies, including eCOAs.  To support these efforts, the FDA has developed an extensive Clinical Outcome Assessment Qualification Program, which is designed to review and assess the design, validity, and reliability of a COA for  a particular use in a clinical study.  Furthermore, the newly formed Clinical Outcome Assessment Compendium  is a collated list of COAs that have been identified for particular uses in clinical studies.  The COA Compendium is further evidence of FDA’s commitment to patient-centric product development, and it provides a helpful starting point for companies looking to integrate these assessments into their clinical development programs. 

Before choosing an eCOA for your clinical development program, the following regulatory factors should be considered:

  • FDA holds COAs to the same regulatory and scientific standards as other measures used in clinical trials. Thus, it is advisable to refer to the Guidance for Industry on Patient-Reported Outcomes and the available information on the COA Assessment Qualification program and COA Compendium provided by the Agency when implementing eCOAs into your development program. If you plan to divert from currently available regulatory guidance, make sure to have a solid rationale and supporting documentation to substantiate your position.
  • The qualification of an eCOA often requires input from patients and/or health-care professionals to evaluate the effectiveness of the assessment. This input is necessary for the regulatory agency to determine whether the eCOA can accurately measure what it’s supposed to measure (validity) and to demonstrate it can measure the outcome dependably (reliability).
  • Data collected from qualified and validated eCOAs can be used to support product labeling claims. The key is to use an eCOA when it’s appropriate to do so and to make sure the eCOA supports your intended labeling claims because the instrument will be evaluated in relation to the intended use in the targeted patient population.
  • For the cases where an instrument was developed for paper based collection or an instrument is collected using multiple modes, it may be necessary to test for equivalence. This regulatory expectation is often required (especially for primary and secondary endpoints) to ensure that the electronic version of the instrument is still valid and data collected with mixed modes are comparable.

A CRO Can Help with your eCOA Strategy

CROs partner with sponsor companies to develop and execute their product development strategies.  In some cases, this involves implementing clinical outcome measures into a development program and then facilitating the interactions between the company and regulatory authorities to ensure adequate qualification of the COA prior to marketing application submission.  Whether or not you choose to engage a CRO in your development plan, consider seeking outside consultation from the experts prior to establishing your eCOA strategy to give you and your company the best chance of success.  

CROs Can Help:

  • Determine endpoints where eCOA data is appropriate
  • Determine the cost/benefit of electronic vs paper data capture
  • Determine the best mode of electronic data capture
  • Recommend eCOA vendors when appropriate
  • Perform equivalence analysis
  • Facilitate discussions with regulatory authorities
  • Manage the entire process of eCOA implementation

Webinar: ePRO and Smart Devices

Embracing Open Source as Good Science

Posted by Brook White on Wed, Nov 30, 2016 @ 09:37 AM
Share:

Ryan Bailey, Senior Clinical ResearcherRyan Bailey, MA is a Senior Clinical Researcher at Rho.  He has over 10 years of experience conducting multicenter asthma research studies, including theInner City Asthma Consortium (ICAC) and the Community Healthcare for Asthma Management and Prevention of Symptoms (CHAMPS) project. Ryan also coordinates Rho’s Center for Applied Data Visualization, which developsnovel data visualizations and statistical graphics for use in clinical trials.

open source software in clinical researchSharing. It's one of the earliest lessons your parents try to teach you - don't hoard, take turns, be generous. Sharing is a great lesson for life. Sharing is also a driving force behind scientific progress and software development. Science and software rely on communal principles of transparency, knowledge exchange, reproducibility, and mutual benefit.

The practice of open sharing or open sourcing has advanced these fields in several ways:

We also feel strongly that the impetus for open sharing is reflected in Rho's core values - especially team culture, innovation, integrity, and quality. Given our values, and given our role in conducting science and creating software, we've been exploring ways that we can be more active in the so-called "sharing economy" when it comes to our work.

One of the ways we have been fulfilling this goal is to release our statistical and data visualization tools as freely-accessible, open source libraries on GitHub. GitHub is one of the world's largest open source platforms for virtual collaboration and code sharing. GitHub allows users to actively work on their code online, from anywhere, with the opportunity to share and collaborate with other users. As a result, we not only share our code for public use, we also invite feedback, improvements, and expansions of our tools for other uses.

We released our first open source tool - the openFDA Adverse Event Explorer - in June 2015. Now we have 26 team members working on 28 public projects, and that number has been growing rapidly. The libraries and tools we've been sharing have a variety of uses: monitor safety data, track project metrics, visualize data, summarize every data variable for a project, aid with analysis, optimize SAS tools, and explore population data.

Most repositories include examples and wikis that describe the tools and how they can be used. An example of one of these tools, the Population Explorer is shown below.

Interactive Population Explorer

interactive population explorer, clinical trial graphics

Access summary data on study population and subpopulations of interest in real time.

One of over 25 public projects on Rho's GitHub page - available at: https://github.com/RhoInc/PopulationExplorer

Over the next few months, we are going to highlight a few of our different open source tools here on the blog. We invite you to check back/subscribe to learn more about the tools we're making available to the public. We also encourage you to peruse the work for yourself on our GitHub page: https://github.com/RhoInc.

We are excited to be hosting public code and instructional wikis in a format that allows free access and virtual collaboration, and hope that an innovative platform like GitHub will give us a way to share our tools with the world and refine them with community feedback. As science and software increasingly embrace open source code, we are changing the way we develop tools and optimizing the way we do clinical research while staying true to our core purpose and values.

If you have any questions or want to learn more about one of our projects, email us at: graphics@rhoworld.com

Big Data: The New Bacon

Posted by Brook White on Wed, Nov 16, 2016 @ 04:10 PM
Share:

Dr. David Hall, Senior Research ScientistDavid Hall is a bioinformatician with an expertise in the development of algorithms, software tools, and data systems for the management and analysis of large biological data sets for biotechnology and biomedical research applications. He joined Rho in June, 2014 and is currently overseeing capabilities development in the areas of bioinformatics and big biomedical data. He holds a B.S. in Computer Science from Wake Forest University and a Ph.D. in Genetics with an emphasis in Computational Biology from the University of Georgia.

big data is the new baconData is the new bacon as the saying goes. And Big Data is all the rage as people in the business world realize that you can make a lot of money by finding patterns in data that allow you to target marketing to the most likely buyers. Big Data and a type of artificial intelligence called machine learning are closely connected. Machine learning involves teaching a computer to make predictions by training it to find and exploit patterns in Big Data. Whenever you see a computer make predictions—from predicting how much a home is worth to predicting the best time to buy an airline ticket to predicting which movies you will like—Big Data and machine learning are probably behind it.

However, Big Data and machine learning are nothing new to people in the sciences. We have been collecting big datasets and looking for patterns for decades. Most people in the biomedical sciences consider the Big Data era starting in the early to mid-1990s as various genome sequencing projects ramped up. The human genome project wrapped up in 2003, took more than 10 years, and cost somewhere north of $500 million. And that was to sequence just one genome. A few years later, the 1000 Genome Project started, whose goal was to characterize genetic differences across 1000 diverse individuals so that we can predict who is susceptible to various diseases among other things. This effort was partially successful, but we learned that 1000 genomes is not enough.

cost of human genome sequencingThe cost to sequence a human genome has fallen to around $1,000. So the ambition and scale of big biomedical data has increased proportionately. Researchers in the UK are undertaking a project to sequence the genomes of 100K individuals. In the US, the Precision Medicine Initiative will sequence 1 million individuals. Combining this data with detailed clinical and health data will allow machine learning and other techniques to more accurately predict a wider range of disease susceptibilities and responses to treatments. Private companies are undertaking their own big genomic projects and are even sequencing the “microbiomes” of research participants to see what role good and bad microbes play in health.

Like Moore’s law that predicted the vast increasing in computing power, the amount of biomedical data we can collect is on a similar trajectory. Genomics data combined with electronic medical records combined with data from wearables and mobile apps combined with environmental data will one day shroud each individual in a data cloud. In the not too distant future, maybe medicine will involve feeding a patient’s data cloud to an artificial intelligence that has learned to make diagnoses and recommendations by looking through millions of other personal data clouds. It seems hard to conceive, but this is the trajectory of precision medicine. Technology has a way of sneaking up on us and the pace of change keeps getting faster. Note that the management and analysis of all of this data will be very hard. I’ll cover that in a future post.

View "Visualizing Multivariate Data" Video