Rho site logo

Rho Knows Clinical Research Services

Brook White

Recent Posts

Rho Participates in Innovative Graduate Student Workshop for the 8th Consecutive Time

Posted by Brook White on Thu, Aug 09, 2018 @ 09:18 AM
Share:

Petra LeBeau, ScD (@LebeauPetra) , is a Senior Biostatistician and Lead of the Bioinformatics Analytics Team at Rho. She has over 13 years of experience in providing statistical support in all areas of clinical trials and observational studies. Her experience includes 3+ years of working with genomic data sets (e.g. transcriptome and metagenome). Her current interest is in machine learning using clinical trial and high-dimensional data.

Agustin Calatroni, MS (@acalatr), is a Principal Statistical Scientist at Rho. His academic background includes a master’s degree in economics from the Université Paris 1 Panthéon-Sorbonne and a master’s degree in statistics from North Carolina State University. In the last 5 years, he has participated in a number of competitions to develop prediction models. He is particularly interested in the use of stacking models to combine several machine learning techniques into one predictive model in order to decrease the variance (bagging), bias (boosting) and improve the predictive accuracy.

At Rho, we are proud of our commitment to supporting education and fostering innovative problem-solving for the next generation of scientists, researchers, and statisticians. One way we enjoy promoting innovation is by participating in the annual Industrial Math/Stat Modeling Workshop for Graduate Students (IMSM) hosted by the National Science Foundation-supported Statistical and Applied Mathematical Sciences Institute (SAMSI).  IMSM is a 10-day program to expose graduate students in mathematics, statistics, and computational science to challenging and exciting real-world projects arising in industrial and government laboratory research.  The workshop is held in SAS Hall on the campus of North Carolina State University. This summer marked our 8th consecutive year as an IMSM Problem Presenter.  We were joined by industry leaders from Sandia National Laboratories, MIT Lincoln Laboratories, US Army Corps of Engineers (USACE), US Environmental Protection Agency (EPA) and , Savvysherpa.

samsi 2018

SAMSI participants 2018 Agustin Calatroni (first from left),Petra LeBeau (first from right), and Emily Lei Kang (second from right) with students from the SAMSI program.

Rho was represented at the 2018 workshop by investigators Agustin Calatroni and Petra LeBeau, with the assistance of Dr. Emily Lei Kang from the University of Cincinnati. Rho’s problem for this year was Visualizing and Interpreting Machine Learning Models for Liver Disease Detection. 

Machine learning (ML) interpretability is a hot topic as many tools have become available over the last couple of years (including a variety of very user-friendly ones) that are able to create pretty accurate ML models, but the constructs that could help us explain and trust these black-box models are still under development. 

The success of ML algorithms in medicine and multi-omics studies over the last decade has come as no surprise to ML researchers. This can be largely attributed to their superior predictive accuracy and their ability to work on both large volume and high-dimensional datasets. The key notion behind their performance is self-improvement. That is, these algorithms make predictions and improve them over time by analyzing mistakes made in earlier predictions and avoiding these errors in future predictions. The difficulty with this “predict and learn” paradigm is that these algorithms suffer from diminished interpretability, usually due to the high number of nonlinear interactions within the resulting models. This is often referred to as the “black-box” nature of ML methods.

In cases where interpretability is crucial, for instance in studies of disease pathologies, ad-hoc methods leveraging the strong predictive nature of these methods have to be implemented. These methods are used as aides for ML users to answer questions like: ‘why did the algorithm make certain decisions?’, ‘what variables were the most important in predictions?’, and/or ‘is the model trustworthy?’ 

The IMSM students were challenged with studying the interpretability of a particular class of ML methods called gradient boosting machines (GBM) on the prediction if a subject had liver disease or not. Rho investigators provided a curated data set and pre-built the model for the students. To construct the model, the open-source Indian Liver Patient Dataset was used which contains records of 583 liver patients from North East India (Dheeru and Karra Taniskidou, 2017). The dataset contains eleven variables: a response variable indicating disease status of the patient (416 with disease, 167 without) and ten clinical predictor variables (Age, Gender, Total Bilirubin, Direct Bilirubin, Alkaline Phosphatase, Alamine Aminotransferase, Aspartate Aminotransferase, Total Proteins, Albumin, Albumin and Globulin Ratio). The data was divided into 467 training and 116 test records for model building. 

The scope of work for the students was not to improve or optimize the performance of the GBM model but to explain and visualize the method’s intrinsic latent behavior.

The IMSM students decided to break interpretability down into two areas. Global, where the entire dataset is used for interpretation and local, where a subset of the data is used for deriving an interpretive analysis of the model. The details of these methods will be further discussed in two additional blog posts.

Rho is honored to have the opportunity to work with exceptional students and faculty to apply state of the art mathematical and statistical techniques to solve real-world problems and advance our knowledge of human diseases.

You can visit the IMSM Workshop website to learn more about the program, including the problem Rho presented and the students’ solution.


With thanks to the IMSM students Adams Kusi Appiah1, Sharang Chaudhry2, Chi Chen3, Simona Nallon4, Upeksha Perera5, Manisha Singh6, Ruyu Tan7 and advisor Dr. Emily Lei Kang from the University of Cincinnati

1Department of Biostatistics, University of Nebraska Medical Center; 2Department of Mathematical Sciences, University of Nevada, Las Vegas; 3Department of Biostatistics, State University of New York at Buffalo; 4Department of Statistics, California State University, East Bay; 5Department of Mathematics and Statistics, Sam Houston State University; 6Department of Information Science, University of Massachusetts; 7Department of Applied Mathematics, University of Colorado at Boulder

References:
Dheeru, D. and Karra Taniskidou, E. (2017). UCI machine learning repository.

Culture Fit Interviews: What Are They and Why Do We Do Them?

Posted by Brook White on Tue, Jul 17, 2018 @ 09:53 AM
Share:

If you’ve ever interviewed for a job at Rho, you know that one part of our process is a little different from what many other companies do.  Each prospective employee goes through a culture fit interview.  So, what is a culture fit interview? (And equally important, what isn’t a culture fit interview?)  Why do we think they are important?

What it is

Rho team  members Liz, Daniel, and SeanThe purpose of the culture fit interview is to make sure that each employee we bring on board shares and embodies the same values that we do.  You can read more about our core values here.  These interviews are conducted by a select set of senior leaders who have been with the company for quite a while.  The interviews do not assess skills or technical qualifications, and, generally speaking, won’t be performed by someone who shares the same expertise as you do.

We use the same bank of questions for all culture fit interviews whether you are applying for an entry level position straight out of college or a senior leadership position.  These questions ask for examples or stories from your past experience that assess qualities that we think are important—ability to work as part of a team, to think critically and creatively when solving problems, to communicate effectively, and  to demonstrate integrity.

What it isn’t

We recognize that one of the dangers of this type of screening is that it provides an opportunity to weed out candidates that aren’t “just like us.”  That is not what we are doing.  We value diversity of all kinds—demographic diversity, diversity of experience, and diversity of perspective.  We are not looking to create a homogeneous workplace where everyone thinks and acts the same.  

We are, however, looking to select candidates that can succeed and thrive in our workplace.  From experience, we’ve identified some of the attributes that can make otherwise similar candidates succeed or fail at Rho.  There are people who are highly skilled and who can be highly successful in other corporate climates who won’t do well here.  We owe it to them and the people who would work with them to try and identify them ahead of time.

In addition to the qualities listed above, there are aspects of our environment that can cause otherwise successful professionals struggle here at Rho.  Rho has a very flat organization structure that relies heavily on project teams’ ability to execute in a fairly independent way.  That allows a high degree of autonomy but also creates higher expectations for collaboration and communication.

Some people love this—they get a great deal of say in both what work they are doing and how they do it.  They don’t feel micromanaged and they enjoy close collaboration with their teams.  Some people don’t love it—some people prefer more firm direction and less fluid hierarchies.  If you need a lot of structure and close oversight from a supervisor to be successful, this may not be the best environment for you.  If you don’t like being part of a self-directing team and want a manager to negotiate your work priorities and interactions with other groups, this may not be the best environment for you.  There’s nothing wrong with that!  There are plenty of places that operate that way, but Rho is not one of them.

Why we do it

Rho super heroesWe believe our employees are our greatest asset.   Attracting and retaining the most talented employees is critical to our success, so we put a huge emphasis on selecting the right people to join us and maintaining a culture where talented people want to stay long-term.  

A number of years ago, we went through a period of accelerated growth where we hired a large number of people very quickly.  Despite carefully vetting the technical capabilities of these individuals, a high percentage failed to succeed here.  We began to experience a lot of turnover—a new and unpleasant problem.  The culture and work environment began to drift from what had made us successful and what had made many of our long-term employees so excited about working here.  It took a lot of effort to correct that drift and stop the turnover, but we did it—and we don’t want to have to repeat that effort.  

We now view maintaining our culture as another key component to continued success.  Culture fit interviews are one way we do this.  It is a significant investment we are making—it takes a substantial amount of time to conduct these interviews and it means we sometimes can’t grow as quickly as we might otherwise.  It is also a step in the selection process that we take very seriously.  We never skip this step, and we don’t make an offer to a candidate unless the culture fit interviewer is satisfied.

How can you prepare?

rho_portraits_Spencer-080Are you interested in working at Rho, but this part of the interview process makes you nervous?  Here’s some advice to help you prepare.  This isn’t supposed to be a “gotcha” process.  It is supposed to help us—and you—evaluate whether this is a working environment where you can be successful.

Start by reviewing our core values.  All of the questions we ask directly relate to these values.  Think about examples and stories from your past experiences that demonstrate your strengths in relationship to each of these values.  Think about some examples that show:

  • Times when you’ve gone above and beyond to help your team or a coworker succeed
  • Clever ways you’ve solved complicated problems
  • Situations where your integrity has been tested
  • Ways you’ve ensured the quality of your work

Don’t worry if you don’t have a lot of work experience to draw from.  We’ve had plenty of early career candidates who have answered our questions with examples from school projects, internships, volunteer experiences, and extracurricular activities.  

Interested in learning more about working at Rho?  Find out more about why Rho is a great place to work or meet some of the interesting people you could be working with.

View Opportunities

Age Diversity in Clinical Trials: Addressing the Unmet Need

Posted by Brook White on Tue, Jul 10, 2018 @ 09:23 AM
Share:

Ryan2Ryan Bailey, MA is a Senior Clinical Researcher at Rho.  He has over 10 years of experience conducting multicenter asthma research studies, including the Inner City Asthma Consortium (ICAC) and the Community Healthcare for Asthma Management and Prevention of Symptoms (CHAMPS) project. Ryan also coordinates Rho’s Center for Applied Data Visualization, which develops novel data visualizations and statistical graphics for use in clinical trials.

elderly patient with nurseIn a recent New York Times article, Paula Span raises the concern that elderly subjects are frequently omitted from clinical trials.  Consequently, physicians know very little about how a given treatment may affect their older patients.  Is a medication effective for the elderly?  Is it safe?  Without data, how is a physician to know?  

Span’s article is timely and aligns well with similar industry trends toward increased patient centricity and trial diversity.  Yet, expanding trials to include older patients poses a challenge for research teams because it brings two tenets of quality research into conflict with one another – representative study populations and patient safety.  

The fundamental assumption of clinical trials research is that we can take data from a relatively small, representative selection of subjects and generalize the results to the larger patient population.  If our sample is too constrained or poorly selected, we hinder the broad applicability of our results.  This is not merely a statistical concern, but an ethical one.  Unfortunately, our industry has long struggled with underrepresentation of important demographic groups, especially women, racial and ethnic minorities, and the elderly. 

At the same time, researchers are keenly concerned about protecting subject safety in trials.  Good Clinical Practice is explicit on this point: 

2.3 The rights, safety, and well-being of the trial subjects are the most important considerations and should prevail over interests of science and society.

Such guidance has engendered broad reluctance to conduct trials in what we deem “vulnerable populations,” namely children, pregnant, and the elderly.  The risk of doing more harm than good in these patient groups often leads us to play it safe and exclude these populations from trials.  Span, however, provides an astute counterpoint: expecting providers to prescribe a medication to a group of patients who were not included in the original research is equally irresponsible.  

No case illuminates the challenging catch-22 we face like the awful thalidomide debacle of the 1950s-60s.  Thalidomide, which was widely regarded as safe, was prescribed off-label for pregnant women to treat morning sickness.  Tragically, the drug was later linked to severe birth defects and banned for expecting mothers.

On one hand, the physicians prescribing thalidomide did so based on limited knowledge of the drug’s safety in pregnant women.  Had a trial had been conducted that demonstrated the risk to children, they would clearly know not to prescribe it to expecting mothers.  Yet, the very risk of such dangerous complications is why such trials are not conducted in vulnerable populations in the first place.  Risks for the elderly are different than for pregnant women, but the principal of protecting sensitive populations is the same.  

Span notes that even in studies that don’t have an explicit age cap, many protocols effectively bar elderly participants via strict exclusion criteria that prevent participation by people with disorders, disabilities, limited life expectancy, cognitive impairment, or those in nursing homes.  It must be stated, however, that the reason for such conditions is not to be obstinately exclusive but to reduce confounding variables and minimize risks to vulnerable patients.  In most cases, it would be patently unethical to conduct research on someone with cognitive impairment or in a nursing home where they may be unable to give adequate informed consent, or they may feel coerced to participate in order to continue receiving care.

So, how do we negotiate this apparent impasse?  Span offers a few general suggestions for increased inclusion, including restructuring studies and authorizing the FDA to require and incentivize the inclusion of older adults.  Changing the laws and enforcement can certainly drive change, but what can we do in the near term, short of legislative intervention?  

elderlycoupleA few quick suggestions:

  1. Reconsider age limits and avoid an all-or-none mentality to enrolling geriatric subjects.  The mindset that older adults are, as a whole, too vulnerable to enroll is usually an overreach.  In most cases, age limits are imposed as a convenience for the study, not a necessity.  Instead, consider evaluating eligibility on a subject-by-subject basis, which will still allow exclusion of patients deemed too frail, risky, or comorbid for the trial.  
  2. Actively recruit older subjects. The lack of geriatric patients in our trials is a result of many years of both passively and actively excluding them, so effort is needed to reverse these trends.  Beyond recruitment for an individual trial, researchers and providers should seek to educate older adults about clinical research.  Many elderly patients may be research-naïve – unfamiliar with clinical trials and how to participate, or unaware of available trials in their area.  
  3. Learn from other efforts to recruit marginalized populations.  As we’ve shared previously, improving trial diversity starts with an effort to thoroughly understand your patient population and their needs, and reduce obstacles to their participation.  
  4. Engage patient advocacy groups that focus on elderly patients.  Ask how trials can be better designed to meet their needs and include them.  Partner with these groups to aid in information sharing and outreach.
  5. Learn what is already expected from agencies like the FDA and NIH when it comes to inclusivity. 
    1. Span alludes to a recent NIH policy revision (stemming from the 21st Century Cures Act) that will require new NIH grantees to have a plan for including children and older adults in their research.
    2. In 2012, the Food and Drug Administration Safety and Innovation Act (FDASIA) required the FDA to create an action plan to improve data quality and completeness for demographic subgroups (sex, age, race, and ethnicity) in applications for medical products. 
  6. Design studies to examine effectiveness (demonstrating that a treatment produces desired results in ‘real world’ circumstances) not just efficacy (demonstrating that a treatment produces desired results in ideal conditions).  This is probably the most labor intensive because it requires additional investment beyond the typical Phase III randomized controlled clinical trial.  Yet, it is becoming more common to explore effectiveness through pragmatic trials, Phase IV studies, and post-market surveillance.   

Site Investigator vs. Sponsor SAE Causality: Are they different?

Posted by Brook White on Thu, Jun 21, 2018 @ 11:25 AM
Share:

Heather Kopetskie, MS, is a Senior Biostatistician at Rho. She has over 10 years of experience in statistical planning, analysis, and reporting for Phase 1, 2 and 3 clinical trials and observational studies. Her research experience includes over 8 years focusing on solid organ and cell transplantation through work on the Immune Tolerance Network (ITN)and Clinical Trials in Organ Transplantation (CTOT) project.  In addition, Heather serves as Rho’s biostatistics operational service leader, an internal expert sharing biostatistical industry trends, best practices, processes and training.

Hyunsook Chin, MPH, is a Senior Biostatistician at Rho. She has over 10 years of experience in statistical design, analysis, and reporting for clinical trials and observational studies. Her therapeutic area experience includes: autoimmune diseases, oncology, nephrology, cardiovascular diseases, and ophthalmology. Specifically, her research experience has focused on solid organ transplantation for over 8 years on the CTOT projects. She also has several publications from research in nephrology and solid organ transplantation projects. She is currently working on several publications.

An Adverse Event (AE) is any unfavorable or unintended sign, symptom, or disease temporally associated with a study procedure or use of a drug, and does not imply any judgment about causality. An AE is considered Serious if in the view of either the investigator or sponsor, the outcome is any of the following: 

  • Death
  • Life-threatening event
  • Hospitalization (initial or prolonged)
  • Disability or permanent damage
  • Congenital anomaly/birth defect
  • Required intervention to prevent impairment or damage
  • Other important medical event

When a serious adverse event (SAE) occurs the site investigator immediately reports the event to the sponsor. Both the site investigator and the sponsor assess causality for every SAE. Causality is whether there is a reasonable possibility that the drug caused the event. The FDA believes the sponsor can better assess causality as they have access to SAE reports from multiple sites and studies along with a familiarity with the drug’s mechanism of action. When expedited SAE reports are delivered to the FDA the sponsor causality is reported instead of the site investigator’s.

complexity-resized-600Causality assessments may differ between the site investigator and sponsor. It is important to understand the difference in assessments to ensure proper reporting and conduct through a trial. For example, if stopping rules rely on causality should the sponsor’s or site investigator’s causality assessment be used? Which causality assessment should be used for DSMB and CSR reports? To better understand how to handle these situations it’s important to understand the differences.

We reviewed over 1400 SAEs from 76 studies over the last 6 years. Each SAE had causality assessed against an average of 3.8 study interventions (e.g. study medication 1, study procedure 1, etc.) for a total of over 5300 causality assessments. Related causality included definitely, possibly, and probably related while Not Related included unlikely related and unrelated. At the SAE level an SAE was considered related if at least one study intervention was determined related.

Table 1: Causality Comparisons

  Study Investigator Sponsor
Study Interventions    
Not Related 89% 81%
Related 11% 19%
SAEs    
Not Related 78% 67%
Related 22% 33%

Sponsors deemed more SAEs to be related to study interventions than site investigators. This relationship is maintained when looking at the breakdown of SAEs by severity with the sponsor determining a larger percentage of SAEs related to the study intervention. This also held for the majority of system organ classes reviewed. 

flowchartWhat actions can we take with this information when designing a trial?

  1. If any study stopping rules rely on causality the study team may want to consider using the sponsor causality to ensure all possible cases are captured. The biggest hurdle with this transition would be acquiring the sponsor causality in real time as it is not captured in the clinical database.
  2. For DSMB reports, if only the site investigator causality is reported the relationship to SAEs may be under reported versus the information the FDA receives. Given the sponsor more often assesses SAEs as related this is important information that should be provided to the DSMB members when evaluating the safety of the study.
  3. For clinical study reports, both SAE and non-serious adverse events are reported. The study team should determine what information they want to include. The sponsor safety assessments are not included in the clinical database but it is what the FDA receives during the conduct of the trial. Additionally, if the sponsor more often assesses SAEs as related the report may under report related SAEs if only the site investigator assessment is used in the report.
Note that these findings are based on studies Rho has supported and may not be consistent with findings from other trials/sponsors.  Additionally, in some studies the site investigator may have changed the relationship of the SAE based on discussions with the sponsor and we do not have any information to quantify how often this occurs.

What We Learned at PhUSE US Connect

Posted by Brook White on Tue, Jun 12, 2018 @ 09:40 AM
Share:

ryan-baileyRyan Bailey, MA is a Senior Clinical Researcher at Rho.  He has over 10 years of experience conducting multicenter asthma research studies, including the Inner City Asthma Consortium (ICAC) and the Community Healthcare for Asthma Management and Prevention of Symptoms (CHAMPS) project. Ryan also coordinates Rho’s Center for Applied Data Visualization, which develops novel data visualizations and statistical graphics for use in clinical trials.

Last week, PhUSE hosted its first ever US Connect conference in Raleigh, NC. Founded in Europe in 2004, the independent, non-profit Pharmaceutical Users Software Exchange has been a rapidly growing presence and influence in the field of clinical data science. While PhUSE routinely holds smaller events in the US, including their popular Computational Science Symposia and Single Day Events, this was the first time they had held a large multi-day conference with multiple work streams outside of Europe. The three-day event attracted over 580 data scientists, biostatisticians, statistical programmers, and IT professionals from across the US and around the world to focus on the theme of "Transformative Current and Emerging Best Practices."

After three days immersed in data science, we wanted to provide a round-up of some of the main themes of the conference and trends for our industry.

Emerging Technologies are already Redefining our Industry

emerging technologyIt can be hard to distinguish hype from reality when it comes to emerging technologies like big data, artificial intelligence, machine learning, and blockchain.  Those buzzwords made their way into many presentations throughout the conference, but there was more substance than I expected.  It is clear that many players in our industry (FDA included) are actively exploring ways to scale up their capabilities to wrangle massive data sets, rely on machines to automate long-standing data processing, formatting, and cleaning processes, and use distributed database technologies like blockchain to keep data secure, private, and personalized.  These technologies are not just reshaping other sectors like finance, retail, and transportation; they are well on their way to disrupting and radically changing aspects of clinical research.

The FDA is Leading the Way

Our industry has gotten a reputation for being slow to evolve, and we sometimes use the FDA as our scapegoat. Regulations take a long time to develop, formalize, and finalize, and we tend to be reluctant to move faster than regulations. However, for those that think the FDA is lagging behind in technological innovation and data science, US Connect was an eye opener. With 30 delegates at the conference and 16 presentations, the agency had a strong and highly visible presence.

Moreover, the presentations by the FDA were often the most innovative and forward-thinking. Agency presenters provided insight into how the offices of Computational Science and Biomedical Informatics are applying data science to aid in reviewing submissions for data integrity and quality, detecting data and analysis errors, and setting thresholds for technical rejection of study data. In one presentation, the FDA demonstrated its Real-time Application for Portable Interactive Devices (RAPID) to show how the agency is able to track key safety and outcomes data in real time amid the often chaotic and frantic environment of a viral outbreak. RAPID is an impressive feat of technical engineering, managing to acquire massive amounts of unstructured symptom data from multiple device types in real time, process them in the cloud, and perform powerful analytics for "rapid" decision making. It is the type of ambitious technically advanced project you expect to see coming out of Silicon Valley, not Silver Spring, MD.

It was clear that the FDA is striving to be at the forefront of bioinformatics and data science, and in turn, they are raising expectations for everyone else in the industry.

The Future of Development is "Multi-lingual"  

A common theme through all the tracks is the need to evolve beyond narrowly focused specialization in our jobs. Whereas 10-15 years ago, developing deep expertise in one functional area or one tool was a good way to distinguish yourself as a leader and bring key value to your organization, a similar approach may hinder your career in the evolving clinical research space. Instead, many presenters advocated that the data scientist of the future specialize in a few different tools and have broad domain knowledge. As keynote speaker Ian Khan put it, we need to find a way to be both specialists and generalists at the same time. Nowhere was this more prevalent than in discussions around which programming languages will dominate our industry in the years to come.

While SAS remains the go-to tool for stats programming and biostatistics, the general consensus is that knowing SAS alone will not be adequate in years to come. The prevailing languages getting the most attention for data science are R and Python. While we heard plenty of debate about which one will emerge as the more prominent, it was agreed that the ideal scenario would be to know at least one, R or Python, in addition to SAS.

We Need to Break Down Silos and Improve our Teams

data miningOn a similar note, many presenters advocated for rethinking our traditional siloed approach to functional teams. As one vice president of a major Pharma company put it, "we have too much separation in our work - the knowledge is here, but there's no crosstalk." Rather than passing deliverables between distinct departments with minimal communication, clinical data science requires taking a collaborative multi-functional approach. The problems we face can no longer be parsed out and solved in isolation. As a multi-discipline field, data science necessarily requires getting diverse stakeholders in the room and working on problems together.

As for how to achieve this collaboration, Dr. Michael Rappa delivered an excellent plenary session on how to operate highly productive data science teams based on his experience directing the Institute for Advanced Analytics at North Carolina State University. His advice bucks the traditional notion that you solve a problem by selecting the most experienced subject matter experts and putting them in a room together. Instead, he demonstrated how artfully crafted teams that value leadership skills and motivation over expertise alone can achieve incredibly sophisticated and innovative output.

Change Management is an Essential Need

Finally, multiple sessions addressed the growing need for change management skills. As the aforementioned emerging technologies force us to acquire new knowledge and skills and adapt to a changing landscape, employees will need help to deftly navigate change. When asked what skills are most important for managers to develop, a VP from a large drug manufacturer put it succinctly, "our leaders need to get really good at change management."

In summary, PhUSE US Connect is helping our industry look to the future, especially when it comes to clinical data science, but the future may be closer than we think. Data science is not merely an analytical discipline to be incorporated into our existing work; it is going to fundamentally alter how we operate and what we achieve in our trials. The question for industry is if we're paying attention and pushing ourselves to evolve in step to meet those new demands.

Webinar: Understanding the FDA Guidance on Data Standards

Cellular Therapy Studies: 7 Common Challenges

Posted by Brook White on Tue, May 15, 2018 @ 09:34 AM
Share:

Heather Kopetskie, Senior BiostatisticianHeather Kopetskie, MS, is a Senior Biostatistician at Rho. She has over 10 years of experience in statistical planning, analysis, and reporting for Phase 1, 2 and 3 clinical trials and observational studies. Her research experience includes over 8 years focusing on solid organ and cell transplantation through work on the Immune Tolerance Network (ITN)and Clinical Trials in Organ Transplantation (CTOT) project.  In addition, Heather serves as Rho’s biostatistics operational service leader, an internal expert sharing biostatistical industry trends, best practices, processes and training.

Kristen Much, Senior BiostatisticianKristen Mason, MS, is a Senior Biostatistician at Rho. She has over 4 years of experience providing statistical support for studies conducted under the Immune Tolerance Network (ITN) and Clinical Trials in Organ Transplantation (CTOT). She has a particular interest in data visualization, especially creating visualizations within SAS using the graph template language (GTL). 

Cellular therapy is a form of treatment where patients are injected with cellular material. cell therapyDifferent types of cells can be utilized such as stem cells (such as mesenchymal stem cells) and cells from the immune system (such as regulatory T cells (Tregs)) from either the patient or a donor. In many cases, these cells have been reprogrammed to carry out a new function that will aid in the treatment of a disease or condition. Cellular therapy has become increasingly-popular largely due to the fact that cells have the ability to carry out many complex functions that drugs cannot. When successful, cellular therapy can result in a more targeted and thus more effective treatment. More information on cellular therapy can be found here .

Rho is conducting several studies using cellular therapy to treat diseases such as systemic lupus erythematosus and pemphigus vulgaris and for various applications within organ transplantation.

Cellular therapy trials offer their own unique set of challenges. The following list presents some of these challenges encountered here at Rho.

  1. Cellular therapies require highly-specialized laboratories to manufacture the investigational product, especially if the cells are being manipulated. Centralized manufacturers are commonly utilized requiring logistical considerations if the trial has multiple study sites. These logistics may include proper packaging, temperature storage, shipping days, etc. which all must be considered when shipping the product.
  2. It is critical to plan for and establish clear communication between the manufacturing lab, the study site, and the study team when working under time constraints. One common consideration is to ensure extracted cells will not arrive at the manufacturer on a Saturday or Sunday when lab personnel may not be available to immediately process cells. 
  3. Protocols usually require a minimum number of cells be available for infusion into the subject. The protocol must detail what steps to take when not enough viable cellular product is produced. Some questions to consider include: 
    • Is it is possible to recollect cells for a second attempt? If so, does it work with the timing of the trial?
    • Are there leftover cells from the first attempt? 
  4. Potent drugs are sometimes paired with administration of the cellular product.  It is crucial to avoid administering these drugs unless a viable cellular product has been produced. Checks should be in place to ensure product is available before administering additional study drugs.
  5. Guidance exists limiting the amount of blood that can be collected over an 8-week period from a single subject. If the cellular product is manufactured from a blood donation, the amount of blood from any and all blood draws around the same time should be taken into consideration. If the blood donation occurs close to screening when blood is often drawn for various baseline labs pay close attention to the total amounts as exceeding the established limits can be easy. 
  6. The subject accrual for a study should be clearly outlined in the protocol. Is it X number of subjects that receive a minimum number of cells, X number of subjects that receive any cells, etc.
  7. Cellular product may not be administered until several months into the study. Subjects may be evaluated for eligibility several times while waiting for the infusion allowing multiple time points each subject may become ineligible. This along with the potential of insufficient cellular product can result in an unexpected length of time to administer cellular product to the target number of subjects. As such, this is an important factor when determining the duration and budget for a cellular therapy study. 
All in all, there are numerous opportunities for learning when using cellular therapies to treat disease. In many disease areas, this concept is still novel and study teams are facing new challenges with each study. Understanding these challenges early can help in the development of a robust protocol that addresses these same challenges before they ever become an issue.

“This drug might be harmful!  Why was it approved?”  What the news reports fail to tell us.

Posted by Brook White on Thu, Apr 19, 2018 @ 08:39 AM
Share:

Jack Modell, MD, Vice President and Senior Medical OfficerJack Modell, MD, Vice President and Senior Medical Officer is a board-certified psychiatrist with 35 years’ of experience in clinical research and patient care including 15 years’ experience in clinical drug development. He has led successful development programs and is a key opinion leader in the neurosciences, has served on numerous advisory boards, and is nationally known for leading the first successful development of preventative pharmacotherapy for the depressive episodes of seasonal affective disorder.

David Shoemaker, PhD, Senior Vice President, R&DDavid Shoemaker, PhD, Senior Vice President R&D, has extensive experience in the preparation and filing of all types of regulatory submissions including primary responsibility for four BLAs and three NDAs.  He has managed or contributed to more than two dozen NDAs, BLAs, and MAAs and has moderated dozens of regulatory authority meetings.  

Once again, we see news of an approved medication* being linked to bad outcomes, even deaths, and the news media implores us to ask:  

drugs and biologics in the news“How could this happen?”
“Why was this drug approved?”
“Why didn’t the pharmaceutical company know this or tell us about it?”
“What’s wrong with the FDA that they didn’t catch this?”
“Why would a drug be developed and approved if it weren’t completely safe?”

And on the surface, these questions might seem reasonable.  Nobody, including the drug companies and FDA, wants a drug on the market that is unsafe, or for that matter, wants any patient not to fare well on it.  And to be very clear at the outset, in pharmaceutical development, there is no room for carelessness, dishonesty, intentionally failing to study or report suspected safety signals, exaggerating drug benefits, or putting profits above patients – and while there have been some very disturbing examples of these happening, none of this should ever be tolerated.  But we do not believe that the majority of reported safety concerns with medications are caused by any intentional misconduct or by regulators failing to do their jobs, or that a fair and balanced portrayal of a product’s risk-benefit is likely to come from media reports or public opinion alone.

While we are not in a position to speculate or comment upon the product mentioned in this article specifically, in most cases we know of where the media have reported on bad outcomes for patients taking a particular medication, the reported situations, while often true, have rarely been shown to have been the actual result of taking the medication; rather, they occurred in association with taking the medication.  There is, of course, a huge difference between these two, with the latter telling us little or nothing about whether the medication itself had anything to do with the bad outcome.  Nonetheless, the news reports, which include catchy headlines that disparage the medication (and manufacturer), almost always occur years in advance of any conclusive data on whether the medication actually causes the alleged problems; and in many cases, the carefully controlled studies that are required to determine whether the observed problems have anything directly to do with the medication eventually show that the medication either does not cause the initially reported outcomes, or might do so only very rarely.  Yet the damage has been done by the initial headlines:  patients who are benefiting from the medication stop it and get into trouble because their underlying illness becomes less well controlled, and others are afraid to start it, thus denying themselves potentially helpful – and sometimes lifesaving – therapy.  And ironically, when the carefully controlled and adequately powered studies finally do show that the medication was not, after all, causing the bad outcomes, these findings, if reported at all, rarely make the headlines. 

Medications do, of course, have real risks, some serious, and some of which might take many years to become manifest.  But why take any risk?  Who wants to take a medication that could be potentially harmful?  If the pharmaceutical companies have safety as their first priority, why would they market something that they know carries risk or for which they have not yet fully assessed all possible risks?  There’s an interesting parallel here that comes to mind.  I recently airplane-1heard an airline industry representative say that the airlines’ first priority is passenger safety.  While the U.S. major airlines have had, for decades, a truly outstanding safety record, could safety really be their first priority?  If passenger safety were indeed more important than anything else, no plane would ever leave the gate; no passengers would ever board.  No boarding, no leaving, and no one could ever possibly get hurt.  And in this scenario, no one ever flies anywhere, either.  The airlines’ first priority has to be efficient transportation, though undoubtedly followed by safety as a very close second.  Similarly, the pharmaceutical industry cannot put guaranteed safety above all else, or no medications would ever be marketed.  No medications and no one could ever get hurt.  And in this scenario, no one ever gets treated for illnesses that, without medications, often harm or kill.  In short, where we want benefit, we must accept risks, including those that may be unforeseeable, and balance these against the potential benefits.

OK then:  so bad outcomes might happen anyway and are not necessarily caused by medication, worse outcomes can happen without the medications, and we must accept some risk.  But isn’t it negligent of a pharmaceutical company to market a medication before they actually know all the risks, including the serious ones that might only happen rarely?  Well, on average, a new medicine costs nearly three-billion dollars and takes well over a decade to develop, and it is tested on up to a few thousand subjects.  But if a serious adverse event did not occur in the 3000 subjects who participated in the clinical trials to develop the medicine, does this show us that the medicine is necessarily safe and unlikely to ever harm anybody?  Unfortunately, it does not.  As can be seen by the statistical rule of three**, this can only teach us that, with 95% confidence, the true rate of such an event is between zero and 1/1000.  And while it may be comforting that a serious event is highly unlikely to occur in more than 1/1000 people who take the medication, if the true rate of this event is, let’s say, even 1/2000, there is still greater than a 90% chance that a serious adverse event will occur in at least one person among the first 5000 patients who take the medication!  Such is the nature of very low frequency events over thousands of possible ways for them to become manifest.

So why not study the new medication in 10,000 subjects before approval, so that we can more effectively rule out the chances of even rarer serious events?  There is the issue of cost, yes; but more importantly, we would now be extending the time to approval for a new medicine by several additional years, during which time far more people are likely to suffer by not having a new and needed treatment than might ever be prevented from harm by detecting a few more very rare events.  There is a good argument to be made that hurting more people by delaying the availability of a generally safe medication to treat an unmet medical need in an effort to try to ensure what might not even be possible – that all potential safety risks are known before marketing – is actually the more negligent course of action.  It is partly on this basis that the FDA has mechanisms in place (among them, breakthrough therapy, accelerated approval, and priority review) to speed the availability of medications that treat serious diseases, especially when the medications are the first available treatment or if the medication has advantages over existing treatments.  When these designations allow for a medication to be marketed with a smaller number of subjects or clinical endpoints than would be required for medications receiving standard regulatory review, it is possible that some of these medications might have more unknown risks than had they been studied in thousands of patients.  In the end, however, whatever the risks – both known and unknown – if we as a society cannot accept them, then we need to stop the development and prescribing of medicines altogether.  

*Neither of the authors nor Rho was involved in the development of the referenced product.  This post is not a comment on this particular product or the referenced report, but rather a response to much of the media coverage of marketed drugs and biologics more broadly.

**In statistical analysis, the rule of three states that if a certain event did not occur in a sample with n subjects, the interval from 0 to 3/n is a 95% confidence interval for the rate of occurrences in the population.  https://en.wikipedia.org/wiki/Rule_of_three_(statistics)  

The probability that no event with this frequency will occur in 5000 people is (1 - .005)5000, or about 0.082.

Free Webinar: Expedited Development and Approval Programs

505(b)(2) vs ANDA: How Complex Drugs Fit In

Posted by Brook White on Tue, Feb 20, 2018 @ 08:42 AM
Share:

Choosing the Appropriate New Drug Application and Corresponding Abbreviated Development Pathway

Samantha Hoopes, PhD, RAC Integrated Product Development AssociateSamantha Hoopes, PhD, RAC is an Integrated Product Development Associate at Rho involved in clinical operations management and regulatory submissions.  Samantha has over 10 years of scientific writing and editing experience and has served as lead author on clinical and regulatory documents for product development programs for a variety of therapeutic areas.

Sheila Bello-Irizarry, PhD, RAC, Research ScientistSheila Bello-Irizarry, PhD, RAC, Research Scientist, is actively involved in protocol development, orphan-drug designation applications, and regulatory submissions including INDs and NDAs/BLAs. Her therapeutic area experience includes infectious diseases, immunology, vaccines, lung biology, musculoskeletal, and antibody-mediated therapy.  She contributed to developing vaccine candidates against malaria and MRSA infections and to the understanding of inflammatory processes during lung fungal infections.

regulatory pathways--ANDA and 505(b)(2)With the confirmation of a new Food and Drug Administration (FDA) Commissioner, Scott Gottlieb, M.D., in 2017, we have seen some changes in the regulatory environment with a new Drug Competition Action Plan and FDA draft guidances focused on introducing more competition into the drug market with the goal of increasing access to drugs that consumers need.  These guidance documents are meant to provide information on abbreviated approval pathways and provide clarity on the regulatory pathway for complex generic drugs in order to speed approval allowing for more competition in the market place, which may impact pricing.  

While it is important to understand how to navigate the complex generic drug approval pathway, it is first necessary to determine whether your drug product should be submitted as an abbreviated new drug application (ANDA) for approval as a generic or if it requires submission of a 505(b)(2) new drug application.  This particular issue is addressed in a new draft guidance, published 13 October 2017, “Determining Whether to Submit an ANDA or a 505(b)(2) Application.”  The draft guidance defines an ANDA as an application for a duplicate (same with respect to their active ingredient[s], dosage form, route of administration, strength, previously approved conditions of use, and labeling [with certain exceptions]) of a previously approved drug product that relies on FDA’s findings that the previously approved drug product, the reference listed drug, is safe and effective.  An ANDA may not be submitted if studies are necessary to establish the safety and effectiveness of the proposed product.  A 505(b)(2) application contains full reports of safety and effectiveness, but one or more of the investigations relied upon by the applicant for approval were not conducted by or for the applicant and for which the applicant has not obtained a right of reference or use from the person by or for whom the investigations were conducted  [Guidance for Industry:  Applications Covered by Section 505(b)(2)]. 

The draft guidance outlines regulatory considerations for ANDA and 505(b)(2) applications as described below.  FDA will generally refuse to file a 505(b)(2) application for a drug that is a duplicate of a listed drug and eligible for approval via an ANDA.  An applicant may submit a suitability petition (21 CFR 314.93) to the FDA requesting permission to submit an ANDA, known as a petitioned ANDA, for a generic drug product that differs from the RLD in its dosage form, route of administration, strength or active ingredient (in a product with more than one active ingredient).  The FDA will not approve a suitability petition if determined that safety and effectiveness of the proposed changes from the reference listed drug cannot be adequately evaluated without data from investigations that exceed what may be required for an ANDA or the petition is for a drug product for which a pharmaceutical equivalent has been approved in an NDA.  The FDA will not accept an ANDA for filing for a product that differs from the reference listed drug until the suitability petition has been approved.

In some circumstances, an applicant may seek approval for multiple drug products containing the same active ingredient(s), known as bundling, when some of the products would qualify for approval under the 505(b)(2) pathway and some of the product would qualify for approval under the ANDA pathway.  The FDA allows the applicant to submit one 505(b)(2) application for all such multiple drug products that are permitted to be bundled.  An example referenced in the draft guidance where bundling into one 505(b)(2) submission would be allowed includes an applicant seeking approval of multiple strengths of a product where only some of which are listed in the Orange Book, as reference listed drugs. 

formal meetings with FDASeveral draft guidance documents have recently focused on complex generic drug products.  A draft guidance titled “Formal Meetings Between FDA and ANDA Applicants of Complex Products Under GDUFA” was issued in October 2017 to provide ANDA applicants information on preparing and submitting meeting requests and meeting materials for complex generic drug products.  Complex products are defined as 1)  complex active ingredients, complex formulations, complex routes of delivery, complex dosage forms, 2)  complex drug-device combination products, or 3) other products where complexity or uncertainty concerning the approval pathway or possible alternative approach would benefit from early scientific engagement.  The guidance describes 3 types of meetings for complex products that may be submitted as an ANDA:  product development meetings, pre-submission meetings, and mid-review-cycle meetings. The draft guidance include details for how it is determined if meetings are granted and the review timeframe goals for FY2018 through FY2022.  

A draft guidance “ANDAs for Certain Highly Purified Synthetic Peptide Drug Products That Refer to Listed Drugs of rDNA Origin” also issued in October 2017, focuses on helping applicants determine if certain complex products, synthetic peptides, that refer to a previously approved peptide drug product of recombinant deoxyribonucleic acid (rDNA) origin should be submitted as an ANDA.  In the past, analytical methods have not been capable of adequately characterizing peptide products for submission in an ANDA; however with advances in scientific technology, FDA now considers it possible to demonstrate that the active ingredient in a proposed generic synthetic peptide is the same as the active ingredient in the reference listed drug of rDNA origin.  While this guidance pertains to some specific synthetic peptides, Dr. Gottlieb addressed (FDA Voice, 02 October 2017) this general issue stating that “a further barrier to generic competition for certain complex drug products is the lack of established methods for showing the sameness of the active ingredient of a proposed generic drug to a brand-name drug for certain complex drugs” and “over the next year, FDA’s generic drug regulatory science program will work to identify gaps in the science and develop more tools, methods, and efficient alternatives to clinical endpoint testing, where feasible.”  These efforts are meant to encourage and facilitate complex generic drug development.  Additional guidance documents will continue to be released regarding specific types of complex drug products.

Additionally, a draft guidance released on 03 January 2018 “Good ANDA Submission Practices,”addresses common, recurring deficiencies seen in ANDAs that may lead to delay in approval.  Common deficiencies include not correctly addressing patent and exclusivity information for the RLD, not providing adequate and properly prepared clinical summary data tables for bioequivalence studies, and not submitting draft container and carton labels with an accurate representation of the formatting that will be used for the final printed labels.  In a statement from Dr. Gottlieb, “it currently takes on average about 4 cycles for an ANDA to reach approval – not necessarily because the product will not meet our standards, but sometimes because the application is missing the information necessary to demonstrate that it does” (Press Release 03 January 2017).  This guidance as well as a new manual of policies and procedures (MAPP:  Good Abbreviated New Drug Application Assessment Procedures) aim to help reduce the number of review cycles ANDAs undergo prior to approval.  

These recently released draft guidance documents provide clarity on abbreviated approval pathways and highlight priorities of the FDA to increase competition in the marketplace with a focus on speeding generic approvals, including complex generic drug products.  

Webinar: Worried About Your Next FDA Meeting?

The Future, Today: Artificial Intelligence Applications for Clinical Research

Posted by Brook White on Tue, Feb 13, 2018 @ 08:37 AM
Share:

Petra LeBeau, ScD, Senior Biostatistician and Lead of the Bioinformatics Analytics TeamPetra LeBeau, ScD, is a Senior Biostatistician and Lead of the Bioinformatics Analytics Team at Rho. She has over 13 years of experience in providing statistical support for clinical trials and observational studies, from study design to reporting. Her experience includes 3+ years of working with genomic data sets (e.g. transcriptome and metagenome). Her current interest is in machine learning using clinical trial and high-dimensional data.

Agustin Calatroni, MS, is a Principal Statistical Scientist at Rho. His academic background includes a master’s degree in economics from the Univesité Paris 1 Panthéon-Sorbonne and a master’s degree in statistics from North Carolina State University. In the last 5 years, he has participated in a number of competitions to develop prediction models. He is particularly interested in the use of stacking models to combine several machine learning techniques into one predictive model in order to decrease the variance (bagging), bias (boosting) and improve the predictive accuracy.

Derek Lawrence, Senior Clinical Data ManagerDerek Lawrence, Senior Clinical Data Manager, has 9 years of data management and analysis experience in the health care / pharmaceutical industry. Derek serves as Rho’s Operational Service Leader in Clinical Data Management, an internal expert responsible for disseminating the application of new technology, best practices, and processes.

artificial intelligence in clinical researchArtificial Intelligence (AI) may seem like rocket science, but most people use it every day without realizing it. Ride-sharing apps, airplane ticket purchasing aggregators, ATM machines, recommendations for your next eBook or superstore purchase, or the photo library within your smartphone—all these common apps use machine learning algorithms to improve the user experience.

Machine learning (ML) algorithms make predictions and, in turn, learn from their own predictions resulting in improved performance over time. ML has slowly been making its way into health research and the healthcare system due in part to an exponential growth in data stemming from new developments in technology like genomics. Rho supports many studies with large datasets including the microbiome, proteome, metabolome, and the transcriptome. The rapid growth of health-related data will continue along with the development of new methodologies like systems biology (i.e. the computational and mathematical modeling of interactions within biological systems) that leverage these data. machine learning in clinical researchML will continue to be a key enabler in these areas. The ever-increasing amounts of computational power, improvements in data storage devices, and falling computational costs have given clinical trial centers the opportunity to apply ML techniques to large and complex data which would not have been possible a decade ago. In general, ML is divided into two main types of techniques: (1) Supervised learning, in which a model is trained on known input and output data in order to predict future outputs, and (2) unsupervised learning, where instead of predicting outputs, the system tries to find naturally occurring patterns or groups within the data. In each type of ML, there a large number of existing algorithms. Example supervised learning algorithms include random forest, boosted trees, neural networks, and deep neural networks just to name a few. Similarly, unsupervised learning has a plethora of algorithms.

Lately, it has become clear that in order to substantially increase the accuracy of a predictive model, we need to use an ensemble of models. The idea behind ensembles is that by combining a diverse set of models one is able to produce a stronger, higher performing model which in turn results in better predictions. By creating an ensemble of models, we maximize the accuracy, precision, and stability of our predictions. The power of the ensemble technique can be intuited with a real-world example: In the early 20th century, the famous English statistician Francis Galton (who created the statistical concept of correlation) attended a local fair. While there, he came across a contest that involved guessing the weight of an ox. He looked around and noticed a very diverse crowd; there were people like him who maybe had little knowledge about cattle, and there were farmers and butchers whose guesses would be considered that of an expert. In general, the diverse audience ended up giving a wide variety of responses. He wondered what would happen if he took the average of all these responses, expert, and non-expert alike. What he found was that the average of all the responses was much closer to the true weight of the ox than any individual guess alone. This phenomenon has been called the “wisdom of crowds.” Similarly, today’s best prediction models are often the result of an ensemble of various models which together provide a better overall prediction accuracy than any individual one would be capable of.

As data management is concerned, the current clinical research model is centered on electronic data capture systems (EDC), in which a database is constructed that comprises the vast majority of the data for a particular study or trial. Getting all of the data into a single system involves a significant investment in the form of external data imports, redundant data entry, transcription from paper sources, transfers from electronic medical/health record systems (EMR/EHR), and the like. Additionally, the time and effort required to build, test, and validate complicated multivariate edit checks into the EDC system to help clean the data as they are entered is substantial and can only utilize data that currently exist in the EDC system itself. As data source variety increases, along with surges in data volume and data velocity, this model becomes less and less effective at identifying anomalous data.

At Rho, we are investing in talent and technology that in the near future will use ML ensemble models in the curation and maintenance of clinical databases. Our current efforts to develop tools to aggregate that data from a variety of sources will be a key enabler. Similar to the ways the banking industry uses ML to identify ‘normal’ and ‘abnormal’ spending patterns and make real-time decisions to allow or decline purchases, ML algorithms can identify univariate and multivariate clusters of anomalous data for manual review. These continually-learning algorithms will enable a focused review of potentially erroneous data without the development of the traditional EDC infrastructure, saving not only time performing data reviews but also identifying potential issues of which we would normally have been unaware.

Webinar: ePRO and Smart Devices

Challenges in Clinical Data Management: Findings from the Tufts CSDD Impact Report

Posted by Brook White on Fri, Feb 09, 2018 @ 12:24 PM
Share:

Derek Lawrence, Senior Clinical Data ManagerDerek Lawrence, Senior Clinical Data Manager, has 9 years of data management and analysis experience in the health care/pharmaceutical industry.  Derek serves as Rho's Operational Service Leader in Clinical Data Management, an internal expert responsible for disseminating the application of new technology, best practices, and processes.

The most recent Impact Report from the Tufts Center for the Study of Drug Development presented the results of a study including nearly 260 sponsor and CRO companies into clinical data management practices and experience. A high-level summary of the findings included longer data management cycle times than those observed 10 years ago, delays in building clinical databases, a reported average of six applications to support each clinical study, and a majority of companies reporting technical challenges as it pertained to loading data into their primary electronic data capture (EDC) system.

These findings represent the challenges those of us in clinical data management are struggling with given the current state of the clinical research industry and technological changes. EDC systems are still the primary method of data capture in clinical research with 100% of sponsors and CROs reporting at least some usage. These systems are experiencing difficulties in dealing with the increases in data source diversity. More and more clinical data are being captured by new and novel applications (ePRO, wearable devices, etc.) and there is an increased capacity to work with imaging, genomic, and biomarker data. The increases in data changing EDC paradigmvolume and data velocity have resulted in a disconnect with the EDC paradigm. Data are either too large or are ill-formatted for import into the majority of EDC systems common to the industry. In addition, there are significant pre-study planning and technical support demands when it comes to loading data into these systems. With 77% of sponsors and CROs reporting similar barriers to effective loading, cleaning, and use of external data, the issue is one with which nearly everyone in clinical research is confronted.

EDC integrationRelated to the issues regarding EDC integration are delays in database build. While nearly half of the build delays were attributed to protocol changes, just over 30% resulted from user acceptance testing (UAT) and database design functionality. Delays attributed to database design functionality were associated with a LPLV-to-lock cycle time that was 39% longer than the overall average. While the Tufts study did not address this directly, it would be no great stretch of the imagination to assume that the difficulties related to EDC system integration are a significant contributor to the reported database functionality issues. With there already being delays associated with loading data, standard data cleaning activities that are built into the EDC system and need to be performed before database lock would most certainly be delayed as well.

Clinical data management is clearly experiencing pains adapting to a rapidly-shifting landscape in which a portion of our current practices no longer play together nicely with advances in data-mining.jpgtechnology and data source diversity. All of this begs the question “What can we do to change our processes in order to accommodate these advances?” At Rho, we are confronting these challenges with a variety of approaches, beginning with limiting the impulse to automatically import all data from external vendors into our EDC systems. Configuring and updating EDC systems requires no small amount of effort on the part of database builders, statistical programmers, and other functional areas. Potential negative impacts to existing clinical data are a possibility when these updates are made as part of a database migration. At the end of the day, importing data into an EDC system results in no automatic improvement to data quality and, in some cases, actually hinders our ability to rapidly and efficiently clean the data. In developing standard processes for transforming and cleaning data external to the EDC systems, we increase flexibility in adapting to shifts in incoming data structure or format and mitigate the risk of untoward impacts to the contents of the clinical database by decreasing the prevalence of system updates.

The primary motivation for loading data received from external vendors into the EDC system is to provide a standard method of performing data cleaning activities and cross-checks against the clinical data themselves. To support this, we are developing tools to aggregate that data from a variety of sources and assemble them for data cleaning purposes. Similar to the ways the banking industry uses machine learning to identify ‘normal’ and ‘abnormal’ spending patterns and make real-time decisions to allow or decline purchases, similar algorithms can identify univariate and multivariate clusters of anomalous data for manual review. These continually-learning algorithms will enable a focused review of potentially erroneous data without the development of the traditional EDC infrastructure. This will save time performing data reviews and also identify potential issues which we would normally miss had we relied on the existing EDC model. With the future state resulting in an ever-broadening landscape of data sources and formats, an approach rooted in system agnosticism and sound statistical methodology will ensure we are always able to provide high levels of data quality.