Information for this article was contributed by Kristen Snipes, a Project Director at Rho with extensive experience managing attention deficit hyperactivity disorder (ADHD) trials, including the recent successful completion of a phase III laboratory classroom study.
1. Selection of a clinically meaningful endpoint is critical.
I’ve found that at the beginning of a study a lot of time is spent deciding what rating scale will be used, and yet not nearly enough time is spent determining what the precise endpoint will be. Particularly in a study that will be used as part of a marketing application, defining a clinically meaningful primary end point can mean the difference between success and failure. Picking the best end point requires consulting with key opinion leaders, regulatory experts, statisticians, and a medical director experienced in clinical trials. Depending on the phase of development, a special protocol assessment with FDA may be advisable.
2. Use of online ADHD assessment tools can reduce stress on parents and sites while increasing data quality.
Using an online assessment tool, parents are able to complete assessments on their own time and don’t have to worry about getting paper assessment forms returned to the site. Making things easier for parents prevents drop-outs, which can cause timeline delays and data issues. Sites’ efforts are reduced because they don’t have to follow up with parents to get the forms, scoring is completed automatically by the system reducing variability in data and they don’t have to transcribe the data from the forms into an electronic data capture (EDC) system. This removes the possibility of data entry errors leading to higher quality data.
3. When conducting laboratory classroom studies, short visit windows may create scheduling headaches that must be carefully managed.
The laboratory classroom portion of the study typically is conducted on a Saturday. If you have a two day visit window around the classroom day, the visit must occur between Thursday and Monday. This often means the visit must occur on Thursday, Friday, or Monday. Taking into account school and work schedules, this can create a stressful situation for parents. In some cases, a narrow visit window can’t be avoided. If so, you must stay on top of this issue to avoid both protocol deviations for visits outside the window and drop-outs because parents can’t make it to the visits.
4. Use of centralized, experienced sites will help you stay on schedule.
Ideally, you want to use sites with experience in ADHD trials and EDC that have a proven track record on both patient enrollment and data quality. Sites with prior experience on ADHD trials know what it takes to enroll subjects with this indication. They will have some ideas about what type and how much advertising is needed and how to incentivize parents to participate (e.g. providing snacks for after school study visits). They also are better able to estimate how many patients they can enroll and how quickly. Conversely, delays in enrollment typical of new sites can be costly and will lead to delays in the overall timeline.
Training and consistency are keys for the laboratory classroom studies which utilize raters to assess frequency of behavior. You need to ensure that the sites are rating in a similar manner to ensure reduced variability by center. Holding a study-wide training may be more expensive but will be worth the money spent when your final analysis is complete.
It is also important to select sites with a proven track record of delivering high quality data. Experience with EDC is an important factor here. Poor data quality can increase costs by driving up the number of queries and the time spent on site resolving problems. Data quality also can delay timelines as additional effort may be necessary to clean the data prior to database lock.
5. Patient diaries aren’t all they are cracked up to be.
We see a lot of interest in using patient diaries to collect information for ADHD studies. Patient diaries allow collection of information at any time directly from patients. It sounds like a great idea; however, what you usually end up with is a lot of dirty data. It can be very difficult to draw meaningful conclusions from what you have, but once you have it, you will have to do something to address it.
When collecting dosing information, keep in mind what you plan to use it for in your analysis. Is your goal strictly compliance? If so, can you gather this from your accountability records or number of tablets returned? Reducing the burden on patients and sites helps to ensure useful, qualify data that can tell the story you want in your clinical study report (CSR).
If you know anything about Rho, you know that clinical research is our life. Strong science is the underpinning of everything we do here—from designing your clinical trial to integrating your data for submission to FDA…to evaluating our coffee machines. Yes, you heard us correctly. One of our statisticians used a bit of scientific experimentation to test whether or not the cream in a coffee machine was spoiled. Taking science too far? We think not.
Recently, we’ve been trying out some new coffee machine options. Following his use of one of the new coffee machines, Senior Biostatistician Henry “Tee” Bahnson posted the following in the employee announcement section of our internal website:
“After hearing reports that the cream [in the new coffee machine] was bad, I did a quick experiment to test some hypotheses. The results are attached (see image below) but the conclusions are that the coffee is probably fine to drink if you don't mind it being acidic. Also, the cream is not spoiled; the acidic coffee is what is causing it to curdle . . . so please don't throw out the perfectly good cream or acidic coffee.
- In the above factorial experiment you can see that the new coffee machine appears to be causing the cream to curdle
- The rows and columns represent the two factors in this experiment (type of coffee maker and type of cream)
- The first row is from the new coffee machine and has the curdled cream. The second row is from the coffee maker in the 3rd floor break room. The coffee curdles regardless of type of cream in the new coffee machine but not in the regular machine; therefore, the cream is probably fine and the new coffee machine is likely causing the cream to curdle. This is probably happening because the coffee is too acidic. A quick Google search turned up this explanation.
Disclaimer: The experiment was under powered, the p-value is not significant, but I still believe the results.”
In case you are wondering, Tee isn’t the only Rho employee predisposed to a scientific view of the world. Project Director Brett Gordon responded with:
“I basically conducted a similar experiment at [a local coffee and pastry shop] in Durham last year. Their house blend does not curdle cream, but their Ethiopian coffee does every time, regardless of how fresh the cream is… again, due to acidity of the coffee. So there’s independent confirmation of your hypothesis.”
So, yes, we admit it. We are passionate about science. It’s part of everything we do, and we plan to keep it that way.
The dictionary defines an entrepreneur as one who assumes risk. Yet, we’ve thrived on creating a business and culture that removes risks for customers and employees. An innovative approach to navigate through economic turbulence has fostered long-term stability and trust – from the customers we serve (more than 90 percent return for future business) and the employees who benefit from the company being profitable every year since our start almost 30 years ago.
Chief Executive Officers Laura Helms Reece and Russ Helms were tapped as regional finalists for the Ernst & Young Entrepreneur of the Year Award. The internationally recognized award program is one of the world’s most prestigious business awards for entrepreneurs. The award recognizes those who demonstrate excellence and extraordinary success based on innovation, financial performance of the business and personal commitment to their business and communities.
“We’re proud of our team’s commitment to our clients and excellent service, enabling us to create a company that fosters stability and planned growth,” Laura said. “This focus has enabled us to provide our clients with the experience, capacities, and capabilities of a larger organization – all without losing touch with our entrepreneurial passion.”
In an industry in which the goal is speed and efficiency to get products to market as rapidly as possible, consistent growth and stability is a winning formula that shapes us and that our customers can trust.
“We are focused on continued organic growth at a pace that guarantees maintaining our great culture, which is valued by our clients,” Russ said. “Our focus will never be driven by short-term results and quarterly earnings reports to investors.”
View press release
On April 27, a group of Rho employees along with nearly half a million others participated in the March of Dimes, March for Babies fundraising event. This event supports research investigating premature birth — the leading cause of newborn death and a major cause of many lifelong disabilities. Rho’s founders having a personal history with the March of Dimes. As a child, co-founder Mary Helms’s family received assistance from the March of Dimes while her father suffered from polio. It is important as a company to take time out of our daily routines to not only give back on a personal level, but to contribute to Rho’s primary purpose: research.
In 2012, Rho ranked 5th in fundraising out of all the Triangle March for Babies teams. We’re excited to continue the tradition of being a top team as we are well on our way in surpassing our fundraising goal this year.
In addition to the March of Dimes, March for Babies walk, Rho hosts an annual March of Dimes BBQ at our headquarters in Chapel Hill, featuring a delicious BBQ lunch from Smokey’s Shack, raffle prizes and a water balloon toss. This year, employees purchased raffle tickets for what some say were our best prizes yet, including restaurant gift cards, photography packages, a round of golf, massages and vacation packages, just to name a few. This event provides the bulk of our fundraising efforts and allows us to be such strong contributors year after year.
We conclude each year’s BBQ with our infamous water balloon toss. Traditionally, a few members of Rho’s Leadership Team volunteer as targets, including co-CEO’s, Russ Helms and Laura Helms Reece. Employees purchase balloons and take aim at their favorite team members, with occasional balloons making their way back towards the crowd of onlookers.
Check out our Facebook page for more pictures from this year’s March of Dimes, March for Babies event and BBQ lunch. You can also follow us on Twitter to stay up-to-date on our summer events.
We recently conducted a webinar “What Non-statisticians Need to Know about Statistics in Clinical Trials” featuring Rho Senior Biostatistician Erika Menius. In case you missed it, click here to register and watch the webinar on demand. We wanted to provide answers to some of the questions received during the webinar which we did not have time for during the Q&A session. We thought a number of these answers would be of interest to a larger group.
Is unblinded the same thing as open label?
Yes, open label and unblinded are considered synonymous.
What parameters should be considered when selecting single blind/double blind/triple blind?
Objectivity of the outcome measure is a primary consideration in determining whether and how to blind your study. If there is any subjectivity in the outcome, the person making the assessment should be blinded and the level of that person determines the level of blinding.
How does FDA view efficacy analysis using Intended to Treat (ITT) versus per protocol populations? Do they look down on one versus the other?
FDA prefers ITT, especially in phase III because it is more generalizable and can help limit bias.
What is the importance of power in analyzing results?
Power uses hypothetical results to make study design decisions (e.g. sample size), so it is only truly relevant during study planning. At the end of the study you are analyzing the actual results, and Power is not a relevant factor in the analysis. However, in a failed study, you can use it to describe whether you had enough patients to see a specific treatment difference.
Describe Type I and Type II error in lay terms.
Type I error: The results indicate that a treatment works when it doesn’t.
Type II error: The results indicate that a treatment doesn’t work when it does.
Can you look at the primary efficacy end point without an interim analysis planned in the study protocol while the study is still on-going?
This is not recommended because it can affect your control of Type I error. If it is not an adequate and well controlled trial you plan to use for a marketing application, it is of less concern, but you should make adjustments as if you planned an interim analysis. FDA is extremely skeptical of doing this during any Phase III trial.
Can you explain allowable error (α), power, and standard deviation?
Allowable error (α) is the same as Type I error. It is the probability that the results will say a treatment works when it doesn’t.
Power is the quantified probability of seeing your treatment work in a given study assuming it does work.
Standard deviation is a common measure of the variability of your data.
What is permuted block randomization?
Permuted block randomization helps ensure a balanced number of subjects across treatment arms by guaranteeing that within a small number of subjects, all treatment arms are covered equally.
What is hot deck imputation?
Hot deck imputation typically is not used in clinical trials. This is a form of imputation where each missing value is replaced with an observed value from a “similar subject.”
Can you use an interim analysis to look at accumulation of cases before you proceed to the next step?
When people say “interim analysis” they typically mean an unblinded look at your data. If you aren’t unblinding, you can look at your data at any time. If you are unblinding, you should have a plan in place and adjust your Type I error accordingly.
Can you explain futility analysis? Can these be done unplanned?
Futility analyses are done to quantify the probability that your study will be successful when your interim results don’t look good. Yes, these can be done unplanned, but you will need to reduce your Type I error (α) for the final analysis.
What is CV?
CV is the coefficient of variation. It is the standard deviation divided by the mean.
Can you explain stratified randomization?
Stratified randomization is used when you want to ensure a balanced number of subjects within a specific factor, such as gender. Subjects are randomized within the stratum.
Do you use Bayesian analysis in clinical trials?
Yes. In clinical trials, Bayesian analyses are most often used in dose escalation studies (e.g., continual assessment method).
What is z(1-α) (from the sample size calculation)?
This is the value of a standardized normal curve associated with 1-α probability (also known as a critical value).
What is the benefit of stratification?
In small studies (typically phase II), stratification reduces the effect of factors that can impact your outcome, making measuring your treatment difference less subject to confounding.
What are the implications of the p-value?
The p-value is the probability that positive results are seen at the conclusion of a study when the drug actually doesn’t work. This value will be small for a drug that truly works.
Why does the p-value change when you do an interim analysis?
When you look at your data more than once, you are more likely to see something you want to see, so you must account for that in your final conclusion by adjusting your p-value.
Is there an equation for how much an interim analysis would affect your Type I error?
Yes, there are a number of equations you could use, such as an α spending equation.
How does FDA see ad hoc analysis used in submission?
An ad hoc analysis can help support the primary analysis or help explain safety issues or biologic mechanisms, but it is unlikely FDA would grant approval based on an ad hoc analysis alone.
Can you review how hazard ratios impact size and length of a trial?
More extreme hazard ratios (either larger or smaller than 1) are associated with the need for smaller and shorter studies. Note: This is a general answer to a highly complex topic.
Is a different sample size calculation equation needed for small clinical trials, large clinical trials, or both?
The sample size calculation is always relevant, no matter the size of the trial. It quantifies how much of a treatment effect you can expect to believe for your study size.
What is incomplete block study design?
This is a study design where you have blocks of subjects and multiple treatments, but not all treatments are appropriate for all blocks. For example, you might have three blocks—pediatric, adult, and geriatric—and two treatments (A and B) where the first treatment (A) isn’t appropriate for pediatric patients and treatment B isn’t appropriate for geriatric patients. In that case, you would have a block of pediatric patients on treatment B, a block of adult patients on treatment A, a block of adult patients on treatment B, and a block of geriatric patients on treatment A.
How is the calculation of n affected when comparing a diagnostic to a gold standard?
If your gold standard is not variable (or only slightly variable), it can reduce your sample size. It can also affect your study design because you may be doing an equivalence study instead of a superiority study. Note: This is a brief answer to a complex question.
Do you have a question from the webinar that isn't answered here? Submit it through the comments.
In recent years the National Institutes of Health (NIH) has undertaken several initiatives intended to advance neuroscience research by means of a multi-institute collaboration entitled “the Blueprint for Neuroscience Research.” Some of these initiatives involve standardization and sharing of cutting-edge technologies such as neuroimaging and genomics. Another of their initiatives, the NIH Toolbox, is a set of standardized neurobehavioral assessments that is useful to a broad research audience. This article summarizes the development-of and uses-for the NIH Toolbox, and provides considerations for its current and future utility in drug development1.
Many central nervous system (CNS) clinical trials, as well as those in other therapeutic areas, require neurological or behavioral assessments. Currently, investigators use a number of different assessments to assess the same construct, making it difficult to understand data across multiple studies. In an effort to resolve this problem in 2004, the NIH formed a coalition to create a toolbox of neurological and behavioral assessments. The coalition included multiple divisions at NIH and more than 250 scientists at more than 80 institutions. The goals of the coalition were:
- To develop and validate a set of standardized, psychometrically sound tools for neurobehavioral constructs
- To be able to measure the same construct across the life span
- To create tools that are royalty-free, and as close to cost-free as possible (most assessments currently in use are proprietary and often costly)
- To create tools that are efficient to administerTo provide measures that facilitate the pooling of data across many studies
In October of 2012, the coalition released the toolbox to the public. It includes:
- Four domain level batteries
- English and Spanish versions
- 34 supplemental instruments
- Training materials for administration of the assessments
- Public data from the studies conducted during the development of the tools
The batteries are fully normalized for ages 3-85, and are essentially free to use. Each domain takes 30 minutes to administer. From the toolbox website, you can access the domain level batteries, supplemental assessments, and training materials including videos showing the assessments being conducted.
The four domains covered by the assessments are cognition, motor, sensation, and emotion. The cognitive domain includes working memory (short term buffer), executive function (planning and organizing), episodic memory (acquisition and retrieval), language, processing speed, and attention. The motor domain includes standing balance, strength, dexterity, and speed/endurance. The sensation domain includes audition, olfaction, pain, taste, vestibular, and vision. The emotional domain includes psychological well-being, social relationships, stress and self-efficacy, and negative affect.
What will be the impact of the NIH toolbox on CNS clinical trials and the development of new treatments for CNS disorders? They are not designed to capture pathology, and so far, they have been used primarily in health subjects. Therefore, the NIH Toolbox is unlike earlier initiatives such as the ECDEU assessment manual for psychopharmacology published in 1976, which had a profound effect on drug development through the late 1980s. However, because of its focus on function in healthy subjects, the NIH Toolbox may provide a more precise and quantitative concept of “normal” against which pathology can be measured. As such, it will inevitably have utility throughout clinical research and drug development.
It may take some time to gain broad acceptance in the CNS community. For now, it seems risky to use one of these assessments alone as a primary end point for a clinical trial, and if you are considering doing so, you should probably proactively pursue agreement from the FDA. Another consideration is how you would use the assessments in a trial. Most of the assessments are aimed at benchmarking a state of wellness. While that doesn’t provide a direct measurement of disease state, these tests may be valuable in showing clinical significance.
Have you used or considered using any of these assessments yet? If so, share your experiences in the comments section.
For more information, see the NIH Toolbox website www.nihtoolbox.org where the assessments and training videos can be found.
1Information for this article was contributed by Nancy Yovetich, Ph.D., Senior Research Scientist, and Herbert Harris, M.D., Ph.D., Medical Director at Rho, Inc. These contributors were not involved in the development or validation of the NIH Toolbox. One of the authors did attend the launch of the NIH Toolbox in Bethesda, MD. Both Dr. Yovetich and Dr. Harris have extensive backgrounds in clinical research and in psychology/psychiatry.
The following article was contributed by Steve Palmatier, Rho's service leader for Interactive Response Technology (IxR) system configuration and development.
Sometimes it's difficult to determine the best tool for a job, especially when technologies are developed in parallel to handle similar tasks. Take Interactive Response Technology (IxR) and Electronic Data Capture / Electronic Case Report Forms (EDC), for example. Both technologies provide a method for electronic entry of important data. Both can have data verification checks incorporated to minimize the potential for ambiguous or incorrect data entry. Both commonly incorporate user roles to limit access of individual users to functionality that is appropriate. So what are the differences that would provide insight on which technology to use when? Several areas of differentiation are outlined below.
Purpose of the System
EDC - In short, EDC systems’ primary purpose is to electronically collect and validate participant data for eventual use in statistical analyses. Collecting these data electronically makes them more quickly available to the study team than traditional paper CRFs, and therefore allows more informed and proactive decision making.
IxR – The goal of IxR in clinical trials is to perform specific tasks, such as randomization, study drug dispensation, study drug resupply requests, emergency unmasking, etc. It is not the goal of IxR in most cases to be the primary place where participant data are entered and stored, though some data are required to perform the aforementioned tasks.
EDC – Due to the sheer volume of data to be captured, EDC systems nearly always use a computer-based interface that allows users to easily navigate between forms and between different areas on the same form. While swift entry of data into EDC systems is often desired so that study teams have accurate enrollment information, it is not usually operationally critical, so it is acceptable for a user to enter data in not-quite-real-time. Moreover, most clinical sites in developed countries can be expected to have computers, so a computerized interface is acceptable the vast majority of the time.
IxR – IxR has two main interfaces: web and voice (IWR and IVR respectively). Over the past 10 years or so, the prevalence of IVR systems has decreased significantly due to workstations, laptops, smartphones, and tablets becoming more widely available in the clinical setting. However, there are still some instances in which the phone interface is beneficial, such as when entry of data for randomization is highly time-sensitive (e.g., in neonatal trials where randomization must occur very shortly after birth), and when the IxR will be used for patient-reported outcomes or diary entry, since study subjects may not have access to a computer at home.
EDC - Most EDC systems are form-based, and most of the data entry fields on any particular web page are static. When a participant is enrolled in a trial, a set of forms is made available into which that participant’s data will be entered. Whether these forms are necessary or not becomes apparent later. For instance, if a participant withdraws consent early in the study, there may be many forms for visits later in the study that never have data associated with them. In many cases, the order in which data are entered is not controlled since different data will become available at different times, though sometimes additional forms are generated as they become necessary (e.g., SAE forms).
IxR - IxR systems generally create data entry pages dynamically. That is, the information and entry fields that appear on-screen or that are prompted over the phone are a result of previous selections and entries made by the user. This both minimizes data entry by the user and provides a gating mechanism that forces things to happen in the correct order. For instance, a user cannot skip to kit assignment prior to randomization, or randomization prior to entry of valid stratification data.
User Modification of Previously Entered Data
EDC - EDC forms can usually be revisited multiple times because all of the data that are to be entered on a form may not be available at once (e.g., lab values). Often, entry of data that seems inaccurate or is in an incorrect format is accepted and stored but fires a query that must be resolved prior to database lock, and the user may return at a later time to correct or confirm the entry. This is consistent with the primary purpose of EDC, to store data for use in data analysis that will take place at a later date.
IxR - Unlike EDC forms, entry of data and completion of a function in IxR usually triggers an action that is based on the entered data, so it is uncommon for a user to be able to return to the system to make corrections of previously missing or incorrectly entered data without support intervention. Incorrect entry of stratification data prior to randomization has cascading impacts, so correcting the mistake often involves more than simply updating that one data point.
EDC – Because there is an opportunity to correct mistakes between the initial entry and database lock, the importance of correct and complete data at the time of entry is not often assessed to be at the highest level. Also, since the primary purpose of EDC is to store data rather than to perform actions, validation of the system can focus primarily on making sure that edit checks fire correctly and that the data is stored accurately.
IxR - Because IxR performs actions that impact the course of the study, IxR systems generally carry a higher risk than EDC systems. Not only is it important for validation efforts to ensure that the entered data is correct; but it is also important to validate the logic that is exercised in order to make the decisions and perform the actions that are based on that data – assigning the correct treatment kits, requesting resupply of investigational product when appropriate, enforcing cohort caps, etc. The result is that IxR systems (especially those that are highly configurable) generally require more extensive validation and a higher percentage of setup time allotted to validation activities.
In the next post in this series, we’ll use these distinctions to help determine the appropriate scope for IxR systems so that the technology can be used most advantageously.
The following article was contributed by our medical director, Herbert Harris, MD, PhD.
On February 7, the FDA issued a proposal designed to assist companies developing new treatments for patients in the early stages of Alzheimer’s disease, before the onset of noticeable (overt) dementia.
Although we have an enormous amount of information about the underlying molecular pathophysiology of Alzheimer’s disease, translating this knowledge into effective new treatments has been exceedingly difficult. Part of this difficulty arises from the slowly progressive nature of the disorder. We have known for many decades that the accumulation in the brain of a protein known as amyloid is a central part of this process. Abnormal accumulation of amyloid triggers many other biochemical processes that lead to neuronal cell death and dysfunction that cause cognitive deterioration characteristic of the disease. This understanding has led to the development of many drugs that have the potential to prevent or oppose the abnormal accumulation of amyloid. However, these new drugs have typically been tested in patients in whom cognitive impairments are already fairly far advanced. Yet in recent years, advances in imaging technology and neuropathology have indicated that amyloid accumulation may begin years, or even decades before the appearance of measurable cognitive deficits. Such findings imply that interventions targeting amyloid accumulation are unlikely to show significant clinical benefits if they are not used until cognitive deficits have manifested. Instead, medicines that target amyloid accumulation and other fundamental molecular processes should probably be introduced well in advance of the onset of cognitive changes in order to be optimally effective. This understanding has led to a fundamental rethinking of the methods and strategies for drug development in Alzheimer's disease. Recognizing these new challenges that face the field, the FDA has developed a draft guidance document for the development of drugs to treat early stages of Alzheimer's disease. The guidance identifies a number of critical drug development issues and has indicated potential solutions that could move the field forward. In an accompanying press release, Russell Katz, M.D., Director of the Division of Neurology Products at the FDA’s Center for Drug Evaluation and Research noted: “The scientific community and the FDA believe that it is critical to identify and study patients with very early Alzheimer’s disease before there is too much irreversible injury to the brain. It is in this population that most researchers believe that new drugs have the best chance of providing meaningful benefit to patients.”
Perhaps the most problematic issue is that of identifying appropriate patient populations to study. Conventional clinical trials involving Alzheimer therapeutics typically enroll patients who meet criteria for a mild to moderate level of dementia as measured by various cognitive tests. Currently, there are a number of diagnostic entities that have been defined so as to capture patient populations at an early stage. These include Mild Cognitive Impermanent (MCI) and prodromal Alzheimer's disease. However, these diagnoses still depend on identification of some level of cognitive dysfunction. To identify patients at even earlier stages may require the use of genetic and other biomarkers. In developing their industry guidance, the FDA has acknowledged the potential importance of conducting trials in enriched populations defined by combinations of clinical findings and biomarkers. Unfortunately, to date, no biomarkers have been identified with sufficient predictive power. However, a great deal of progress is being made in this area.
The development of treatments for early stage Alzheimer's disease may also require the development of innovative outcome measures. Conventional studies of mild to moderate Alzheimer's disease typically employ cognitive testing used in combination with either a functional or global outcome measure as a co-primary endpoint. In the FDA guidance, it is acknowledged that in early stage Alzheimer's subjects, there may be little or no functional impairment. Therefore, it is recognized that in some cases the use of a co-primary outcome measure may be impractical. However, it is noted that as patients progress to later stages in which both functional and cognitive impairment begin to manifest, it may be appropriate to use composite scales that capture elements of function and cognition. The Clinical Dementia Rating Scale–Sum of Boxes score, which is been validated in patients whose level of impairment does not meet the threshold of frank dementia, is given in the guidance as an example of such a scale,. In the draft guidance, the possibility was also raised that a treatment might obtain approval under the accelerated approval mechanism based on effects demonstrated on an isolated cognitive measure. It was noted that in this scenario a sponsor might be required to demonstrate sustained global effects as a post-marketing condition.
The draft guidance contains an extensive discussion of the topic of biomarkers as primary and secondary outcome measures. It is noted that the use of a biomarker as a primary efficacy endpoint is a theoretical possibility under the accelerated approval mechanism, but there is currently no biomarker for which there is sufficient evidence to justify its use as a proxy for clinical preventive Alzheimer's disease. The draft guidance states “until there is widespread evidence-based argument in the research community that in effect on the particular biomarker is reasonably likely to predict clinical benefit, we will not be in a position to consider approval based on the use of a biomarker as a surrogate outcome measure in Alzheimer's disease (at any stage of illness).”
While many issues such as the potential role of biomarkers will have to await scientific development within the field, the development of an industry guidance document represents an important step that will focus the energies of the research community and enable much-needed progress in Alzheimer research. The agency is currently seeking public comments on the draft guidance. It is likely that they will begin finalization of the document next month. The FDA proposal is part of U.S. Department of Health and Human Services initiative known as the National Plan to Address Alzheimer’s Disease. This calls for both the government and the private sector to intensify efforts to treat or prevent Alzheimer’s and related dementias and to improve care and services.
One of the key goals of phase II is to determine the optimal dose that you will use going into your phase III trials and ultimately will be used on your product label submitted for approval as part of the new drug application (NDA). An optimal dose is a dose that is high enough to demonstrate efficacy in the target population, yet low enough to minimize safety concerns and adverse events. There are a number of strategies to determine the optimal dose, but here we will look at the four most common dose finding study designs.
In a cross-over design, subjects are randomized to a sequence of investigational product (IP) and placebo. Specifically, they are given one or more doses of the IP and then switched to dosing with a placebo or they start dosing with a placebo and then are switched to doses of IP. The value of cross over studies is they can determine efficacy of a dose within a subject because subjects act as their own control. Cross-over designs only work when the drug is quickly eliminated from the body, however. You need to be able to give a subject the treatment, wait for it to clear, and then give the placebo. It also requires a product that is designed to be used multiple times. For example a product that is intended to be given once, such as a drug to lower blood pressure during heart surgery, can’t be tested in a cross-over study because you won’t do the surgery again just to give the placebo.
In a dose titration study, you titrate to the maximum tolerated dose within a subject. This means that each subject will start at a low dose and receive an incrementally higher dose until the maximum tolerated dose is reached. Dose titration studies work well for treatments of chronic conditions where a drug will be used for a long period of time, and where it is likely that you will see significant differences in the way each subject reacts. Chronic hypertension medications are a good example of products where dose titration is useful. There is a lot of variability in how individual patients respond to hypertension products and with titrating the dose, you can give a lower dose to those who respond to it.
Parallel Dose Comparison
Parallel dose comparison studies are the classical dose finding study and are still one of the most common study designs. In parallel dose escalation study, several potential doses are selected and subjects are randomized to receive one of the doses for the entire study. At the end of the study, you can look at how each treatment group performed as compared to the control group. Because all treatment groups, including the higher dose cohorts, are dosed at the same time, this study design is best suited for situations where you have a good idea about the safety profile before the study starts.
If you are unsure of your safety profile and want to start exposing subjects to lower doses first, consider a dose escalation study. In this type of study, you start with one group of subjects (often referred to as a cohort) and give them a low dose. You observe this group for some period of time. If no safety issues are noted, you enroll a new group of subjects and give them a higher dose. This process is repeated until either you reach the maximum tolerated dose or you reach the highest dose you plan to consider. This design increases patient safety because you can start by exposing a small number of subjects to the lowest dose possible. You are mitigating risk both by limiting the initial number of subjects and limiting the exposure of each subject to study drug. You can also add control subjects to each cohort if you want to look at efficacy measures with an appropriate comparison group.
There are other types of study designs and many variations on each of these study designs that may be useful in determining the optimal dose before heading into your phase III clinical trials. Interested in learning more? Check out this video where Dr. Karen Kesler talks about whether an adaptive design is right for your study.
Dr. Karen Kesler, Senior Statistical Scientist and Dr. Andrea Mospan, Program Manager contributed to this article. Check out the video below where Dr. Kesler discusses the basics of adaptive design.
You will often hear the phrase “learn and confirm” related to clinical trials. Phase II clinical trials are where you “learn” about your treatment and phase III clinical trials are where you “confirm” what you know for regulatory agencies. One important part of the learning that takes place in phase II is looking at various efficacy outcomes to determine which primary end point you will use for phase III and what specific label claims you will be able to make following an approved NDA.
These outcomes measure some component of the biological mechanism the treatment is targeting. An example an endpoint of a biological mechanism would be measuring hemoglobin levels for a treatment of anemia. An advantage of this type of outcome is that they are clear and objective. This type of endpoint may not be an option in the case where the mechanism of action is not well understood or easily measured, such as in many psychiatric drugs.
Measuring some aspect of the physical manifestation of the disease can serve as an outcome. For example, cystic fibrosis impacts lung function. To demonstrate efficacy of a treatment for cystic fibrosis you can use spirometry to assess lung function. This is especially useful when your therapy doesn’t attack the disease state itself, but ameliorates the symptoms of the disease. It’s also good in the situation where the disease progresses slowly, but the symptoms have a much earlier onset.
You can also look at qualitative aspects of subject improvement. Sometimes these are qualitative assessments by the physician or patient, like rating their pain on a scale of 1 to 10. It can also be a more objective measurement like looking for a decrease in the number of hospitalizations during a period of time. These can be important endpoints because there is an opportunity to clearly demonstrate the impact on a patient. Since these types of outcomes are more subjective, it is important to provide as much structure as possible to the assessment to limit the potential for bias in the results.
Often, the outcome of most interest is a direct measure of disease progression, as when you measure death or cancer progression in an oncology study. This sometimes overlaps with the physical manifestation or symptom endpoints, but is more focused on the direct consequences of the disease instead of the early symptoms of progression or opportunistic events.
A few other considerations when considering the types of outcomes you will measure in your phase II trials:
- Collect any measurement you might consider using as an endpoint in your phase III trials. You don’t want any surprises in phase III when mistakes are much more expensive because of the scale of the trial.
- If there is a “typical” endpoint for the indication you are studying, you should collect it, even if you don’t plan to use it as your primary endpoint. This makes comparing your results to prior results in other studies easier to do.
- The types of outcomes described above are all about demonstrating efficacy. Safety is important in all stages of development, including phase II, so consider if there are specific safety endpoints that you should also be measuring.
Each investigational product and indication is unique, and it would be impossible to provide one set of rules or guidelines for picking endpoints, but hopefully you find the information provided here useful.
Dr. Karen Kesler, Senior Statistical Scientist and Dr. Andrea Mospan, Program Manager contributed to this article. Check out the video below where Dr. Kesler discusses the basics of adaptive design.