Rho site logo

Rho Knows Clinical Research Services

4 Types of Dose Finding Studies Used in Phase II Clinical Trials

Posted by Brook White on Mon, Mar 11, 2013 @ 12:07 PM
Share:

phase II clinical trials dose finding studiesOne of the key goals of phase II is to determine the optimal dose that you will use going into your phase III trials and that ultimately will be used on your product label submitted for approval as part of the new drug application (NDA).  The optimal dose is the dose that is high enough to demonstrate efficacy in the target population, yet low enough to minimize safety concerns and adverse events.  There are a number of strategies to determine the optimal dose, but here we will look at the four most common dose finding study designs.

Parallel Dose Comparison

Parallel dose comparison studies are the classical dose finding studies and are still one of the most common study designs. In a parallel dose escalation study, several potential doses are selected and subjects are randomized to receive one of the doses or placebo for the entire study. At the end of the study, you can compare each treatment group to the control group and examine both safety and efficacy. Because all treatment groups, including the higher dose cohorts, are dosed at the same time, this study design is best suited for situations where you have a good idea about the safety profile before the study starts. The design is also the basis for some adaptive studies (such as adaptive randomizations or pruning designs) that can reduce the number of subjects exposed to unsafe or ineffective doses.

Cross-over

In a cross-over design, subjects are randomized to a sequence of investigational product (IP) and placebo. Specifically, they are given a dose of the IP and then switched to dosing with a placebo or they start dosing with a placebo and then are switched to doses of IP. The difference between the subjects' response to placebo and IP is the result of interest and by having different groups of subjects exposed to different doses, you can pick the optimal dose. The value of cross over studies is they can determine efficacy of a dose within a subject because subjects act as their own control.  This reduces the variability and can therefore reduce the number of subjects you need to study. However, cross-over designs only work when the drug is quickly eliminated from the body. You need to be able to give a subject the treatment, wait for it to clear, and then give the second treatment in the sequence. It also requires a product that is designed to be used multiple times. For example a product that is intended to be given once, such as a drug to lower blood pressure during heart surgery, can’t be tested in a cross-over study because you won’t do the surgery again just to give the second treatment in the sequence.

Dose Titration

In a dose titration study, you titrate to the maximum tolerated dose within a subject. This means that each subject will start at a low dose and receive an incrementally higher dose until the maximum dose is reached.  In some studies, like chemotherapy for cancer studies, this dose is determined by the onset of side effects--this dose is called the Maximum Tolerated Dose (MTD).  In other studies where the product is less toxic, it may depend on the blood levels of the IP, a metabolite, or a maximum dose determined from preclinical studies.  Dose titration studies work well for treatments of chronic conditions where a drug will be used for a long period of time, and where the dose is likely to be tailored to the subject's weight or reaction.  This design is also good for situations where it is likely that you will see significant differences in the way each subject reacts.  Chronic hypertension medications are a good example of products where dose titration is useful. There is a lot of variability in how individual patients respond to hypertension products and with titrating the dose, you can give a lower dose to those who respond to it.

Dose Escalation

If you are unsure of your safety profile and want to start exposing subjects to lower doses first, consider a dose escalation study. In this type of study, you start with one group of subjects (often referred to as a cohort) and give them a low dose. You observe this group for a period of time, and, if no safety issues are noted, you enroll a new group of subjects and give them a higher dose. This process is repeated until either you reach the maximum tolerated dose or you reach the highest dose you plan to consider. This design increases patient safety because you can start by exposing a small number of subjects to the lowest dose possible. You are mitigating risk both by limiting the initial number of subjects and limiting the exposure of each subject to study drug. You can also add control subjects to each cohort if you want to look at efficacy measures with an appropriate comparison group.

There are other types of study designs and many variations on each of these study designs that may be useful in determining the optimal dose before heading into your phase III clinical trials. Interested in learning more? Check out this video where Dr. Karen Kesler talks about whether an adaptive design is right for your study.

View "Is Adaptive Design Right for You?" Video

Dr. Karen Kesler, Senior Statistical Scientist and Dr. Andrea Mospan, Program Manager contributed to this article.  Check out the video below where Dr. Kesler discusses the basics of adaptive design.

Adaptive Design Series: Some Not So New Adaptive Design Resources

Posted by Brook White on Tue, Nov 20, 2012 @ 11:19 AM
Share:

Note: This article is one of a series about adaptive design that come from a blog written by Dr. Karen Kesler from 2010 to 2011.  That blog is no longer active, but it contained some great information, so we wanted to re-post it here.

adaptive design resourcesAlthough we are used to thinking of adaptive designs as new, some of them have actually been around for awhile. The PhRMA white paper series (DIJ 2006 40 (4)) includes group sequential designs and adaptive randomizations. Statisticians have been working on these methods for decades. Take adaptive randomizations—the seminal Pocock and Simon paper was written in 1975 (Biometrics 31:103-115), and there are scores of excellent articles in this area, examining various methods and their impact. Group sequential methods have also been studied extensively. If you want a couple of great books for this area, I recommend Jennison and Turnbull’s Group Sequential Methods with Applications to Clinical Trials and Whitehead’s The Design and Analysis of Sequential Clinical Trials. Both are approachable and will get you well down the road of understanding this fascinating area.

My point is that we shouldn’t consider every adaptive design as something new—for good or bad. It’s reassuring that we can use a method like group sequential analyses and know what the properties are. Maybe it takes the excitement of “going where no one has gone before” out of using these designs, but there’s less danger, too.

Click me

Adaptive Design Series: My Friend Frane

Posted by Brook White on Thu, Nov 08, 2012 @ 09:50 AM
Share:

Note: This article is one of a series about adaptive design that come from a blog written by Dr. Karen Kesler from 2010 to 2011.  That blog is no longer active, but it contained some great information, so we wanted to re-post it here.

adaptive design randomization“Many randomized studies in small patient populations and studies in early research (such as Phase I and Phase II trials) have small to moderate numbers of patients. In such studies the use of simple randomization or blocking on only one or two factors can easily result in imbalance between treatment groups with respect to one or more potentially prognostic variables. Baseline adaptive randomization methods (such as biased coin methods) can be used to virtually guarantee balance between treatment groups with respect to several covariates.”

This quote comes from the abstract of a paper by James Frane, and I couldn’t resist quoting it because he explains the issue so succinctly. The method he describes is an incredibly intuitive biased coin adaptive randomization that can accommodate multiple balancing factors—including continuous measures. After you randomize a few subjects, you start looking at the subject’s characteristics you’ve identified as important to balance.

Let’s take a simple example of age as the balancing factor and two treatment groups to randomize to. You calculate two p-values based on a t-test comparing the distribution of age between your two treatment groups. The first p-value is calculated based on putting the subject into the first treatment group and the second p-value is calculated based on putting the subject into the second treatment group. Now, you set your “biased coin” probability of randomization to treatment 1 to p1/(p1+p2). In our example, let’s assume that putting our subject in treatment group 1 results in a p-value comparing the age distributions of 0.15 and that putting s/he in treatment group 2 gives us a p-value of 0.52. We then randomize to treatment 1 with probability 0.15/(0.15+0.52)=0.22 and treatment 2 with probability 0.78.

I must confess to having a moment of backwards thinking because I’m so conditioned to thinking that a low p-value indicates “good” that I couldn’t understand why the higher probability went to the higher p-value. But in this case, we want balance, not difference, so bigger p-values are better.

Frane goes on to explain how this method can be applied for multiple balancing factors. Then he runs the method through a randomization test to show that this doesn’t affect your analysis results at the end of the study.

I love this article—it’s clear, concise and covers all the bases. I definitely recommend checking it out.

Frane, J W (1998) “A Method of Biased Coin Randomization, Its Implementation, and Its Validation” DIJ Vol 32, pp 423-432

 Click me

Adaptive Design Series: Pruning Designs 101

Posted by Brook White on Tue, Sep 18, 2012 @ 11:26 AM
Share:

Note: This article is one of a series about adaptive design that come from a blog written by Dr. Karen Kesler from 2010 to 2011.  That blog is no longer active, but it contained some great information, so we wanted to re-post it here. 

Pruning designs are a variation of group sequential studies but without the strict Type I error control. They are ideal for Phase II dose finding when you have a large number of doses or regimens (>3) to investigate. The general idea is that you start with a range of active dose/regimen arms and one placebo (or control) arm, run several interim analyses and “prune” the less effective doses at each interim, leaving the most promising one or two active arms (and placebo) for the final analysis. Pruning is accomplished by not randomizing any more subjects to the pruned arm.

Why would we want to use this crazy design? Compared to a traditional Phase II study with no interims and a smaller number of doses, say 2 or 3, we get information on a wider range of doses and we don’t expose as many subjects to less effective doses. It also provides a hedge against choosing the wrong place on the dose response curve by covering a larger area.

The key to these designs are the futility boundaries. The efficacy bounds are important too, but since we’re in early stages, we’re probably not including enough subjects to be powered to see typical efficacy differences. We’re also going to want to look at enough subjects to get information on big safety issues, so stopping early for efficacy is not really the main goal. Futility, however, is the main goal. In order for this design to be truly effective, you need to prune dose arms early and often.

To build the boundaries, I usually start with Whitehead triangular boundaries for the comparison of two groups. Like these:

whitehead adaptive design

The x-axis is information, the y-axis is the test statistic, the upper boundary (U) is for efficacy and the lower boundary (L) is for futility. At each interim, you calculate the test statistics comparing each active arm to the placebo arm, and plot the information vs. the test statistic. If your point is below the futility boundary, it’s futile and you prune it. If your point is above the efficacy boundary, you stop for efficacy, and if it’s between the two boundaries, you keep going. That’s the basic premise, but of course, the devil is in the details. For example, these Whitehead boundaries are for continual reassessment (i.e. after each subject achieves the outcome of interest) and we’re certainly not interested in doing that. Also, the amount of information at a specific interim could be different for the arms, which leads to a scatter across the x-axis. But the biggest problem is that the futility bound is usually not in the right place to prune arms as often as we want it to.

In order to balance the risks and benefits of pruning, allowing the best doses continue, we need to adjust the Whitehead boundaries. I have to admit, it surprised me that the Whitehead bounds provide such a great starting point. I figured we’d have to adjust the heck out of them, but usually a little bit of tweaking does the trick. (We’ve done some investigation into the optimal shape, but I’ll save that for another post.) For now, we’ll just adjust the intercepts of the two bounds and basically move them up or down. The real question is what are we optimizing these boundaries to? You may have noticed that we’ve moved pretty far afield from the Neymann-Pearson hypothesis testing framework we usual live and play in.

There are a few options here, but my long-time mentor, Ron Helms, came up with some intuitive and practical metrics, and we’ve found them to work pretty darn well. The concept is to think about what you want to happen with your study under different scenarios and use those as your metrics. Specifically, if none of the doses are efficacious (null case) you want to prune a lot of arms very early and by the end come to the conclusion that none of the doses are efficacious, so your metric is how often do you not choose a dose in this case. On the other hand, if some doses work, you want to pick the one that works the best, so your metric in this case is the percentage of time you pick the “best” dose (according to your assumed efficacy levels) as the “winner” (it either crosses the efficacy boundary or has the best efficacy at the end). The last metric we use tries to maximize the pruning—without this one, we wouldn’t prune arms and we’d lose the efficiency that we want. This one is trickier to define, but we usually measure how often we have 2 active arms after the last interim analysis. We can also use total sample size or how often ineffective doses are pruned at each interim. Then you run simulated studies and adjust the boundaries up or down (they can move independently of each other) until you hit the balance of pruning vs. picking the right dose. Another great feature of these metrics is that non-statisticians understand them, so it’s easy to get the rest of your research team to help pick the boundaries. (And if you’ve ever tried to explain the statistical version of power to non-statisticians, you’ll appreciate this aspect.)

I’d love to hear other people’s ideas for metrics—I’m not convinced we’ve hit on the perfect combination yet. (And it’s okay to take me to task over the lack of statistical rigor—it’s always good to discuss that.)

 Click me

Adaptive Design Series: Futility-A Big Reason We Are Here

Posted by Brook White on Fri, Aug 31, 2012 @ 12:03 PM
Share:

Note: This article is one of a series about adaptive design that come from a blog written by Dr. Karen Kesler from 2010 to 2011.  That blog is no longer active, but it contained some great information, so we wanted to re-post it here.

futility in adaptive designWhen we started doing pruning studies, I was frustrated by the futility boundaries found in group sequential designs. They’re generally the opposite of the efficacy bounds, which means you have to prove your compound is significantly worse than your control before you cross the boundary, for heaven’s sake! I know nobody wants to talk about the fact that their drug or device or biologic doesn’t work, but as statisticians, it’s our duty to remind them that reality doesn’t play nice. Especially now, when money is tight, we need to put our resources toward therapies that work—which means killing doses or therapies that don’t work as quickly as possible.

The implication I see is that spending a bit more effort than is fashionable right now in Phase II makes sense. It’s easier to justify looking at a wider range of doses in Phase II if you can eliminate them early in a study using futility boundaries. (See “Why Pruning Designs are my favorite adaptive designs” for more detail on that.) Putting a little more power into a “go/no-go” study could save you a lot of headaches by preventing Phase III trials that have no hope for success.

This all begs the question of what do we use for futility—either in pruning doses or killing development programs. I believe the answer lies in simulations. Every company and clinical area has a tangled mess of expectations and risk-benefit limits. Some companies are willing to go to Phase III with less information than others. Some companies have more compounds in the pipeline and therefore are willing to kill one to put the resources into a more promising one. Making a one-size-fits-all framework for futility would be impossible. By simulating what will happen in a few key scenarios (e.g. “compound doesn’t work at all”, “compound works as expected”, etc.) you can quantify the risks and benefits for the decision makers in an approachable manner. (Certainly, more approachable than power/sample size calculations!)  I know that it’s a hard argument to make. “Gosh, why don’t we consider options that will kill this therapy early?” just doesn’t go over well in the board room, but if you consider the bigger picture, futility can save you big.

Click me

Adaptive Design Series: Why Pruning Designs Are My Favorite

Posted by Brook White on Thu, Aug 23, 2012 @ 09:39 AM
Share:

Note: This article is one of a series about adaptive design that come from a blog written by Dr. Karen Kesler from 2010 to 2011.  That blog is no longer active, but it contained some great information, so we wanted to re-post it here.

pruning adaptive designI think that pruning designs get a bad rap. Sure, you can’t control Type I error if you’re looking at the data frequently and getting rid of treatment groups in the middle of the study, but there’s a place for this type of design in finding your best dose. Before I get too far ahead of myself, recall that in a pruning study, you start with a bunch of doses or regimens of your active compound and one arm of “placebo”. The placebo could be an active drug, but the goal here is to show superiority, not non-inferiority. You then set up boundaries—both efficacy and futility—for use in multiple interim analyses, with the goal of eliminating (“pruning”) the less effective or less safe doses early in the study. At each of the interim analyses, you calculate the test statistic for comparing each active arm to the placebo arm and see how it compares to your boundaries. Test statistic below the futility boundary? Prune that arm! Test statistic still in the middle of your boundaries? Keep randomizing subjects to that arm. Test statistic above the efficacy boundary? Think about stopping your study. Theoretically, at the end of the study, you’re down to a placebo arm and 1 or 2 active arms—the ones you now want to take to your confirmatory Phase III trial. If you start with 4-6 active dose/regimens and at least 2 interim analyses, you can see that the number of hypothesis tests really starts stacking up. Suppose you have 5 active arms and 3 interim analyses—that’s potentially 15 hypothesis tests before you even get to your final analysis! Of course, if you really have 15 tests, the design isn’t working right. You should be eliminating 1-2 arms in each interim analysis.

If it is working correctly, you would have a little bit of information on the less effective doses and as much information on your best dose as you would have from a traditional Phase II study. You’re also a bit more protected from guessing wrong on your dose-response curve. If it’s a little off, one of the doses you expected to prune could be the big winner. If it’s way off, you’re going to see that earlier than if you had waited to the end of the study to look at your data. Plus, one of the things I keep hearing from the FDA is that we don’t spend enough time investigating doses. This study design can help make looking at a wider range of doses or regimens more palatable to your teammates monitoring the program budget.

It’s not a panacea, however. It is definitely not “adequate and well controlled” so you can’t use it as a pivotal study. I know that a lot of people planning development programs would like to be able to use their Phase II studies as both dose-finding and confirmatory, but as they say, you can’t always have your cake and eat it, too. I’ve just gotten to the point where I expect the study to go in a completely unexpected direction since that seems to happen more often than not. In which case, I feel reassured by this design—I may not have the best assumptions going into the study, but I have more room for screwing up. That’s why it’s my favorite.

Click me

Adaptive Design Series: Why is doing dose escalation studies so hard?

Posted by Brook White on Wed, Jul 11, 2012 @ 10:42 AM
Share:

Note: This article is one of a series about adaptive design that come from a blog written by Dr. Karen Kesler from 2010 to 2011.  That blog is no longer active, but it contained some great information, so we wanted to re-post it here.

adaptive design dose escalationOn the surface, dose escalation studies are some of the most intuitive studies around—heck, everybody thinks they can run a traditional 3+3 design—“Just dose three people and if nobody has a toxicity, increase the dose for the next three—if two of the three have a toxicity, stop the study and declare a winner.” Yes, I’m exaggerating, but my point is that for an adaptive design that everybody understands, I’ve been really struggling with dose escalation designs in recent years and I don’t think I’m being dense about it. The problem I’m finding is in the basic principle of defining a “toxicity”. If you work in oncology, you can just stop reading now, you guys have this down to a science—chemotherapeutic agents by their very mechanism of action cause specific types of “bad things” to happen (e.g. neutropenia, infections, etc.) so toxicities are predictable and easy to define. Moving out of that well-defined realm, however, toxicities can be hard to define. Since the toxicity needs to relate directly to the therapy under consideration, if the mechanism of action or the consequences are not known (not an unusual situation in drug development, btw) how do you define it? We can fall back on outcomes that are typical for the clinical area, but how do we know that the drug is causing the toxicity and that it wouldn’t happen anyways? With only three or four subjects in each dosing cohort, one or two events can have a huge impact on the conduct of the study and therefore the determination of the “best” dose.

Perhaps an illustration is in order—take my favorite clinical area, sickle cell disease. Say we’re trying to treat patients who are having an acute vaso-occlusive crisis (VOC). (Lots of blood cells sickling and sticking together, causing a huge amount of pain, and doing all sorts of organ damage.) We have a good idea of whether our new therapy works because people get out of the emergency department or hospital faster. But how do we define a toxicity? Researchers have made huge strides in understanding the mechanism of a VOC over the years, but I can assure you that we only get the tip of the iceberg. On top of that, there’s no other compounds to actually treat this situation (patients only get palliative pain therapy) so we don’t have any experience with seeing how other compounds work. Bottom line--we’re flying blind here in terms of mechanism. To define our toxicity, we could choose some typical adverse events that occur in these patients, like acute chest syndrome. But if the compound doesn’t affect those events, that’s building an entire study on the poor foundation of an irrelevant endpoint. We could go general and choose any bad event of a sufficient magnitude—the “Any AE of Grade III or IV” option. But that puts our study at the mercy of random (or not so random given how sick these patients are) bad adverse events.

My challenge to you—tell us how you’ve dealt with this situation. You don’t have to give any trade secrets away, just describe the clinical area, what the expected effects of the compound were (if any) and how you chose a definition of “toxicity”. Maybe we’ll all learn something.

View Adaptive Design 101 Video

Adaptive Design Series: A Lesson in the Interpretation of Results

Posted by Brook White on Tue, Jun 19, 2012 @ 09:16 AM
Share:

Note: This article is one of a series about adaptive design that come from a blog written by Dr. Karen Kesler from 2010 to 2011.  That blog is no longer active, but it contained some great information, so we wanted to re-post it here.

I had this fascinating discussion with one of my colleagues this week about the Whitehead boundaries and the interpretation of crossing the futility boundary. I had always treated the area below the futility boundary as the null hypothesis acceptance region and glossed over the fact that that region is actually two regions. The lowest portion is where you conclude that your treatment of interest is actually significantly worse than your control (solid portion of the line) and the upper region where you conclude that there’s no significant difference (dashed portion of the line).

futility in adaptive design

In the case he was examining, the investigators had crossed the boundary at the dashed line portion and rightfully stopped the study due to futility. (Note that the picture is from Whitehead’s book The Design and Analysis of Sequential Clinical Trials and does not reflect the results of the study in question.) The investigators concluded that the trial demonstrated that the experimental method provided no benefit over the standard method. Other people claimed that their conclusion was flawed because they stopped early and therefore had little power to demonstrate a lack of difference between the methods. I feel like both sides have a valid argument, but that they’re comparing apples to oranges.

There’s a big difference between stopping a study because you can’t show efficacy and showing conclusively that two treatments are equivalent. I can understand the frustration of the research community at not having a conclusive answer about the new method, but if the new method is more risky, expensive or labor intensive, there may not be interest in using it if you can’t show it’s significantly better. It would have been irresponsible for the study investigators to continue the study once they realized that they couldn’t possibly achieve their goal of showing the new method to be better.

Besides learning more about a method I thought I had down pat, this also reminded me of the subtleties of working in clinical trials and how we need to be very careful of our interpretation of results.

Click me

 

Ethics and Adaptive Design

Posted by Brook White on Fri, Jun 01, 2012 @ 09:07 AM
Share:

Note: This article is one of a series about adaptive design that come from a blog written by Dr. Karen Kesler from 2010 to 2011.  That blog is no longer active, but it contained some great information, so we wanted to re-post it here.

ethics and adaptive designIt seems like periodically our culture goes through a phase where it’s cool to only discuss things in terms of money and not touch on the much more slippery issues of what’s good for people. Even the pharmaceutical industry, which is built upon making people healthier and therefore improving their lives, follows this notion. Luckily, statisticians never rank very high on anybody’s “coolness” scale, so we can discuss the issues we’re facing in designing studies or analyzing data in terms of “what’s the right thing to do” to our heart’s content. And it’s a good thing, because we’re always running into problems like how many people do we need in a study so we get a solid answer without exposing more subjects than necessary to this potentially harmful therapy.

Adaptive designs can help us with this moral dilemma, which is a really good reason to consider them. The whole concept of an adaptive design in a Phase II clinical trial means that instead of exposing a hundred or so subjects to a new compound and waiting to see what happens, we stop at 25 or 50 subjects, look at what’s going on and make appropriate changes as needed. If we’ve really got an unsafe or completely ineffective compound, maybe we decide to stop it and we’ve saved a bunch of people from getting those nasty side effects, or, ensured they got something that will help them. Specific designs illustrate how much of an impact the adaptive aspect can have--take pruning designs for example. By eliminating unsafe or ineffective doses throughout the trial, we end up with fewer subjects taking those “bad” or “less effective” doses. And by starting with a wider range of doses than used in a classical design, we increase the probability that we’ll find that golden “safe and effective” dose that will benefit patients. Another kind of adaptive design is the Seamless Phase II/III studies -- an example of a broader issue—getting to market faster means that the population at large gains access to a new, effective therapy. If we can use the seamless design to eliminate wasted, bureaucratic time, that’s certainly the right thing to do.

We’re lucky, really, in our industry, a lot of the cost of developing a new therapy is driven by the per subject costs, so when we’re being cost effective, we’re also being patient effective. And if a new compound works, getting it approved earlier means that more people can reap the health benefits as well as the sponsor having more time on patent to make a profit. Saving money by developing faster and using subjects more effectively not only makes the stockholders happy, it makes the ethicists happy, too.

Click me