Blog Post

“What Non-statisticians Need to Know about Statistics in Clinical Trials” Follow Up

May 14, 2013

We recently conducted a webinar “What Non-statisticians Need to Know about Statistics in Clinical Trials” featuring Rho Senior Biostatistician Erika Menius. We wanted to provide answers to some of the questions received during the webinar which we did not have time for during the Q&A session. We thought a number of these answers would be of interest to a larger group.

Is unblinded the same thing as open label?

Yes, open label and unblinded are considered synonymous.

What parameters should be considered when selecting single blind/double blind/triple blind?

Objectivity of the outcome measure is a primary consideration in determining whether and how to blind your study. If there is any subjectivity in the outcome, the person making the assessment should be blinded and the level of that person determines the level of blinding.

How does FDA view efficacy analysis using Intended to Treat (ITT) versus per protocol populations? Do they look down on one versus the other?

FDA prefers ITT, especially in phase III because it is more generalizable and can help limit bias.

What is the importance of power in analyzing results?

Power uses hypothetical results to make study design decisions (e.g. sample size), so it is only truly relevant during study planning. At the end of the study you are analyzing the actual results, and Power is not a relevant factor in the analysis. However, in a failed study, you can use it to describe whether you had enough patients to see a specific treatment difference.

Describe Type I and Type II error in lay terms.

Type I error: The results indicate that a treatment works when it doesn’t.
Type II error: The results indicate that a treatment doesn’t work when it does.

Can you look at the primary efficacy end point without an interim analysis planned in the study protocol while the study is still on-going?

This is not recommended because it can affect your control of Type I error. If it is not an adequate and well controlled trial you plan to use for a marketing application, it is of less concern, but you should make adjustments as if you planned an interim analysis. FDA is extremely skeptical of doing this during any Phase III trial.

Can you explain allowable error (α), power, and standard deviation?

Allowable error (α) is the same as Type I error. It is the probability that the results will say a treatment works when it doesn’t.

Power is the quantified probability of seeing your treatment work in a given study assuming it does work.

Standard deviation is a common measure of the variability of your data.

What is permuted block randomization?

Permuted block randomization helps ensure a balanced number of subjects across treatment arms by guaranteeing that within a small number of subjects, all treatment arms are covered equally.

What is hot deck imputation?

Hot deck imputation typically is not used in clinical trials. This is a form of imputation where each missing value is replaced with an observed value from a “similar subject.”

Can you use an interim analysis to look at accumulation of cases before you proceed to the next step?

When people say “interim analysis” they typically mean an unblinded look at your data. If you aren’t unblinding, you can look at your data at any time. If you are unblinding, you should have a plan in place and adjust your Type I error accordingly.

Can you explain futility analysis? Can these be done unplanned?

Futility analyses are done to quantify the probability that your study will be successful when your interim results don’t look good. Yes, these can be done unplanned, but you will need to reduce your Type I error (α) for the final analysis.

What is CV?

CV is the coefficient of variation. It is the standard deviation divided by the mean.

Can you explain stratified randomization?

Stratified randomization is used when you want to ensure a balanced number of subjects within a specific factor, such as gender. Subjects are randomized within the stratum.

Do you use Bayesian analysis in clinical trials?

Yes. In clinical trials, Bayesian analyses are most often used in dose escalation studies (e.g., continual assessment method).

What is z_(1-α) (from the sample size calculation)?

This is the value of a standardized normal curve associated with 1-α probability (also known as a critical value).

What is the benefit of stratification?

In small studies (typically phase II), stratification reduces the effect of factors that can impact your outcome, making measuring your treatment difference less subject to confounding.

What are the implications of the p-value?

The p-value is the probability that positive results are seen at the conclusion of a study when the drug actually doesn’t work. This value will be small for a drug that truly works.

Why does the p-value change when you do an interim analysis?

When you look at your data more than once, you are more likely to see something you want to see, so you must account for that in your final conclusion by adjusting your p-value.

Is there an equation for how much an interim analysis would affect your Type I error?

Yes, there are a number of equations you could use, such as an α spending equation.

How does FDA see ad hoc analysis used in submission?

An ad hoc analysis can help support the primary analysis or help explain safety issues or biologic mechanisms, but it is unlikely FDA would grant approval based on an ad hoc analysis alone.

Can you review how hazard ratios impact size and length of a trial?
More extreme hazard ratios (either larger or smaller than 1) are associated with the need for smaller and shorter studies. Note: This is a general answer to a highly complex topic.

Is a different sample size calculation equation needed for small clinical trials, large clinical trials, or both?

The sample size calculation is always relevant, no matter the size of the trial. It quantifies how much of a treatment effect you can expect to believe for your study size.

What is incomplete block study design?

This is a study design where you have blocks of subjects and multiple treatments, but not all treatments are appropriate for all blocks. For example, you might have three blocks—pediatric, adult, and geriatric—and two treatments (A and B) where the first treatment (A) isn’t appropriate for pediatric patients and treatment B isn’t appropriate for geriatric patients. In that case, you would have a block of pediatric patients on treatment B, a block of adult patients on treatment A, a block of adult patients on treatment B, and a block of geriatric patients on treatment A.

How is the calculation of n affected when comparing a diagnostic to a gold standard?

If your gold standard is not variable (or only slightly variable), it can reduce your sample size. It can also affect your study design because you may be doing an equivalence study instead of a superiority study. Note: This is a brief answer to a complex question.