Thank you to everyone who attended our recent webinar on clinical research statistics for non-statisticians. During the webinar, we weren't able to get to all of the questions. Below, Senior Biostatistician Jennifer Marcello has answered the remainder of the questions.
If you didn't have an opportunity to attend the webinar, it is now available on demand.
If survival data set is analyzed using different statistical approaches how different the curves will be? For example Kaplan Meier versus Poisson Models?
It depends on the data. The Kaplan-Meier method is non-parametric, that means it does not assume an underlying distribution of the survival data. Poisson models are sometimes used as a parametric (has an underlying distribution) alternative to a Cox proportional hazards model (which is non-parametric) under certain conditions these two models will give similar estimates.
What exactly is sensitiviy analyses? Do these have to be pre-specified?
Sensitivity analysis in clinical trials is a way to estimate how different results are between different trial populations or sub-populations or using different assumptions. It does not need to be pre-specified and can be used to evaluate the impact unexpected trial conditions have on the primary outcome. However, it can be pre-specified in cases where in trial planning you suspect it may be useful. You can also add it to a statistical analysis plan conditional on certain trial outcomes. For example, if >10% of subjects drop out early a sensitvity analysis will be performed to compare results of the primary analysis in the ITT population to results in the population of subjects who completed the trial. Then, if 10% or fewer subjects drop out early the sensitivity analysis would not be performed and a note would be added to the CSR stating this fact.
Is unblinded the same thing as open label?
No. Open label refers to a study where blinding of treatment assignments never occurs. Unblinded refers to a state in a blinded study after the treatment assignments have been revealed for one or more subjects.
What parameters should be considered when selecting single blind/double blind/triple blind?
One should consider what is possible based on the treatment. If you are investigating an injection and the comparator is a sham injection, it may not be possible to run a double-blind study because the physician giving the injection would know which treatment was given. Also, one should consider the indication and if there is a form of blinding that is standard in trials for this indication. Thirdly, any recommendations made by the FDA should be considered.
Is it considered that a study is triple masked if there is a reading center receiving and evaluating all the data that are used in the analyses of the study?
If the reading center is masked then they would be considered part of the investigative team and thus fall under double masking.
When discusing Powering, you mentioned using the mean. When is it approriate to use the median as a measure of central tendency?
Power is of particular concern when you are interested in estimating sample size for a test of hypothesis so you choose the mean or median based on the hypothesis you are interested in testing and the best statistical method to test that hypothesis. Mean difference is generally more common because that is what popular statistical methods (like t-tests) use. For descriptive statistics most of the time we present both the mean and the median for continuous data. If there are a few influential outliers in the data then the median will better capture the 'center' of the data than the mean.
Since VAS is a 'scale', doesn't it belong to ''ordinal' data rather than continuous ?
VAS is considered continuous data because it can take any value between 0 and 100 mm. Ordinal data is comprised of discrete values that represent an arbitrary numerical scale where the exact numerical quantity are not meaningful only the ranking of the numerical values are meaningful. For VAS we are treating the measurement on the line as an approximation of pain level so the exact values are meaningful.
For the statistical analysis methods you listed, how do they differ in terms of why you choose one for a trial versus another?
I assume you are asking about how the examples of statistical methods (t-test, ANOVA, chi-square tests) I mentioned differ and how to choose the appropriate one for your trial. The short answer is that it depends on the type of data you are collecting (e.g. continuous, categorical, time to event), how many groups you are comparing, how many time points you have (single time point or many time points), and whether you need to adjust for any covariates or stratification factors. The study statistician can help you decide which test will best answer your research questions.
How is the power of a trial determined and why is a 80% power chosen versus 90% power or the reverse?
Power is set by the clinical team at the start of the trial. Phase III randomized controlled trials with power less than 80% are generally considered under powered to detect a difference in treatment groups. Considerations for using 80% power versus 90% power may depend on ability to recruit subjects to your trial (e.g. higher power results in more subjects and you may not have the resources to run the trial in more subjects or if you are working in a disease population where it is difficult to recruit subjects you may need to choose fewer subjects). You may also take into consideration how many trials you are planning in your program and how much risk you are willing to take that any one of them fails due to lack of power.
Is there any way to assess the power of a trial once it has been published?
Yes, you can use the study results to calculate power based on the actual clinically significant difference, sample size, and standard deviation(s) observed in the trial.
How typically is cost determined from sample size?
Many of the costs associated with clinical trials are related to how many subjects are enrolled (e.g. how many sites do you need to pay to run the trial, how many blood samples do you need to send to a central laboratory for analysis) so the more subjects you have the more you can expect your trial to cost.
When is it appropriate to use mITT?
You can use a modified ITT either in addition to or instead of an ITT population. A modified ITT may be more appropriate if it is impractical to analyze data from subjects who do not have certain data available (e.g. a randomized subject who never received drug and never completed any on-treatment assessments will not have data for a change from baseline treatment response). ITT is the ideal population to use for clincial trials, but sometime for practical purposes a modified ITT population may be used.
What is the best approach to dealing with missing at random data?
If the data is truly missing at random then you can still get an unbiased estimate of treatment differences using appropriate statistical methods like a mixed model. If you have a large amount of missing data you may also want to use an imputation method such as multiple imputation.
Do we usually use type 2 error calculation in carrying out clinical trials ?
Type II error (beta) is your false negative rate, it is the probability of saying your treatment did not work when in fact it did. Power is equal to 1 - beta, so type II error is a value that you pre-specify to set the Power you are assuming for your sample size calculations.
What are the key factors in determining whether positive results from a prospectively defined subgroup analysis are valid (and may represent a successful trial outcome), especially if it differs from the results based on the overall patient population?
For a phase III trial for regulatory agencies success is defined statistical significance for your primary endpoint of interest in the population defined in the trial. For proof of concept studies you can be more flexible with your definition of a successful trial and consider your trial a success (or partial success) if results are positive only in a subgroup of your population. If you think that the subgroup represents a target population for your new drug then it is best to run an entire trial on this group of patients if you want to show efficacy for approval or labeling purposes.
What are pros and cons of conducting after the fact corrections for type 1 error (ie, Hochberg test) versus powering the analyses to show a significant comparison from the get go? Are they equally robust?
Corrections to the type I error rate for multiple testing should ideally be pre-specified because otherwise you risk not having enough power to detect the differences you are looking for. Pre-specified fixed sequence methods (like Hommel) are preferred for multiple testing procedures used in the clinical trial setting rather than a stepwise method (like Hochberg). This is because the regulatory agencies strongly prefer pre-specified statistical methods as opposed to data driven methods and because they provide better control of the overall type I error rate for all comparisons.
Given that early phase studies typically involve a smaller sample size than later phase trials, how can you ensure that sample sizes in early studies are adequate for later phase development?
If you are interested in looking at efficacy in an early phase trial you can estimate sample size for your endpoint of interest much like you would for a larger trial. You may use lower power estimates and/or larger clinically significant differences than you would for a phase III trial to keep the sample size (and thus cost) lower. There is greater risk that you would not see a statistically significant difference but you would likely still have enough information to observe trends in treatment differences.
If the p-value of the primary endpoint is not significant, but the secondary endpoints are statistically signfiicant, can you still say that the secondary endpoints are (nominally) significant, or mention any significance?
Yes, you can still report the results of the secondary endpoints along with the specifics of how you did (or did not) control the type I error rate. Whether or not the hypothesis testing for secondary endpoints was pre-planned or the result of post-hoc exploratory analysis should also be clearly represented (this will hold for both clinical study reports and any public publication or presentation of the results). However, the FDA standard for showing efficacy of a new drug is statistical significant results for the primary endpoint in at least 2 randomized controlled trials so having significant secondary endpoint results is generally not enough to define your trial as successful from a regulatory perspective.
Please feel free to submit additional questions in the comments below. You can see all of our on-demand webinars here.