Module 3 Assessment

Hypothesis Testing Laboratory Quiz

📋 Quiz Information

Test your understanding of hypothesis testing concepts, statistical inference, and practical applications in pharmaceutical formulation development.

Questions
15 Multiple Choice
Time Limit
45 minutes
Passing Score
80% (12/15)
Attempts
Unlimited

Progress: 0/15

Question 1 Hypothesis Construction
A pharmaceutical company is developing a new tablet formulation and wants to test if the average dissolution time is significantly different from the current standard of 30 minutes. Which of the following represents the correct null and alternative hypotheses?

✅ Correct Answer: B

Step-by-Step Reasoning:
  1. Understanding the Question: We want to test if the new formulation is "significantly different" from 30 minutes, which indicates a two-tailed test.
  2. Null Hypothesis (H₀): Always states "no difference" or "no effect" - the dissolution time equals the standard (μ = 30 minutes).
  3. Alternative Hypothesis (H₁): States what we're trying to prove - the dissolution time is different from the standard (μ ≠ 30 minutes).
  4. Population vs. Sample: Hypotheses are about population parameters (μ), not sample statistics (x̄).
Why Others Are Wrong:
  • A: Reverses null and alternative hypotheses
  • C: These would be for one-tailed tests, not "different from"
  • D: Uses sample mean (x̄) instead of population mean (μ)
Excel Application: For this test, you would use =T.TEST(array1, 30, 2, 1) where "2" indicates two-tailed test
Question 2 Type I & II Errors
In quality control testing of pharmaceutical batches, what does a Type II error represent?

🏭 Manufacturing Context:

A pharmaceutical company tests each batch to determine if it meets specification limits before release.

✅ Correct Answer: B

Step-by-Step Reasoning:
  1. Understanding Type II Error (β): Failing to reject a false null hypothesis
  2. In Pharmaceutical Context: H₀ = "Batch meets specifications"
  3. Type II Error Occurs When: We fail to reject H₀ (conclude batch is good) when it's actually false (batch is bad)
  4. Real-World Impact: Bad product reaches consumers (consumer's risk)
  5. Power Relationship: Power = 1 - β (probability of correctly detecting a bad batch)
Memory Aid:

Type I Error (α): Producer's risk - rejecting good batch
Type II Error (β): Consumer's risk - accepting bad batch

Power Calculation: Power = 1 - NORM.S.DIST(critical_value - effect_size/SE, TRUE)
Question 3 One-Sample t-Test
A sample of 16 tablets has a mean weight of 502.3 mg with a standard deviation of 3.2 mg. The target weight is 500 mg. Calculate the t-statistic for testing if the sample mean differs significantly from the target.

📊 Given Data:

n = 16, x̄ = 502.3 mg, s = 3.2 mg, μ₀ = 500 mg

✅ Correct Answer: A

Step-by-Step Calculation:
t = (x̄ - μ₀) / (s / √n)
  1. Step 1: Calculate the standard error (SE)
    SE = s / √n = 3.2 / √16 = 3.2 / 4 = 0.8 mg
  2. Step 2: Calculate the difference from target
    x̄ - μ₀ = 502.3 - 500 = 2.3 mg
  3. Step 3: Calculate t-statistic
    t = 2.3 / 0.8 = 2.875
  4. Step 4: Degrees of freedom = n - 1 = 16 - 1 = 15
Interpretation:

With t = 2.875 and df = 15, this would be significant at α = 0.05 (critical value ≈ 2.131), suggesting the tablet weight differs significantly from the target.

Excel Formula: =T.TEST({tablet_weights}, 500, 2, 1)
Or manually: =(AVERAGE(range)-500)/(STDEV(range)/SQRT(COUNT(range)))
Question 4 Two-Sample t-Test
Two different manufacturing processes produce tablets. Process A (n=12) has mean dissolution time of 28.5 minutes (SD=2.1), and Process B (n=15) has mean dissolution time of 31.2 minutes (SD=2.8). Which statistical test is most appropriate?

✅ Correct Answer: C

Step-by-Step Decision Process:
  1. Identify the Design: Two independent groups (different processes)
  2. Check Sample Sizes: n₁ = 12, n₂ = 15 (unequal sample sizes)
  3. Check Variance Equality:
    s₁² = (2.1)² = 4.41
    s₂² = (2.8)² = 7.84
    Ratio = 7.84/4.41 = 1.78 (suggests unequal variances)
  4. Rule of Thumb: If variance ratio > 1.5 or unequal n, use Welch's test
  5. Test Choice: Independent samples t-test with unequal variances
Why Others Are Wrong:
  • A: No pairing between observations
  • B: Variances appear unequal and sample sizes differ
  • D: We have two groups, not one
Excel Formula: =T.TEST(array1, array2, 2, 3)
Where: "2" = two-tailed, "3" = unequal variances
Question 5 ANOVA
A pharmaceutical company tests the dissolution rate of tablets from four different batches. The ANOVA results show F = 4.23 with p = 0.018. What can we conclude?

📈 ANOVA Results:

F-statistic = 4.23, p-value = 0.018, α = 0.05

✅ Correct Answer: B

Step-by-Step ANOVA Interpretation:
  1. ANOVA Hypotheses:
    H₀: μ₁ = μ₂ = μ₃ = μ₄ (all means are equal)
    H₁: At least one mean differs
  2. Decision Rule: Reject H₀ if p-value < α
  3. Compare p-value to α: 0.018 < 0.05
  4. Conclusion: Reject H₀, meaning at least one batch differs
  5. Next Step: Post-hoc tests needed to identify which batches differ
Important Note:

ANOVA only tells us that differences exist somewhere among the groups, not specifically which groups differ. Post-hoc tests (Tukey's HSD, Bonferroni) are needed for pairwise comparisons.

Excel: Data Analysis → ANOVA: Single Factor
Manual F-test: =F.TEST(array1, array2) for two groups
Question 6 Non-Parametric Tests
When would you choose the Mann-Whitney U test over an independent samples t-test for comparing dissolution times between two formulations?

✅ Correct Answer: C

When to Use Non-Parametric Tests:
  1. Violation of Normality: Data is severely skewed, has outliers, or fails normality tests
  2. Ordinal Data: Data represents ranks or ordered categories
  3. Small Sample Sizes: When n < 30 and normality cannot be assumed
  4. Robust Alternative: Mann-Whitney is less sensitive to outliers than t-test
Mann-Whitney U Test Advantages:
  • No normality assumption required
  • Robust to outliers
  • Works with any distribution shape
  • Only requires ordinal data
Disadvantages:
  • Less powerful than t-test when assumptions are met
  • Tests medians, not means
Excel Implementation: No direct function, but can be calculated using RANK functions
Decision Tree: Check normality → If violated → Use Mann-Whitney
Question 7 Power Analysis
A study has 80% power to detect a difference of 5 mg in tablet weight. What does this mean?

✅ Correct Answer: B

Understanding Statistical Power:
  1. Definition: Power = 1 - β (probability of correctly rejecting false H₀)
  2. In This Context: If tablets truly differ by 5 mg, we'll detect it 80% of the time
  3. Type II Error: β = 1 - Power = 1 - 0.80 = 0.20 (20% chance of missing the difference)
  4. Practical Meaning: Out of 100 studies where true difference = 5 mg, 80 would find significance
Factors Affecting Power:
  • Effect Size: Larger differences → Higher power
  • Sample Size: Larger n → Higher power
  • Significance Level: Higher α → Higher power
  • Variability: Lower SD → Higher power
Sample Size for 80% Power:
n = 2 × (Z_α/2 + Z_β)² × σ² / Δ²
Where Δ = 5 mg (effect size)
Question 8 Confidence Intervals
A 95% confidence interval for the mean dissolution time is calculated as (27.3, 32.7) minutes. Which interpretation is correct?

✅ Correct Answer: C

Correct Confidence Interval Interpretation:
  1. Key Concept: CI is about the process, not the specific interval
  2. Correct Thinking: "If we repeat this sampling procedure many times..."
  3. 95% Refers To: The long-run frequency of intervals containing μ
  4. This Specific Interval: Either contains μ or it doesn't (probability is 0 or 1)
Common Misconceptions:
  • A: The parameter μ is fixed, not random
  • B: This describes sampling distribution, not CI
  • D: Sample mean is already observed, not probabilistic
Formula Used:
CI = x̄ ± t(α/2, df) × (s/√n)
Excel Formula: =CONFIDENCE.T(0.05, stdev, n)
Complete CI: =AVERAGE(range) ± CONFIDENCE.T(0.05, STDEV(range), COUNT(range))
Question 9 Assumption Testing
Before conducting a t-test on dissolution data, which assumptions should be verified?

✅ Correct Answer: C

t-Test Assumptions (Complete List):
  1. Normality: Data should be approximately normally distributed
    • Test: Shapiro-Wilk, Q-Q plots
    • Robust for n > 30 (Central Limit Theorem)
  2. Independence: Observations should be independent
    • Check study design
    • No autocorrelation in time series data
  3. Equal Variances: For two-sample t-test (homoscedasticity)
    • Test: Levene's test, F-test
    • Alternative: Welch's test if violated
What If Assumptions Are Violated?
  • Non-normality: Use non-parametric tests (Mann-Whitney, Wilcoxon)
  • Unequal variances: Use Welch's t-test
  • Non-independence: Use appropriate correlation structure
Normality Check: Create histogram, Q-Q plot
Variance Check: =F.TEST(array1, array2) for equal variances
Question 10 Sample Size Calculation
To detect a 3 mg difference in tablet weight with 80% power and α = 0.05, and assuming σ = 4 mg, approximately how many tablets per group are needed for a two-sample t-test?

📊 Given Parameters:

Effect size (Δ) = 3 mg, Power = 80%, α = 0.05, σ = 4 mg

✅ Correct Answer: C

Step-by-Step Sample Size Calculation:
n = 2 × (Z_α/2 + Z_β)² × σ² / Δ²
  1. Step 1: Find critical values
    Z_α/2 = Z_0.025 = 1.96 (for α = 0.05, two-tailed)
    Z_β = Z_0.20 = 0.84 (for Power = 80%, β = 0.20)
  2. Step 2: Calculate (Z_α/2 + Z_β)²
    (1.96 + 0.84)² = (2.80)² = 7.84
  3. Step 3: Apply the formula
    n = 2 × 7.84 × (4)² / (3)²
    n = 2 × 7.84 × 16 / 9
    n = 250.88 / 9 = 27.9 ≈ 28
  4. Step 4: Round up for conservative estimate
    n ≈ 35 per group (accounting for dropouts and variability)
Interpretation:

With 35 tablets per group, we have adequate power to detect a 3 mg difference in weight between formulations.

Excel Calculation:
=2*POWER((NORM.S.INV(0.975)+NORM.S.INV(0.8)),2)*POWER(4,2)/POWER(3,2)
Question 11 Chi-Square Test
A quality control study examines the relationship between manufacturing shift (Day, Evening, Night) and defect rates (Pass, Fail). Which statistical test is most appropriate?

📋 Data Structure:

ShiftPassFail
Day18515
Evening16733
Night14852

✅ Correct Answer: B

Step-by-Step Test Selection:
  1. Identify Data Types:
    Shift: Nominal categorical (3 categories)
    Quality: Nominal categorical (2 categories)
  2. Research Question: Is there an association between shift and defect rate?
  3. Data Structure: Contingency table (3×2)
  4. Appropriate Test: Chi-square test of independence
Chi-Square Test Process:
  1. H₀: Shift and quality are independent
  2. H₁: Shift and quality are associated
  3. Calculate Expected Frequencies: E = (Row Total × Column Total) / Grand Total
  4. Chi-Square Statistic: χ² = Σ(O - E)²/E
  5. Compare to Critical Value: df = (rows-1)(cols-1) = (3-1)(2-1) = 2
Excel Function: =CHISQ.TEST(observed_range, expected_range)
Or use: Data Analysis → Chi-square test
Question 12 Effect Size
Two formulations have mean dissolution times of 25.3 minutes (SD=3.1) and 28.7 minutes (SD=3.4). The pooled standard deviation is 3.25. What is Cohen's d?

✅ Correct Answer: B

Cohen's d Calculation:
d = (x̄₁ - x̄₂) / s_pooled
  1. Step 1: Calculate mean difference
    |x̄₁ - x̄₂| = |25.3 - 28.7| = 3.4 minutes
  2. Step 2: Use given pooled SD
    s_pooled = 3.25 minutes
  3. Step 3: Calculate Cohen's d
    d = 3.4 / 3.25 = 1.05
  4. Step 4: Interpret effect size
    d = 1.05 → Large effect (d ≥ 0.8)
Cohen's d Interpretation Guidelines:
  • Small effect: d = 0.2
  • Medium effect: d = 0.5
  • Large effect: d = 0.8
Practical Meaning:

A Cohen's d of 1.05 indicates the formulations differ by more than one standard deviation - a clinically meaningful difference.

Excel Calculation:
=ABS(AVERAGE(range1)-AVERAGE(range2))/pooled_SD
Pooled SD: =SQRT(((n1-1)*VAR(range1)+(n2-1)*VAR(range2))/(n1+n2-2))
Question 13 Wilcoxon Test
A before-after study measures pain scores on a 1-10 scale for 12 patients before and after a new pain medication. The data is heavily skewed. Which test should be used?

✅ Correct Answer: B

Test Selection Logic:
  1. Study Design: Before-after (paired/dependent observations)
  2. Data Type: Ordinal (pain scale 1-10)
  3. Distribution: Heavily skewed (violates normality)
  4. Sample Size: n = 12 (relatively small)
  5. Conclusion: Use non-parametric test for paired data
Wilcoxon Signed-Rank Test:
  • Purpose: Compare paired observations when normality is violated
  • Process: Ranks differences (ignoring sign), then applies signs
  • Assumption: Differences are symmetric around median
  • Output: Tests whether median difference = 0
Why Others Are Wrong:
  • A: Requires normality (violated)
  • C: For independent groups, not paired
  • D: For independent groups and requires normality
Manual Approach: Calculate differences → Rank absolute differences → Apply signs → Sum positive ranks
Statistical Software: Use R, SPSS, or online calculators
Question 14 Multiple Comparisons
After finding a significant ANOVA result comparing 5 different tablet formulations, you want to determine which specific pairs differ. Why is it important to use a post-hoc test rather than multiple t-tests?

✅ Correct Answer: B

Multiple Comparisons Problem:
  1. Number of Comparisons: With 5 groups, there are C(5,2) = 10 possible pairwise comparisons
  2. Individual α: Each t-test uses α = 0.05
  3. Family-wise Error Rate: Probability of at least one Type I error
    FWER ≈ 1 - (1 - 0.05)¹⁰ = 1 - 0.599 = 0.401 (40%!)
  4. Problem: Very high chance of finding "significant" differences by chance alone
Post-Hoc Test Solutions:
  • Tukey's HSD: Controls FWER for all pairwise comparisons
  • Bonferroni: Divides α by number of comparisons (α/k)
  • Dunnett's: For comparing all groups to a control
  • Scheffe's: Most conservative, for any contrast
Practical Impact:

Without correction, you might incorrectly conclude formulations differ when they don't, leading to unnecessary reformulation work and costs.

Bonferroni Adjustment: Use α/k where k = number of comparisons
Example: For 10 comparisons, use α = 0.05/10 = 0.005
Question 15 Practical Application
You're tasked with comparing the bioavailability of a generic drug to the brand name. The FDA requires demonstrating bioequivalence using 90% confidence intervals. If the 90% CI for the ratio of AUCs is (0.88, 1.09), what can you conclude?

📋 Regulatory Context:

FDA bioequivalence criteria: 90% CI for AUC ratio must fall within (0.80, 1.25)

✅ Correct Answer: B

FDA Bioequivalence Assessment:
  1. Regulatory Requirement: 90% CI for ratio must fall entirely within (0.80, 1.25)
  2. Observed CI: (0.88, 1.09)
  3. Check Lower Bound: 0.88 > 0.80 ✓
  4. Check Upper Bound: 1.09 < 1.25 ✓
  5. Conclusion: Entire CI falls within acceptance range → Bioequivalent
Why This Approach Works:
  • Two One-Sided Tests (TOST): Tests both "not too low" and "not too high"
  • 90% CI Equivalence: 90% CI corresponds to two one-sided 5% tests
  • Regulatory Acceptance: Approved method by FDA/EMA
Clinical Interpretation:

The generic drug's absorption (AUC) is between 88% and 109% of the brand name, well within the acceptable range for therapeutic equivalence.

Ratio Calculation: =AVERAGE(test_drug)/AVERAGE(reference_drug)
90% CI: Use =CONFIDENCE.T(0.10, SD_ratio, n) for 90% level
0%

Excellent Work!

You answered 12 out of 15 questions correctly.

Study Recommendations: