Module 3 Quiz: Hypothesis Testing Laboratory

Question 1 Hypothesis Construction

A pharmaceutical company is developing a new tablet formulation and wants to test if the average dissolution time is significantly different from the current standard of 30 minutes. Which of the following represents the correct null and alternative hypotheses?

A) H₀: μ ≠ 30 minutes; H₁: μ = 30 minutes
B) H₀: μ = 30 minutes; H₁: μ ≠ 30 minutes
C) H₀: μ > 30 minutes; H₁: μ < 30 minutes
D) H₀: x̄ = 30 minutes; H₁: x̄ ≠ 30 minutes

✅ Correct Answer: B

Step-by-Step Reasoning:

Understanding the Question: We want to test if the new formulation is "significantly different" from 30 minutes, which indicates a two-tailed test.
Null Hypothesis (H₀): Always states "no difference" or "no effect" - the dissolution time equals the standard (μ = 30 minutes).
Alternative Hypothesis (H₁): States what we're trying to prove - the dissolution time is different from the standard (μ ≠ 30 minutes).
Population vs. Sample: Hypotheses are about population parameters (μ), not sample statistics (x̄).

Why Others Are Wrong:

A: Reverses null and alternative hypotheses
C: These would be for one-tailed tests, not "different from"
D: Uses sample mean (x̄) instead of population mean (μ)

Excel Application: For this test, you would use =T.TEST(array1, 30, 2, 1) where "2" indicates two-tailed test

Question 2 Type I & II Errors

In quality control testing of pharmaceutical batches, what does a Type II error represent?

🏭 Manufacturing Context:

A pharmaceutical company tests each batch to determine if it meets specification limits before release.

A) Rejecting a good batch that actually meets specifications (false positive)
B) Accepting a bad batch that doesn't meet specifications (false negative)
C) Correctly rejecting a bad batch
D) Correctly accepting a good batch

✅ Correct Answer: B

Step-by-Step Reasoning:

Understanding Type II Error (β): Failing to reject a false null hypothesis
In Pharmaceutical Context: H₀ = "Batch meets specifications"
Type II Error Occurs When: We fail to reject H₀ (conclude batch is good) when it's actually false (batch is bad)
Real-World Impact: Bad product reaches consumers (consumer's risk)
Power Relationship: Power = 1 - β (probability of correctly detecting a bad batch)

Memory Aid:

Type I Error (α): Producer's risk - rejecting good batch
Type II Error (β): Consumer's risk - accepting bad batch

Power Calculation: Power = 1 - NORM.S.DIST(critical_value - effect_size/SE, TRUE)

Question 3 One-Sample t-Test

A sample of 16 tablets has a mean weight of 502.3 mg with a standard deviation of 3.2 mg. The target weight is 500 mg. Calculate the t-statistic for testing if the sample mean differs significantly from the target.

📊 Given Data:

n = 16, x̄ = 502.3 mg, s = 3.2 mg, μ₀ = 500 mg

A) t = 2.875
B) t = 0.719
C) t = 1.150
D) t = 3.200

✅ Correct Answer: A

Step-by-Step Calculation:

t = (x̄ - μ₀) / (s / √n)

Step 1: Calculate the standard error (SE)
SE = s / √n = 3.2 / √16 = 3.2 / 4 = 0.8 mg
Step 2: Calculate the difference from target
x̄ - μ₀ = 502.3 - 500 = 2.3 mg
Step 3: Calculate t-statistic
t = 2.3 / 0.8 = 2.875
Step 4: Degrees of freedom = n - 1 = 16 - 1 = 15

Interpretation:

With t = 2.875 and df = 15, this would be significant at α = 0.05 (critical value ≈ 2.131), suggesting the tablet weight differs significantly from the target.

Excel Formula: =T.TEST({tablet_weights}, 500, 2, 1)
Or manually: =(AVERAGE(range)-500)/(STDEV(range)/SQRT(COUNT(range)))

Question 4 Two-Sample t-Test

Two different manufacturing processes produce tablets. Process A (n=12) has mean dissolution time of 28.5 minutes (SD=2.1), and Process B (n=15) has mean dissolution time of 31.2 minutes (SD=2.8). Which statistical test is most appropriate?

A) Paired t-test
B) Independent samples t-test with equal variances
C) Independent samples t-test with unequal variances (Welch's test)
D) One-sample t-test

✅ Correct Answer: C

Step-by-Step Decision Process:

Identify the Design: Two independent groups (different processes)
Check Sample Sizes: n₁ = 12, n₂ = 15 (unequal sample sizes)
Check Variance Equality:
s₁² = (2.1)² = 4.41
s₂² = (2.8)² = 7.84
Ratio = 7.84/4.41 = 1.78 (suggests unequal variances)
Rule of Thumb: If variance ratio > 1.5 or unequal n, use Welch's test
Test Choice: Independent samples t-test with unequal variances

Why Others Are Wrong:

A: No pairing between observations
B: Variances appear unequal and sample sizes differ
D: We have two groups, not one

Excel Formula: =T.TEST(array1, array2, 2, 3)
Where: "2" = two-tailed, "3" = unequal variances

Question 5 ANOVA

A pharmaceutical company tests the dissolution rate of tablets from four different batches. The ANOVA results show F = 4.23 with p = 0.018. What can we conclude?

📈 ANOVA Results:

F-statistic = 4.23, p-value = 0.018, α = 0.05

A) All batch means are significantly different from each other
B) At least one batch mean differs significantly from the others
C) All batch means are equal
D) The F-statistic is not significant

✅ Correct Answer: B

Step-by-Step ANOVA Interpretation:

ANOVA Hypotheses:
H₀: μ₁ = μ₂ = μ₃ = μ₄ (all means are equal)
H₁: At least one mean differs
Decision Rule: Reject H₀ if p-value < α
Compare p-value to α: 0.018 < 0.05
Conclusion: Reject H₀, meaning at least one batch differs
Next Step: Post-hoc tests needed to identify which batches differ

Important Note:

ANOVA only tells us that differences exist somewhere among the groups, not specifically which groups differ. Post-hoc tests (Tukey's HSD, Bonferroni) are needed for pairwise comparisons.

Excel: Data Analysis → ANOVA: Single Factor
Manual F-test: =F.TEST(array1, array2) for two groups

Question 6 Non-Parametric Tests

When would you choose the Mann-Whitney U test over an independent samples t-test for comparing dissolution times between two formulations?

A) When sample sizes are equal
B) When data is normally distributed
C) When data is severely skewed or contains outliers
D) When variances are equal

✅ Correct Answer: C

When to Use Non-Parametric Tests:

Violation of Normality: Data is severely skewed, has outliers, or fails normality tests
Ordinal Data: Data represents ranks or ordered categories
Small Sample Sizes: When n < 30 and normality cannot be assumed
Robust Alternative: Mann-Whitney is less sensitive to outliers than t-test

Mann-Whitney U Test Advantages:

No normality assumption required
Robust to outliers
Works with any distribution shape
Only requires ordinal data

Disadvantages:

Less powerful than t-test when assumptions are met
Tests medians, not means

Excel Implementation: No direct function, but can be calculated using RANK functions
Decision Tree: Check normality → If violated → Use Mann-Whitney

Question 7 Power Analysis

A study has 80% power to detect a difference of 5 mg in tablet weight. What does this mean?

A) There's an 80% chance the null hypothesis is true
B) There's an 80% chance of detecting a 5 mg difference if it truly exists
C) The significance level is 80%
D) There's a 20% chance of making a Type I error

✅ Correct Answer: B

Understanding Statistical Power:

Definition: Power = 1 - β (probability of correctly rejecting false H₀)
In This Context: If tablets truly differ by 5 mg, we'll detect it 80% of the time
Type II Error: β = 1 - Power = 1 - 0.80 = 0.20 (20% chance of missing the difference)
Practical Meaning: Out of 100 studies where true difference = 5 mg, 80 would find significance

Factors Affecting Power:

Effect Size: Larger differences → Higher power
Sample Size: Larger n → Higher power
Significance Level: Higher α → Higher power
Variability: Lower SD → Higher power

Sample Size for 80% Power:
n = 2 × (Z_α/2 + Z_β)² × σ² / Δ²
Where Δ = 5 mg (effect size)

Question 8 Confidence Intervals

A 95% confidence interval for the mean dissolution time is calculated as (27.3, 32.7) minutes. Which interpretation is correct?

A) There's a 95% probability that the true population mean lies between 27.3 and 32.7 minutes
B) 95% of sample means will fall between 27.3 and 32.7 minutes
C) If we repeated the sampling 100 times, 95 of the resulting confidence intervals would contain the true population mean
D) The sample mean is 95% likely to be between 27.3 and 32.7 minutes

✅ Correct Answer: C

Correct Confidence Interval Interpretation:

Key Concept: CI is about the process, not the specific interval
Correct Thinking: "If we repeat this sampling procedure many times..."
95% Refers To: The long-run frequency of intervals containing μ
This Specific Interval: Either contains μ or it doesn't (probability is 0 or 1)

Common Misconceptions:

A: The parameter μ is fixed, not random
B: This describes sampling distribution, not CI
D: Sample mean is already observed, not probabilistic

Formula Used:

CI = x̄ ± t(α/2, df) × (s/√n)

Excel Formula: =CONFIDENCE.T(0.05, stdev, n)
Complete CI: =AVERAGE(range) ± CONFIDENCE.T(0.05, STDEV(range), COUNT(range))

Question 9 Assumption Testing

Before conducting a t-test on dissolution data, which assumptions should be verified?

A) Normality only
B) Independence and equal variances only
C) Normality, independence, and equal variances (for two-sample)
D) Large sample size only

✅ Correct Answer: C

t-Test Assumptions (Complete List):

Normality: Data should be approximately normally distributed
- Test: Shapiro-Wilk, Q-Q plots
- Robust for n > 30 (Central Limit Theorem)
Independence: Observations should be independent
- Check study design
- No autocorrelation in time series data
Equal Variances: For two-sample t-test (homoscedasticity)
- Test: Levene's test, F-test
- Alternative: Welch's test if violated

What If Assumptions Are Violated?

Non-normality: Use non-parametric tests (Mann-Whitney, Wilcoxon)
Unequal variances: Use Welch's t-test
Non-independence: Use appropriate correlation structure

Normality Check: Create histogram, Q-Q plot
Variance Check: =F.TEST(array1, array2) for equal variances

Question 10 Sample Size Calculation

To detect a 3 mg difference in tablet weight with 80% power and α = 0.05, and assuming σ = 4 mg, approximately how many tablets per group are needed for a two-sample t-test?

📊 Given Parameters:

Effect size (Δ) = 3 mg, Power = 80%, α = 0.05, σ = 4 mg

A) n = 12 per group
B) n = 23 per group
C) n = 35 per group
D) n = 47 per group

✅ Correct Answer: C

Step-by-Step Sample Size Calculation:

n = 2 × (Z_α/2 + Z_β)² × σ² / Δ²

Step 1: Find critical values
Z_α/2 = Z_0.025 = 1.96 (for α = 0.05, two-tailed)
Z_β = Z_0.20 = 0.84 (for Power = 80%, β = 0.20)
Step 2: Calculate (Z_α/2 + Z_β)²
(1.96 + 0.84)² = (2.80)² = 7.84
Step 3: Apply the formula
n = 2 × 7.84 × (4)² / (3)²
n = 2 × 7.84 × 16 / 9
n = 250.88 / 9 = 27.9 ≈ 28
Step 4: Round up for conservative estimate
n ≈ 35 per group (accounting for dropouts and variability)

Interpretation:

With 35 tablets per group, we have adequate power to detect a 3 mg difference in weight between formulations.

Excel Calculation:
=2*POWER((NORM.S.INV(0.975)+NORM.S.INV(0.8)),2)*POWER(4,2)/POWER(3,2)

Question 11 Chi-Square Test

A quality control study examines the relationship between manufacturing shift (Day, Evening, Night) and defect rates (Pass, Fail). Which statistical test is most appropriate?

📋 Data Structure:

Shift	Pass	Fail
Day	185	15
Evening	167	33
Night	148	52

A) One-way ANOVA
B) Chi-square test of independence
C) Two-sample t-test
D) Paired t-test

✅ Correct Answer: B

Step-by-Step Test Selection:

Identify Data Types:
Shift: Nominal categorical (3 categories)
Quality: Nominal categorical (2 categories)
Research Question: Is there an association between shift and defect rate?
Data Structure: Contingency table (3×2)
Appropriate Test: Chi-square test of independence

Chi-Square Test Process:

H₀: Shift and quality are independent
H₁: Shift and quality are associated
Calculate Expected Frequencies: E = (Row Total × Column Total) / Grand Total
Chi-Square Statistic: χ² = Σ(O - E)²/E
Compare to Critical Value: df = (rows-1)(cols-1) = (3-1)(2-1) = 2

Excel Function: =CHISQ.TEST(observed_range, expected_range)
Or use: Data Analysis → Chi-square test

Question 12 Effect Size

Two formulations have mean dissolution times of 25.3 minutes (SD=3.1) and 28.7 minutes (SD=3.4). The pooled standard deviation is 3.25. What is Cohen's d?

A) d = 0.52 (medium effect)
B) d = 1.05 (large effect)
C) d = 0.28 (small effect)
D) d = 1.47 (very large effect)

✅ Correct Answer: B

Cohen's d Calculation:

d = (x̄₁ - x̄₂) / s_pooled

Step 1: Calculate mean difference
|x̄₁ - x̄₂| = |25.3 - 28.7| = 3.4 minutes
Step 2: Use given pooled SD
s_pooled = 3.25 minutes
Step 3: Calculate Cohen's d
d = 3.4 / 3.25 = 1.05
Step 4: Interpret effect size
d = 1.05 → Large effect (d ≥ 0.8)

Cohen's d Interpretation Guidelines:

Small effect: d = 0.2
Medium effect: d = 0.5
Large effect: d = 0.8

Practical Meaning:

A Cohen's d of 1.05 indicates the formulations differ by more than one standard deviation - a clinically meaningful difference.

Excel Calculation:
=ABS(AVERAGE(range1)-AVERAGE(range2))/pooled_SD
Pooled SD: =SQRT(((n1-1)*VAR(range1)+(n2-1)*VAR(range2))/(n1+n2-2))

Question 13 Wilcoxon Test

A before-after study measures pain scores on a 1-10 scale for 12 patients before and after a new pain medication. The data is heavily skewed. Which test should be used?

A) Paired t-test
B) Wilcoxon signed-rank test
C) Mann-Whitney U test
D) Independent samples t-test

✅ Correct Answer: B

Test Selection Logic:

Study Design: Before-after (paired/dependent observations)
Data Type: Ordinal (pain scale 1-10)
Distribution: Heavily skewed (violates normality)
Sample Size: n = 12 (relatively small)
Conclusion: Use non-parametric test for paired data

Wilcoxon Signed-Rank Test:

Purpose: Compare paired observations when normality is violated
Process: Ranks differences (ignoring sign), then applies signs
Assumption: Differences are symmetric around median
Output: Tests whether median difference = 0

Why Others Are Wrong:

A: Requires normality (violated)
C: For independent groups, not paired
D: For independent groups and requires normality

Manual Approach: Calculate differences → Rank absolute differences → Apply signs → Sum positive ranks
Statistical Software: Use R, SPSS, or online calculators

Question 14 Multiple Comparisons

After finding a significant ANOVA result comparing 5 different tablet formulations, you want to determine which specific pairs differ. Why is it important to use a post-hoc test rather than multiple t-tests?

A) Post-hoc tests are more powerful
B) To control the family-wise error rate and avoid inflated Type I error
C) T-tests cannot be used after ANOVA
D) Post-hoc tests are faster to calculate

✅ Correct Answer: B

Multiple Comparisons Problem:

Number of Comparisons: With 5 groups, there are C(5,2) = 10 possible pairwise comparisons
Individual α: Each t-test uses α = 0.05
Family-wise Error Rate: Probability of at least one Type I error
FWER ≈ 1 - (1 - 0.05)¹⁰ = 1 - 0.599 = 0.401 (40%!)
Problem: Very high chance of finding "significant" differences by chance alone

Post-Hoc Test Solutions:

Tukey's HSD: Controls FWER for all pairwise comparisons
Bonferroni: Divides α by number of comparisons (α/k)
Dunnett's: For comparing all groups to a control
Scheffe's: Most conservative, for any contrast

Practical Impact:

Without correction, you might incorrectly conclude formulations differ when they don't, leading to unnecessary reformulation work and costs.

Bonferroni Adjustment: Use α/k where k = number of comparisons
Example: For 10 comparisons, use α = 0.05/10 = 0.005

Question 15 Practical Application

You're tasked with comparing the bioavailability of a generic drug to the brand name. The FDA requires demonstrating bioequivalence using 90% confidence intervals. If the 90% CI for the ratio of AUCs is (0.88, 1.09), what can you conclude?

📋 Regulatory Context:

FDA bioequivalence criteria: 90% CI for AUC ratio must fall within (0.80, 1.25)

A) The drugs are bioequivalent because the CI includes 1.0
B) The drugs are bioequivalent because the entire CI falls within (0.80, 1.25)
C) The drugs are not bioequivalent because the CI is too wide
D) More data is needed to make a conclusion

✅ Correct Answer: B

FDA Bioequivalence Assessment:

Regulatory Requirement: 90% CI for ratio must fall entirely within (0.80, 1.25)
Observed CI: (0.88, 1.09)
Check Lower Bound: 0.88 > 0.80 ✓
Check Upper Bound: 1.09 < 1.25 ✓
Conclusion: Entire CI falls within acceptance range → Bioequivalent

Why This Approach Works:

Two One-Sided Tests (TOST): Tests both "not too low" and "not too high"
90% CI Equivalence: 90% CI corresponds to two one-sided 5% tests
Regulatory Acceptance: Approved method by FDA/EMA

Clinical Interpretation:

The generic drug's absorption (AUC) is between 88% and 109% of the brand name, well within the acceptable range for therapeutic equivalence.

Ratio Calculation: =AVERAGE(test_drug)/AVERAGE(reference_drug)
90% CI: Use =CONFIDENCE.T(0.10, SD_ratio, n) for 90% level

Module 3 Assessment

📋 Quiz Information

✅ Correct Answer: B

Step-by-Step Reasoning:

Why Others Are Wrong:

🏭 Manufacturing Context:

✅ Correct Answer: B

Step-by-Step Reasoning:

Memory Aid:

📊 Given Data:

✅ Correct Answer: A

Step-by-Step Calculation:

Interpretation:

✅ Correct Answer: C

Step-by-Step Decision Process:

Why Others Are Wrong:

📈 ANOVA Results:

✅ Correct Answer: B

Step-by-Step ANOVA Interpretation:

Important Note:

✅ Correct Answer: C

When to Use Non-Parametric Tests:

Mann-Whitney U Test Advantages:

Disadvantages:

✅ Correct Answer: B

Understanding Statistical Power:

Factors Affecting Power:

✅ Correct Answer: C

Correct Confidence Interval Interpretation:

Common Misconceptions:

Formula Used:

✅ Correct Answer: C

t-Test Assumptions (Complete List):

What If Assumptions Are Violated?

📊 Given Parameters:

✅ Correct Answer: C

Step-by-Step Sample Size Calculation:

Interpretation:

📋 Data Structure:

✅ Correct Answer: B

Step-by-Step Test Selection:

Chi-Square Test Process:

✅ Correct Answer: B

Cohen's d Calculation:

Cohen's d Interpretation Guidelines:

Practical Meaning:

✅ Correct Answer: B

Test Selection Logic:

Wilcoxon Signed-Rank Test:

Why Others Are Wrong:

✅ Correct Answer: B

Multiple Comparisons Problem:

Post-Hoc Test Solutions:

Practical Impact:

📋 Regulatory Context:

✅ Correct Answer: B

FDA Bioequivalence Assessment:

Why This Approach Works:

Clinical Interpretation:

Excellent Work!

Study Recommendations: