Confidence Intervals

Overview

A confidence interval (CI) provides a range of values that is likely to contain the true parameter of interest. The level of confidence, denoted as , represents the proportion of confidence intervals that will contain the true parameter if we were to repeat the study multiple times.

For example, if we have a CI, this implies:

Consequently, we have:

Finding the Critical Value

When seeking the value of such that the cumulative probability satisfies:

we are interested in the left tail of the standard normal distribution.

The corresponding -value for this cumulative probability is approximately:

This value indicates that 5% of the data lies to the left of .


One-Sided Confidence Intervals

CI Level
10%1.2890%
5%1.64595%
1%2.3399%

Two-Sided Confidence Intervals

CI Level
10%1.6490%
5%1.9695%
1%2.5899%

Confidence Intervals for Proportions

Let , where:

  • (ensures sufficient sample size)

The sample proportion is calculated as: where is the number of successes.

For a confidence interval

  • Adjusted sample size:

  • Adjusted proportion:

  • The confidence interval is then given by:


Small Sample Confidence Interval for Population Mean

When the sample size , and if the population is normally distributed, the -distribution is used to calculate the CI for the population mean:

Where:

  • is the population standard deviation.
  • is the standard deviation of the sample.
  • is the size of the sample.
  • represents the degrees of freedom.

Warning: If a small sample is drawn and the population standard deviation is known, use the Z-score instead of :

Confidence Interval Formulas

  1. General CI formula:

  2. Lower CI limit:

  3. Upper CI limit:


Confidence Interval for the Difference Between Two Means

Consider two samples and , where:

and are distributed as and , respectively.

Distribution of the Difference

The difference is distributed as:

where:

Confidence Interval Level

The confidence interval for the difference between the two means is given by:


Confidence Interval for Paired Data (n ≤ 30)

The confidence interval for the mean difference is given by:


Factors Affecting Margin of Error (MoE)

The margin of error is calculated as:

To Reduce MoE

  • Increase
  • Decrease
  • Increase

Hypothesis Testing

Introduction

Hypothesis testing is a statistical method used to make decisions based on data analysis. It involves formulating two competing hypotheses: the null hypothesis and the alternative hypothesis . The aim is to assess the strength of the evidence against the null hypothesis using sample data.

Hypotheses

  • Null Hypothesis : This hypothesis states that there is no effect or no difference, serving as the default assumption. For example, it could assert that the means of two populations are equal.
  • Alternative Hypothesis : This hypothesis represents what we aim to prove, suggesting that there is an effect or a difference. For instance, it might claim that the means of two populations are not equal.

Critical Values

Critical values are thresholds that define the boundaries of the acceptance and rejection regions for the null hypothesis. In a standard normal distribution:

  • The critical value corresponds to the upper threshold.
  • The critical value corresponds to the lower threshold.

Acceptance and Rejection Regions

  • Acceptance Region: This is the range of values for which we do not reject the null hypothesis, bounded by the critical values.
  • Rejection Region: This is the range of values for which we reject the null hypothesis. If the test statistic falls within this region, it indicates that the observed data is inconsistent with .

Decision Rule

The decision rule is based on comparing the p-value with a predetermined significance level , typically set at :

  • If the p-value is less than :
    • Reject : There is sufficient evidence to support the alternative hypothesis.
  • If the p-value is greater than or equal to :
    • Accept : There is not enough evidence to reject the null hypothesis.

p-value Calculation

The p-value quantifies the probability of obtaining a test statistic as extreme as, or more extreme than, the observed value under the assumption that is true. It is calculated as:

This means the p-value is the sum of the probabilities in both tails of the distribution, depending on whether the test is one-tailed or two-tailed.


Statistical Concepts: p-Value, Statistical Significance, Interpolation, and Extrapolation

p-Value and Statistical Significance

The p-value is a critical concept in hypothesis testing. It quantifies the probability of observing a test statistic at least as extreme as the one obtained, assuming the null hypothesis is true. A smaller p-value indicates stronger evidence against the null hypothesis.

Statistical Significance: A result is considered statistically significant if the p-value is less than or equal to a predetermined significance level (commonly set at 0.05). In such cases, researchers reject the null hypothesis, suggesting that the observed effect is unlikely to be due to random chance.

Interpretation of p-Values

  • : Strong evidence against , indicating a significant effect.
  • : Moderate evidence against ; results are significant but require cautious interpretation.
  • : Insufficient evidence to reject ; the results may be attributed to chance.

Interpolation

Interpolation is the method of estimating unknown values that fall within the range of known data points. It is widely used in statistics, mathematics, and various fields to fill gaps in data. For instance, if we have temperature readings at 10 AM and 12 PM, we can interpolate to estimate the temperature at 11 AM.

Extrapolation

Extrapolation is the process of estimating unknown values outside the range of known data points. While it can provide insights into trends and future behavior, it carries greater risk than interpolation, as it assumes that the established relationship continues beyond the observed data.