Confidence Intervals
Overview
A confidence interval (CI) provides a range of values that is likely to contain the true parameter of interest. The level of confidence, denoted as $(1−α)$, represents the proportion of confidence intervals that will contain the true parameter if we were to repeat the study multiple times.
For example, if we have a $(1−α)=90%$ CI, this implies: $α=10%$
Consequently, we have: $2α =5%$
Finding the Critical Value
When seeking the value of $z$ such that the cumulative probability satisfies: $P(Z<?)=0.05$ we are interested in the left tail of the standard normal distribution.
The corresponding $z$value for this cumulative probability is approximately: $?=−1.64$ This value indicates that 5% of the data lies to the left of $(−1.64)$.
OneSided Confidence Intervals
$α$  $Z_{α}$  CI Level 

10%  1.28  90% 
5%  1.645  95% 
1%  2.33  99% 
TwoSided Confidence Intervals
$α$  $Z_{α/2}$  CI Level 

10%  1.64  90% 
5%  1.96  95% 
1%  2.58  99% 
Confidence Intervals for Proportions
Let $X∼B(n,p)$, where:
 $np>10$ (ensures sufficient sample size)
 $np(1−p)>10$
The sample proportion is calculated as: $p^ =nx $ where $x$ is the number of successes.
For a $(1−α)%$ confidence interval:

Adjusted sample size: $n~=n+4$

Adjusted proportion: $p~ =n~x+2 =n+4x+2 $

The confidence interval is then given by: $p=p^ ±Z_{α/2}n~p~ (1−p~ ) $
Small Sample Confidence Interval for Population Mean
When the sample size $n≤30$, and if the population is normally distributed, the $t$distribution is used to calculate the CI for the population mean:
$z=n σ xˉ−μ $Where:
 $σ$ is the population standard deviation.
 $s$ is the standard deviation of the sample.
 $n$ is the size of the sample.
 $n−1$ represents the degrees of freedom.
$z=n σ xˉ−μ $Warning: If a small sample is drawn and the population standard deviation is known, use the Zscore instead of $t_{n−1}$:
Confidence Interval Formulas

General CI formula: $CI_{μ}=xˉ±t_{n−1,α/2}(n s )$

Lower CI limit: $Lower CI=xˉ−t_{n−1,α/2}(n s )$

Upper CI limit: $Upper CI=xˉ+t_{n−1,α/2}(n s )$
Confidence Interval for the Difference Between Two Means
Consider two samples $x_{1},x_{2},…,x_{n}$ and $y_{1},y_{2},…,y_{n}$, where: $X$ and $Y$ are distributed as $∼N(μ_{X},σ_{X})$ and $∼N(μ_{Y},σ_{Y})$, respectively.
Distribution of the Difference
The difference $X−Y$ is distributed as: $X−Y∼N(μ_{X}−μ_{Y},σ_{X}+σ_{Y})$
where:
 $μ_{X−Y}=μ_{X}−μ_{Y}$
 $σ_{X−Y}=σ_{X}+σ_{Y}$
Confidence Interval Level
The confidence interval for the difference between the two means is given by:
$Xˉ−Yˉ±Z_{α/2}n_{X}σ_{X} +n_{Y}σ_{Y} $Confidence Interval for Paired Data (n ≤ 30)
The confidence interval for the mean difference $μ_{D}$ is given by:
$μ_{D}=Dˉ±t_{n−1,α/2}⋅n S_{D} $Factors Affecting Margin of Error (MoE)
The margin of error is calculated as:
$MoE=Z_{α/2}⋅n σ $To Reduce MoE:
 Increase $(1−α)$
 Decrease $σ$
 Increase $n$
Hypothesis Testing
Introduction
Hypothesis testing is a statistical method used to make decisions based on data analysis. It involves formulating two competing hypotheses: the null hypothesis $H_{0}$ and the alternative hypothesis $H_{1}$. The aim is to assess the strength of the evidence against the null hypothesis using sample data.
Hypotheses

Null Hypothesis $H_{0}$: This hypothesis states that there is no effect or no difference, serving as the default assumption. For example, it could assert that the means of two populations are equal.

Alternative Hypothesis $H_{1}$: This hypothesis represents what we aim to prove, suggesting that there is an effect or a difference. For instance, it might claim that the means of two populations are not equal.
Critical Values
Critical values are thresholds that define the boundaries of the acceptance and rejection regions for the null hypothesis. In a standard normal distribution:
 The critical value $Z_{c}$ corresponds to the upper threshold.
 The critical value $−Z_{c}$ corresponds to the lower threshold.
Acceptance and Rejection Regions

Acceptance Region: This is the range of values for which we do not reject the null hypothesis, bounded by the critical values.

Rejection Region: This is the range of values for which we reject the null hypothesis. If the test statistic falls within this region, it indicates that the observed data is inconsistent with $H_{0}$.
Decision Rule
The decision rule is based on comparing the pvalue with a predetermined significance level $α$, typically set at $0.05$:

If the pvalue is less than $0.05$:
 Reject $H_{0}$: There is sufficient evidence to support the alternative hypothesis.

If the pvalue is greater than or equal to $0.05$:
 Accept $H_{0}$: There is not enough evidence to reject the null hypothesis.
pvalue Calculation
The pvalue quantifies the probability of obtaining a test statistic as extreme as, or more extreme than, the observed value under the assumption that $H_{0}$ is true. It is calculated as:
$pvalue=P(right)+P(left)$
This means the pvalue is the sum of the probabilities in both tails of the distribution, depending on whether the test is onetailed or twotailed.
Statistical Concepts: pValue, Statistical Significance, Interpolation, and Extrapolation
pValue and Statistical Significance
The pvalue is a critical concept in hypothesis testing. It quantifies the probability of observing a test statistic at least as extreme as the one obtained, assuming the null hypothesis $H_{0}$ is true. A smaller pvalue indicates stronger evidence against the null hypothesis.
Statistical Significance: A result is considered statistically significant if the pvalue is less than or equal to a predetermined significance level $α$ (commonly set at 0.05). In such cases, researchers reject the null hypothesis, suggesting that the observed effect is unlikely to be due to random chance.
Interpretation of pValues:
 $p<0.01$: Strong evidence against $H_{0}$, indicating a significant effect.
 $0.01<p<0.05$: Moderate evidence against $H_{0}$; results are significant but require cautious interpretation.
 $p≥0.05$: Insufficient evidence to reject $H_{0}$; the results may be attributed to chance.
Interpolation
Interpolation is the method of estimating unknown values that fall within the range of known data points. It is widely used in statistics, mathematics, and various fields to fill gaps in data. For instance, if we have temperature readings at 10 AM and 12 PM, we can interpolate to estimate the temperature at 11 AM.
Extrapolation
Extrapolation is the process of estimating unknown values outside the range of known data points. While it can provide insights into trends and future behavior, it carries greater risk than interpolation, as it assumes that the established relationship continues beyond the observed data.