Estimation and Inference

Quantitative Methods

Central Limit Theorem and Inference

Learning Outcome Statement:

Explain the central limit theorem and its importance for the distribution and standard error of the sample mean.

Summary:

The Central Limit Theorem (CLT) states that the distribution of sample means approximates a normal distribution as the sample size becomes large, regardless of the population's distribution, provided it has a finite variance. This theorem is crucial for constructing confidence intervals and hypothesis testing. The standard error of the sample mean, which measures the precision of the sample mean as an estimator of the population mean, decreases as the sample size increases.

Key Concepts:

Central Limit Theorem

The Central Limit Theorem asserts that the sampling distribution of the sample mean will approximate a normal distribution with a mean equal to the population mean and a variance equal to the population variance divided by the sample size, as the sample size becomes large.

Standard Error of the Sample Mean

The standard error of the sample mean is the standard deviation of its sampling distribution, indicating the variability of the sample mean from the population mean. It is calculated using the population standard deviation (if known) or the sample standard deviation (if the population standard deviation is unknown).

Formulas:

Standard Error of the Sample Mean (known population standard deviation)

σXˉ=σn\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}

This formula calculates the standard error of the sample mean when the population standard deviation is known.

Variables:
σ\sigma:
population standard deviation
nn:
sample size
Units: units of the data

Standard Error of the Sample Mean (unknown population standard deviation)

sXˉ=sns_{\bar{X}} = \frac{s}{\sqrt{n}}

This formula is used to estimate the standard error of the sample mean when the population standard deviation is unknown, using the sample standard deviation instead.

Variables:
ss:
sample standard deviation
nn:
sample size
Units: units of the data

Sample Variance

s2=i=1n(XiXˉ)2n1s^2 = \frac{\sum_{i=1}^{n}(X_i - \bar{X})^2}{n-1}

This formula calculates the sample variance, which is used to estimate the population variance when it is unknown.

Variables:
XiX_i:
ith data point
Xˉ\bar{X}:
sample mean
nn:
sample size
Units: squared units of the data

Bootstrapping and Empirical Sampling Distributions

Learning Outcome Statement:

describe the use of resampling (bootstrap, jackknife) to estimate the sampling distribution of a statistic

Summary:

The learning outcome focuses on understanding and applying resampling techniques, specifically bootstrap and jackknife, to estimate the sampling distribution of statistics. Bootstrap involves drawing repeated samples from the original data with replacement to estimate parameters like the standard error of the sample mean. Jackknife, on the other hand, systematically leaves out one observation at a time to reduce bias and estimate parameters. These methods are particularly useful when traditional assumptions for analytical methods do not hold or are difficult to apply.

Key Concepts:

Bootstrap

Bootstrap is a resampling method where samples are drawn with replacement from the original dataset to create a large number of 'resamples'. This method is used to estimate the sampling distribution of a statistic and is useful for calculating measures like the standard error of the sample mean or constructing confidence intervals.

Jackknife

Jackknife is another resampling technique where each sample is created by omitting one observation from the dataset. This method is often used to reduce the bias of estimators and can be used to estimate the standard error and confidence intervals of estimators.

Standard Error of the Sample Mean

The standard error of the sample mean measures the dispersion of the sample means around the population mean. It is crucial for constructing confidence intervals and hypothesis testing.

Formulas:

Standard Error of the Sample Mean (Bootstrap)

sXˉ=1B1b=1B(θ^bθˉ)2s_{\bar{X}} = \sqrt{\frac{1}{B-1} \sum_{b=1}^{B} (\hat{\theta}_b - \bar{\theta})^2}

This formula calculates the standard error of the sample mean using bootstrap samples. It uses the variance of the resample means around the overall mean of these resamples.

Variables:
sXˉs_{\bar{X}}:
estimate of the standard error of the sample mean
BB:
number of bootstrap samples
θ^b\hat{\theta}_b:
mean of a bootstrap sample
θˉ\bar{\theta}:
mean across all bootstrap sample means
Units: unitless (as it is a measure of dispersion)

Sampling Methods

Learning Outcome Statement:

compare and contrast simple random, stratified random, cluster, convenience, and judgmental sampling and their implications for sampling error in an investment problem

Summary:

This LOS explores different sampling methods including simple random sampling, stratified random sampling, cluster sampling, and non-probability sampling methods like convenience and judgmental sampling. It discusses the implications of these methods on sampling error, which is crucial for making accurate inferences about a population from a sample. The content also delves into the practical applications of these sampling methods in various investment scenarios, highlighting their advantages, limitations, and suitability depending on the nature of the population and the specific requirements of the study.

Key Concepts:

Simple Random Sampling

A sampling method where each member of a population has an equal chance of being selected. This method is best used when the population is homogeneous.

Stratified Random Sampling

Involves dividing the population into strata and then drawing random samples from each stratum. This method is useful when the population is heterogeneous and the strata are homogeneous.

Cluster Sampling

The population is divided into clusters, and a random sample of these clusters is taken to represent the whole population. It is cost-effective for large populations but might offer less accuracy compared to other methods.

Non-Probability Sampling

Includes methods like convenience sampling (selection based on ease of access) and judgmental sampling (selection based on a researcher's judgment), which do not provide all the members of the population an equal chance of being selected, potentially leading to sampling bias.

Sampling Error

The error that arises from using a sample to estimate characteristics of a population. It includes the difference between the observed value of a statistic and the true value that the statistic is intended to estimate.

Formulas:

Sample Mean

Xˉ=1ni=1nXi\bar{X} = \frac{1}{n} \sum_{i=1}^{n} X_i

Calculates the average of all sampled values, used as an estimate of the population mean.

Variables:
XiX_i:
the value of the ith observation in the sample
nn:
the total number of observations in the sample
Units: units of the data points