Tests of Independence Using Contingency Table Data
Learning Outcome Statement:
explain tests of independence based on contingency table data
Summary:
This LOS covers the methodology to test the independence of classifications in categorical data using contingency tables. It involves calculating expected frequencies under the assumption of independence and comparing them to observed frequencies using the chi-square test statistic. The chi-square test helps determine if there is a significant association between the classifications.
Key Concepts:
Contingency Table
A contingency table, or two-way table, displays the frequency distribution of variables and helps in assessing the relationship between categorical variables.
Chi-Square Test of Independence
This nonparametric test assesses whether observed frequencies in a contingency table differ significantly from expected frequencies, which are calculated assuming no association between the variables.
Expected Frequency
Expected frequencies are calculated under the hypothesis of independence. They represent the expected counts in each cell of the contingency table if the row and column variables are independent.
Degrees of Freedom
In a chi-square test, degrees of freedom are calculated as (number of rows - 1) * (number of columns - 1). This value is crucial in determining the critical value from the chi-square distribution.
Critical Value and Decision Rule
The critical value is determined based on the desired level of significance and the degrees of freedom. The decision to reject the null hypothesis of independence is based on whether the chi-square statistic exceeds this critical value.
Formulas:
Chi-Square Test Statistic
This formula calculates the chi-square statistic by summing the squared differences between observed and expected frequencies, scaled by the expected frequencies.
Variables:
- :
- Observed frequency in cell (i, j)
- :
- Expected frequency in cell (i, j)
- :
- Total number of cells in the contingency table
Expected Frequency Calculation
This formula calculates the expected frequency for each cell of the contingency table under the assumption of independence between the row and column classifications.
Variables:
- :
- Row index
- :
- Column index