Lab #6 Example

An Example:
An R x C Table with the Chi-Square Test of Independence

Does religious preference vary by region of the country? Regional differences in religious preference are found in many parts of the world such as India, the former Yugoslavia, and Ireland. Is religious preference independent of geographical region in the United States?

In this example, the variables relig and region4 (from the gss 93 subset file) are used as table variables to form a table with five rows and four columns. The five religions are Protestant, Catholic, Jewish, None, and Other; the regions are Northeast, Midwest, South, and West. This table structure is called a general R x C table with no ordering across categories of its variables. Numbers code the categories of both table variables, but SPSS uses their respective labels in the output. The Pearson chi-square statistic is requested for testing the independence of table rows and columns-that is, testing the premise that religious preference and region are independent of each other. Sometimes this task is expressed as testing equality of proportions across rows (or columns).

To produce this output, from the menus choose:
         Analyze
            Descriptive Statistics
                  Crosstabs...
         Click Reset to restore the dialog box defaults, and then select:
            ·Row(s): relig
            ·Column(s): region4
         Statistics ...
            Put a check beside "Chi-square"

See Case Processing Summary

Case Processing Summary. This panel describes the number of cases used in each table you request. The total number of cases in the gss 93 subset file is 1500, and for 756 of these cases, the values of both relig and region4 are Valid. One or both values are missing for the remaining 744 cases. Thus, only half of the sample is used (50.4%). With so many values missing, you should be concerned that the results might be biased. For example, people from certain groups may feel uncomfortable about stating their religion, so they omit the question. Using the Frequencies procedure to check the number of missing values, you find that relig has fairly complete data but that region4 has many missing values.

Religious Preference by Region Crosstabulation (See it). In this sample of 756 people, 480 are Protestant, 15 are Jewish, 15 are Other, and so on. These counts are totals of the cell frequencies in their respective rows. The counts along the bottom are totals for each column. The row and column totals are known as marginals because they summarize the counts within each table variable independently of the other variable. The cell counts in the body of the table result from crosstabulating the two table variables. For example, in the upper left comer there are 54 Protestants who live in the Northeast, 140 who live in the Midwest, and so on. These counts are the observed number for each cell.

Chi-Square Tests (See it). The null hypothesis for the Pearson chi-square test is that the row and column variables are independent of each other. By definition, two table variables are independent if the probability that a case falls in a specific cell is the product of its marginal probabilities. Using the probability that a subject is Protestant (480/756) and the probability that a subject lives in the Northeast (136/756), the probability for a case failing in the upper left cell is

(480 x 136)/(756^2) = 0.114

This probability is used to estimate the number of cases expected (under the hypothesis of independence) in each cell. The expected count is then compared with the observed count. To compute the expected number of cases, multiply the probability by the total sample size. This result is the row total multiplied by the column total divided by the total sample size, or 86.3 cases expected for this cell.

The difference between the observed count of 54 and the expected count of 86.3 is large. Does this gap support the variables' independence? For an overall test of independence, the Pearson chi-square statistic repeats this process of comparing the observed number of cases with the number expected for each cell. After subtracting the expected count from the actual observed count for each cell, SPSS constructs the statistic by squaring the difference and dividing the result by the expected count. Thus, for the Pearson chi-square statistic, these quantities are summed across all cells:

See the Equation.

When the resulting chi-square statistic is large, the null hypothesis of independence is rejected. To define large, the sample statistic is compared to a critical point on the theoretical chi-square distribution that depends on the number of rows and columns in the table. This latter information is labeled df for degrees of freedom. For an R x C table, the degrees of freedom are the number of rows minus 1.0 times the number of columns minus 1.0, or (r - 1)(c - 1).

For this table, df = (5 - 1)(4 - 1), or 12.

The computed chi-square statistic for this table is 109.1 and has an associated probability (p-value) or significance level of less than 0.0005 (the probability is not 0). Conventionally, if this probability is small enough (less than 0.05 or 0.01), the hypothesis of independence is rejected. Using these numbers alone, you could report that there is an association between religious preference and region.

However, if certain assumptions are not met, this probability can be distorted or misleading. Many researchers use the guideline that no cell has an expected value less than 1.0 and not more than 20% of the cells have expected values less than 5 (in 2 x 2 tables, some say that no cell has an expected value less than 5).

SPSS reports the minimum expected count, the number of cells with expected count < 5, and the % of cells with expected count < 5. In this table, the minimum expected count is 2.7, and eight cells (40%) have expected counts < 5. Clearly, the guideline is violated.

What should you do? Can you see a way to make the table less sparse? A total of 15 people are in the Other category. Because it is probably a mixture of religions, you can justify deleting it. The Jewish category also has very few subjects. If you delete it, however, you should be careful to indicate that any conclusions are restricted to the Protestant, Catholic, and None groups.