new lab 2
UCSC LAB 2
Crosstabulation with Non-Interval Variables
PURPOSE
- To learn how to perform a crosstabulation and practice formulating hypotheses.
- To appreciate how crosstabulation allows us to make comparisons relevant to our hypotheses.
- Introduce the logic of comparison
MAIN POINTS
Crosstabulation
- Crosstabulation brings together the indicators for two variables and displays the relationship between them in a single table. Each column in the crosstab corresponds to a category of the independent variable, and each row corresponds to a category in the dependent variable. Hence the dependent variable goes on the left, and the independent variable goes on the top.
- Each cell represents a unique combination of categories from each of the variables. For example, in the table below, the cell “G” represents all the respondents who selected Category I for the independent variable and Category III for the dependent variable.
- The percentage in each cell is calculated by dividing the number of respondents in the cell by the total number of respondents for the column. Note: the cell-percentage values will be affected by whether or not we treat some categories of our indicators as missing values. Pay attention to the percentages in each cell rather than the number (n) of respondents in each cell.
- To interpret crosstabs compare the column-percentages across the rows to see whether they differ. For instance, in the table below, compare the percentage values for cells A, B, and C, then compare D, E, and F, and finally compare G, H, and I. If the column-percentages of cells A-B-C, and/or D-E-F, and/or G-H-I remarkably differ from one another then you may have found a relationship.
- Crosstabulation does not work effectively if either variable has a great many value categories.
INDEPENDENT VARIABLE | ||||
Category I | Category II | Category III | ||
DEPENDENT VARIABLE | Category I | A | B | C |
Category II | D | E | F | |
Category III | G | H | I |
INSTRUCTIONS:
Crosstabulating Variables
- Select the October 2016 California statewide survey and data set.
- Enter the codebook for the dataset you have chosen.
- Hypothesize a relationship between two variables in the dataset.
- For example, you might think that attitudes toward inequality may vary by partisanship
- In order to avoid corrupting your data, lock your data set prior to beginning your analyses.
- To make certain there is some variation on the variables, use SPSS to perform a frequency analysis for each variable.
- In the Analysis menu of SPSS, select Descriptive Statistics and then Crosstabs. Place your dependent in the rows box and your independent variable in the columns box. Click on the “Cells” tab and select column percentages.
- Consider whether recoding your variables would be desirable and do so as necessary.
- Click on the “Paste” button. Select the syntax and run it.
- Determine whether there is a relationship between the variables based on the column-percentages in the crosstab.
- Repeat the analysis until you find a set of variables with a relationship.
EXAMPLE
- Dataset:
- Statewide Survey October 2016
- Y Variable
- Marijuana Initiative
- Indicator for Y
Q21. “Proposition 64 is called the ‘Marijuana Legalization Initiative Statute’ … If the election were held today, would you vote yes or no on Proposition 64?” - Possible Explanation (X)
Gender - Indicator for X
Gender
- Arrow Diagrams :
- X → Y
- Gender →Voting Intention on Marijuana Initiative
- Syntax:
*Preparing the DV*. missing values q21 (8,9). *Running the Crosstabulation*. crosstabs /tables=q21 BY gender /cells=column count.
- Output:
Crosstabulation of Initiative Vote intention by Gender
Q21. Proposition 64 is called the ‘Marijuana Legalization Initiative Statute.’ If the election were held today, would you vote yes or no on Proposition 64? * Gender Crosstabulation | |||||
Gender | Total | ||||
Male | Female | ||||
Q21. Proposition 64 is called the ‘Marijuana Legalization. Initiative Statute.’ If the election were held today, would you vote yes or no on Proposition 64? | yes | Count | 406 | 306 | 712 |
% within Gender | 62.1% | 48.3% | 55.3% | ||
no | Count | 248 | 327 | 575 | |
% within Gender | 37.9% | 51.7% | 44.7% | ||
Total | Count | 654 | 633 | 1287 | |
% within Gender | 100.0% | 100.0% | 100.0% |
Source: PPIC October 2016
Edited Version of Table:
Intended Vote on Marijuana Proposition by Gender
Gender | ||||
Male | Female | |||
Q21.vote yes or no on Proposition 64? | yes | 62.1% | 48.3% | |
no | 37.9% | 51.7% | ||
Total | 654 | 633 |
Interpretation of Crosstab:
-
- The edited version of the table is easier to absorb.
- The number in each cell is a column-percentage. At the bottom of each column is the number of cases on which the column percentages are based. The column percentages are key in interpreting your findings.
- Comparing the column-percentages for the cells across each row of the table we can see that there are differences between the gender groups.
- It is often most useful to look at the top and bottom rows before looking at any middles rows.
- In particular, looking across the top row, males are more likely to favour the initiative than females. And looking across the bottom row, women are more likely to oppose the initiative
- Overall, there is a clear gender difference in vote intentions.
QUESTIONS FOR REFLECTION
Based on the column-percentages in your crosstab, did you discover a relevant relationship? If so, was it evident in only one row of the table or in all rows?
Try another crosstabulation with another independent variable such as language of interview.
DISCUSSION
When you find a cell that has a substantially different column-percentage from the other cells in that row, there are usually other rows in the table that also have a difference. For example, if you find a difference in the column-percentage for cells A-B-C, then there is probably also a difference between D-E-F, or G-H-I. This happens because the column-percentage in any given cell influences the column-percentage of the other cells in that column.
INDEPENDENT VARIABLE | ||||
Category I | Category II | Category III | ||
DEPENDENT VARIABLE | Category I | A | B | C |
Category II | D | E | F | |
Category III | G | H | I |
Syntax for Regions Using any PPIC Data
*Regional Recodes. *Regions used in PPIC reports. Recode county (4,6,9,10,11,15,16,20,24,31,34,39,45,50,51,52,54,57,58 = 1) (1,7,21,28,38,41,43,48,49 =2) (19=3)(33,36 =4) (30,37=5) (else = 6) into region. Value labels region 1 'Central Valley' 2 'SF Bay Area' 3 'LA' 4 'Inland Empire' 5 'OrangeSD' 6 'other'. *Coastal Recodes as used in PPIC reports. Recode county ( 8, 12, 23, 49, 1, 7, 21, 28, 38, 41, 43, 48, 49 , 44, 27, 40=1) (42, 56, 3, 30, 37 = 2) (else = 3) into coastal. Value labels coastal 1 'NorthCent Coast' 2 'South Coast' 3 'Inland'. *crosstab below uses June 2023 data. crosstabs tables = q34 by region coastal /cells = column count.