Lab 4
Pol242 LAB MANUAL
Lab 4
PURPOSE
- To identify the meaning of descriptive terms for each type of variable: nominal, ordinal, and interval.
- To become acquainted with recoding.
MAIN POINTS
- Types of Variables
- Nominal: The categories of the variable have no inherent rank or order. The categories are nevertheless mutually exclusive and exhaustive. Examples are partisan preference or gender.
- Ordinal: The categories of the variable are ordered or ranked, from less to more or more to less, but there is not an equivalent distance between them. E.g. How much should be done to reduce the gap between the rich and poor in Canada? Much more, somewhat more, the same as now, somewhat less or much less?
- Interval: The categories of the variable are ordered and have a uniform distance between them. E.g., Income
- An interval variable can be transformed into an ordinal variable by recoding it. For example, we could divide income into categories of income groups such as $0-10,000, $10,000-20,000…etc.
- Descriptive Statistics
- Mean: Computed by adding all the values and dividing this sum by the number of cases
- Standard deviation: Expresses the degree of variation within a variable on the basis of the average deviation from the mean.
- Variance: The squared value of the std. deviation. Hence the standard deviation is the square-root of the variance.
- Median: The value of the middle case, i.e, the one with the same number of cases above and below it.
- Mode: The most frequent value.
- Skew: This measures the symmetry of the distribution
- Kurtosis: This measures the peakedness of the distribution
INSTRUCTIONS
- Enter the codebook for the data set on which you would like to work.
- Identify three indicators one of which should be nominal, the second should be ordinal, and the last one should, if possible, should be measured at the interval
- Using SPSS syntax, run a frequency analysis for each of the three indicators.
- Based on the Output of the trial run, identify which values should be identified as Missing Valuesand decide whether and how best to Recode the data. Prior to declaring missing values and making the appropriate recodes any summary measures may be misleading.
- Edit the syntax for missing values and recode as needed. Labs 1 & 2 include a number of examples. It is essential to re-label the recoded values as the old labels will not be automatically changed.
- Finally, where relevant, identify the MEANING of the summary measures for each type of variable.
EXAMPLES
Variable: Attitudes regarding inequality
- Dataset:
- CES 2011
- Indicator Type:
- nominal
- Indicator: MBS11_b3
Please circle the number that BEST reflects your opinion. (Please circle ONE answer only)
The government should:
- See to it that everyone has a decent living
- Leave people to get ahead on their own.
8. Not sure.
- Syntax:
recode mbs11_b3 (1=1) (2=0) into goveqch. value labels goveqch 1 'decent living' 0 'leave alone'. fre var goveqch /statistics = mode median mean stddev variance skew kurtosis
Note that the “not sure” category is rendered as missing by the recode.
- Output:
goveqch | |||||
Frequency | Percent | Valid Percent | Cumulative Percent | ||
leave alone | 260 | 6.0 | 18.8 | 18.8 | |
decent living | 1121 | 26.0 | 81.2 | 100.0 | |
Total | 1381 | 32.1 | 100.0 | ||
Missing | System | 2927 | 67.9 | ||
Total | 4308 | 100.0 |
Statistics | ||
goveqch | ||
N | Valid | 1381 |
Missing | 2927 | |
Mean | .8117 | |
Median | 1.0000 | |
Mode | 1.00 | |
Std. Deviation | .39107 | |
Variance | .153 | |
Skewness | -1.597 | |
Kurtosis | .550 |
- Note that the recode does several things. First, it makes support for government action receive the high score. By recoding the variable as a dichotomy it permits the mean score to indicate the proportion of respondents favouring action. Third it creates a new variable name. Fourth it indirectly renders the “not sure” category as missing by not including it in the new variable
Example #2
- Dataset:
- CES2011
- Indicator Type:
- Ordinal
- Indicator: PES11_41
- How much should be done to reduce the gap between the rich and poor in Canada? Much more, somewhat more, the same as now, somewhat less or much less?
- Syntax:
missing values pes11_41 (8,9). recode pes11_41 (1=1) (2=.75) (3=.5) (4= .25) (5=0) into undogap. value labels undogap 0 'muchless' .25 'someless' .5 'asnow' .75 'somemore' 1 'muchmore'. fre var undogap /statistics = mode median mean stddev variance skew kurtosis.
- Output:
undogap | |||||
Frequency | Percent | Valid Percent | Cumulative Percent | ||
muchless | 63 | 1.5 | 2.0 | 2.0 | |
someless | 81 | 1.9 | 2.5 | 4.5 | |
asnow | 638 | 14.8 | 19.8 | 24.2 | |
somemore | 1252 | 29.1 | 38.8 | 63.0 | |
muchmore | 1193 | 27.7 | 37.0 | 100.0 | |
Total | 3227 | 74.9 | 100.0 | ||
Missing | System | 1081 | 25.1 | ||
Total | 4308 | 100.0 |
Statistics | ||
undogap | ||
N | Valid | 3227 |
Missing | 1081 | |
Mean | .7658 | |
Median | .7500 | |
Mode | .75 | |
Std. Deviation | .22910 | |
Variance | .052 | |
Skewness | -.930 | |
Kurtosis | .850 |
Note that the recode makes the high score indicate support for the government doing much more which is useful since we understand the indicator to be measuring support for action against inequality.
Example #3
- Dataset:
- CES2011
- Indicator Type:
- interval
- Indicator: MBS11_k2
Please place yourself on a scale of 0 to 10, where 0 means you strongly believe that the government SHOULD ACT to reduce differences in income and wealth, and 10 means that you strongly believe that the government SHOULD NOT ACT to reduce differences in income and wealth.
0 ‘Government should act’ thru 10 ‘government should not act.
- Syntax:
missing values mbs11_k2 (-99). compute govact = (((mbs11_k2 * -1) +10)/10). value labels govact 0 'not act' 1 'gov act'. fre var govact /statistics = mode median mean stddev variance skew kurtosis.
- Output:
govact | |||||
Frequency | Percent | Valid Percent | Cumulative Percent | ||
not act | 65 | 1.5 | 4.6 | 4.6 | |
.10 | 37 | .9 | 2.6 | 7.2 | |
.20 | 93 | 2.2 | 6.6 | 13.8 | |
.30 | 108 | 2.5 | 7.6 | 21.4 | |
.40 | 111 | 2.6 | 7.9 | 29.3 | |
.50 | 250 | 5.8 | 17.7 | 47.0 | |
.60 | 191 | 4.4 | 13.5 | 60.5 | |
.70 | 211 | 4.9 | 14.9 | 75.4 | |
.80 | 147 | 3.4 | 10.4 | 85.8 | |
.90 | 84 | 1.9 | 5.9 | 91.7 | |
gov act | 117 | 2.7 | 8.3 | 100.0 | |
Total | 1414 | 32.8 | 100.0 | ||
Missing | System | 2894 | 67.2 | ||
Total | 4308 | 100.0 |
Statistics | ||
govact | ||
N | Valid | 1414 |
Missing | 2894 | |
Mean | .5634 | |
Median | .6000 | |
Mode | .50 | |
Std. Deviation | .26142 | |
Variance | .068 | |
Skewness | -.272 | |
Kurtosis | -.519 |
QUESTIONS FOR REFLECTION
- Why aren’t all of the descriptive statistics appropriate to describe all three variables?
- Can we ever learn something from measures appropriate for another level of data?
- Can graphics help us better understand our data?
Discussion
- Not all summary measures are appropriate for every variable.
- With nominal variables, the mode is the only truly useful descriptive statistic and the range can be used for dispersion. For dichotomous variables coded between 0 and1 (dummy variables), the mean is useful to indicate the proportions.
- With ordinal variables, the mode, median and range are all useful.
- With interval/ratio variables, the mean, median, range and standard deviation are useful. The mode can be used (as in our example), but often it is not very useful with interval data.
- Skew and kurtosis can be particularly helpful with interval level data.
- Grapics can often provide a better picture of our results. The relevant syntax is:
GRAPH /BAR(SIMPLE)=PCT BY undogap. GRAPH /BAR(SIMPLE)=PCT BY govact. GRAPH /BAR(SIMPLE)=PCT BY goveqch.