What we see is a plot of the number of years of education by the occupational prestige score for persons in the data set who have a job.
We can edit our graph to make it easier to interpret. First, double-click anywhere in the graph. This will cause the graph to open in its own window. Then, double-click on the X-axis. A dialog box will open. In the Range section of the box, change the Minimum to 0. In the Major and Minor Divisions sections, change the Increments to 2. Then, click OK.
Now, on the Menu Bar, click on “Chart,” then “Options.” In the Fit Line section, click in the box next to Total. Then, click on the Fit Options button, and click in the box next to “Display R-square in legend.” Click Continue, then OK.
The first table just shows the variables that have been included in the analysis. The second table, “Model Summary,” shows the R-square statistic, which is .270.
The third table, ANOVA, gives the information about the model as a whole. ANOVA is discussed briefly in chapter 6. The final table, Coefficients, gives results of the regression analysis that are not available using only correlation techniques. Look at the “Unstandardized Coefficients” column. Two statistics are reported: B, which is the regression coefficient, and the standard error. Notice that there are two statistics reported under B: one labeled as (Constant), the other labeled as EDUC. The statistic labeled as EDUC is the regression coefficient, which is the slope of the line that you saw on the scatterplot (note that in scholarly reports, it is conventional to refer to the regression coefficient using the lower case, b). The one labeled as (Constant) is not actually a regression coefficient, but is the Y-intercept (SPSS reports it in this column for convenience only).
Y = a + bX
Y refers to the value of the dependent variable for a given case, a is the Y-intercept (the point where the line crosses the Y-axis, listed as Constant on your output), b is the slope of the line which describes the relationship between the independent and dependent variables (B for EDUC), and X is the value of the independent variable for a given case.
We know that the linear relationship between X and Y (EDUC and PRESTG80) is not perfect. The correlation coefficient was not 1 (or –1), and the scatterplot showed plenty of cases that did not fall directly on the line. Thus, it is clear to us that knowing someone’s education will not tell us without fail what their occupational prestige is, and furthermore, we are only analyzing a sample of cases and not the whole population to which we want to generalize our findings. It is clear that there is some error built into our findings (the reason that the Fit Line is usually called the “Best Fit Line”). For these reasons, it is conventional to write the formula for the line as
Y = a + bX + e, where e refers to error.
What can we do with this formula? One thing we can do is make predictions about particular values of the independent variable, using just a little arithmetic. All we have to do is plug the values from our output into the formula for a line (for our purposes, we will ignore the “e”):
Y = 9.84 + 2.565X
9.84, the Y-intercept (or Constant), is interpreted as the average occupational prestige score (our dependent, or Y variable), holding constant the effects of education (our independent, or X variable). 2.565 is the slope of the line. That is, if you refer back to the scatterplot, if you move one unit to the right on the X-axis, then move 2.565 units upward, you will intersect with the regression line. (It is possible to have a negative coefficient. In that case, to intersect with the line, you would move one unit to the right, and then B units downward.)
What occupational prestige score would our results predict for a person who completed high school, but no higher education? All we have to do is enter 12 (as in twelve years of education) into our education:
Y = 9.84 + 2.565(12)
Y = 40.62
Y = 40.62
We find that having 12 years of education is associated with an occupational prestige score of 40.62. But what of the error? We know that not every high school graduate has this exact prestige score. We acknowledge this when we discuss results by stating that on average, those with 12 years of education will have occupations with prestige scores of 40.62. This language points out to our readers that it is likely that some of those respondents scored higher and some lower, but that 40.62 represents a central point. In sum, the error tells us about the distance from actual values of Y (the answers that the GSS survey respondents gave) and predicted values of Y (the one’s you calculate based on the GSS respondent’s information in the “X” variable). Thus, the error is the difference between a predicted value of Y for a given case and the actual value of Y for a given case (-Y).
More generally, though, when we discuss regression results, we rarely compute predicted scores for particular values of the independent variable. Instead, in scholarly reports, we usually point out the general process at work. In our case, we would say that “each additional year of education is associated with a 2.565 increase on the occupational prestige scale.” Note that we refer to “an additional year of education” because our independent variable was measured as years of school completed. Thus, the “unit” of measurement is years. We say there was a 2.565 increase in prestige with a unit increase in education, because that is the distance we have to move to intersect with the Y-axis, which represents occupational prestige.
No comments:
Post a Comment