Lab 6 (Stata)

Lab Goals & Instructions

Today we are using a new dataset. See the script file for the lab to see the explanation of the variables we will be using.

Research Question: What characteristics of campus climate are associated with student satisfaction?

Goals

  • Use component plus residuals plots to evaluate linearity in multivariate regressions.
  • Add polynomial terms to your regression to address nonlinearity
  • Turn a continuous variable into a categorical variable to address nonlinearity
  • Add interaction terms to your regression and evaluate them with margins plots.

Instructions

  1. Download the data and the script file from the lab files below.
  2. Run through the script file and reference the explanations on this page if you get stuck.
  3. No challenge activity!

Components Plus Residuals Plot

This week we're returning to the question of nonlinearity in a multivariate regression. First we're going to discuss a new plot to detect nonlinearity –specifically in regressions with more than one independent variable: the component plus residuals plot.

Sometimes we want to examine the relationship between one independent variable and the outcome variable, accounting for all other independent variables in the model. They take the residual and subtract the parts of the residual that come from the other independent variables.

Let's run through an example.

STEP 1: First, run the regression:

                                                regress                  satisfaction climate_gen climate_dei instcom                  ///                                                  fairtreat female ib3.race_5                          
            >         fairtreat female ib3.race_5        Source |       SS           df       MS      Number of obs   =     1,416 -------------+----------------------------------   F(9, 1406)      =    134.18        Model |  692.700417         9   76.966713   Prob > F        =    0.0000     Residual |  806.483199     1,406  .573601137   R-squared       =    0.4621 -------------+----------------------------------   Adj R-squared   =    0.4586        Total |  1499.18362     1,415  1.05949372   Root MSE        =    .75736  -------------------------------------------------------------------------------  satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval] --------------+----------------------------------------------------------------   climate_gen |   .4549723   .0402608    11.30   0.000     .3759946      .53395   climate_dei |   .0914773    .039135     2.34   0.020     .0147081    .1682466       instcom |    .291927   .0325804     8.96   0.000     .2280156    .3558384     fairtreat |   .1642801   .0384167     4.28   0.000     .0889198    .2396404        female |  -.0647484   .0419125    -1.54   0.123    -.1469663    .0174694               |        race_5 |        White  |   .3724681   .0683034     5.45   0.000     .2384805    .5064557         AAPI  |   .3445667   .0754328     4.57   0.000     .1965937    .4925396 Hispanic/L~o  |   .2711118   .0763919     3.55   0.000     .1212575    .4209662        Other  |   .2489596   .0872887     2.85   0.004     .0777294    .4201897               |         _cons |  -.2139648   .1334109    -1.60   0.109    -.4756706    .0477411 -------------------------------------------------------------------------------          

STEP 2: Run the cprplot command specifying the independent variable you want to examine.

Basic Command:

                                                cprplot                  climate_dei,                  lowess                                          

Command with clearer line colors:
I changed the regression line to be dashed and the lowess line to be red. This makes the lines and patterns easier to distinguish.

                                                cprplot                  climate_dei, rlopts(lpattern(dash))                  ///                                                  lowess                  lsopts(lcolor(red))                          

INTERPRETATION:
If the independent variable being examined and the outcome variable have a linear relationship, then the lowess line will be relatively straight and line up with the regression line. If there is a pattern to the scatter plot or clear curves in the lowess line, that is evidence of nonlinearity that needs to be addressed.

Now we'll move on to addressing nonlinearity when we find it.

Approach 1: Polynomials

One way we can account for non-linearity in a linear regression is through polynomials. This method operates off the basic idea that \(x^2\) and \(x^3\) have pre-determined shapes when plotted (to see what these plots look like, refer to the explanation of this lab on the lab wepage. By including a polynomial term we can essentially account account for some curved relationships, which allows it to become a linear function in the model.

Squared Polynomial

Here's what a \(y = x^2\) looks like when plotted over the range -10 to 10. It's u-shaped and can be flipped depending on the sign.

This occurs when an effect appears in the middle of our range or when the effect diminishes at the beginning or end of our range. Let's look at an example:

STEP 1: Evaluate non-linearity and possible squared relationship

                                                      scatter                    satisfaction instcom ||                    lowess                    satisfaction instcom                              

This is flipped and less exagerated, but it's still an upside down u-shape.

STEP 2: Generate a squared variable for key variable

                                                      gen                    instcom_sq = instcom * instcom                              

STEP 3: Run regression with the squared expression to check significance
NOTE: You must always put both the original and the squared variables in the model! Otherwise, you aren't telling STATA to model both an initial and cubic change to the line.

                                                      regress                    satisfaction climate_gen climate_dei fairtreat female ib3.                    ///                                                        race_5 instcom instcom_sq                              
              >         race_5 instcom instcom_sq        Source |       SS           df       MS      Number of obs   =     1,416 -------------+----------------------------------   F(10, 1405)     =    122.56        Model |  698.460566        10  69.8460566   Prob > F        =    0.0000     Residual |   800.72305     1,405  .569909644   R-squared       =    0.4659 -------------+----------------------------------   Adj R-squared   =    0.4621        Total |  1499.18362     1,415  1.05949372   Root MSE        =    .75492  -------------------------------------------------------------------------------  satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval] --------------+----------------------------------------------------------------   climate_gen |   .4335635   .0406921    10.65   0.000     .3537397    .5133874   climate_dei |   .0965462   .0390414     2.47   0.014     .0199605     .173132     fairtreat |   .1597253   .0383197     4.17   0.000     .0845553    .2348954        female |  -.0721038   .0418415    -1.72   0.085    -.1541823    .0099747               |        race_5 |        White  |   .3629737   .0681488     5.33   0.000     .2292894     .496658         AAPI  |   .3254566   .0754296     4.31   0.000     .1774898    .4734233 Hispanic/L~o  |   .2634913   .0761834     3.46   0.001     .1140459    .4129368        Other  |   .2499186   .0870079     2.87   0.004     .0792393     .420598               |       instcom |    .741871   .1452069     5.11   0.000     .4570254    1.026717    instcom_sq |  -.0718686   .0226061    -3.18   0.002    -.1162139   -.0275233         _cons |  -.7750299   .2209744    -3.51   0.000    -1.208505   -.3415547 -------------------------------------------------------------------------------            

STEP 4: Generate margins graph if significant NOTE: We use ## interact variables in a model. When you interact a variable with itself, it acts as a squared term. This is called 'factor notation' and we must use it instead of the squared variable we created in order to get margins.

                                                      regress                    satisfaction climate_gen climate_dei fairtreat female ib3.                    ///                                                        race_5 c.instcom##c.instcom                  margins,                    at(instcom = (0(1)5))                  marginsplot, noci                              

Cubed Polynomial

Here's what a \(y = x^3\) looks like when plotted over the range -10 to 10. It's slightly s-shaped.

This occurs when the effect is perhaps less impactful in the middle of the range. Let's go through the example. The steps are the same, so we're going to skip the generating a new variable step.

STEP 1: Evaluate non-linearity and possible cubic relationship

                                                      scatter                    satisfaction fairtreat ||                    lowess                    satisfaction fairtreat                              

You can see our slight characteristic s-shape to the data.

STEP 2: Run Regression with cubic using interaction factor ( ## )
NOTE: We interact the variable "fairtreat" with itself twice to make a cubed term. Again, we need to do this in order to generate margins. If you find the regression output harder to read with factor notation you can manually create new cubed variable.

                                                      regress                    satisfaction climate_gen climate_dei instcom female                    ///                                                        ib3.race_5 c.fairtreat##c.fairtreat##c.fairtreat                              
              >         ib3.race_5 c.fairtreat##c.fairtreat##c.fairtreat        Source |       SS           df       MS      Number of obs   =     1,416 -------------+----------------------------------   F(11, 1404)     =    110.48        Model |  695.586279        11  63.2351163   Prob > F        =    0.0000     Residual |  803.597337     1,404  .572362776   R-squared       =    0.4640 -------------+----------------------------------   Adj R-squared   =    0.4598        Total |  1499.18362     1,415  1.05949372   Root MSE        =    .75655  -------------------------------------------------------------------------------  satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval] --------------+----------------------------------------------------------------   climate_gen |   .4434539   .0405782    10.93   0.000     .3638534    .5230544   climate_dei |   .0917208   .0391412     2.34   0.019     .0149392    .1685024       instcom |   .2880597   .0325941     8.84   0.000     .2241213     .351998        female |  -.0688182   .0419067    -1.64   0.101    -.1510246    .0133882               |        race_5 |        White  |   .3710566   .0684485     5.42   0.000     .2367842     .505329         AAPI  |   .3416881   .0761053     4.49   0.000     .1923958    .4909803 Hispanic/L~o  |   .2759983   .0765125     3.61   0.000     .1259071    .4260894        Other  |    .254486   .0872918     2.92   0.004     .0832496    .4257224               |     fairtreat |  -1.405129   .7489441    -1.88   0.061    -2.874299    .0640413               |   c.fairtreat#|   c.fairtreat |    .493122   .2257115     2.18   0.029      .050354      .93589               |   c.fairtreat#|   c.fairtreat#|   c.fairtreat |  -.0479178   .0215204    -2.23   0.026    -.0901333   -.0057022               |         _cons |   1.324045   .8063997     1.64   0.101    -.2578329    2.905923 -------------------------------------------------------------------------------            

Margins plot:

                                  margins,                    at(fairtreat = (1(1)5))                  marginsplot                              

Approach 3: Creating a Categorical Variable

A second way we can account for non-linearity in a lienar regression is through transforming our continuous variable into categories. Age is a very common variable to see as categorical in models. We can capture some aspects of nonlinearity with ordered categories, but it may not be as precise as working with squared or cubed terms.

Let's run through an example:

STEP 1: Evaluate what categories I want to create

                          Composite: General climate -------------------------------------------------------------       Percentiles      Smallest  1%     1.571429              1  5%     2.285714              1 10%     2.714286              1       Obs               1,797 25%     3.142857       1.142857       Sum of wgt.       1,797  50%     3.714286                      Mean           3.607732                         Largest       Std. dev.      .7253975 75%     4.142857              5 90%     4.571429              5       Variance       .5262015 95%     4.714286              5       Skewness       -.500229 99%            5              5       Kurtosis       3.205013          

It looks pretty evenly spread across the range, so I'm going to create five categories.

STEP 2: Create the Category

                                                gen                  climategen_cat =.                                  replace                  climategen_cat =1                  if                  climate_gen >=1 & climate_gen<2                                  replace                  climategen_cat =2                  if                  climate_gen >=2 & climate_gen<3                                  replace                  climategen_cat =3                  if                  climate_gen >=3 & climate_gen<4                                  replace                  climategen_cat =4                  if                  climate_gen >=4 & climate_gen<5                                  replace                  climategen_cat =5                  if                  climate_gen >=5                          

STEP 3: Run regression with indicator

                                                regress                  satisfaction climate_dei instcom fairtreat female ib3.race_5                  ///                                                  i.climategen_cat                          
            >                 i.climategen_cat        Source |       SS           df       MS      Number of obs   =     1,416 -------------+----------------------------------   F(12, 1403)     =     96.90        Model |  679.419147        12  56.6182622   Prob > F        =    0.0000     Residual |  819.764469     1,403  .584293991   R-squared       =    0.4532 -------------+----------------------------------   Adj R-squared   =    0.4485        Total |  1499.18362     1,415  1.05949372   Root MSE        =    .76439  -------------------------------------------------------------------------------  satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval] --------------+----------------------------------------------------------------   climate_dei |   .1449277   .0390442     3.71   0.000     .0683363    .2215191       instcom |   .2859731   .0331552     8.63   0.000      .220934    .3510122     fairtreat |   .1982309   .0381992     5.19   0.000     .1232972    .2731647        female |   -.060501   .0423081    -1.43   0.153    -.1434949    .0224929               |        race_5 |        White  |   .3467763   .0691179     5.02   0.000     .2111907    .4823618         AAPI  |   .3305859   .0764519     4.32   0.000     .1806136    .4805582 Hispanic/L~o  |   .2366182   .0770686     3.07   0.002      .085436    .3878003        Other  |   .2477479   .0882183     2.81   0.005      .074694    .4208018               | climategen_~t |            2  |    .589944   .1564542     3.77   0.000     .2830347    .8968534            3  |    1.04883   .1580689     6.64   0.000     .7387531    1.358907            4  |   1.297643   .1671818     7.76   0.000     .9696895    1.625596            5  |   1.501673   .2290519     6.56   0.000     1.052352    1.950994               |         _cons |   .0892198   .1855177     0.48   0.631    -.2747021    .4531418 -------------------------------------------------------------------------------          

STEP 4: Double-Check linearity with margins

                              margins climategen_cat                marginsplot, noci                          

Interactions

We have finally arrived at interactions. It is finally time for 'margins' to TRULY shine. Wrapping your head around interactions might be difficult at first but here is the simple interpretation for ALL interactions:

The effect of 'var1' on 'var2' varies by 'var3'

OR

The association of 'var1' and 'var2' significantly
differs for each value of 'var3's

Interactions are wonderful because for any combination of variable types. The key thing to be aware of is how you display/interpret it. Let's see some options.

Continous variable x continuous variable

The first thing we are going to look at is the interaction between two continuous variables. Let's run a simple regression interacting climate_dei & instcom. The question I'm asking here then is: Does the effect of people's overall sense of DEI climate on their satisfaction differ based on a person's perception of institutional commitment to DEI?

First we run the regression with the interaction term:

                                                      regress                    satisfaction climate_gen undergrad female ib3.race_5                    ///                                                        c.climate_dei##c.instcom                              
              >         c.climate_dei##c.instcom        Source |       SS           df       MS      Number of obs   =     1,428 -------------+----------------------------------   F(10, 1417)     =    116.36        Model |  689.376078        10  68.9376078   Prob > F        =    0.0000     Residual |  839.539188     1,417  .592476491   R-squared       =    0.4509 -------------+----------------------------------   Adj R-squared   =    0.4470        Total |  1528.91527     1,427  1.07141925   Root MSE        =    .76972  -------------------------------------------------------------------------------  satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval] --------------+----------------------------------------------------------------   climate_gen |   .5038651   .0380054    13.26   0.000     .4293123     .578418     undergrad |  -.0216865   .0429315    -0.51   0.614    -.1059026    .0625296        female |  -.0738629   .0425678    -1.74   0.083    -.1573655    .0096397               |        race_5 |        White  |   .4226195   .0679349     6.22   0.000     .2893557    .5558834         AAPI  |   .3526495   .0766823     4.60   0.000     .2022265    .5030726 Hispanic/L~o  |   .3064282    .076708     3.99   0.000     .1559547    .4569016        Other  |    .303079   .0877973     3.45   0.001     .1308523    .4753057               |   climate_dei |   .4501156   .0965718     4.66   0.000     .2606766    .6395546       instcom |   .6256633     .09919     6.31   0.000     .4310882    .8202385               |            c. |   climate_dei#|     c.instcom |  -.0978223   .0272883    -3.58   0.000    -.1513522   -.0442924               |         _cons |  -.9367096   .2943789    -3.18   0.001    -1.514175   -.3592444 -------------------------------------------------------------------------------            

Then we look at the margins plot. Because I'm mostly interested in what the graph looks like, I've added quietly to the front of the margins command. This tells Stata to run the margins command in the background without displaying the results in the console or in your log.

                                                      quietly                    margins,                    at(climate_dei=(1(1)5) instcom=(1(1)5))                  marginsplot                              

When creating a margins plot with a continuous x continuous interaction:

  • You need to specify the (min(interval)max) to tell STATA which predicted values to calculate for the plot.
  • Because both variables are continuous and you want STATA to calculate for each combination of two numbers, you have to put both in the same_at(xxx) bracket so STATA knows to interact them.

Interpretation:

  1. The association between rating of DEI climate and satisfaction is MODERATED by perception of the institution's commitment to DEI.
  2. The association between rating of DEI climate and satisfaction varies based on perception of the institution's commitment to DEI.
  3. For students with low perception of the institution's commitment to DEI, increased DEI climate ratings are associated with an significant increase in satisfaction. As perception of the institution's commitment to DEI increases, the effect of DEI climate on satisfaction dampens (the slope gets less steep).

Sometimes, you may decide that interpreting this relationship in this direction is difficult to interpret/doesn't make sense. In situations like that, you might want to change what is your key 'x' and your 'moderator' variable. Essentially, you are switching your x and y axis.

One way to do this is to switch which variable comes first in the _at() bracket:

                                                      quietly                    margins,                    at(instcom=(1(1)5) climate_dei=(1(1)5))                  marginsplot                              

The other way is to tell marginsplot which variable to 'plot' (present as *moderator on the graph essentially:

                                                      quietly                    margins,                    at(climate_dei=(1(1)5) instcom=(1(1)5))                  marginsplot, plot(climate_dei)                              

And both graphs come out the same.

Updated Interpretation:
Because we switched which variable is the moderator, our interpration of the relationship changes.

  1. The association between perception of institutional commitment to DEI and satisfaction is MODERATED by the rating of DEI climate.
  2. The association between perception of institutional commitment to DEI and satisfaction varies based on rating of DEI climate.
  3. For students who rate the DEI climate lower, increased perception of institutional commitment to DEI is associated with higher satisfaction. For more positive ratings of DEI climate, the positive effect of perception of institutional commitment to DEI on satisfaction is dampened.

One last thing you can change is the number of lines that appear on the graph.
Approach 1: change the intervals

                                                      quietly                    margins,                    at(instcom=(1(1)5) climate_dei=(1(2)5))                  marginsplot                              

Approach 2: specify the values that should be predicted

                                                      quietly                    margins,                    at(instcom=(1(1)5) climate_dei=(1 3 5))                  marginsplot                              

Continuous variable x dummy variable

Once you get a handle on continuous variables, the continuous dummy variable is extremely straightforward.

First run the regression.

                                                      regress                    satisfaction climate_gen instcom ib3.race_5 i.female##c.climate_dei                              
                              Source |       SS           df       MS      Number of obs   =     1,428 -------------+----------------------------------   F(9, 1418)      =    128.65        Model |    687.2481         9     76.3609   Prob > F        =    0.0000     Residual |  841.667166     1,418  .593559356   R-squared       =    0.4495 -------------+----------------------------------   Adj R-squared   =    0.4460        Total |  1528.91527     1,427  1.07141925   Root MSE        =    .77043  -------------------------------------------------------------------------------  satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval] --------------+----------------------------------------------------------------   climate_gen |   .5208932   .0368994    14.12   0.000     .4485099    .5932765       instcom |   .2856222   .0326096     8.76   0.000     .2216539    .3495904               |        race_5 |        White  |   .4265764   .0679328     6.28   0.000     .2933168     .559836         AAPI  |   .3730469    .076419     4.88   0.000     .2231404    .5229533 Hispanic/L~o  |   .3190265   .0767491     4.16   0.000     .1684726    .4695805        Other  |   .3101714   .0877853     3.53   0.000     .1379685    .4823743               |        female |       Female  |   -.650237   .1951897    -3.33   0.001    -1.033129   -.2673455   climate_dei |   .0498149   .0471972     1.06   0.291    -.0427689    .1423987               |        female#| c.climate_dei |       Female  |   .1588592   .0519094     3.06   0.002     .0570318    .2606866               |         _cons |    .355042   .1677356     2.12   0.034     .0260054    .6840787 -------------------------------------------------------------------------------            

Then look at the margins plot:

                                                      quietly                    margins female,                    at(climate_dei=(1(1)5))                  marginsplot                              

Interpretation:

  1. The association between rating of DEI climate and satisfaction is MODERATED by gender

  2. The association between rating of DEI climate and satisfaction varies based on a student's gender identity

  3. The positive effect/association of rating of DEI climate on/with satisfaction is stronger for females than males.

Continuous variable x Categorical variable

Categorical variables are often feel most confusing for interactions.

Let's say I'm interested in how climate_dei is moderated by race. Let's look at the regression results:

                                                      regress                    satisfaction climate_gen instcom female i.race_5##c.climate_dei                              
                              Source |       SS           df       MS      Number of obs   =     1,428 -------------+----------------------------------   F(12, 1415)     =     98.65        Model |  696.431535        12  58.0359612   Prob > F        =    0.0000     Residual |  832.483731     1,415  .588327725   R-squared       =    0.4555 -------------+----------------------------------   Adj R-squared   =    0.4509        Total |  1528.91527     1,427  1.07141925   Root MSE        =    .76703  -------------------------------------------------------------------------------  satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval] --------------+----------------------------------------------------------------   climate_gen |   .5163975   .0370079    13.95   0.000     .4438013    .5889937       instcom |   .2820706   .0325448     8.67   0.000     .2182293    .3459118        female |  -.0669802   .0421579    -1.59   0.112    -.1496789    .0157186               |        race_5 |         AAPI  |   .5523891   .3057571     1.81   0.071    -.0473968    1.152175        Black  |  -1.043364   .2810924    -3.71   0.000    -1.594766   -.4919609 Hispanic/L~o  |  -.9812627   .2657842    -3.69   0.000    -1.502636   -.4598894        Other  |  -.2390071   .3184738    -0.75   0.453    -.8637387    .3857245               |   climate_dei |   .0902564   .0489236     1.84   0.065    -.0057142     .186227               |        race_5#| c.climate_dei |         AAPI  |  -.1571012   .0788798    -1.99   0.047     -.311835   -.0023673        Black  |   .1855982   .0830573     2.23   0.026     .0226696    .3485268 Hispanic/L~o  |   .2401252   .0712755     3.37   0.001     .1003082    .3799421        Other  |   .0307093   .0874273     0.35   0.725    -.1407918    .2022105               |         _cons |   .6527546   .1785275     3.66   0.000     .3025476    1.002962 -------------------------------------------------------------------------------            

And then the margins plot:

                                                      quietly                    margins race_5,                    at(climate_dei=(1(1)5))                  marginsplot, noci                              

When creating a margins plot with a continuous x categorical interaction:

  • Plot your variable of interest, that you think is a moderator, on the graph by putting it before the comma in the margins command. In this case we're interested in the effect of race.

Interpretation:

  1. What we see then is how the effect of DEI climate rating on satisfaction varies by racial identity.

Let's say, though, that you're only interested in comparing how DEI and satisfaction differs. You might want to specify which racial groups to plot.

                                                      quietly                    margins,                    at(climate_dei=(1(1)5) race_5=(2 3 4))                  marginsplot                              

Categorical variable x dummy variable

We'll now look at the categorical and dummy variables interaction.

First the regression:

                                                      regress                    satisfaction climate_gen climate_dei instcom undergrad                    ///                                                        i.race_5##i.female                              
              >         i.race_5##i.female        Source |       SS           df       MS      Number of obs   =     1,428 -------------+----------------------------------   F(13, 1414)     =     89.58        Model |  690.509811        13  53.1161393   Prob > F        =    0.0000     Residual |  838.405456     1,414  .592931722   R-squared       =    0.4516 -------------+----------------------------------   Adj R-squared   =    0.4466        Total |  1528.91527     1,427  1.07141925   Root MSE        =    .77002  -------------------------------------------------------------------------------  satisfaction | Coefficient  Std. err.      t    P>|t|     [95% conf. interval] --------------+----------------------------------------------------------------   climate_gen |   .5146673   .0378538    13.60   0.000     .4404117     .588923   climate_dei |   .1383065   .0389207     3.55   0.000      .061958     .214655       instcom |   .2899488   .0328652     8.82   0.000      .225479    .3544186     undergrad |  -.0102032   .0430071    -0.24   0.813    -.0945678    .0741613               |        race_5 |         AAPI  |  -.1500545   .0789974    -1.90   0.058    -.3050192    .0049102        Black  |  -.2799878    .106818    -2.62   0.009    -.4895265   -.0704491 Hispanic/L~o  |  -.1299022   .0869353    -1.49   0.135    -.3004383    .0406339        Other  |   .0592291   .1093019     0.54   0.588    -.1551821    .2736403               |        female |       Female  |  -.0515413   .0643228    -0.80   0.423    -.1777197    .0746371               | race_5#female |  AAPI#Female  |     .20961   .1132215     1.85   0.064    -.0124902    .4317103 Black#Female  |  -.2356153   .1338683    -1.76   0.079    -.4982172    .0269866 Hispanic/L~o #|       Female  |   .0289424   .1185801     0.24   0.807    -.2036695    .2615543 Other#Female  |  -.3206363   .1473326    -2.18   0.030    -.6096503   -.0316223               |         _cons |    .449525    .134609     3.34   0.001     .1854701    .7135798 -------------------------------------------------------------------------------            

And then the margins plot:

                                                      quietly                    margins female,                    at(race=(1(1)5))                  marginsplot                              

The first thing to notice is how ENTIRELY unhelpful this graph is because of how many things are happening. The way to do it is to break it down:

  1. FOCUS ON TWO DOTS EACH COLUMN TO SEE GENDER DIFFERENCES IN EACH RACIAL GROUP. We can see the difference between female and male satisfaction for each racial group. We can see, for example, that there is a major difference in satisfaction by gender for black students and students whose identity was grouped into other. Interestingly, the confidence intervals tell us that while the 'other' category's difference is statistically significant, we can't be sure for black students given the overlap.
  2. FOCUS ON LINES TO SEE RACIAL DIFFERENCES IN EACH GENDER CATEGORY. We can see the difference between the races for each gender. We can see for example, that black female students have lower satisfaction than all other female students, and that gap is statistically different with all the groups except women in the 'other' category.

What if we wanted to see these differences more clearly?

APPROACH 1: Change the type of graph we see

                                  marginsplot,                    recast(bar)                    by(female)                              

The 'recast' function allows you to use a different type of graph The 'by' creates a new graph for each value in the specified variable

APPROACH 2: Create margins that show the coefficient differences

                                                      quietly                    margins,                    dydx(female)                    at(race=(1(1)5))                  marginsplot,                    recast(bar)                              

The 'dydx' command calculates the marginal effects of the variable specified. This shows how much more or less satisfaction is for women compared to men for each race. The unit of 'dydx' here: the change in outcome units.

                                                      quietly                    margins female,                    dydx(race)                  marginsplot,                    recast(bar)                    by(female)                              

Here, we see how much more or less satisfaction is for each racial group compared to white students in their shared gender. Here, we care about whether or not the confidence interval crosses over 0. If it does, then we can see that this is likely not statistically significant.

There is no challenge activity in today's lab. Interactions can be challenging to wrap your mind around, but the better you can understand an interaction on a graph the more you will grasp interactions.