Thank you for the comments and tutorial that went above and beyond what was expected. Both proved to be incredibly useful, not just in model building, but also in helping me better communicate and test my research questions. I have reworked all the models using your code examples, incorporating much from your overall approach.
In doing so, I have attempted to address all the concerns/suggestions from the action letter. From that letter, you noted two particular challenges:
To deal with these problems, you offered two primary suggestions:
I have given both suggestions serious thought and have appreciated the opportunity to grapple with them. I have learned a lot and am better for it. In what follows, I would like to walk you through my solutions to these challenges with some of the actual code used to generate the final results. Although not all code is displayed for expository purposes, the complete model solutions can be found as .html files (LP-G_analysis_latency.html, LP-G_analysis_deviation.html, and LP-G_analysis_confidence.html) on my Github site for this project: https://github.com/nickduran/politicalConspiracies
As a reminder, the conspiracy responses could have been endorsed (“belief”) or not endorsed (“nonbelief”). However, this distinction could not be entered straightforwardly as a single factor and separate analyses were run on split datasets: one involving “nonbelief” conspiracies and the other “belief” conspiracies. Each subset entails a unique set of research questions.
I start with the analyses and research questions guiding the nonbelief subset.
The goal is to create a statistical model that will allow a three-way interaction between the main factors in the research design:
This single model would simultaneously test the following five core research questions:
Again, to get at the above, what is most tricky is how the “resp_type” variable is coded and how the planned contrasts are being set up. Another potential problem is of course the “rank deficiency” warning in the LMER models. As I attempt to explain below, I think that keeping with an “Obama conspiracies” and “Bush conspiracies” distinction - at least for the nonbelief subset - is the most effective way of testing these research questions. Also, although “rank-deficiency” warnings in LMER can in some cases indicate problems, I also try to show that it is not a critical issue for the current dataset.
#### Read Data ####
fromprep = read.delim("/Users/nduran/Dropbox (ASU)/POLIBELIEFS/analysis/JESP/REDO/3_Ranalysis/Mouse-Traj-2GIT.csv", header = TRUE,sep=',')
fromprep = tbl_df(fromprep)
fromprep <- within( fromprep, {
group.id <- subjnum ## subject number
item.id <- questionType ## item code
dependent.variable <- value ## action dynamic response variables
## PLEASE NOTE: Giving the main factors new intuitive labels that will be used from here on out!
political.identity <- poliID2
statement.direct <- explicit
BO.GB.GK.GC <- resp_type
} )
c_scale <- function(x) {
scale(x, scale = FALSE) }
fromprep <- within( fromprep, age.c <- c_scale(fromprep$age) )
Preferred Data Coding
fromprep <- within( fromprep, {
condition <- ifelse( BO.GB.GK.GC=="Gen Conspiracy", "Gen Conspiracy",
ifelse( BO.GB.GK.GC=="Right-wing", "Obama",
ifelse( BO.GB.GK.GC=="Left-wing", "Bush",
ifelse( BO.GB.GK.GC=="Gen Knowledge", "Gen Knowledge", NA )) ) )
} )
This preferred data coding creates a condition variable that better highlights the fact that “right-wing” items are just conspiracies about Barack Obama and the “left-wing” items are just conspiracies about George W. Bush. I think this allows a clear distinction for addressing the research questions.
For example, for RQ1, what we really want to know is whether there are differences in how people respond to Obama relative to Bush political conspiracies depending on their political identity. Based on below, this is equivalent to asking whether the difference between latency time for Democrats answering Bush vs. Obama items (472.9817 - 474.9751 = -1.99ms) is different from how Republicans answered Bush vs. Obama items (446.9819 - 491.5314 = -44.55ms). In the upcoming demonstration of the omnibus model, this is equivalent to testing the coefficient corresponding to the political.identity x Bush vs. Obama items contrast.
political.identity | condition | mean | count |
---|---|---|---|
Democrat | Bush | 472.9817 | 2072 |
Democrat | Obama | 474.9751 | 2531 |
Republican | Bush | 446.9819 | 993 |
Republican | Obama | 491.5314 | 589 |
Alternative Data Coding Suggested by AE
fromprep <- within( fromprep, {
condition2 <- ifelse( BO.GB.GK.GC=="Gen Conspiracy", "Gen Conspiracy",
ifelse( BO.GB.GK.GC=="Right-wing" & political.identity=="Republican", "Party Concordant",
ifelse( BO.GB.GK.GC=="Left-wing" & political.identity=="Democrat", "Party Concordant",
ifelse( BO.GB.GK.GC=="Right-wing" & political.identity=="Democrat", "Party Discordant",
ifelse( BO.GB.GK.GC=="Left-wing" & political.identity=="Republican", "Party Discordant",
ifelse( BO.GB.GK.GC=="Gen Knowledge", "Gen Knowledge", NA )))) ) )
} )
I really like this distinction as an alternative method for recoding but I could not figure out how to get it to work for answering the research questions as originally laid out. I tried playing with various iterations but because the concordant and discordant levels include responses from both Republicans and Democrats, it makes the interaction with “political.identity” impossible. That is, I don’t think I can ask whether there are differences in how party concordant and party discordant items are answered relative to each other - as a function of political identity - because political party is now nested within each concordant/discordant level.
political.identity | condition2 | mean | count |
---|---|---|---|
Democrat | Party Concordant | 472.9817 | 2072 |
Democrat | Party Discordant | 474.9751 | 2531 |
Republican | Party Concordant | 491.5314 | 589 |
Republican | Party Discordant | 446.9819 | 993 |
The below just underscores the point that each Party Concordant/Discordant level includes both Bush and Obama items, which changes the values being compared when interacting with other factors.
condition | mean | count |
---|---|---|
Bush | 464.5582 | 3065 |
Obama | 478.1006 | 3120 |
condition2 | mean | count |
---|---|---|
Party Concordant | 477.0876 | 2661 |
Party Discordant | 467.0871 | 3524 |
However, I do want to note that I do use this party concordant/discordant distinction in the “Belief” subset analyses. This distinction works nicely with the research questions being asked there. There is no interaction term with political identity in any of the models, only a “simple effects” style analysis looking within Republicans and Democrats, as separate models.
I can also add a statement to the manuscript, or perhaps in supplementary materials that includes your code, that highlights this idea of concordant/discordant that, although this way of framing the experiment is outside the scope of the immediate questions, this approach could offer a way forward with new analysis or experimentation.
Next, in order to test all five research questions within a single omnibus model, what is particularly tricky (as you note) is how to contrast code the “condition” variable. Again, in light of the specific nature of the research questions, the following contrast code scheme does the job. I explain in greater detail in the following section.
Preferred Contrast Coding
nonbelief.data <- within( subset( fromprep, endorse!="C" ), {
political.identity.b <- ifelse( political.identity=="Democrat", -1/2,
ifelse( political.identity=="Republican", 1/2, NA ) )
statement.direct.w <- ifelse( statement.direct=="Reject as False", 1/2,
ifelse( statement.direct=="Accept as True", -1/2, NA ) )
ObamaVsKnowledge <- ifelse( condition=="Bush", 0,
ifelse( condition=="Obama", 1/2,
ifelse( condition=="Gen Conspiracy", 0,
ifelse( condition=="Gen Knowledge", -1/2, NA ))))
BushVsKnowledge <- ifelse( condition=="Bush", 1/2,
ifelse( condition=="Obama", 0,
ifelse( condition=="Gen Conspiracy", 0,
ifelse( condition=="Gen Knowledge", -1/2, NA ))))
GeneralVsKnowledge <- ifelse( condition=="Bush", 0,
ifelse( condition=="Obama", 0,
ifelse( condition=="Gen Conspiracy", 1/2,
ifelse( condition=="Gen Knowledge", -1/2, NA ))))
} )
Starting with the omnibus, I run the complete model. For expository purposes, I print out the warning message that is generated and just the fixed effects structure.
omnibus.nonbeliever.model <- lmer( dependent.variable ~ (1|group.id) + (1|item.id) + age.c + political.identity.b * statement.direct.w * (ObamaVsKnowledge + BushVsKnowledge + GeneralVsKnowledge), data=nonbelief.data, REML=FALSE, na.action = na.exclude, subset=variable=="latency" )
## fixed-effect model matrix is rank deficient so dropping 2 columns / coefficients
## Estimate
## (Intercept) 469.761002
## age.c 3.390742
## political.identity.b -7.489793
## statement.direct.w -13.418422
## ObamaVsKnowledge 43.735805
## BushVsKnowledge -6.970058
## GeneralVsKnowledge 7.431752
## political.identity.b:statement.direct.w -18.623772
## political.identity.b:ObamaVsKnowledge 51.861862
## political.identity.b:BushVsKnowledge -70.369420
## political.identity.b:GeneralVsKnowledge 37.382983
## statement.direct.w:ObamaVsKnowledge -1.953055
## statement.direct.w:BushVsKnowledge -66.573438
## political.identity.b:statement.direct.w:ObamaVsKnowledge -73.126469
## political.identity.b:statement.direct.w:BushVsKnowledge 49.619711
The above is meant to draw attention to two major issues. This first is that the model still produces a rank-deficiency warning, and the second is that the coefficients printed out in the fixed effects structure seemingly do not allow for all five research questions to be tested (e.g., there is no explicit Bush vs. Obama contrast). In what follows, I will attempt to justify:
AE: “Moreover, you have to model an estimable model and not allow the model to arbitrarily drop some of your contrasts. By allowing the final model to remain rank deficient, some contrasts from the resp_type condition variable are dropped. That means that all four levels of the resp_type variable are not specified mathematically, so you can’t leave it like that.”
AE: “Moreover, the rank deficiency isn’t purely a random response pattern thing, but rather it is due to the way that your “explicit” and “endorse” variables are coded. For the General Conspiracy prompts, the explicit variable’s “Accept as True” and “Reject as False” options map on 1:1 with the endorse variable’s “C” or “NC” responses, respectively. As a result, there are never any “Reject as False” for the endorsed general conspiracies and never any “Accept as True” for the non-endorsed conspiracies.”"
It is true that the model still shows a “rank-deficiency” warning stating that two coefficients have been dropped. These two coefficients would have corresponded to the tests:
And as you note, this is not a random dropping of contrasts. Rather, the experimental design does not allow these contrasts to be tested. That is, the warning is because one of the variables, “condition,” is not expressed in complete combination with the “statement.direct” variable. The problem is built into the data that was sampled as there are no “Gen Conspiracy, Accept as True” items (as can be seen below):
## Source: local data frame [14 x 4]
## Groups: condition, political.identity [?]
##
## condition political.identity statement.direct count
## <chr> <fctr> <fctr> <int>
## 1 Bush Democrat Accept as True 966
## 2 Bush Democrat Reject as False 1106
## 3 Bush Republican Accept as True 474
## 4 Bush Republican Reject as False 519
## 5 Gen Conspiracy Democrat Reject as False 2146
## 6 Gen Conspiracy Republican Reject as False 839
## 7 Gen Knowledge Democrat Accept as True 4311
## 8 Gen Knowledge Democrat Reject as False 4453
## 9 Gen Knowledge Republican Accept as True 1626
## 10 Gen Knowledge Republican Reject as False 1756
## 11 Obama Democrat Accept as True 1282
## 12 Obama Democrat Reject as False 1249
## 13 Obama Republican Accept as True 315
## 14 Obama Republican Reject as False 274
According to online posts from Ben Bolker (one of the creators of LMER) and others, the warning message should not be interpreted necessarily as an error. That is, the warning could indicate problems, e.g., inadequate amount of data, too much redundancy in the data, comparing variables on radically different scales, not properly centering/scaling data with big numbers - but in the current case, the warning is because of the linear combination issue introduced by the lack of data for the “Gen Conspiracy, Accept as True” items.
What LMER does in this situation is to simply ignore the interactions that would have involved these items. In this way, LMER is only dealing with a full-rank subspace (in which everything is mathematically specified). By ignoring these interactions, it does not affect the other contrasts that are tested.
With some further digging, I also tested whether the “rank deficiency” warning is because of the linear combination issue involving the nonexistent “Gen Conspiracy, Accept as True” items. I ran the “findLinearCombos” function from the caret package that does the same QR decomposition as LMER in identifying the contrasts that will be ignored.
Here I specify the fixed effects structure that is being submitted to the omnibus model. These are the coefficients assuming that all levels can be linearlly combined:
fix.formula <- dependent.variable ~ age.c + political.identity.b * statement.direct.w * (ObamaVsKnowledge + BushVsKnowledge + GeneralVsKnowledge)
fix.matrix <- model.matrix (fix.formula, subset(nonbelief.data,variable=="latency"))
names(data.frame(fix.matrix))
## [1] "X.Intercept."
## [2] "age.c"
## [3] "political.identity.b"
## [4] "statement.direct.w"
## [5] "ObamaVsKnowledge"
## [6] "BushVsKnowledge"
## [7] "GeneralVsKnowledge"
## [8] "political.identity.b.statement.direct.w"
## [9] "political.identity.b.ObamaVsKnowledge"
## [10] "political.identity.b.BushVsKnowledge"
## [11] "political.identity.b.GeneralVsKnowledge"
## [12] "statement.direct.w.ObamaVsKnowledge"
## [13] "statement.direct.w.BushVsKnowledge"
## [14] "statement.direct.w.GeneralVsKnowledge"
## [15] "political.identity.b.statement.direct.w.ObamaVsKnowledge"
## [16] "political.identity.b.statement.direct.w.BushVsKnowledge"
## [17] "political.identity.b.statement.direct.w.GeneralVsKnowledge"
Next, I test for which of the comparisons are going to be removed by LMER because they cannot be linearly combined. These are displayed below and correspond to the missing coefficients in the omnibus.
names(data.frame(fix.matrix))[findLinearCombos(fix.matrix)$remove]
## [1] "statement.direct.w.GeneralVsKnowledge"
## [2] "political.identity.b.statement.direct.w.GeneralVsKnowledge"
Not having these two contrasts would be problematic if my research questions involved these contrasts in some way. It does not appear problematic mathematically in terms of how the other tests are being generated.
References
If the above claim is true that LMER merely ignores the interactions in the omnibus that would have involved “Gen Conspiracy, Accept as True” items (which don’t exist), without affecting how the other coefficients are computed, then, when I run a model that explicitly removes these interactions, it should be identical to the omnibus model results reported above.
Indeed, for this model that manually identifies all relevant tests (below), the results are identical to the model where the “rank deficiency” warning was being generated. Also, no “rank deficiency” warning is generated for this manually-specified model.
omnibus.nonbeliever.TEST <- lmer( dependent.variable ~ (1|group.id) + (1|item.id) + age.c +
political.identity.b +
statement.direct.w +
ObamaVsKnowledge +
BushVsKnowledge +
GeneralVsKnowledge +
political.identity.b:statement.direct.w +
political.identity.b:ObamaVsKnowledge +
political.identity.b:BushVsKnowledge +
political.identity.b:GeneralVsKnowledge +
statement.direct.w:ObamaVsKnowledge +
statement.direct.w:BushVsKnowledge +
political.identity.b:statement.direct.w:ObamaVsKnowledge +
political.identity.b:statement.direct.w:BushVsKnowledge,
data=nonbelief.data, REML=FALSE, na.action = na.exclude, subset=variable=="latency" )
df1 = data.frame(summary(omnibus.nonbeliever.TEST)$coefficients)
df1[1]
## Estimate
## (Intercept) 469.761002
## age.c 3.390742
## political.identity.b -7.489793
## statement.direct.w -13.418422
## ObamaVsKnowledge 43.735805
## BushVsKnowledge -6.970058
## GeneralVsKnowledge 7.431752
## political.identity.b:statement.direct.w -18.623772
## political.identity.b:ObamaVsKnowledge 51.861862
## political.identity.b:BushVsKnowledge -70.369420
## political.identity.b:GeneralVsKnowledge 37.382983
## statement.direct.w:ObamaVsKnowledge -1.953055
## statement.direct.w:BushVsKnowledge -66.573438
## political.identity.b:statement.direct.w:ObamaVsKnowledge -73.126469
## political.identity.b:statement.direct.w:BushVsKnowledge 49.619711
To interpret the coefficients from the omnibus, we of course have to first show that the overall interactions involving the “condition” variable are statistically significant. This was ommitted from the original manuscript but is now included in the final code (please see the aforementioned .html files for the complete analysis).
Assuming that the interactions are significant (which they are in this case for latency), we can now address all five research questions in the one omnibus model. But to do so, I have to use “ghlt” from the “multcomp” R package. To explain:
Here again is the fixed effects tests from the omnibus involving latency DV, with the rows numbered:
## [1] "(Intercept)"
## [2] "age.c"
## [3] "political.identity.b"
## [4] "statement.direct.w"
## [5] "ObamaVsKnowledge"
## [6] "BushVsKnowledge"
## [7] "GeneralVsKnowledge"
## [8] "political.identity.b:statement.direct.w"
## [9] "political.identity.b:ObamaVsKnowledge"
## [10] "political.identity.b:BushVsKnowledge"
## [11] "political.identity.b:GeneralVsKnowledge"
## [12] "statement.direct.w:ObamaVsKnowledge"
## [13] "statement.direct.w:BushVsKnowledge"
## [14] "political.identity.b:statement.direct.w:ObamaVsKnowledge"
## [15] "political.identity.b:statement.direct.w:BushVsKnowledge"
Each of these coefficients can be directly tested by glht, but as you mentioned in your sample code, this is redundant as a t statistic is already generated for each. But where glht is handy is in testing two of the coefficients against each other. This can be done so long as they share the same baseline in the original contrast scheme. For example, estimates [9] vs. [10] can be tested because they both share “general knowledge” as baseline (see coding scheme above). The only difference is the “Obama” vs. “Bush” distinction. By doing this test, we are in fact testing RQ1: “Are there differences in how people respond to Obama (right-wing items) relative to Bush (left-wing items) political conspiracies depending on their political identity? (two-way interaction).”
planCont = rbind(
# note that "Bush" items become the new baseline
"RQ1.2way" = c(0, 0, 0, 0, 0, 0, 0, 0, 1, -1, 0, 0, 0, 0, 0))
Moreover, by testing, [14] vs. [15], we can test the follow-up RQ1: “Does this depend on whether the conspiracies are stated directly or indirectly? (three-way interaction)”
planCont = rbind(
# note that "Bush" items become the new baseline
"RQ1.3way" = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, -1))
This can be continued for all five of the research questions:
Hopefully the above shows how all research questions can be tested in one omnibus model, using all data, with the help of helper functions like ghlt. I believe this is the most comprehensive and parsimonious way of doing this sort of analysis.
As mentioned earlier, the “Belief” subset introduces challenges in terms of data/response distributions not evident in the “Nonbelief” subset. But more importantly, the “Belief” subset also engenders new research questions based on the idea of testing whether participants show an accuracy bias when endorsing party-concordant conspiratorial information. How responders answered party-discordant responses is of little interest in light of this bias. Also, the empty cells are most pronounced with endorsing party-discordant responses, as this behavior was not engaged in by a large number of participants.
The above leads to these specific research questions:
To answer these questions, I am only looking at “within party.” That is, the interaction with political.identity is not necessary. As such, this makes the recoding of responses into party-concordant vs. party-discordant the most intuitive because I am only interested in the party-concordant endorsement of political conspiracies and not a distinction between right-wing and left-wing as a function of political party. It also allows use of the the suggested “simple effects” coding involving Republicans and Democrats. Below is the proposed sequence of steps for testing each research question:
Step 1: Create coding scheme separating responses into a clear “Party Concordant” level.
fromprep2 <- within( fromprep, {
condition.belief <- ifelse( resp_type=="Gen Conspiracy", "Gen Conspiracy",
ifelse( resp_type=="Right-wing" & poliID2=="Republican", "Party Concordant",
ifelse( resp_type=="Left-wing" & poliID2=="Democrat", "Party Concordant",
ifelse( resp_type=="Right-wing" & poliID2=="Democrat", "Party Discordant",
ifelse( resp_type=="Left-wing" & poliID2=="Republican", "Party Discordant",
ifelse( resp_type=="Gen Knowledge", "Gen Knowledge", NA )))) ) )
} )
Step 2: Create contrasts to do “simple effects” testing within each political party. Also set contrasts to examine the one critical contrast (that will be allowed to interact with statement.direct.w in the statistical model).
belief.data <- within( subset( fromprep2, endorse!="NC" ), {
## creating new contrast coding involving political party to do "within party" analysis
republican <- ifelse( poliID2=="Republican", 0, ifelse( poliID2=="Democrat", 1, NA ) )
democrat <- ifelse( poliID2=="Democrat", 0, ifelse( poliID2=="Republican", 1, NA ) )
statement.direct.w <- ifelse( statement.direct=="Reject as False", 1/2,
ifelse( statement.direct=="Accept as True", -1/2, NA ) )
noGCPD.PartyVsKnowledge <- ifelse( condition.belief=="Party Concordant", 1/2,
ifelse( condition.belief=="Party Discordant", NA,
ifelse( condition.belief=="Gen Conspiracy", NA,
ifelse( condition.belief=="Gen Knowledge", -1/2, NA ))))
} )
Republicans
republican.omnibus.believer.model <- lmer( dependent.variable ~ (1|group.id) + (1|item.id) + age.c + statement.direct.w * republican * (noGCPD.PartyVsKnowledge), data=belief.data, REML=FALSE, na.action = na.exclude, subset=variable=="latency" )
For RQ1, I can directly test “noGCPD.PartyVsKnowledge” and “statement.direct.w:noGCPD.PartyVsKnowledge”
## Estimate
## (Intercept) 456.121668
## age.c 3.157937
## statement.direct.w 26.099987
## republican 25.205933
## noGCPD.PartyVsKnowledge 34.478090
## statement.direct.w:republican 29.111752
## statement.direct.w:noGCPD.PartyVsKnowledge 15.550880
## republican:noGCPD.PartyVsKnowledge 15.934709
## statement.direct.w:republican:noGCPD.PartyVsKnowledge 45.893537
Democrats
democrat.omnibus.believer.model <- lmer( dependent.variable ~ (1|group.id) + (1|item.id) + age.c + statement.direct.w * democrat * (noGCPD.PartyVsKnowledge), data=belief.data, REML=FALSE, na.action = na.exclude, subset=variable=="latency" )
For RQ2, again, I can directly test “noGCPD.PartyVsKnowledge” and “statement.direct.w:noGCPD.PartyVsKnowledge”
## Estimate
## (Intercept) 481.327601
## age.c 3.157937
## statement.direct.w 55.211739
## democrat -25.205933
## noGCPD.PartyVsKnowledge 50.412798
## statement.direct.w:democrat -29.111752
## statement.direct.w:noGCPD.PartyVsKnowledge 61.444417
## democrat:noGCPD.PartyVsKnowledge -15.934708
## statement.direct.w:democrat:noGCPD.PartyVsKnowledge -45.893537
The above also addresses this comment:
AE: “One trick that I think is true — but you should think harder about it, since you are the actual experts on this topic — is that you need to recode the “left-wing/right-wing” conspiracies into categories of “Party Concordant” and “Party Discordant” conspiracies. This minimizes some of the empty cells, especially for the “belief” subset. That is, for the belief subset, only “party concordant” conspiracies are ever endorsed as true or rejected as false, which means “Left-wing” for Democrats and “Right-Wing” for Republicans. If you leave it as “right-wing”/“left-wing,” then you cannot examine the interaction of explicit and resp_type in the belief subset.”