I've been thinking a great deal lately about how best to study moral judgment. Let me say a few things myself, but I hope others will be able to chime in and share ideas, especially since I'm in the process of designing some studies.
What's best to measure?
So far it seems most researchers focus on measuring something other than what we might call purely "evaluative judgments," such as whether someone did something good or bad. This seems right to me since these don't necessarily constitute judgments about whether an action is right or wrong, which is paradigmatically a moral verdict. Moreover, this just seems to be the main target of most people doing research on moral judgment. For example, we want to know whether people think killing one to save five is morally wrong, not (merely) whether they think it involves doing some good. One might provide different answers to the corresponding questions.
So I think rightly the focus has been instead on what we might call "deontic judgments" such as whether an act is right/wrong, permissible/impermissible, etc. However, this then leads to issues about how to measure such judgments.
How best to measure it?
Some researchers present participants with a forced, dichotomous choice. They ask subjects to answer "Yes" or "No" to whether what the protagonist did was, say, permissible. (John Mikhail's work is a key example.) This has the advantage of quite clearly and straightforwardly yielding results that measure deontic judgments and places participants clearly into expressing one judgment or the other.
One potential problem with this, however, is that the forced-choice situation provides no option for participants to register uncertainty. (In fact, I suspect this has caused some trouble with some of this work. Maybe more on this some other time.) Another issue is that this is a "nominal" or "categorical" variable, which prevents a more fine-grained look at the data and makes certain statistical tests inappropriate.
Another approach is to present subjects with scales. Some researchers use something like:
0 = perfectly OK ... 9 = extremely wrong
(Compare Haidt and his various collaborators.)
One issue with this approach is that it's unclear, to me at least, whether wrongness really comes in degrees. Tom Hurka has recently argued that it does, at least in a certain sense (in a blog post over at PEA Soup). I have my doubts, which I've recently, albeit briefly, expressed in a paper. I worry that the concepts of right and wrong (as opposed to good and bad e.g.) come in degrees in only a loose sense, as when we might say: "Murder is extremely illegal." In the comments section of his post, Hurka acknowledges that we might have to add "more seriously" to "wrong" to express a concept that comes in degrees. I wonder what other think, but I at least want to flag that this is an issue.
One could switch to a purely evaluative scale, such as:
1 = Very Good ... 7 = Very Bad
But, as I suggested before, I doubt this will shed much light on judgments about what is right and wrong, which I suspect are the primary targets of most researchers in this area.
Another Tack?
This all makes me wonder about the merits of taking a different approach. In other areas of experimental philosophy, researchers have tended to stick closer to the Likert-style scales, which are framed in terms of degree of agreement or disagreement, such as:
1 = strongly disagree, 2 = mildly disagree ... 7 = strongly agree
(Compare Liao et al 2012, "Putting the Trolley in Order")
Participants are then presented with statements and asked for their degree of agreement or disagreement with it (e.g. "It was morally wrong of Sam to steal the necklace.")
A similar approach has been pursued recently, albeit briefly, by Aaron Zimmerman in his forthcoming commentary on Mikhail's book in which he reports some new results. Zimmerman (along with John Caravello) presented participants with a scale that measures something like confidence in a claim (which is close to degree of agreement):
Should Jamie throw the switch?
(definitely yes) 1 - 2 - 3 - 4 - 5 - 6 (definitely no)
Focusing on something like agreement or confidence has the advantage of more clearly coming in degrees. So one needn't take a stance on whether deontic properties themselves are gradable.
Some worry that the usual mid-point of "Neither agree nor disagree" in Likert scales precludes equal intervals, since it seems to be a separate category. But some have taken to using "in between" for the midpoints, which seems a reasonable approach. And Zimmerman opted for no midpoint at all.
(I think there's another advantage of focusing on something like agreement, but that involves a further topic. I might take that up another time.)
These are just some thoughts I figured I'd make public. I'm curious to hear what others think about how best to study moral judgment. In particular, are there significant costs to measuring degree of agreement with moral claims rather than judgments about (alleged) degrees of morality?
Hi Josh,
(1)
Why not stagger questions:
(a) If you were forced to choose would you say this is morally permissible or not?
(b) How confident would you be in your answer here (1 - 10)?
You could start with a sample to prime the subjects to make use of the framework.
(2) I think you might do ok sticking with permissible as I am not seeing how Hurka's stuff will ground any claims about that being gradable (if it is wrong at all, it is not permissible, right?). Similarly, I am not sure his stuff on wrongness will extend to obligation in a straightforward way.
(3) Last, I noticed one of the quoted questions was "should Josh.." I think people should be careful to distinguish should and ought claims from obligation/permission
Posted by: Brad Cokelet | 05/27/2013 at 12:09 AM
Or you could use a scale with thicker terms coupled with the epistemic question:
(1) This act is...[scale]
1 morally horrific/evil
2 morally impermissible
3 morally indecent/regrettable
4 morally indifferent/permissible
5 morally commendable
6 morally heroic/saintly
(2) How confident are you in your answer?
1-10
Posted by: Brad Cokelet | 05/27/2013 at 12:22 AM
Good points, Brad. And thanks for the feedback!
On (1): I like your initial idea, and I believe some have pursued it, although perhaps not in moral judgment research.
I'm not sure about your second proposal in your second comment, though. Your 1-6 scale doesn't clearly have equal intervals between steps, in which case it isn't a scale variable. It would have to be treated as nominal/categorical, which would be much less useful data with so many categories.
On (2): I see, you're thinking Hurka is only talking about wrongness/rightness and that this won't transfer over to permissibility/impermissibility anyhow. I guess I was just thinking that Hurka's discussion of wrongness/rightness lead me to think about whether deontic categories do come in degrees. So it's not Hurka's theory in particular that worries me (especially given how he qualifies it), but rather just the general issue his post raises.
On (3): Yes, this is somewhat dissatisfying about Zimmerman's study. Still, I think he's okay as far as targeting Mikhail. Mikhail's theory presumably predicts that people will say the agent shouldn't flip the switch in Loop, yet they did.
(Maybe Mikhail needn't predict that people will say the agent should flip the switch in Man in Front, since "permissible" doesn't necessarily entail "should." But the problem Zimmerman is raising for Mikhail is primarily that he didn't replicate the low permissibility judgments in Loop. Zimmerman's results for his version of Man in Front pretty closely match Mikhail's.)
Posted by: Josh May | 05/27/2013 at 12:29 AM
Hi Josh,
Great question! I've experimented with a lot of different ways to probe for cognitive evaluations. I like the "definitely yes" --> "definitely no" approach, which one might think of as a "first-order" scale for collecting responses, which asks people to judge X, as opposed to the typical "second-order" or "meta" scale one sees with Likerts that ask people to report on their view of X.
One might suspect – as I once did – that the difference between the 1st and 2nd order sclae could matter a great deal. However, in my experience, results observed with the 1st-order scale never differed significantly from what I observed with analogous 2nd-order scales. (I've not written anything on this and I'm going from memory here, so take it for what it's worth.)
In the end, I tend to favor the rather unoriginal view that when we think we've discovered something interesting about people's opinions on a certain matter, probing in different ways to see if it holds up is worthwhile (though not necessarily obligatory). If it does, great. If it doesn't, that's important information too.
By the way, I think 'neutral' and 'unclear' are perfectly respectable midpoints that don't move one off of the spectrum, and which are more natural than the 'neither/nor' or 'in between' anchors.
Posted by: John Turri | 05/27/2013 at 06:40 PM
My first reaction was like John Turri's. I wonder whether there is any large difference between the various ways of asking people for moral judgments. Should the difference be filed away with question order and font size, or are they really getting at distinct judgments people tend to make?
If you could find that two sets of questions, e.g. judging confidence that P is wrong vs. judging wrongness of P, were poorly correlated or can come apart, this could be very interesting. It would also be mildly interesting to know that they do not come apart, as 'certainty of it being wrong' and 'how wrong it is' might be psychologically equivalent in the questionnaires.
Posted by: Taylor | 05/29/2013 at 12:11 AM
Thanks for the comments, John and Taylor! You both make excellent points.
Taylor, I like your idea that one could empirically test some of these issues. I've been thinking about that myself. But I'm not entirely sure about your proposal about how to test it. If responses on the two scales turned out to *not* be correlated, then that does seem to indicate that participants treat them differently.
However, if the two were highly correlated, matters might be less clear. That result seems compatible with a number of hypotheses, such as:
(1) People are plotting their response to the morality question based on how they'd answer the confidence question. (I float this idea in the paper of mine I link to in the post.)
(2) The reverse: people are treating the claims about confidence as about degrees of morality. (I've heard someone propose this in conversation, not about moral judgments but about knowledge attributions.)
(3) Participants treat the two as asking different questions but just happen to respond similarly. (But perhaps this hypothesis could be discarded as too implausible if the correlation holds up consistently.)
Anyway, as I say, it seems quite appropriate to tackle some of these issues experimentally, which is also in the spirit of John's idea that it's good to probe further once we find an initial result on whatever scale was used.
Posted by: Josh May | 05/29/2013 at 01:11 AM
Thanks Josh.
I was struggling with my questionnaire. You saved my life with this post.
Sincerely yours
Hosein
Posted by: Hosein | 05/30/2013 at 03:26 AM
Hi Josh,
Just a follow on the degrees issue. Degrees of wrongness can be motivated by thinking of various actions and comparing their apparent degree of wrongness. Some violations of moral obligation, for example, just do seem worse than others. Now this suggests that some obligations are more *important* than others and that violations of obligations can vary when it comes to their *moral seriousness*. But I do not see how this line of thought will support the idea that obligations or permissions themselves come in degrees. The idea that one obligation violations is worse than another gives no support to the idea that one can be more or less obligated or more or less permitted. So I think you would need to provide a new argument to motivate the later idea (and I am not seeing one right now).
Posted by: Brad Cokelet | 05/30/2013 at 02:45 PM
I see what you say about my previous categories, but I wonder whether people are running experiments in deontic and non-deontic categories and comparing the results? (real question!)
For example one could have this, in addition to the version that asks about permissiblity:
(a) I lean towards thinking this action is:
1: Morally Evil
5: Morally indifferent
10: Morally Admirable
(b) How Confident are you in your judgement?
Posted by: Brad Cokelet | 05/30/2013 at 02:49 PM
Josh, I agree that these issues about the importance of small variations in wording of the question are important. How else to tell whether two questions are in fact asking different things in empirical studies?
As for your 3 hypotheses, I have two other possible explanations, which are not mutually exclusive.
(4) Simple correlation: The more wrong some type of action is, the more overriding concerns are required to make it permissible in a concrete situation, and vice versa, and people have learned this. Murder is super wrong and so rarely permissible, and we can be super confident that any particular act of murder is wrong because exceptions are unlikely. Here I worry I am just ignorant of the relevant literature, but it does not seem impossible. Is there anything that is extremely wrong like murdering babies but in fact is almost always the right thing to do?
(5) Same mechanism: The same cognitive systems produce both judgments. There is a distinction between mechanisms underlying evaluations of certainty and subjective value and in the neuroeconomics literature on discounting, which might be of use. (The discounting literature focuses on temporal discounting, but some also consider probabilistic discounting.) If moral certainty and moral wrongness use the same system, it would be interesting that people tend to process both questions as valuation questions. How exactly to test such questions is unclear, but it might be possible to link the test to those already done on probabilistic discounting and valuation in the non-moral domain. Moral certainty : rightness :: reward certainty : subjective value.
Posted by: Taylor Murphy | 05/30/2013 at 08:02 PM
Brad, about degrees:
I think you're exactly right. It sounds like you read me as defending the idea that rightness/wrongness is gradable, but on the contrary I meant to be expressing serious doubts about it. But I was trying to take the opposing idea seriously because people like Hurka (and others in conversation) seem to. Maybe you're right, though, that there isn't even a way to motivate it! I suppose one could point to the fact that participants in the relevant studies worked with a scale just fine. But then of course the other worries kick in.
Posted by: Josh May | 05/30/2013 at 11:16 PM