Cash rules everything?

[5th post in a series; start with the first post here]

This week, I’ve talked about the controversy over whether or not incentives reduce intrinsic motivation, leading to less task engagement than if the incentive had never been offered. Yesterday, I covered Indranil Goswami’s research with me, which tests this idea in repeated task choices, and finds that the post-incentive reduction is brief and small (relative to the during-incentive increase).

So, what does this mean for the original theories? Remember, paying people was supposed to feel controlling and reduce autonomy and therefore supplant people’s own intrinsic reasons for doing a task.

stock-illustration-19166718-pig-on-a-donkey

Maybe the theory is mostly right, just wrong about the degree and duration of the impact it has.  Maybe intrinsic motivation is reduced, but bounces back after giving people a little time to forget about the payment.  Or maybe giving people the opportunity to make a few choices on their own without any external influence resets their intrinsic motivation.

But maybe the theory is just wrong about the effects of payments.  Perhaps people are trying to manage the eternal tradeoff between effort and leisure over time.  If they want a mix of both effort and leisure, then when they have a good reason to invest more in effort for a while, they do.  But then, when the incentive is gone, it’s an opportunity to balance it back out by taking a break and engaging in more leisure.

For research purposes, it’s helpful that the Deci, Ryan and Koestner paper is quite clear about the theoretical process. This can be used to make predictions about how motivation should change depending on the context, according to their theory. Indranil designed several studies to test between these accounts.

In one study, he gave people different kinds of breaks after the incentive ended. He found that giving people a brief break eliminated the initial post-incentive decline, but only if the break did not involve making difficult choices.  So, it doesn’t seem like giving people the opportunity to make their own choices took away the sting of controlling incentives, making the negative effects brief.  Instead, it’s that giving people a little leisure made them willing to dive back into the math problems.

But how about a more direct test?  The intrinsic motivation theory predicts that the negative effects of incentives should be more pronounced, the more intrinsically motivating the task is.  If I really enjoy watching videos, and you destroy my love of video-watching for its own sake with an incentive, my behavior afterwards when there is no incentive should be really different.  On the other hand, if the post-incentive decline is because I was working hard during the incentive and need a break, then I should be less interested in changing tasks after being paid to do an easy and fun task.

To test this, Indranil took his experimental setup and varied which task was incentivized, paying some people for every video they watched in Round 2, and others for every math problem they solved.  After being paid to do math, when the payment ended, people wanted a break and initially watched more videos for the first few choices.  But after being paid to watch videos, when the payment ended, there was no difference in their choices, contrary to the intrinsic motivation theory.

Perhaps the most direct test came from simply varying the amount of the payments, in another study, either 1 cent, 5 cents or 50 cents per correct match problem in Round 2.  According to the intrinsic motivation theory, the larger the incentive, the more controlling it is, and the more damage is done to intrinsic motivation.  However, if people are balancing effort and leisure, they may feel less of a need to do so, the better they are paid. As the chart below shows, paying people a high amount (50 cents) lead to not only no immediate post-incentive decline, but summing across all of Round 3, people did more of the math task without any additional payment.  Again, the opposite of what the intrinsic motivation account would predict.

s3

So, what does it all mean?  We need more research to figure out when it is that incentives will have no post-incentive effects, a negative effect or a positive effect.  But at a minimum, our findings strongly suggest that simply offering a temporary incentive does not necessarily harm intrinsic motivation. Instead, it seems that when an incentive gets people to work harder than they would have otherwise, they just want to take a break afterwards.

 

 

The dynamic effects of incentives on task engagement.

[4th post in a series; start with the first post here]

This week, I’ve laid out a puzzle: motivational theories and research suggest that incentives reduce intrinsic motivation, so that task engagement is lower when a temporary incentive ends than if the incentive had never been offered in the first place.  This suggests that offering people temporary incentives for performance will backfire.  But tests of incentives in real-world settings have all found either no long-term effects or positive long-term effects.

Indranil Goswami tackled this puzzle in his dissertation, which has just been published in the January issue of JEP:General.  Prior studies generally only measured people’s behavior right after the incentive ended.  Indranil designed a new test to see how motivation to engage in a task compared before, during and after an incentive, and how it changed over time.

In his studies, people are given repeated choices between doing a 30 second math problem and watching a 30 second video.  There were three rounds:

Round 1: Participants made eight choices between math problems and videos.

Round 2: For half the people, an incentive was offered for the math task (5 cents for every time they chose the math problem and solved it correctly). Participants were told that the incentive would only apply in the current round. For the other half of the people, no incentive was mentioned. All participants made ten choices.

Round 3: The key test: all participants made another 12 choices between a math task and watching a video, with no incentives.

People who were given an incentive in Round 2 will probably do more of the math tasks while the incentive is available, compared to those who weren’t paid to do math. But what will happen in Round 3, when the incentive has ended and is no longer available?

According to the psychological theories we’ve been discussing,  people who were never paid still have their intrinsic motivation intact, and will keep doing the math, to the degree they find it interesting. But for people who were paid to do math during Round 2, the math task is now different — it either no longer provides autonomy, or they will have inferred that it’s an uninteresting task –and they won’t want to do it anymore in Round 3, now that they’re not being paid.  So, the results would be predicted to look like this:

PREDICTED RESULTS:

demonstration

Going back to the Deci, Koestner and Ryan paper, Indranil’s experiments correspond to a tangible expected reward that is performance contingent, which resulted in a negative effect on subsequent performance (d=-.28) in the 32 such studies they reviewed. If we were to ask Alfie Kohn, he would presumably endorse the Control (no payment) condition — sure, the payment gives us a short-term increase in the math task during Round 2, but at what cost to long-term motivation?

Economists would tend to disagree, however. Incentives should increase the number of math tasks when people are being paid, but why would it have a negative effect afterwards?  Once the incentive ends, people should go back to doing as many math tasks as they enjoy, as if the incentive had never happened. Or, if the incentive actually helped them improve at the task by getting them to practice more in Round 2, then maybe they would continue to do a bit more than they had been doing before, as suggested by Fryer’s studies, which I discussed in the last post.

Indranil conducted a series of experiments, varying multiple factors.  Across the experiments, he had nearly 1100 participants who were in the versions described above.  This chart summarizes what he found:

ACTUAL RESULTS:

metachart

In Round 1, the two groups were the same.  Then, in Round 2, people who could earn money for solving the math problems did a lot more math problems, when they could earn money.  That makes sense.  The key question is what happens next, when the incentive ends.

In Trial 19, when the incentive ended, those people who had previously been paid to do math were suddenly a lot less likely to choose the math tasks.  They wanted to watch a video, not only more than they had before, but also more than the people who had never been paid an incentive in the first place. So it looks like intrinsic motivation was reduced — but only for a while.  It was still the case but weaker in Trial 20, and the difference was effectively eliminated by Trial 21.

So, after a minute and a half, a mere three choices later, the story had changed.  Whether the person had been paid an incentive or not didn’t matter for their willingness to do math rather than watch videos. After a few more choices, the pattern actually fully reverses, and the people who had been paid before are now doing more math problems, for free.

In a sense, both sets of findings from prior research were vindicated.  The immediate negative post-incentive effects on behavior that had been found in previous lab experiments were found here too.  On the other hand, the lack of an overall negative post-incentive effect observed in the field studies was replicated here as well.

What does it all mean? As far as policy implications, the results are quite inconsistent with the dire warnings about incentives.  Providing a temporary incentive can yield a boost in behavior while people are being paid, and only a small and brief decline afterwards. Maybe incentives work pretty well after all.

 

 

 

The mystery of the missing long-term harm from incentives.

[3d post in a series; start with the first post here]

I’ve been talking about research on incentives, and how a temporary incentive might undermine intrinsic motivation. In the last post, we heard from educational policy advocate Alfie Kohn on his prediction that paying kids to do schoolwork would backfire, particularly when the incentive ended.

Perhaps the most comprehensive experiments to test these ideas in a real-world setting were done by Roland Fryer, as reported in his 2011 paper. He describes his research on education in the video below (fascinating throughout, but the student incentives discussion runs from around 17:00-24:00):

Overall, his results suggest little effect, positive or negative, of paying schoolchildren for their performance (e.g., for getting high test scores). However, he also conducted two large-scale randomized in-school trials that instead tested rewards for their efforts, for the underlying behaviors that could foster success in school. In Dallas, second-graders randomly assigned to the treatment condition were paid $2 for each book they read and passed a quiz on.  In Washington D.C., sixth through eighth graders in the treatment condition were paid for other educational inputs, including attendance, school behavior and handing in homework.

Paying kids to read books yielded a significant improvement in the Dallas students’ reading comprehension, a marginal improvement in their language scores and a positive but insignificant effect on vocabulary scores. In Washington D.C., the incentives yielded a marginal improvement in reading and a non-significant improvement in math.

The key question is what happened when the incentive program ended?  The psychological theories we’ve been discussing would predict that the kids would be worse off. Having lost their own motivation to engage in the behaviors and no longer having the external tangible incentive, motivation would be reduced, harming outcomes.  Instead, Fryer found that the positive effects were reduced by half and were no longer statistically significant after the reward ended.

To put it another way, the benefits do seem to fade when the incentive ends, but there is no evidence that the outcomes are worse than if the incentive had not been offered in the first place. This is not an isolated finding.  Three other studies with older students (high school: Jackson 2010, college: Angrist et al 2009 and Garbarino & Slonim, 2005) actually find (some) positive effects of education incentives that significantly persisted after the incentive has ended.

In his dissertation, Indranil Goswami reviewed 18 field studies across a variety of domains (including education, smoking cessation, weight loss, gym attendance, medical adherence and work productivity).  These studies all measured people’s total behavior in a period after the incentive had ended and found either no long-term effect or a modest positive effect. Not a single study found that people had worse outcomes when an incentive was offered and then ended, than if the incentive had never happened.

So where was the long-term harm that the predicted loss of intrinsic motivation from the incentive would have caused?

The evils of incentives.

[2d post in a series; start with the first post here]

Yesterday, I talked about research on incentives, and how a temporary incentive might undermine intrinsic motivation. This view has had a major impact on policy regarding incentives, particularly in relation to children and education.

Alfie Kohn has published several books on the topic, including “Punished by Rewards: The Trouble with Gold Stars, Incentive Plans, A’s, Praise, and Other Bribes.” Here he is explaining to Oprah and TV viewers everywhere, why rewarding kids is a very bad idea.  He presents the main idea in the first three minutes:

 

In the interview, he says:

“One of the findings in psychology that has been shown over and over again — the more you reward people for doing something, the more they tend to lose interest in whatever they had to do to get the reward.”

He then goes on to specifically talk about grades as problematic incentives.  In his book, he goes even farther, saying that verbal praise is coercive and should be avoided because it contains an implied threat to withhold praise in the future.

But looking back at Deci, Koestner and Ryan’s meta analysis which I discussed yesterday, verbal rewards had no negative effect on children’s subsequent motivation, and even tangible rewards had no post-reward effect when the reward was unexpected. So, this seems to be over-selling the impact of temporary rewards that had been found in the literature.

The show does an experiment of their own that yields a somewhat implausibly strong effect. Kohn then characterizes this result first in terms of an inference-based theory of intrinsic motivation, before circling back to the control vs. autonomy account.

“If the kid figures they have to bribe me to do this, then it must be something I wouldn’t want to do, so the very act of offering a reward for a behavior signals to somebody that this is not something interesting.”

What he’s talking about here is the “overjustification hypothesis” of Lepper, Greene and Nisbett (1973). However, most of the experiments in which this has been tested were with young children (for example, pre-schoolers in the article above).  There’s something a bit odd to me about the idea that the teenagers on the show would not be able to judge how interesting the task was on their own, without making those kinds of more remote inferences.

A schoolteacher comes on and talks about a rewards program (“Earning by Learning”) her school uses to motivate reading that she thinks works. Kohn is skeptical and raises three issues — whether the kids are choosing easier books just to get the rewards, whether they comprehend what they’re reading and the issue we’ve been discussing, whether motivation will persist when the program is over.

Nevertheless, what we’re left with is the admonition that in the long run, rewards not only don’t work, but will harm motivation. As Oprah says, “You have to change the way you think about parenting!” But notice how far we are now from the original studies. Next, we’ll look at some more direct tests of this idea.

What happens to motivation when incentives end?

If you’re temporarily paid to do something, would that change your motivation or interest in doing the same thing when you’re not paid to do it anymore? Indranil Goswami investigated this long-standing question for his dissertation with me, which I’ll get to soon.  But first, some background.

Psychologists and economists have long debated the effectiveness of incentives.  From the viewpoint of economics, it’s simple, almost definitional.  Economics is fundamentally about how incentives shape human behavior. Much of the empirical research in economics is about how incentives –overt, hidden, and even perverse — influence and explain people’s behavior. While this view can be summarized simply as “incentives work!”, identifying what the incentives are can be tricky and open to debate, and the definition of what constitutes an incentive has been broadening.  Andreoni , for example, introduced the idea of a “warm glow” (or good feeling) that a person may get from donating to others as an incentive that can explain altruistic behavior.

Psychologists tend to think in terms of internal mental processes and motivators, and have historically been skeptical of external incentives, seeing such incentives, particularly monetary incentives, as impure and interfering with people’s true or “intrinsic” motivation.

Which brings us to one of my favorite papers, a comprehensive review by Deci, Koestner and Ryan (1999) of how incentives affect intrinsic motivation. They look for experiments that tested how motivated people are to do a task without compensation, after they had been temporarily paid to do the task. In the paper, they painstakingly gather up research findings, including unpublished studies in order to deal with publication bias, categorize the differences in experimental methods and then summarize the average findings (using meta-analysis).  Their main results are summarized below (from p. 647 of the paper):

dkr99

What they’re looking at here is the “free choice paradigm”, where people in a lab study are paid to do an activity (such as drawing), and then are put in a situation where they could do more of the activity or do some other activity, with no compensation for any of the options. Their decision whether or not to do more of the activity is compared with the same decision among people who were never paid for the activity in the first place (a control group).

Based on 101 such studies, it looks bad for incentives (d= -.24, at the top of the graph). People paid to do the activity then do less of it when the payment is no longer available than if they had never been paid in the first place.  From a classical economics perspective, this may appear weird – if you like drawing, then you should draw, whether you were previously paid to draw or not. To many psychologists, the reason seems clear: paying people changed how they viewed the activity, undercutting the intrinsic motivation that made it fun in the first place.

As the figure illustrates, the experiments vary a lot, and so do the results. Verbal rewards (i.e., praise) has positive effects on subsequent motivation (the opposite finding), at least for college students. The negative effects are driven by tangible rewards (e.g., money), in situations where people are paid conditional on something: trying the activity, completing the activity, or achieving a particular performance in the activity.

What does this mean?  The proposed theory centers on feelings of autonomy: people do things in part to feel good about having done it themselves,. When someone else comes in and provides a conditional reward, it eliminates the ability of the activity to provide the autonomy benefit.  And here’s the key: this is assumed to be a long-term change in how the activity is perceived and experienced.  As a result, there’s a risk to using incentives.  As they warn in the paper:

“if people use tangible [i.e., monetary] rewards, it is necessary that they be extremely careful… about the intrinsic motivation and task persistence of the people they are rewarding” (p. 656)

I’ll look more closely at this implication this week, including our new data.

The Costs of Misestimating Time

Indranil Goswami and I have a new working paper up.  We were interested in the consequences of misestimating time, and looked at contract choices.  We have some participants serve as workers, doing a task, like solving jigsaw puzzles.  The workers are either paid a flat fee, or are paid for the time they spend on the task. We have other participants serve as managers, making choices between hiring a worker with a fixed fee contract or a per-time contract.

We find that the “managers” generally prefer the fixed fee contracts, even though the per-time contracts are actually more profitable. Prior research has also found a ” flat-rate bias” in contexts such as gym memberships and service contracts. Years ago, before I knew about any of the research, I also stumbled across it in the results of a conjoint marketing research study I worked on for a telecommunications company. This puzzled all of us at the marketing research firm, accustomed to thinking of consumers as simply hunting for the best deal.

One explanation is that flat rate deals provide a kind of insurance — even if they are more expensive on average, they eliminate a costly worst-case scenario. A more recently proposed explanation is that people don’t like having to feel like each additional bit of consumption is costing them. When you’re texting your friends, you want to just enjoy texting your friends, not do a cost-benefit analysis of whether each text is actually worth the cost.

In our context, the culprit turns out to be different — misestimation of how long the workers will take. We find that the managers choose the flat-fee mainly when their own time estimates suggest that the flat-fee would be a better deal. It doesn’t seem to be about insuring against the worst-case scenario of an expensive, slow worker, because they are much less interested in the certain option, when given a choice between a certain amount and a gamble, constructed to be equivalent to their contract choice.

The best evidence that it’s about misestimating workers’ time comes from time limits.  We give the workers either a short time-limit or a long time-limit to complete the task. The contracts are set up such a way that the per-time contract is a better deal than the flat fee in both cases, but the advantage of the per-time contract is even stronger when the time limit is longer.

So, based on just the incentives, our “managers” should be less likely to choose the flat fee contract under the long time-limit. But instead, more of our managers choose the flat fee contract under the long time-limit. Why? Because under the long-time limit they also over-estimate how long the workers will take to a greater degree than under the short time-limit.  This turns out to be a very robust finding, observed with different kinds of tasks and among participants with management experience.

I think this may hint at something broader about the eternal battle between “carrot” and “stick” philosophies. Managers often have strong views about whether they should be creating a hospitable environment in which workers can unleash their creativity and productivity, or creating a tightly controlled environment to prevent overspending and inefficiency, recognizing that the two are at least somewhat incompatible.  Those views may often be based a lot more on personal philosophies than on being well-calibrated to the optimal strategy in a given setting.

 

Teaching resource using open data

I previously mentioned Christopher Madan’s article on open data in teaching.  He points to the Open Stats Lab at Trinity College, which I hadn’t heard about before. They create statistics exercises, suitable for an undergrad statistics or psychological research methods course, based on open data from papers published in Psychological Science. Each exercise has the original paper, a dataset, and a brief (1-2 page) description of the analysis to be done.  They have multiple examples for each topic, including significance testing, correlation, regression and ANOVA.

It’s very nicely done, and the contexts of the analyses (i.e. the research questions from the original papers) are interesting, which I think helps make it engaging for students. There’s flexibility in the exercises for the students to think about which variables to use, and to formulate their own interpretation. My one quibble is that the writeups seem aimed at the student, and I think it might be useful to have a separate document with a few pointers (e.g., a “teaching note”) for instructors. In particular, if I were using one of the correlation exercises (particularly the example on Ebola and voting), I would want to have a thorough discussion in class of what we can and cannot conclude about causation (including how the original paper tried to address causation) and how to think about alternative explanations for the observed correlation.

Psychological Science, the source of the data, has a voluntary data disclosure policy, offering “badges” to published articles that make their data public.  This is a relatively gentle “nudge” towards open data, but it seems to be having an effect.  Nearly a quarter of papers after the policy was instituted made their data public, compared to 3% before. If the Open Stats Lab’s efforts (and others like it) are successful, it provides a nice additional incentive to authors of making their data public. After all, who wouldn’t want their research becoming a “textbook example” used in statistics classes?

 

The value of open data for teaching

Christopher Madan has a nice article on the usefulness of open data (i.e. making public the underlying data for a research article) for developing teaching materials. In business schools, MBA teaching relies heavily on cases, particularly Harvard Business Review cases. Since I started teaching, I’ve been puzzled that almost none of the cases provide data that can be statistically analyzed.  Of the few that do, as far as I know, the data is simulated or at least doesn’t claim to be the actual data. This seems like an odd way to teach students about the increasingly analytics-based practice of making business decisions.

For the past few months, I’ve been developing an MBA course on experimental methods (think online A/B testing, test-markets, in-store stocking experiments, direct-mail tests, advertising and communications experiments, etc..). After not finding suitable cases, we started writing cases based on published field (not lab) experiments.  The catch was that I wanted field experiments with publicly available data, so that the students could go through the data analysis process themselves, using real data.

I’ve found some very nice examples to develop into cases, but I was surprised at how difficult it was.  Even in those research journals which require (or at least strongly encourage) making data public, the data for many papers had restrictions or were completely unavailable due to the data being proprietary. Often this is because of requirements that companies have before they will share data with researchers. I’m sensitive to companies’ concerns about how their data might be used if made public, of course. But the benefits of open data (and the value of papers that make their data public) go far beyond just checking up on the authors’ analyses.

 

 

A textbook case: Definitely true. Maybe.

Yesterday, I discussed a recent paper which looked at whether introductory psychology textbooks promote or correct “myths” about psychology.

A few years back, I came across an interesting book published in 2009, specifically about correcting myths in psychology, 50 Great Myths of Popular Psychology, by Lilienfeld et al. They discuss misconceptions about how much of our brains we use, hypnosis, polygraphs, positive attitudes and disease, self-esteem, criminal profiling, expert judgment and so on. While some researchers are sure to disagree with their characterizations, at a minimum their critiques suggest substantial caution in accepting the claims as facts.

At the end of the book, in a postscript, they list 13 findings that they characterize as “Difficult to Believe, But True” (p. 248-250).

The irony?  One of the findings they list as true is implicit egotism, the theory that people are more likely to select options in major life choices (including professions, locations and spouses) that have similar names to their own name (i.e., “Dennis the Dentist”). Much of the evidence for this, however, has been found by Uri Simonsohn, in his 2011 paper “Spurious? Name similarity effect (implicit egotism) in marriage, job and moving decisions” to be replicable, but accounted for by other explanations.

Another on their list of true findings? The finding that holding a warm object makes people feel warmer to others, which other researchers recently have not replicated.

This is not to say that implicit egotism or social-warmth priming are now known to be false. Perhaps subsequent research will revise the currently negative prognoses of these effects.  But it is telling, I think, that even in a book about skepticism towards psychological theories, at least two of thirteen findings were oversold as being known to be true.

There has been a lot of discussion about how to change the publication process to try to make individual papers’ findings more reliable. This is definitely important.  But perhaps it’s just as important that we lower our expectations about what any one paper will achieve.  In most cases, we simply can’t adopt the conclusions of a paper until they have been subjected to critical debate and replication attempts, direct and conceptual, especially by those skeptical of the theory. The less enthusiastically a field supports such debate, the longer it will take until we can reliably consider a finding well-established. Until then, every finding is definitely true. But only maybe.

A textbook case.

A recent paper surveys coverage of famous conclusions and examples in psychology that are the topic of active debate, or that are outright incorrect.

Perhaps the most famous case they discuss is the Kitty Genovese bystander effect example. This 1964 murder case was once considered a classic example of the bystander effect, people in groups not taking action because they assume someone else will or should take action. However, the truth turns out to be more complicated, with fewer witnesses than assumed, neglected calls to the police and some questionable journalism.

The paper documents inaccurate coverage of some of these topics in introductory textbooks, particularly media violence, stereotype threat and the bystander effect example. They discuss these inaccuracies in terms of the desires to support favored views (e.g., an ideological bias) or a preference for simplicity and conclusiveness which would present psychological progress in a positive light.

“Aside from this, textbooks had difficulty covering controversial areas of research carefully, often not noting scholarly debate or divergent evidence where it existed. .. The errors on these issues were universally in the direction of presenting controversial research or scientific urban legends as more consistent or factual than they are.”

Much of social science is inherently noisy and even the most reliable findings are usually multiply determined. The impatient response to that is to downplay what we don’t know, sweep complexity and uncertainly under the rug, and prematurely declare hypotheses to be established theories and established theories to be scientific laws.

Perhaps it would make for unsatisfying textbooks if we paid more attention to  what we don’t know and instead discussed controversies in the literature. But one of the benefits that students could get from studying psychology is an accurate understanding of human behavior as complex and difficult to predict.