If you want to see where a little bit of your $833 billion stimulus went, head south from St. Louis on Interstate 44 until you reach the Mark Twain National Forest. On March 13, 2009, less than a month after President Barack Obama signed the American Recovery and Reinvestment Act (ARRA) into law, the federal government awarded $462,912.30 to a Spokane, Washington, construction firm called CXT Incorporated to build and install 22 “precast concrete toilets” in the park. 

These bunker-style commodes did not add to the number of bathrooms in the forest; they replaced existing toilets that didn’t meet Forest Service condition standards or accessibility requirements. And they were not just isolated outhouses. New Mexico got $2.8 million to spend on new toilets in its national parks. Another $42 million went to upgrading toilets and other sanitation facilities in Alaska.

The stimulus wasn’t sold as a plan to build bathrooms. “We’ll put people back to work rebuilding our crumbling roads and bridges, modernizing schools that are failing our children, and building wind farms and solar panels, fuel-efficient cars and the alternative energy technologies that can free us from our dependence on foreign oil and keep our economy competitive in the years ahead,” President-elect Obama said in a November 2008 address. The stimulus, Obama vowed, would “put people back to work and get our economy moving again,” creating between 2 million and 2.5 million jobs. Instead, the economy followed the money right down the drain. 

What went wrong? Plenty. The stimulus was rushed to passage based on economic assumptions that remain hotly contested. Its implementation was marred by politics, logistics, and red tape. And the aid it directed toward the country’s least well off may have undermined the very recovery it was designed to hasten. This is what happens when politicians insist that something big must be done, even if they’re not sure what that something should be.

The Rush to Stimulus

The march to the stimulus began on the 2008 presidential campaign trail. “Instead of doing nothing for out-of-work Americans,” Obama said in April 2008, “we need a second stimulus that extends unemployment insurance and helps communities that have been hit hard by this recession.” Obama framed his call for stimulus as a follow-up to the $152 billion tax rebate George W. Bush signed into law in February 2008. That plan cut most Americans a $600 check. 

Candidate Obama called for something more proactive: Washington-directed, socially conscious spending on education, alternative energy, and transit projects would replace the usual Republican prescription of tax breaks only, allowing government to “grow the middle class by investing in millions of new green jobs and rebuilding our crumbling infrastructure.” It was more a  grab bag of longstanding liberal wish list items than a focused spending injection tied to a specific economic theory. 

Soon after Obama won the presidential election in November 2008, his advisers spent a day walking him through the ugly economic realities of the recession. One of the presentations came from Christina Romer, soon to be the head of the president’s Council of Economic Advisers. As Michael Grabell reports in his book Money Well Spent?: The Truth Behind the Trillion Dollar Stimulus, the Biggest Economic Recovery Plan in History, which this article draws upon substantially, Romer warned the president-elect of a chilling possibility: that America’s economy would plateau but struggle through a decade of weak growth, much like Japan. It was a warning that would prove unintentionally prophetic. 

Obama indicated he was willing to be flexible regarding the details of a stimulus plan, but he made one thing clear. “What is not negotiable,” he said, “is the need for immediate action.” Romer took the lead on designing the plan. 

Her recommendations had goals similar to those of Bush’s tax rebates: Boost output by injecting money into the economy to stimulate consumer demand, and therefore jobs and growth. The hope was to create a “multiplier effect,” in which each dollar of stimulus creates more than one dollar of economic activity through a virtuous feedback loop. 

But in addition to having the government rather than consumers spend most of the money, Romer’s plan differed strongly from Bush’s in one key respect: scale. It was several times larger than any stimulus proposed in 2008 by any prominent politician. On the campaign trail, Obama’s Democratic rival Hillary Clinton had proposed a $30 billion stimulus. Then-House Speaker Nancy Pelosi (R-Calif.) put together a $150 billion proposal. Some economists pegged the necessary amount closer to $300 billion, an estimate of how much it would take to get the economy back to its full potential—a figure referred to as the “output gap.” 

But Romer estimated that the amount needed to fill the gap was well north of $1 trillion. She pushed for a multiyear plan that would include a combination of infrastructure spending, increased funding for transfers such as Medicaid and unemployment benefits, bailout money for budget-hammered state governments, and tax breaks that would trickle out paycheck by paycheck over several years. 

Political considerations eventually knocked the price tag down to about $787 billion (a figure later revised upward to $833 billion), but it was still the largest and most ambitious recovery plan in the history of the world, putting the country’s already extended finances much more deeply in the red. Yet not only did the president’s economists not know if it would work; they might never be able to judge whether it had. 

That didn’t stop them from predicting success. In January 2009, Romer and Jared Bernstein, who would go on to be Vice President Joseph Biden’s top economic adviser, projected that without the stimulus unemployment would hit 9 percent and stay there for nearly a year, but that with a recovery plan unemployment would peak at 8 percent and drop below 7 percent within a year. Of the new jobs created, 90 percent would be in the private sector. Those projections were quickly revealed as fantastically optimistic: The unemployment rate would climb to 10 percent in October 2009 and hover near that level for another year, while millions of people simply stopped looking for work. 

Stimulus supporters still deem ARRA a success in forestalling another depression. But you can’t claim success unless you can measure it. And when it comes to massive economic interventions like the stimulus, that’s exceedingly difficult to do. 

Multiplier Madness

The key metric in assessing whether the stimulus was a success is the multiplier. Most would consider a multiplier of 2.0, meaning $1 of stimulus generates $2 in economic activity, to be a utilitarian triumph. A multiplier of 0.5, wherein $1 of stimulus leads to only 50 cents in economic activity, would be a bum deal. 

But multipliers are very difficult to gauge. Romer and Bernstein’s memo confessed to “substantial uncertainty” about both their choice of multipliers and (in a footnote) the number of jobs created by increased GDP. While researchers can easily determine what happens after governments make big changes to their spending patterns, pinning down cause and effect is trickier than it might seem. How much of the economy’s performance can be chalked up to its pre-stimulus trajectory, which turned out to be far worse in late 2008 and 2009 than economists believed at the time?

What you really want to know is what would have happened if there had been no government purchases at all. But macroeconomists cannot conduct the sort of controlled experiments that their counterparts in the world of microeconomics do all the time. 

“Ideally,” says Valerie Ramey, a University of California at San Diego economist who has surveyed numerous multiplier studies as well as performed her own research, “the International Monetary Fund would be allowed to go out and conduct a randomized experiment across all the countries, randomly raising government spending in some countries and randomly decreasing it in others. If you did that and watched it for several years, then you could use very simple statistics to try to figure out exactly the answers we’re looking for. Of course the IMF is not allowed to do that.” 

Fiscal policy researchers have devised a number of workarounds. Some have looked at how state economies perform when they get extra infusions of federal money. But it’s hard to extrapolate national information from state numbers. “The statewide multiplier is only loosely linked to the aggregate,” says Ramey. “And the aggregate, economy-wide effects are really what we want to know for stimulus.”

Stimulus packages typically come in response to recessions, but economists need to isolate the effects of government purchases from the effects of economic slumps. So some look for changes in spending levels that are not direct responses to bad economies—for instance, big military build-ups. 

This approach works, Ramey says, only “if you don’t worry about anticipation.” After all, people don’t merely react to the fact that the government has spent money. They also react to the expectation that the government is going to spend. “Individuals understand that even if the government is deficit spending now, it’s going to have to raise taxes later,” she says. As a result, people feel poorer and therefore act differently. 

Several years ago, Ramey attempted to measure the effect of anticipation. But how do you figure out what people expect from their government? “You can’t believe government documents,” she says, “so what I did was look at what business people were forecasting.” Ramey read every issue of Business Week starting from 1939, as well as multiple newspapers and other popular sources of information. The result? 

As soon as spending news is made public, Ramey discovered, consumption declines, and so do real wages, something that some prior attempts to measure the multiplier had missed. Essentially, the other measurements had skipped the pregame—and missed the economic reactions that started before the spending took place. 

After including the anticipation variable in her calculations, Ramey found that the multiplier for government purchases was somewhere between 0.6 and 1.1, meaning that at most each dollar of government purchases produced an extra dime of economic activity, while the worst-case scenario meant losing 40 cents on the dollar. (Elsewhere she has estimated the possible high end at 1.2.) Ramey says she suspects the multiplier might be higher if financed “purely with deficits.” But she also notes that “trying to estimate that effect precisely is very difficult.”

That hasn’t prevented Ramey from drawing some conclusions, which she summed up in a 2012 presentation: The government purchases multiplier is “probably” less than one, she said. And while government purchasing increases overall employment, it does so by increasing government employment—not hiring in the private sector. Ramey’s work has been convincing enough that in 2011 the Congressional Budget Office—the nonpartisan government scorekeeper that provides policy cost estimates for Congress—reworked its estimates of ARRA’s potential effects, reducing the lower end of its estimates.

Others economists have arrived at different multipliers. Surveying the literature, Ramey found that most estimates range between 0.8 and 1.5, and that the data could support conclusions ranging from 0.5 to 2.0. She also found that the variation across studies was nearly as large as the variation within studies. The wide range only underscores how difficult it is to determine a single stimulus multiplier with any degree of confidence.

It also suggests the limits of even a relatively high multiplier. Two dollars of output for one dollar of government purchases may sound like a pretty good deal. But even at double your money—and it is your money—federally funded stimulus still isn’t likely to pay for itself.

Why? Because the government has two basic sources of revenue: taxes and borrowing. Paying for a $1 trillion stimulus through tax receipts would require more than $2 trillion in economic activity, because the government does not collect 50 cents on every dollar of GDP. And since virtually all of the $1 trillion is effectively borrowed, the cost of debt service makes the math even less likely to work out.

There was much discussion of multipliers in the run-up to ARRA, and the subject has provided fodder for squabbles among economists ever since. But once Congress passed the stimulus package, there was a new policy question: Exactly what should the administration spend $833 billion on? 

Making It Work or Make Work?

Concrete toilets were only the beginning. A 2010 report from Sen. Tom Coburn (R-Okla.) offered a tour of stimulus spending absurdities. Lowlights included $760,000 for interactive dance software, $1.9 million for international ant research, $550,000 to replace windows at a Forest Service visitor center that was closed, $16 million to help airplane manufacturer Boeing clean up an old environmental mess it had made, $700,000 for behavioral research into how monkeys respond to inequity, and $194,000 to study voter perceptions…of the stimulus package. 

Coburn’s report demonstrated that many stimulus projects were easy to ridicule. But there was a deeper problem: ARRA was sold as a way to create millions of jobs right away. Yet it turned out to be surprisingly hard to find projects that were planned and ready to go—and that would actually hire the unemployed. 

In 2011 Garett Jones, an economist at George Mason University’s Mercatus Center, and his colleague Daniel Rothschild collected questionnaires from more than 1,300 businesses that had received stimulus funding. The idea was to find out how the businesses actually used the money they got. Did they hire? Who did they hire? What sort of projects did they work on? 

Jones and Rothschild found that the stimulus did spur some hiring. But there’s a big difference between hiring and hiring the unemployed. A business can add 10 people to its staff but directly reduce unemployment by only two if eight of the hires already had jobs. As it turns out, nearly half the new employees reported in the survey as paid with stimulus funds were previously employed. 

Low-skilled and unskilled workers were among the hardest hit by the recession. But little effort was made to target those workers for stimulus-funded work. Indeed, many of the projects that made the cut required highly educated employees. In Money Well Spent?, Grabell reports that projects funded by the stimulus included computerized health records, electricity grid upgrades, education databases, climate change studies, environmental cleanup, the purchase of solar panels and lithium ion batteries, anti-smoking campaigns, carbon capture and storage facilities, and prescription drug research. 

“The government targeted its stimulus at sectors of the economy where it was hard to find good workers,” Jones tells me. “The reason why is that it was hiring for highly specialized parts of the economy. That’s why there was less job creation than you would have expected.”

Even some stimulus success stories come out looking worse when judged by their effect on unemployment. In addition to the written survey, Jones and Rothschild conducted detailed interviews with 85 businesses that received stimulus funds. The owner of one construction engineering firm said that if not for ARRA, his firm would have closed. Instead, it was thriving, with 20 new employees. But only six of the new hires were previously unemployed. The rest had been hired away from other firms. 

In such games of employment musical chairs, stimulus might simply increase wages for those who already have jobs. Where unemployment is already low, government spending will just bid up the price of labor. “That’s a broader lesson for stimulus nowadays,” says Jones. “A lot of the things that the government would like to do involve hiring in sectors of the economy where it’s really hard to find good help on short notice. There just aren’t that many unemployed environmental engineers around. The best person for the job probably already has one.” 

Just as difficult as finding good workers was navigating the law’s bureaucratic requirements. A “Buy American” provision included in the stimulus legislation at the behest of unions and steel workers made projects more expensive and harder to get off the ground. Grabell quotes the owner of a Texas pipefitting company who said in a U.S. Chamber of Commerce report that the provision was “a paperwork nightmare…causing huge delays…and stalling otherwise viable projects.” 

The stimulus overseers faced a tradeoff: They could try to spend money fast, or they could try to spend it well. But it was difficult to manage both. That was a political problem for the White House, because it had promised to be vigilant in fighting ARRA waste. “The president and I can’t stop you from doing some things, but I’ll show up in your city and say, ‘This is a stupid idea,’ ” declared Vice President Biden, the administration’s designated stimulus watchdog, at a March 2009 conference. Yet Obama administration officials had also insisted that the public would be able to see the benefits of the stimulus immediately. And if they didn’t spend the money, there would be no benefits to see. 

So what did they spend it on? Jones and Rothschild offer a hint. The pair interviewed one contractor with two decades of experience laying tile in government buildings. He made plans to install standard blocks of four-inch white tiles—the same tiles he usually installed, the same tiles found in other parts of the same office complex, and the exact materials called for in the architectural plans. Then he got updated specs. The large white tiles were out. Tiny, colored tiles that needed to be laid in an unusually intricate pattern were in. Did it matter that the smaller tiles would cost the government 50 percent more than the larger white tiles? Not at all. In fact, the higher cost may have been the point. The tile layer told Rothschild’s interview team that “the only reason he could see for using the smaller tiles was to move the money out the door on the ARRA schedule.”

Was this sort of thing a worthwhile contribution to the health of a nosediving economy? It’s a question the administration never truly answered—because it couldn’t. 

Numbers Games 

“According to the non-partisan Congressional Budget Office,” says Recovery.gov, the Obama administration’s stimulus website, “the Recovery Act supported as many as 3.5 million jobs across the country.” As the stimulus ran its course over roughly three years, the capital’s top newspapers kept printing similar, supportive-sounding figures from the budget office. “CBO Says Stimulus May Have Added 3.3 Million Jobs,” a Washington Post headline trumpeted in 2010. “CBO: Stimulus Added Up to 3.3 million Jobs,” declared a Politico headline in 2011. Senate Democrats touted the estimates as proof of ARRA’s success. So did the vice president.

When the stimulus passed, the White House promised more than just results; it promised accountability. “If the verdict on this effort is that we’ve wasted the money, we built things that were unnecessary, or we’ve done things that are legal but make no sense, then, folks, don’t look for any help from the federal government for a long while,” Biden told local government officials at a conference on stimulus spending in March 2009. The administration set up Recovery.gov to help collect data, track stimulus spending, and see how many jobs it created.

ARRA also tasked the CBO with issuing quarterly re-ports estimating both the size of the boost the stimulus had given to the economy and the number of jobs it had created or saved. When the CBO put together its estimates, it ran into the same trouble that every economist attempting to measure the impact of government spending eventually faces: the lack of a counterfactual. There was no way around it. In a November 2009 report, the agency flatly declared that “it is impossible to determine how many of the reported jobs would have existed in the absence of the stimulus package.”

Impossible, yes, but the CBO was still required by statute to produce a progress report every quarter. So rather than attempt to measure the law’s output—the actual number of real-world jobs created or saved—the CBO measured inputs: how much money was spent and on what. Once the CBO knew how much money had been spent, it combined that number with an estimate of the multiplier. Because it relied on a range on multipliers that went relatively high—up to 2.5 for federal purchases—the results were quarterly reiterations of estimates the agency made before the law was even passed. (The White House Council of Economic Advisers also made estimates using a similar technique.)

The CBO estimated that the stimulus created or saved up to 3.6 million jobs. But CBO Director Douglas Elmendorf has also noted that if the real-world results were different —if the law created 5 million jobs, or if it created none at all—the agency wouldn’t know. At a March 2010 presentation, Elmendorf characterized the CBO’s follow-up reports as “repeating the same exercises we did rather than an independent check.” At the same event, Elmendorf was asked, “If the stimulus bill did not do what it was originally forecast to do, then that would not have been detected by the subsequent analysis?” His response: “That’s right. That’s right.”

The Wrong Incentives

So what did the stimulus actually do? To a great extent, the answer to that question is a matter of faith. Those who believe that the multiplier is high think that the stimulus created a significant amount of economy activity and jobs to go with it. Those who believe the multiplier is low generally conclude the opposite. One thing is clear, however: The economy’s performance continues to be far worse than the White House’s worst-case projections for what might happen if there had been no stimulus at all. Beyond that, anyone who claims to know with certainty how many jobs the stimulus did or didn’t create is just bluffing. 

Yet while it’s difficult to determine ARRA’s effect on the overall economy, it is possible to examine the economic incentives it created. That’s exactly what University of Chicago economist Casey Mulligan has tried to do. 

Mulligan’s work starts with a simple insight: Not all of the spending authorized by ARRA was stimulus, at least not as economists usually define it. Some of the program’s $833 billion went toward visible public works projects like paving runways and building toilets. But about a third of it went toward what economists call “transfer spending,” which includes aid programs for the unemployed and other low-income individuals. Among other things, the stimulus extended an existing program to provide federal unemployment benefits to those who had exhausted state benefit programs. It added a temporary bonus of $25 per week to federal unemployment benefits, suspended taxation on the first $2,400 of unemployment benefits in a given year, allocated $87 billion to increase the federal share of Medicaid benefits, and provided another $25 billion to subsidize health coverage for the newly unemployed. 

Such programs provided an economic cushion to millions of out-of-work people at the height of the recession. They also made it less painful to be unemployed—which is to say, they made it less costly. “When you make something less painful,” says Mulligan, “people are going to do less to avoid it.” 

What Mulligan found is that the 2009 stimulus created big incentives for people to not work. He estimates that between 2 million and 3 million people “had as much disposable income while unemployed as they would have by accepting a job that paid 80 to 100 percent” of what they were making in their previous job. Before the stimulus passed, that would have been true of fewer than 1 million people. 

It’s clear that the incentives changed. But did people respond by behaving differently? Mulligan thinks at least some did. “The effect of incentives on unemployment is something we” —we being economists—“have studied for a long time,” he says. “And the effect is clear. When people are paid more for not working, they work less.” Even, he says, during a recession.

Individuals receiving unemployment benefits return to work less quickly than those who aren’t getting benefits, Mulligan says. He also points to his own research showing that, even since 2007, unmarried people stay out of work much longer than married people. The reason, he argues, has to do with “the fact that there’s a bunch of safety net programs that married people don’t qualify for. They’re getting less help from the government when they lose their jobs. And so they’re quicker to get back to work.” 

Getting people back to work was, of course, the whole point of the stimulus. In January 2009, just days before his first inaugural, Obama put pressure on federal lawmakers. “I urge Congress to move as quickly as possible on behalf of the American people,” he said. “For every day we wait or point fingers or drag our feet, more Americans will lose their jobs. More families will lose their savings. More dreams will be deferred and denied. And our nation will sink deeper into a crisis that, at some point, we may not be able to reverse.” 

But four years later, the unemployment rate sits at 7.9 percent. The work force participation rate is lower than at any time since 1981. In the fourth quarter of 2012, the economy grew by just 0.1 percent. Overall growth in 2012 was just 1.5 percent. The CBO now predicts slow growth in the coming year and an economy that will remain below its potential until 2017. Sure, the Mark Twain National Forest has new toilets, and there are repaved runways and roads aplenty. But the looming worry is that we may have hit just the sort of decade-long weak recovery that the Obama administration was seeking to avoid. Perhaps it would have been better to do nothing at all.