Why the Polling for Clinton Versus Sanders in New York is Mostly Garbage

A wedding ring or a regularly used set of keys or other items of more-than-marginally-important value have found their way to the bottom of the kitchen garbage. There really isn’t much choice but to plug your nose best you can, plunge your hand into the gross, and sift around for the items that matter. As in Michigan, the polls showing Hillary Clinton with a huge 10-20 point lead versus Bernie Sanders are sloppy, slimy, and smell like a month worth of leftovers finally rubbished.

Previously, I’ve published a scorecard showing just how ridiculously bad most pollster and forecasters have been, especially outside the South this cycle.  The polling in New York, as is so often the case, is prone to 1) radically underestimate likely voter turnout for 18-44-year-olds, especially those in the 18-29-year-old demographic 2) pretty clearly misreading the electorate outside of major urban centers (in this case New York City and its suburbs) and 3) sometimes make rather glaring errors for which almost no one of any influence is willing to call out.

All nine pollsters active in New York on the Democrat have run badly afoul in one or more of these three areas. Rather than sorting through all of the ugliness in front of you, let’s drag out representative examples from each area.

Bad Age Splits
I’ve previously discussed the infamous Emerson poll from several weeks ago showing Sanders down 48 in New York. Things are somewhat better in more recent polling, but still clearly wide of the mark. My pollster scorecard suggests that in-state or in-region university pollsters have fared much better than commercial outfits. Siena College has a good rating from FiveThirtyEight for previous cycles and has released one of the more Sanders friendly polls. It shows him down just ten points.

But what are the age splits for the Siena poll?

15% 18-34-year-olds
31% 35-54-year-olds
50% 55+

The problem is this: those numbers are far lower for millennials then the 18-34-year-old vote was in 2008 in New York for Obama versus Clinton. In virtually every contest to date in 2016, the share of young people voting has increased measurably and dramatically for Sanders versus Clinton. (Meaning, by default, that the share of older people has decreased.) In 2008, 18-34-year-olds were 22-23% of the Democratic electorate in New York versus just 15% in Siena’s offering. While 18-44-year-olds would make up only around 30-31% of of the Siena poll, they made up 37% for Clinton versus Obama 2008.

By comparison, even in other closed primaries or places where Sanders has done quite badly, young people have turned out in droves. The 18-29-year-old share of the vote in Florida rose from just 9% in 2008 to 15% in 2016, more than 30% higher than where Siena pegs things at for New York. In Ohio, where the biggest schools were on Spring Break for the March 15 election, 15% 18-29-year-olds still turned out. The Siena numbers suggest they’ll be around 10%. Re-weighting Siena’s topline for age splits that look like Ohio’s, makes for a race too close to call, with a small Clinton edge. If the age splits look more like Wisconsin’s or Michigan’s, Clinton is in real danger of a significant loss. My final pick will likely split the difference between Ohio and Michigan numbers, similar to Illinois.

Misreading Voting Intentions Outside Major Urban Centers

Then there are the geographical issues. My friend Stuart Parker early on pointed out that pollsters were all over the place this cycle because they seem unable to read voter intentions outside major urban centers both in the South and outside the South. In the South, this meant badly calculating how rural and small to mid-sized city black voters intended to mark their ballots. Outside the South this has meant, for instance, badly estimating how people outside the Detroit-Flint corridor would vote in Michigan or how people outside of Milwaukee would vote in Wisconsin. My Facebook Primary (Adjusted) model has nailed those places where other polling has been badly wrong. Marquette University said the Madison area would go for Sanders by just 10%. The counties in the Madison media market almost all went for Sanders by 20-25% or more. Marquette suggested Clinton would eek out a narrow two point win in the Green Bay area. She lost it by an average of fifteen percent and Marquette’s overall pick for Wisconsin wrong by double digits even though they nailed what was happening in and around Milwaukee.

Once again, my model is showing that Upstate New York will go for Sanders quite big (I will be pegging his share of the vote north and west of Rockland and Westchester counties somewhere between 57 and 60%). Other than Emerson, which came closest to getting something right for a change in their 2nd most recent poll, every other polling firm showing its geographical splits has suggested that the race in Upstate is a virtual tie or, even, that Clinton has a modest lead. Maybe this will be the time my model is way off base. If so, Sanders is in real trouble.

Glaring Errors 

So far, the pollster with the most consistently reasonable age splits is CBS/YouGov. CBS’s polling practice is to take the age splits from 2008 and use them exactly to weight its polling outcomes. This means that they are off compared to where exit polling shows age splits to actually occur, but often not nearly as bad as other pollsters. So CBS has 15% 18-29-year-olds for New York, 22% 35-44-year-olds, etcetera. But in their polling from two Sundays ago, they made what seems to be a glaring error as caught by Bugei Nyaosi. Against the outcomes in every contest to date and every other polling firm in New York, CBS suggested that Clinton had a more than 30% lead with 30-44-year-olds. This was a stronger preference for Clinton, if this is even possible, than the 65+ crowd that CBS interviewed for the same poll. Did CBS fix this problem for their poll yesterday?

Nope.

They interviewed 60% the same people and, once again, are suggesting that New Yorkers in the thirty to forty-four age bracket will go for Clinton over Sanders even more so than retirees. In both instances, CBS also had Sanders down just 10%. Once again, if you fix that glaring error by, say, forecasting that Sanders will garner about the same or a little worse share of the 30-44-year-old vote that he has in previous contests, the race is suddenly a dead heat.

Fixed CBS

What Then?

I think the race is actually extremely close. I arrive at the conclusion from a number of different routes but will not publicly sort through the trash from the other five or six polling firms active in New York on the Dem side. There are hidden gems in them, particularly when various sub-groupings are averaged across polls. In my final forecast tomorrow morning, which hopefully won’t be a similarly fetid mess, I’ll publish what I take to be a reasonably likely model for what exit polling will look like when the dust settles in the Empire State.

Doug Johnson Hatlem writes on polling, elections data, and politics. For questions, comments, or to inquire about syndicating this weekly column for the 2020 cycle in your outlet, he can be contacted on Twitter @djjohnso (DMs open) or at djjohnso@yahoo.com (subject line #10at10 Election Column).