New York Primary: Why is Exit Poll Data Adjusted to Match Final Voting Results?

Tuesday night CNN projected a 52-48 Clinton win based on exit polling data at 9pm when polls closed in New York. Very similar numbers from ABC at the same time said voters by a 52-47 margin thought Clinton was more inspiring, a number you’d think would closely reflect how people voted. Clinton won the final reported tally by 16%, and by late night and early this morning, exit polling data available at CNN and elsewhere much more closely matched a mid-double digit margin for Clinton. Some of the turn arounds in terms of specific demographics were rather remarkable, especially since just 24 respondents were added to the relevant sample size.

Earlier, Clinton lead Sanders by 14% (57-43%) with Latina and Latino voters as I reported in my exit poll live blog. This was consistent with my 56-44% projection based primarily on the average of a half a dozen polls from the week and a half before New York voted. The final exit poll, however, shows Clinton doubling her lead to a 28% win with hispanic voters. Early reports suggested Sanders was winning 69-31% with voters under 45. Final exit polling shows him winning by just 10%, 55-45%, and included him losing the 30-39 year old demographic by 4%. Sanders has not lost 30-39-year-olds anywhere outside the South, including Ohio where he won with them by 18% but lost the overall vote by 14%.

From a statistical standpoint, this intrigued me.

Swings that radical, if the initial sampling is accurate in terms of size and randomness, are unusual but not out of the realm of possibility. My expectation was that those numbers would become more favorable to Sanders, not less. Sanders won rural upstate 58-42. Polls in rural upstate opened at noon, six hours later than polls in Buffalo and downstate, and just a handful of hours before initial exit polling was released. In Wisconsin, where I watched exit polling shifts carefully, younger voters voted later and stretched results more in Sanders favor.

I wanted to see what the initial sample size was. What I found was rather shocking.

As late as between 9-10pm eastern, the exit polls were still reporting similar numbers. John Aravosis wanted to prove that White Bernie-Bros are a real and measurable phenomenon. He took screenshots of CNN exit polling and posted them to his blog at 9:40pm last night. As of then, Clinton was leading Bernie Sanders with Latinos 59-41% with a 1367 sample size, just a 4% swing from earlier reporting. This made sense statistically, even if it wasn’t more favorable to Sanders as I expected.

Source: http://americablog.com/2016/04/sanders-exit-polls-new-york-hillary.html

But what happened next gets really weird. At that point, according not only to Aravosis’ blog, but also according to numbers I reported on my liveblog, CBS numbers still available as of this writing, and various Twitter users, Sanders was winning the 41% of the population 18-44 by a margin of 61-39% and was losing over 45-year-olds by the same 61-39% margin. These numbers are consistent with a 4 to 5 point Clinton win.

Here’s the deal, though. The sample size grew in the last two renditions of the exit pollingby just 24 respondents, first from 1367 to 1383 when I took several screen shots for my liveblog just after 11pm eastern and then to 1391 as of Wednesday morning. Over the same period, Clinton’s lead grew by 10% from 18% with Latinos to 28%. Her lead also grew by 10% among those 45 and over and shrunk by 12% with those under 45.  In exit poll version (2), Sanders lead with white people (59% of the vote) by 9%, in exit poll (3) by just 2%, and now with exit poll (4) it is tied.

This would be possible and reasonable with a very large growth in sample size, but, as you might imagine, is mathematically impossible without serious data fiddling in this instance. Sanders lead with the same sampling grown by just 1.8% dropped by 12% overall, by nine percentage points with men, by 12% with young voters, and by 9% with white voters. Meanwhile, Clinton’s lead with Latinx voters grew from 18% to 28% and with black voters by 2%.

Apparently, the last 24 respondents to exit polls yesterday were all Latina or black female Clinton voters over 44, and they were all allowed also to count more than double while replacing more than one male Sanders voters under 45.

To put this plainly: the numbers add up to 341 18-44-year-old voters for Sanders out of 1367 total respondents as of 9pm exit polls, version (2), that said it was a close race. By the next morning, the maximum number of Sanders voters 18-44 in the same data had dropped to just 313. Edison Research removed twenty-eight young white male Sanders respondents and has given no public explanation for the same. The initial overall exit poll, +4 or +5 Clinton, was outside the margin of error for the final result, Clinton +16 with 99.6% reporting.

I have attempted to contact Edison Research for a response. Yesterday afternoon, I was patched through to the voice mail of Joe Lenski, co-founder and Executive Vice-President of Edison. He has not responded and other calls and emails have also gone unanswered. I will update this piece if anyone from Edison responds.

Doug Johnson Hatlem writes on polling, elections data, and politics. For questions, comments, or to inquire about syndicating this weekly column for the 2020 cycle in your outlet, he can be contacted on Twitter @djjohnso (DMs open) or at djjohnso@yahoo.com (subject line #10at10 Election Column).