And as it was in the beginning, so shall it be in the end
That bullshit is bullshit, it just goes by different names
— The Jam
You may not recognize names like Amy Cuddy, Kristina Durante, or Brian Wansink but if you listen to NPR, watch TED talks, or read popular online news sites or local and national outlets such as the New York Times, you have probably stumbled across their work. They are among a growing number of academics who have produced one or more exciting, novel, too-amazing-to-be-true research studies that have caught the attention of the media and have been widely disseminated through American culture to the point that we may have internalized their findings as fact. Yet their work has since been debunked, shown to be unscientific and irreproducible. It is all part of what has been dubbed the “replication crisis” in science. Since replication is one of the basic tenets of science, failure to reproduce the results of a study (especially after several attempts) indicates a lack of support for the original findings. How does this happen time and time again, and what does it say about science and the news media?
Scientific research is far from infallible. While the right-wing assault on science stems from an invalid, self-serving, financial core, where money-making trumps all truth or reason, that attack has, in turn, rendered scientific endeavors sacrosanct to much of the left. But neither ideological side speaks honestly or accurately about the complex and nuanced nature of scientific research in practice. While the names mentioned above all perform research in the social (“soft”) sciences where the current reproducibility crisis runs amok, blatant errors, misrepresentations, and deceptions occur far too frequently throughout all fields of science that purport to utilize the scientific method, with untold consequences for society.
Illustration of Bad Science
What if I told you that scientists have proven that the rising of the sun makes you drink orange juice? You would instinctively know that’s not true and would probably question what the heck is going on in “science.” But academics purporting to work under the scientific method using statistical inferences have been making erroneous assertions just like that all too commonly and publicly, and an uncritical news media promote their false information.
Of course, drinking orange juice and morning breakfast-time are, or at least used to be, correlated in American households from around the 1930s through the present. This was in a large part due to the overproduction of oranges, the fear of vitamin C deficiency, and marketing by orange juice manufacturers. But as most people know, correlation does not equal causation, and correlation may not even mean much of anything at all if your are not clear about all of the other possible contributing, confounding, or explanatory factors involved in the relationship, such as the three I mentioned. While the error in this example seems obvious, similar errors which render research conclusions null and void are too common among many high profile studies.
Case 1 – Amy Cuddy
Amy Cuddy’s famous study on how an assertive “power pose” could elevate testosterone levels and increase a person’s confidence and risk-taking was published in the prestigious Psychological Science, one of the top journals in that field. Then a professor in the Harvard Business School, Cuddy went on to give the second most-popular TED talk ever, sign a book deal, and travel around the world commanding huge fees on the lecture circuit based on the general theme of her study. In the meantime, other skeptical researchers Joe Simmons and Uri Simonsohn questioned the veracity of her claims and Eva Ranehill and collegues failed to replicate the results of the study. One of Cuddy’s co-authors, Dana Carney, has since withdrawn her support of the study, saying “I do not believe the effects are real.” But Cuddy, having voluntarily left her academic position, still stands by her work.
In truth, not only is the power pose study a replication failure, it is a failure of peer review. No one needs a particularly specialized expertise to see some of the problems with the study. One glance at the methods section of the paper and you see the sample size of 42, hardly sufficient or statistically powerful. In addition, like in many studies, specific subjective proxies were used to indicate a much more general, supposedly objective, finding. Here, risk taking was measured by participants’ willingness to perform a certain gambling task. Yet one’s interest in gambling is not necessarily directly proportional to one’s interest in other risky activities. Further, participants’ levels of confidence were self-reported on a scale of 1-5. Self-reporting is always error prone, because your level of “2” may not be equivalent to my level of “2.” And yet, all of these subjective measurements are treated as concrete quantifiable data. Finally, the study assumed no cultural differences; demonstrations of power or confidence might not be viewed as beneficial and positive as they are assumed to be in the American culture.
You can see how the reliability of the study deteriorates under scrutiny. But no study is perfect. One of the biggest problems with this study and many similar ones is not just how unreliable the results are, but that the results are treated as generalizable to everyone everywhere. If Cuddy had defined the results as provisional and contingent upon certain assumptions, and circumstances, then her research might have been more defendable, but instead she presented her shoddy science as universal immutable fact. This practice appears to be too widespread.
Case 2 – Kristina Durante
Kristina Durante has received public attention due to her research on the correlation (which she mistakes for causation) between women’s ovulation cycles and other social phenomena. Perhaps her biggest claim to fame is the negative attention she garnered after her study proclaiming that women’s voting preferences correspond to their ovulation cycles was reported by CNN. The backlash from readers focused more on the sexism of the concept than the validity of the research, but plenty of critics took issue with the poor quality of the research methods, the false assumptions, the misplaced correlations, the lack of control for confounding variables, erroneous conclusions not based on the data, and on and on.
I can point to two immediate flaws in her work. First, she assumed voters would only vote for Romney or Obama rather than any other candidate or no one at all. Second, she could not have possibly assessed how fertility or ovulation affects voting choices unless she looked at all of these women throughout their monthly cycles and determined that their voting preference changed during ovulation. Those are only a couple among the myriad errors in the paper; yet, like Cuddy’s, it was published in Psychological Science and the journal stands by the work.
Durante feebly attempted to defend her work as well, but her attempts did not hold water with much of the scientific community. Yet, she continues to conduct the same poor-quality research on ovulation at her new university, Rutgers. Her website states that, “her work integrates knowledge from biology with diverse areas of psychology and marketing,” but it is not clear where her knowledge of biology comes from, because she holds no degrees in any sort of natural science.
Case 3 – Brian Wansink
The most recent academic under fire is Brain Wansink, who achieved moderate fame over the past several decades as a food researcher, calling himself “the Sherlock Holmes of food.” His Food & Brand Lab at the Cornell business school examines consumer behavior with regard to diet. His work sparked a $22 million dollar program in public schools called Smarter Lunchrooms and he was appointed to the USDA Center for Nutrition Policy and Promotion under George W. Bush. Some of his claims include that having a fruit bowl in your kitchen corresponds lower body weight, sitting by a window in a restaurant is correlated with healthier food intake, and cereal box characters’ eyes are angled toward children to appeal to them in the grocery aisle. He wrote two popular diet books and his studies have been continuously, endlessly eaten up by the press. Now the press is covering his inaccurate and error-prone work.
The controversy for Wansink started when his (now-deleted) blog post about a highly productive graduate student outlined questionable research procedures and designs and produced an outpouring of confusion and shock within the academic community. Among the processes described were a number of suspect, unscientific practices. One such method, known as p-hacking, derives its name from the statistical probability value (p-value). In statistical hypothesis testing, most sciences choose a p-value of less than 0.05 to show that a result is significant, basically meaning that the result is not likely to have happened due to random chance alone. (A more precise definition can be found here.) But in practice, that p-value can be manipulated through poor methods, and results that seem significant could be just random signals derived from a lot of noise. The other practice, HARKing, refers to Hypthothesizing After the Results are Known. This means you pretended to test your hypothesis, but instead actually did the reverse, so your results are suspect.
Wansink’s blog post sparked a number of scholars to look more deeply at his prodigious body of work, and what they found was troubling. First, Jordan Anaya, an independent scholar, along with Nicholas Brown and Tim van der Zee, two graduate students in the Netherlands, analyzed the four publications described in the post. They found nearly 150 inconsistencies with the data. They went on to look into more of Wansink’s studies, along with research scientist James Heathers of Northeastern University. University of Liverpool professor Eric Robinson also took Wansink to task on the bad science behind the widely-adopted Smarter Lunchrooms program.
Using statistical and data analysis techniques and computer models, these critical investigators demonstrated that in over 50 of Wansink’s publications, the numbers reported simply did not add up or could not even be validly obtained; in short, some values had to be in error or were not real. They also found large areas of self-plagiarism (meaning the re-use of one’s own written material in more than one publication), which constitutes ethical misconduct. But Wansink’s work also suffers from many of the same issues seen with Cuddy and Durante’s: lack of consideration of cultural and socioeconomic differences, lack of control variables, erroneous assumptions, conflation of correlation with causation, and embellished claims, many of which could be noticed without any knowledge of sophisticated quantitative analytical skills.
So why weren’t Wansink’s research issues caught sooner? Actually, several people had raised some alarm bells previously, but few seemed to take note, In fact, the people behind the popular 87 year old American cookbook The Joy of Cooking found fault with one of Wansink’s previous studies concerning their recipes and were delighted when researchers confirmed their suspicions about Wansink’s sloppy studies. So it took a careless blog post and a few intrepid, unpaid critical reviewers to expose years of unsubstantiated scientific claims. This occurred partially because the evaluators relied on quantitative critiques rather than qualitative ones, the former of which are more valued in the current scientific culture.
As it stands, their voluntary work has drawn attention to the research problems in Wansink’s lab and has resulted in six retractions and 15 corrections of his research publications. Email messages obtained by Stephanie Lee of Buzzfeed indicate that Wansink knew about his meager methods, but continued to play with his data to obtain desired, often pre-ordained results, and used his in-house public relations mechanisms to spread his messages. Meanwhile, Wansink continues to publish and disseminate his more current results to the media and through speaking engagements while under academic investigation by Cornell.
The research irregularities noted above are not outright fraud as with the cases of Michael LaCour, Deiderik Staple, or Marc Hauser. If fraud were the main problem, then it might be easier to attribute the scientific problems to particular bad apples. But the examples are more illustrative of the less acute, insidious troubles within science that could rot it through to its core if not adjusted.
Unscientific research masquerading as science feels like it is rampant in the social sciences and in psychology in particular, where discussion of the replication crisis is most prevalent. But physician and statistician John Ioannidis indicates that even in the biomedical sciences, most research findings are false. In particular, it appears that findings in all human-related sciences are problematic, which is not surprising since human biology, psychology, behavior, social interactions, and ecological connections overlap in the real world and are difficult, if not impossible, to study completely with the reductionist tools of the scientific method. Perhaps it is easier to study non-human phenomena also because cultural, socioeconomic, and political factors are not in the mix.
Just because science cannot necessarily deal with extreme complexity and the scientific method has limitations, that obviously does not mean that all science is wrong or useless. Science performs beautifully, elegantly in certain areas of study; in others it cannot fulfill its promise because it is the inappropriate tool for inquiry or it is inappropriately used. Given that much of the public holds science in such high regard, the scientific community should strive to live up to its ideals. But currently, science is not performing to the high standards it promotes.
Problem – The Research Process
Problems with scientific rigor start with the research process. As exemplified by the cases above, too much research has no controls, no controlling for outside variables, and no hypotheses (or hypotheses that are unsupported by any existing knowledge, have no theoretical foundation, or are implausible). More exploratory types of research may need no hypotheses, but then they should not only be identified as such, they should not be mistakenly subjected to hypothesis testing, nor should the results of such inquiries be reported as anything but conditional, subject to further testing.
Many studies use experimental proxies to stand in for variables they seek to examine. For example, a researcher might say that taking a cookie indicated that that subject was prone to eating sweets. But what if I don’t care for the type of cookie given or I am allergic to something in that cookie? Maybe I do like sweets, but that particular cookie is a bad indication. Similarly, lab rats are used in toxicological, pharmacological, and other studies as proxies for humans, but we know that sometimes rats are good proxies, depending on the effect measured, sometimes they are poor proxies, and sometimes it depends on the specific type of rat for a given variable.
Another issue is the transformation of subjective data to seemingly objective, quantifiable data. Surveys do this all the time. They ask you to rank your preference on a scale of 1-5. Or they provide three to four answers from which you are forced to choose, even though none of the answers suit your needs, and your correct choice should be “other” – after which you should provide a qualitative answer which could not be entered into statistical analysis. When we quantify things that are more qualitative or cannot really be quantified at all (e.g., love) we leave room for error in the scientific record and we need to be clear that our results reflect this level of uncertainty.
Faulty assumptions, biased values, and subjective definitions of terms can also play a role in flawed research. A lot of agricultural science values high yield of food over the quality of the food produced. Many studies assume that the U.S. is a functioning democracy and/or define it as one, whereas other researchers find that not to be the case. Some people would simply define industrial livestock production as animal agriculture while others would call it animal cruelty. And a whole host of researchers tend to say, in the introduction of their publications, that various technologies have without question enhanced, expanded, helped, or benefited human lives, but fail to provide any citation or evidence for that assumption. I have encountered such unsupported suppositions assumed as fact so many times when reading research about human health and the environment, that my husband has coined it the First Paragraph Fabrication. But these subjective assumptions can lay the foundation for what then becomes supposedly objective results.
Additionally, quantitative data and analyses are most commonly utilized in scientific research, but can fail to elucidate clear conclusions. Sometimes quantitative data analysis can lead to erroneous conclusions when not coupled with qualitative data and/or analyses that put the quantitative data in context. In my own research, I examined media coverage of the Deepwater Horizon Oil Spill and quantified how many citizens were interviewed and allowed to speak about their experience. But only through qualitative analysis did I see that the citizens’ comments appeared to be limited to subjects such as economics and livelihood, rather than science or health.
Then there are statistics. What I know about statistics is that I have so much more that I need to know in order to perform meaningful research that takes into account the complexities inherent in most research studies. I also know that a lot of researchers know less than I do, and use and report statistics wrong. In my graduate regression analysis class, I found out, to the great dismay of my professor, that I was one of only about three people who had taken calculus. But because of the advancement of computer the past several decades, researchers no longer need to completely understand math when they can just input numbers into a program and obtain a p-value. Moreover, even though there is no such thing as any result being more statistically significant than any other, researchers sometimes still report p-values < 0.01 as more highly significant than p-values<0.05.
Finally, there are exaggerated and unsupported claims. Cuddy, Durante, and Wansink all suffer from drawing conclusions that do not necessarily stem from their data. Some research may be less problematic if is were more truthful – if the researchers report interim conclusions, if the deficiencies and limitations of the research study are clearly elucidated, and if correlational relationships are not transformed into causal ones. But that kind of honestly and clarity does not tend to make a scientific splash in the media and popular culture.
Problem – Publish or Perish
In order to get a job as a tenured professor, you must incessantly publish research in scholarly journals. Academic search committees do not usually read candidates’ publications, nor do they consider the quality of the research or the benefit of the research for the public good, but the number of publications is a crucial factor in hiring and tenure decisions. Quantity is valued over quality, so the incentives in academia are not to design extremely rigorous studies, pour over carefully obtained data, and act with extreme consideration in analyzing results or drawing conclusions. The incentives are to publish as much as possible as fast as possible – empty productivity.
In graduate school I encountered an ambitious professor who advised his students to attempt to publish every paper they had ever written. This person made a name for himself in his field, but his work is questionable.
When I conducted pilot studies surveying undergraduate students, my adviser recommended against publishing as these types of studies, due to their unreliability and lack of rigor, had become frowned upon. Moreover, they were merely exploratory – no conclusions could be drawn. By contrast, the ambitious professor would have urged me to publish. Even though they are not rigorous, studies of students are still published often and seep into media reports. The advice of the ambitious professor would have been more helpful toward academic career prospects (had I been interested), even though it was less ethical.
With competition fierce, academics are enticed to produce sexy, cool, headline garnering results, not necessarily truth. Not only is employment itself on the line, but grants and funding can be dependent upon these superficial goals. The emphasis on fame rather than truth corrupts science. Much like the rest of our consumer capitalistic culture, style is valued over substance. As Brian Nosek, professor and director of the Center for Open Science notes, “the real problem is that the incentives for publishable results can be at odds with the incentives for accurate results.”
A great deal of the aforementioned research problems might be better avoided with more time and a focus on quality rather than quantity, but that is not how academia currently works. Consequently, instead of slowly producing what would likely be fewer rigorous, meaningful, high-quality studies, academia produces too many studies, too quickly, lowering their overall quality. Those studies then fails to serve as a public good. Indeed, the pinnacle of such a perverse capitalistic incentive structure is in China, where scientists are provided cash rewards for publication; the more venerable the journal, the higher the pay. The implications for the corruptibility of science under such circumstances could not be clearer. But with the academic publishing industry posting astonishing profits, it is unlikely to change under the current free market system.
Problem – Peer Review
Peer review is considered the gold standard for vetting scientific research, but a lot of scientists recognize it is not. Nevertheless, much of the media and public view a peer-reviewed paper as incontrovertible. Like the rest of the academic system, peer review has broken down or may have never been as robust as people like to think. After all, the same researchers who are producing problematic studies act as peer-reviewers. But the problems lie deeper.
First, there is the outright fraud. With the emergence of more and more venues for academic work, journals exist that pretend to conduct peer reviews when they do not. Others charge the author fees to publish his/her work, with little concern for peer review at all. There is also the case of researchers or journal editors fabricating peer review. However, much like blatant fraud in research itself, these instances are less common than the more subtle factors at play.
Conscious and unconscious biases exist in the peer review process, as does manipulation. Often, reviewers rely on the reputations of high profile researchers and/or respected institutions, favoring their papers without reservation. Sometimes, researchers purposely cite previous publications from the journal to which they submit work, not because these citations are relevant in their study, but because they can increase the impact factor (the mark of prestige) of the journal. Study authors do this to curry favor and increase the likelihood of publication, while journal editors occasionally ask for it themselves.
Then there is the emphasis on sensationalism in scientific journals. They are not interested in careful, nuanced, studies – and forget about replication studies. They want novel, exciting results. Nobel Prize winner Randy Schekman says that journals “curate their brands” like any other corporate product. He and others claim that this leads to unsound science that does not benefit society, as it was intended to.
There are numerous other issues with the peer review process. People might be afraid to scrutinize research because they fear their research may be equally scrutinized and found to be flawed. Conflicts of interest may exist between the reviewers and the authors. Reviewers who openly, rather than anonymously, criticize the manuscript of a colleague might fear reprisal for criticizing the work of someone who could, in turn, be tasked with overseeing the reviewer’s future grant proposals or publications. On the other hand, reviewers can and do enter into quid pro quo arrangements whereby they ease one another’s work through the review process without regard to its merit.
Sometimes peer reviewers may not be equipped for their job. Many reviewers just do not know statistical methods well enough to assess them. Other times, reviewers are simply overworked which could affect their ability to catch errors. Because their peer review labor is unpaid, they may even unconsciously prioritize it less, leaving it rife for mistakes and oversights.
I am sure any researcher can list endless examples of papers they have read which contained obvious errors, yet passed peer review. I recall one in which the statistical test outlined in the method differed from the test described in the results. The point is that peer review is far from perfect and is quite flawed due to the inherent pressures of the current academic research and publishing system.
Problem – News media
The problems with media reports of scientific issues stem from the same market-driven demands that constrain the academic research process, and frankly, most of the processes in modern industrial societies. As the New York Times admitted, their paper is, in fact, ideologically driven; they believe in capitalism. Consequently, just as Edward Herman and Noam Chomsky theorized in Manufacturing Consent, the pro-capitalist ideology of mainstream/corporate news media can therefore influence not only which scientific issues they chose to cover in their news (known as agenda-setting), but how they frame and cover the issue.
As a result, news publications and journalists have almost the same incentives as academic journals and their authors: flashy headlines, cool stories, clickbait. Journalists themselves are commonly rewarded for quantity of publications over quality, and for high visibility and readership. Subsequently, the research they tend to cover is the easy to understand studies with trendy results. Audiences seem to like results that are intuitive and confirm their beliefs or, conversely, improbable and leave them awestruck. Americans tend to be enticed by ease and convenience. They like solutions to problems that seem magical, particularly if they can be characterized as science. They want to believe. As one student commented after much of Wansink’s research was shown to be unsound, “Despite all the news about Brain Wansink’s research, I still believe/follow some of his advice.” That is why Cuddy, Durante, and Wansink’s work was so appealing to the press and to the public.
Much like the failure of the vetting process in peer review, shoddy research could be better vetted by science journalists. One way to do this is to have more journalists with actual scientific expertise cover scientific research and perform investigative journalism of, rather than public relations for, science. In this way, the press could fulfill its role as the fourth estate and act as an auditor and translator of scientific information, rather than as a stenographer for science.
Even though some more complicated scientific research would need a critical eye with a background in a specific subject, many social science studies such as the ones recounted here are pretty easy to follow and need no specific expertise to find the flaws. Thus, I suspect that the journalists covering the aforementioned faulty studies (and so many more) do not parse the research publications, but merely count on the university or researcher’s press releases, thus acting as publicists rather than journalists. As a corporate enterprise, a news service does not necessarily have the motive to produce truth.
In the case of Wansink, the same news media that first uncritically touted his research have now turned around to criticize his work, only because others brought the problems to their attention. For them and their business model, its win-win, but it is not at all a win for the public.
Because scientific research affects all of our lives and because most people obtain their scientific information either directly or indirectly (via social media) from the press, the press plays a crucial role in better evaluating the science it covers and in ensuring that low quality research does not get reported as scientific “fact.” In this way, they also avoid adding fuel to the current fire of (sometimes valid) proclamations of “fake news.”
Where This Leaves Science
If a critical mass of scientists become untrustworthy, a tipping point is possible in which the scientific enterprise itself becomes inherently corrupt and public trust is lost, risking a new dark age with devastating consequences to humanity.
— Mark A. Edwards and Siddhartha Roy
Even with all of the difficulties described above, there is still so much good science being done, and there are still ethical, honest, scientists doing the best work they possibly can. The problem is, they are too often overlooked because meaningful, vigilant, nuanced work is not rewarded in the current system as much as fashionable, thrilling – if flawed – work. And the cycle can be vicious, with shoddy work gaining more attention, more rewards, and more funding. Due to this competitive and corruptible academic playing field, some of the best potential scientists either do not enter graduate school, drop out before graduating, or avoid work in academic research for lack of interest in the game.
Scientific research is vitally important and should be shared with the public, after all, most people agree that its goal should be to enhance public good. But we need to be wary of those who seem to be marketing their own products themselves. Some of these popular science researchers are not scientists at all but merely influence peddlers. There are roles galore for them in our current consumer culture, but promoting their unscientific research under the guise of science undermines the trustworthiness of all of science.
Some scientists are afraid of the current trend of open science, and of pre-publication and post-publication peer review, where scholars from all over the globe publicly scrutinize scientific research outside the traditional system. They fear we would all find that bad practices are too common. That mindset is anything but scientific, since science is supposed to welcome correction (in theory, if not in practice). Independent public review might be even better than traditional peer review in many respects. Reviewers might be less biased because they do not necessarily have to worry about career retaliation. Also, open review allows for constructive criticism from not just a set of several reviewers, but endless numbers. And there are scientists who even say that people outside of their specialized field, but who know enough about the subject, can even lend some of the best critiques because they view the work from a different, but valid perspective. These sorts of changes are all good for science.
Some people also view the push for more open science and more critical analyses of research as a witch hunt. The difference is that witches do not actually exist. Poor research does. If so-called scientists are not practicing real scientific research, and more importantly, if they will not learn from mistakes and change accordingly, then they should not be participating in science at all. They receive all of the adulation and reverence without the presumed integrity of their research.
Just like markets, science is supposedly self-correcting, yet neither are. Markets need regulations and regulators to keep them from enriching only the most corrupt, those with the least honesty and greatest ambition. So does science. The structures in place for checks and balances in science have been crumbling under the weight of the market system it which it operates. The question is, will the current movement for better science be able to withstand the forces against it? Right now, some scientists are charactering those who uncover and expose bad research “data thugs” and “academic terrorists.” I would characterize them as people of integrity, even heroes. As Thomas Kuhn suggested, scientific paradigm shifts do not come easily and there will always resistance to change, especially from the establishment.
Given that scientific results affect citizens and social policies, it is vital that they be reliable. But the careerist, market-driven, capitalistic incentive structure in the system does not foster truth and reliability as much as it fosters entertainment and novelty. Scientific inquiry should be seeking truths, and in application, these truths should be for the benefit of all life on Earth. Forget the ideologues. Real people and scientists themselves are questioning the sanctity of science for real reasons, as they should. Dismissing the troubles in scientific research as simply due to such problems as conflicts of interest from ties with industry is just merely targeting the lowest-hanging fruit.
The people who want to believe in magic, fairy dust, and the powers of positive thinking or prayer (both debunked) will always find ways to reject science for the wrong reasons and accept the unscientific research they like. Unfortunately, some people think science is magical. It’s not; it takes a lot of hard work to get it right. Those of us who are awestruck by the power, wonder, and uncertainty of science, need to be much, much more careful about what we publicly proclaim under the auspices of science and academic research. Science needs to adhere to the higher standards it presumes to uphold – otherwise, it is nothing more than business.