English as the currency of science’s practice

K. VijayRaghavan, the secretary of India’s Department of Biotechnology, has written a good piece in Hindustan Times about how India must shed its “intellectual colonialism” to excel at science and tech – particularly by shedding its obsession with the English language. This, as you might notice, parallels a post I wrote recently about how English plays an overbearing role in our lives, and particularly in the lives of scientists, because it remains a language many Indians don’t have to access to get through their days. Having worked closely with the government in drafting and implementing many policies related to the conduct and funding of scientific research in the country, VijayRaghavan is able to take a more fine-grained look at what needs changing and whether that’s possible. Most hearteningly, he says it is – only if we had the will to change. As he writes:

Currently, the bulk of our college education in science and technology is notionally in English whereas the bulk of our high-school education is in the local language. Science courses in college are thus accessible largely to the urban population and even when this happens, education is effectively neither of quality in English nor communicated as translations of quality in the classroom. Starting with the Kendriya Vidyalayas and the Nayodya Vidyalayas as test-arenas, we can ensure the training of teachers so that students in high-school are simultaneously taught in both their native language and in English. This already happens informally, but it needs formalisation. The student should be free to take exams in either language or indeed use a free-flowing mix. This approach should be steadily ramped up and used in all our best educational institutions in college and then scaled to be used more widely. Public and private colleges, in STEM subjects for example, can lead and make bi-lingual professional education attractive and economically viable.

Apart from helping students become more knowledgeable about the world through a language of their choice (for the execution of which many logistical barriers spring to mind, not the least of which is finding teachers), it’s also important to fund academic journals that allow these students to express their research in their language of choice. Without this component, they will be forced to fallback to the use of English, which is bound to be counterproductive to the whole enterprise. This form of change will require material resources as well as a shift in perspective that could be harder to attain. Additionally, as VijayRaghavan mentions, there also need to be good quality translation services for research in one language to be expressed in another so that cross-disciplinary and/or cross-linguistic tie-ups are not hampered.

Featured image credit: skeeze/pixabay.

Advertisements

The language and bullshitness of ‘a nearly unreadable paper’

Earlier today, the Retraction Watch mailing list highlighted a strange paper written by a V.M. Das disputing the widely accepted fact that our body clocks are regulated by the gene-level circadian rhythm. The paper is utter bullshit. Sample its breathless title: ‘Nobel Prize Physiology 2017 (for their discoveries of molecular mechanisms controlling the circadian rhythm) is On Fiction as There Is No Molecular Mechanisms of Biological Clock Controlling the Circadian Rhythm. Circadian Rhythm Is Triggered and Controlled By Divine Mechanism (CCP – Time Mindness (TM) Real Biological Clock) in Life Sciences’.

The use of language here is interesting. Retraction Watch called the paper ‘unreadable’ in the headline of its post because that’s obviously a standout feature of this paper. I’m not sure why Retraction Watch is highlighting nonsense papers on its pages – watched by thousands every day for intriguing retraction reports informed by the reporting of its staff – but I’m going to assume its editors want to help all their readers set up their own bullshit filters. And the best way to do this, as I’ve written before, is to invite readers to participate in understanding why something is bullshit.

However, to what extent do we think unreadability is a bullshit indicator? And from whose perspective?

There’s no exonerating the ‘time mindness’ paper because those who get beyond the language are able to see that it’s simply not even wrong. But if you had judged it only by its language, you would’ve landed yourself in murky waters. In fact, no paper should be judged by how it exercises the grammar of the language its authors have decided to write it in. Two reasons:

1. English is not the first language for most of India. Those who’ve been able to afford an English-centred education growing up or hail from English-fluent families (or both) are fine with the language but I remember most of my college professors preferring Hindi in the classroom. And I assume that’s the picture in most universities, colleges and schools around the country. You only need access to English if you’ve also had the opportunity to afford a certain lifestyle (cosmopolitan, e.g.).

2. There are not enough good journals publishing in vernacular languages in India – at least not that I know of. The ‘best’ is automatically the one in English, among other factors. Even the government thinks so. Earlier this year, the University Grants Commission published a ‘preferred’ list of journals; only papers published herein were to be considered for career advancement evaluations. The list left out most major local-language publications.

Now, imagine the scientific vocabulary of a researcher who prefers Hindi over English, for example, because of her educational upbringing as well as to teach within the classroom. Wouldn’t it be composed of Latin and English jargon suspended from Hindi adjectives and verbs, a web of Hindi-speaking sensibilities straining to sound like a scientist? Oh, that recalls a third issue:

3. Scientific papers are becoming increasingly hard to read, with many scientists choosing to actively include words they wouldn’t use around the dinner table because they like how the ‘sciencese’ sounds. In time, to write like this becomes fashionable – and to not write like this becomes a sign of complacency, disinterest or disingenuousness.

… to the mounting detriment of those who are not familiar with even colloquial English in the first place. To sum up: if a paper shows other, more ‘proper’ signs of bullshit, then it is bullshit no matter how much its author struggled to write it. On the other hand, a paper can’t be suspected of badness if its language is off – nor can it be called bad as such if that’s all is off about it.

This post was composed entirely on a smartphone. Please excuse typos or minor formatting issues.

A conference’s peer-review was found to be sort of random, but whose fault is it?

It’s not a good time for peer-review. Sure, if you’ve been a regular reader of Retraction Watch, it’s never been a good time for peer-review. But aside from that, the process has increasingly been taking the brunt for not being able to stem the publishing of results that – after publication – have been found to be the product of bad research practices.

The problem may be that the reviewers are letting the ‘bad’ papers through but the bigger issue is that, while the system itself has been shown to have many flaws – not excluding personal biases – journals rely on the reviewers and naught else to stamp accepted papers with their approval. And some of those stamps, especially from Nature or Science, are weighty indeed. Now add to this muddle the NIPS wrangle, where researchers may have found that some peer-reviews are just arbitrary.

NIPS stands for the Neural Information Processing Systems (Foundation), whose annual conference was held in the second week of December 2014, in Montreal. It’s considered one of the few main conferences in the field of machine-learning. Around the time, two attendees – Corinna Cortes and Neil Lawrence – performed an experiment to judge how arbitrary the conference’s peer-review could get.

Their modus operandi was simple. All the papers submitted to the conference were peer-reviewed before they were accepted. Cortes and Lawrence then routed a tenth of all submitted papers through a second peer-review stage, and observed which papers were accepted or rejected in the second stage (According to Eric Price, NIPS ultimately accepted a paper if either group of reviewers accepted it). Their findings were distressing.

About 57%* of all papers accepted in the first review were rejected during the second review. To be sure, each stage of the review was presumably equally competent – it wasn’t as if the second stage was more stringent than the first. That said, 57% is a very big number. More than five times out of 10, peer-reviewers disagreed on what could be published. In other words, in an alternate universe, the same conference but with only the second group of reviewers in place was generating different knowledge.

Lawrence was also able to eliminate a possibly redeeming confounding factor, which he described in a Facebook discussion on this experiment:

… we had a look through the split decisions and didn’t find an example where the reject decision had found a ‘critical error’ that was missed by the accept. It seems that there is quite a lot of subjectivity in these things, which I suppose isn’t that surprising.

It doesn’t bode well that the NIPS conference is held in some esteem among its attendees for having one of the better reviewing processes. Including the 90% of the papers that did not go through a second peer-review, the total predetermined acceptance rate was 22%, i.e. reviewers were tasked with accepting 22 papers out of every 100 submitted. Put another way, the reviewers were rejecting 78%. And this sheds light on the more troubling perspective of their actions.

If the reviewers had been randomly rejecting a paper, they would’ve done so at the tasked rate of 78%. At NIPS, one can only hope that they weren’t – so the second group was purposefully rejecting 57% of the papers that the first group had accepted. In an absolutely non-random, logical world, this number should have been 0%. So, that 57% is closer to 78% than is 0% implies some of the rejection was random. Hmm.

While this is definitely cause for concern, forging ahead on the basis of arbitrariness – which machine-learning theorist John Langford defines as the probability that the second group rejects a paper that the first group has accepted – wouldn’t be the right way to go about it. This is similar to the case with A/B-testing: we have a test whose outcome can be used to inform our consequent actions, but using the test itself as a basis for the solution wouldn’t be right. For example, the arbitrariness can be reduced to 0% simply by having both groups accept every nth paper – a meaningless exercise.

Is our goal to reduce the arbitrariness to 0% at all? You’d say ‘Yes’, but consider the volume of papers being submitted to important conferences like NIPS and the number of reviewer-hours being available to evaluate them. In the history of conferences, surely some judgments must have been arbitrary for the reviewer to have fulfilled his/her responsibilities to his/her employer. So you see the bigger issue: it’s not all the reviewer as much as it’s also the so-called system that’s flawed.

Langford’s piece raises a similarly confounding topic:

Perhaps this means that NIPS is a very broad conference with substantial disagreement by reviewers (and attendees) about what is important? Maybe. This even seems plausible to me, given anecdotal personal experience. Perhaps small highly-focused conferences have a smaller arbitrariness?

Problems like these are necessarily difficult to solve because of the number of players involved. In fact, it wouldn’t be entirely surprising if we found that nobody or no institution was at fault except how they were all interacting with each other, and not just in fields like machine-learning. A study conducted in January 2015 found that minor biases during peer-review could result in massive changes in funding outcomes if the acceptance rate was low – such as with the annual awarding of grants by the National Institutes of Health. Even Nature is wary about the ability of its double-blind peer-review to solve the problems ailing normal ‘peer-review’.

Perhaps for the near future, the only takeaway is likely going to be that ambitious young scientists are going to have to remember that, first, acceptance – just as well as rejection – can be arbitrary and, second, that the impact factor isn’t everything. On the other hand, it doesn’t seem possible in the interim to keep from lowering our expectations of peer-reviewing itself.

*The number of papers routed to the second group after the first was 166. The overall disagreement rate was 26%, so they would have disagreed on the fates of 43. And because they were tasked with accepting 22% – which is 37 or 38 – group 1 could be said to have accepted 21 that group 2 rejected, and group 2 could be said to have accepted 22 that group 1 rejected. Between 21/37 (56.7%) and 22/38 (57.8%) is 57%.

Hat-tip: Akshat Rathi.

Some research misconduct trends by the numbers

A study published in eLIFE on August 14, 2014, looked at data pertaining to some papers published between 1992 and 2012 that the Office of Research Integrity had determined contained research misconduct. From the abstract:

Data relating to retracted manuscripts and authors found by the Office of Research Integrity (ORI) to have committed misconduct were reviewed from public databases. Attributable costs of retracted manuscripts, and publication output and funding of researchers found to have committed misconduct were determined. We found that papers retracted due to misconduct accounted for approximately $58 million in direct funding by the NIH between 1992 and 2012, less than 1% of the NIH budget over this period. Each of these articles accounted for a mean of $392,582 in direct costs (SD $423,256). Researchers experienced a median 91.8% decrease in publication output and large declines in funding after censure by the ORI.

While the number of retractions worldwide is on the rise – also because the numbers of papers being published and of journals are on the rise – the study addresses a subset of these papers and only those drawn up by researchers who received funding from the National Institutes of Health (NIH).

pubsfreq

Among them, there is no discernible trend in terms of impact factors and attributable losses. In the chart below, the size of each datapoint corresponds to the direct attributable loss and its color, to the impact factor of the journal that published the paper.

tabpublic 15-08-2014 100128

However, is the time to retraction dropping?

The maximum time to retraction has been on the decline since 1997. However, on average, the time to retraction is still fluctuating, influenced as it is by the number of papers retracted and the nature of misconduct.

trendTimeToRetr

No matter the time to retraction or the impact factors of the journals, most scientists experience a significant difference in funding before and after the ORI report comes through, as the chart below shows, sorted by quanta of funds. The right axis displays total funding pre-ORI and the left, total funding post-ORI.

prepostfund

As the study’s authors summarize in their abstract: “Researchers experienced a median 91.8% decrease in publication output and large declines in funding after censure by the ORI,” while total funding toward all implicated researchers went from $131 million to $74.5 million.

There could be some correlation between the type of misconduct and decline in funding, but there’s not enough data to determine that. Nonetheless, there are eight instances in 1992-2012 when the amount of funding increased after the ORI report, of which the lowest rise as such as is seen for John Ho, who committed fraud, and the highest for Alan Landay, implicated for plagiarism, a ‘lesser’ charge.

incfundFrom the paper:

The personal consequences for individuals found to have committed research misconduct are considerable. When a researcher is found by the ORI to have committed misconduct, the outcome typically involves a voluntary agreement in which the scientist agrees not to contract with the United States government for a period of time ranging from a few years to, in rare cases, a lifetime. Recent studies of faculty and postdoctoral fellows indicate that research productivity declines after censure by the ORI, sometimes to zero, but that many of those who commit misconduct are able to find new jobs within academia (Redman and Merz, 2008, 2013). Our study has found similar results. Censure by the ORI usually results in a severe decrease in productivity, in many cases causing a permanent cessation of publication. However the exceptions are instructive.

Retraction Watch reported the findings with especial focus on the cost of research misconduct. They spoke to Daniele Fanelli, one part of whose quote is notable – albeit no less than the rest.

The question of collateral damage, by which I mean the added costs caused by other research being misled, is controversial. It still has to be conclusively shown, in other words, that much research actually goes wasted directly because of fabricated findings. Waste is everywhere in science, but the role played by frauds in generating it is far from established and is likely to be minor.

References

Stern, A.M., Casadevall, A., Steen, R.G. and Fang, F.C., Financial costs and personal consequences of research misconduct resulting in retracted publications, eLIFE. August 14, 2014;3:e02956.

Plagiarism is plagiarism

In a Nature article, Praveen Chaddah argues that textual plagiarism entails that the offending paper only carry a correction and not be retracted because that makes the useful ideas and results in the paper unavailable. On the face of it, this is an argument that draws a distinction between the writing of a paper and the production of its technical contents.

Chaddah proposes to preserve the distinction for the benefit of science by punishing plagiarists only for what they plagiarized. If they pinched text, then issue a correction and apology but let the results stay. If they pinched the hypothesis or results, then retract the paper. He thinks this line of thought is justifiable because, this way, one does not retard the introduction of new ideas into the pool of knowledge, because it does not harm the notion of “research as a creative enterprise” for as long as the hypothesis, method and/or results are original.

I disagree. Textual plagiarism is also the violation of an important creative enterprise that, in fact, has become increasingly relevant to science today: communication. Scientists have to use communication effectively to convince people that their research deserves tax-money. Scientists have to use communication effectively to make their jargon understandable to others. Plagiarizing the ‘descriptive’ part of papers, in this context, is to disregard the importance of communication, and copying the communicative bits should be tantamount to copying the results, too.

He goes on to argue that if textual plagiarism has been detected but if the hypothesis/results are original, the latter must be allowed to stand. His hypothesis appears to assume that scientific journals are the same as specialist forums that prioritize results over a full package: introduction, formulation, description, results, discussion, conclusion, etc. Scientific journals are not just the “guarantors of the citizen’s trust in science” (The Guardian) but also resources that people like journalists, analysts and policy-makers use to understand the extent of the guarantee.

What journalist doesn’t appreciate a scientist who’s able to articulate his/her research well, much less patronizing the publicity it will bring him/her?

In September 2013, the journal PLoS ONE retracted a paper by a group of Indian authors for textual plagiarism. This incident exemplifies a disturbing attitude toward plagiarism. One of the authors of the paper, Ram Dhaked, complained that it was the duty of PLoS ONE to detect their plagiarism before publishing it, glibly abdicating his guilt.

Like Chaddah argues, authors of a paper could be plagiarizing text for a variety of reasons – but somehow they believe lifting chunks of text from other papers during the paper-production process is allowable or will go unchecked. As an alternative to this, publishers could consider – or might already be considering – the ethics of ghost-writing.

He finally posits that papers with plagiarized text should be made available along with the correction, too. That would increase the visibility of the offense and over time, presumably, shame scientists into not plagiarizing – but that’s not the point. The point is to get scientists to understand why it is important to think about what they’ve done and communicate their thoughts. That journals retract both the text and the results if only the text was plagiarized is an important way to reinforce that point. If anything, Chaddah’s contention could have been to reduce the implications of having a retraction against one’s bio.

R&D in China and India

“A great deal of the debate over globalization of knowledge economies has focused on China and India. One reason has been their rapid, sustained economic growth. The Chinese economy has averaged a growth rate of 9-10 percent for nearly two decades, and now ranks among the world’s largest economies. India, too, has grown steadily. After years of plodding along at an average annual increase in its gross domestic product (GDP) of 3.5 percent, India has expanded by 6 percent per annum since 1980, and more than 7 percent since 1994 (Wilson and Purushothaman, 2003). Both countries are expected to maintain their dynamism, at least for the near future.”

– Gereffi et al, ‘Getting the Numbers Right: International Engineering Education in the United States, China and India’, Journal of Engineering Education, January 2008

A June 16 paper in Proceedings of the National Academy of Sciences, titled ‘China’s Rise as a Major Contributor to Science and Technology’, analyses the academic and research environment in China over the last decade or so, and discusses the factors involved in the country’s increasing fecundity in recent years. It concludes that four factors have played an important role in this process:

  1. Large human capital base
  2. A labor market favoring academic meritocracy
  3. A large diaspora of Chinese-origin scientists
  4. A centralized government willing to invest in science

A simple metric they cite to make their point is the publication trends by country. Between 2000 and 2010, for example, the number of science and engineering papers published by China has increased by 470%. The next highest climb was for India, by 234%.

Click on the image for an interactive chart.

Click on the image for an interactive chart.

“The cheaters don’t have to worry they will someday be caught and punished.”

This is a quantitative result. A common criticism of the rising volume of Chinese scientific literature in the last three decades is the quality of research coming out of it. Dramatic increases in research output are often accompanied by a publish-or-perish mindset that fosters a desperation among scientists to get published, leading to padded CVs, falsified data and plagiarism. Moreover, it’s plausible that since R&D funding in China is still controlled by a highly centralized government, flow of money is restricted and access to it is highly competitive. And when it is government officials that are evaluating science, quantitative results are favored over qualitative ones, reliance on misleading performance metrics increases, and funds are often awarded for areas of research that favor political agendas.

The PNAS paper cites the work of Shi-min Fang, a science writer who won the inaugural John Maddox prize in 2012 for exposing scientific fraud in Chinese research circles, for this. In an interview to NewScientist in November of that year, he explains the source of widespread misconduct:

It is the result of interactions between totalitarianism, the lack of freedom of speech, press and academic research, extreme capitalism that tries to commercialise everything including science and education, traditional culture, the lack of scientific spirit, the culture of saving face and so on. It’s also because there is not a credible official channel to report, investigate and punish academic misconduct. The cheaters don’t have to worry they will someday be caught and punished.

At this point, it’s tempting to draw parallels with India. While China has seen increased funding for R&D…

Click on the chart for an interactive view.

Click on the chart for an interactive view.

… India has been less fortunate.

Click on the chart for an interactive view.

Click on the chart for an interactive view.

The issue of funding is slightly different in India, in fact. While Chinese science is obstinately centralized and publicly funded, India is centralized in some parts and decentralized in others, public funding is not high enough because presumably we lack the meritocratic academic environment, and private funding is not as high as it needs to be.

Click on the image for an interactive chart.

Click on the image for an interactive chart.

Even though the PNAS paper’s authors say their breakdown of what has driven scientific output from China could inspire changes in other countries, India is faced with different issues as the charts above have shown. Indeed, the very first chart shows how, despite the number of published papers having double in the last decade, we have only jumped from one small number to another small number.

“Scientific research in India has become the handmaiden of defense technology.”

There is also a definite lack of visibility: when little scientific output of any kind is accessible to 1) the common man, and 2) the world outside. Apart from minimal media coverage, there is a paucity of scientific journals, or they exist but are not well known, accessible or both. This Jamia Milia collection lists a paltry 226 journals – including those in regional languages – but it’s likelier that there are hundreds more, both credible and dubious. A journal serves as an aggregation of reliable scientific knowledge not just for scientists but also for journalists and other reliant decision-makers. It is one place to find the latest developments.

In this context, Current Science appears to be the most favored in the country, not to mention the loneliest. Then again, a couple fingers can be pointed at years of reliance on quantitative performance metrics, which drives many Indian researchers to publish in journals with very high impact factors such as Nature or Science, which are often based outside the country.

In the absence of lists of Indian and Chinese journals, let’s turn to a table used in the PNAS paper showing average number of citations per article compared with the USA, in percent. It shows both India and China close to 40% in 2010-2011.

The poor showing may not be a direct consequence of low quality. For example, a paper may have detailed research conducted to resolve a niche issue in Indian defense technology. In such a case, the quality of the article may be high but the citability of the research itself will be low. Don’t be surprised if this is common in India given our devotion to the space and nuclear sciences. And perhaps this is what a friend of mine referred to when he said “Scientific research in India has become the handmaiden of defense technology”.

To sum up, although India and China both lag the USA and the EU for productivity and value of research (albeit through quantitative metrics), China is facing problems associated with the maturity of a voluminous scientific workforce, whereas India is quite far from that maturity. The PNAS paper is available here. If you’re interested in an analysis of engineering education in the two countries, see this paper (from which the opening lines of this post were borrowed).

Replication studies, ceiling effects, and the psychology of science

On May 25, I found Erika Salomon’s tweet:

The story started when the journal Social Psychology decided to publish successful and failed replication attempts instead of conventional papers and their conclusions for a Replications Special Issue (Volume 45, Number 3 / 2014). It accepted proposals from scientists stating which studies they wanted to try to replicate, and registered the accepted ones. This way, the journal’s editors Brian Nosek and Daniel Lakens could ensure that a study was published no matter the outcome – successful or not.

All the replication studies were direct replication studies, which means they used the same experimental procedure and statistical methods to analyze the data. And before the replication attempt began, the original data, procedure and analysis methods were scrutinized, and the data was shared with the replicating group. Moreover, an author of the original paper was invited to review the respective proposals and have a say in whether the proposal could be accepted. So much is pre-study.

Finally, the replication studies were performed, and had their results published.


The consequences of failing to replicate a study

Now comes the problem: What if the second group failed to replicate the findings of the first group? There are different ways of looking at this from here on out. The first person such a negative outcome affects is the original study’s author, whose reputation is at stake. Given the gravity of the situation, is the original author allowed to ask for a replication of the replication?

Second, during the replication study itself (and given the eventual negative outcome), how much of a role is the original author allowed to play when performing the experiment, analyzing the results and interpreting them? This could swing both ways. If the original author is allowed to be fully involved during the analysis process, there will be a conflict of interest. If the original author is not allowed to participate in the analysis, the replicating group could get biased toward a negative outcome for various reasons.

Simone Schnall, a psychology researcher from Cambridge writes on the SPSP blog (linked to in the tweet above) that, as an author of a paper whose results have been unsuccessfully replicated and reported in the Special Issue, she feels “like a criminal suspect who has no right to a defense and there is no way to win: The accusations that come with a “failed” replication can do great damage to my reputation, but if I challenge the findings I come across as a “sore loser.””

People on both sides of this issue recognize the importance of replication studies; there’s no debate there. But the presence of these issues calls into question how replication studies are designed, reviewed and published, with a just as firm support structure, or they all suffer the risk of becoming personalized. Forget who replicates the replicators, it could just as well become who bullies the bullies. And in the absence of such rules, replication studies are becoming actively disincentivized. Simone Schnall acceded to a request to replicate her study, but the fallout could set a bad example.

During her commentary, Schnall links to a short essay by Princeton University psychologist Daniel Kahneman titled ‘A New Etiquette for Replication‘. In the piece, Kahneman writes, “… tension is inevitable when the replicator does not believe the original findings and intends to show that a reported effect does not exist. The relationship between replicator and author is then, at best, politely adversarial. The relationship is also radically asymmetric: the replicator is in the offense, the author plays defense.”

In this blog post by one of the replicators, the phrase “epic fail” is an example of how things could be personalized. Note: the author of the post has struck out the words and apologized.

In order to eliminate these issues, the replicators could be asked to keep things specific. Various stakeholders have suggested different ways to resolve this issue. For one, replicators should address the questions and answers raised in the original study instead of the author and her/his credentials. Another way is to regularly publish reports of replication results instead of devoting a special issue to it, and make them part of the scientific literature.

This is one concern that Schnall raises in her answers (in response to question #13):”I doubt anybody would have widely shared the news had the replication been considered “successful.”” So there’s a need to address a bias here: are journals likelier to publish replication studies that fail to replicate previous results? Erasing this bias requires publishers to actively incentivize replication studies.

A paper published in Perspectives on Psychological Science in 2012 paints a slightly different picture. It looks at the number of replication studies published in the field and pegs the replication rate at 1.07%. Despite the low rate, one of the paper’s conclusions was that among all published replication studies, most of them reported successful, not unsuccessful, replications. It also notes that since 2000, among all replication studies published, the fraction reporting successful outcomes stands at 69.4%, and that reporting unsuccessful outcomes at 11.8%.

chart_1

Sorry about the lousy resolution. Click on the chart for a better view.

At the same time, Nosek and Lakens concede in this editorial that, “In the present scientific culture, novel and positive results are considered more publishable than replications and negative results.”


The ceiling effect

Schnall does raise many questions about the replication, including alleging the presence of a ceiling effect. As she describes it (in response to question #8):

“Imagine two people are speaking into a microphone and you can clearly understand and distinguish their voices. Now you crank up the volume to the maximum. All you hear is this high-pitched sound (“eeeeee”) and you can no longer tell whether the two people are saying the same thing or something different. Thus, in the presence of such a ceiling effect it would seem that both speakers were saying the same thing, namely “eeeeee”.

The same thing applies to the ceiling effect in the replication studies. Once a majority of the participants are giving extreme scores, all differences between two conditions are abolished. Thus, a ceiling effect means that all predicted differences will be wiped out: It will look like there is no difference between the two people (or the two experimental conditions).”

She states this as an important reason to get the replicators’ results replicated.


My opinions

// Because Schnall thinks the presence of a ceiling effect is a reason to have the replicators’ results replicated, it implies that there could be a problem with the method used to evaluate the authors’ hypothesis. Both the original and the replication studies used the same method, and the emergence of an effect in one of them but not the other implies the “fault”, if that, could lie with the replicator – for improperly performing the experiment – or with the original author – for choosing an inadequate set-up to verify the hypothesis. Therefore, one thing that Schnall felt strongly about, the scrutiny of her methods, should also have been formally outlined, i.e. a replication study is not just about the replication of results but about the replication of methods as well.

// Because both papers have passed scrutiny and have been judged worthy of publication, it makes sense to treat them as individual studies in their own right instead of one being a follow up to the other (even though technically that’s what they are), and to consider both together instead of selecting one over the other – especially in terms of the method. This sort of debate gives room for Simone Schnall to publish an official commentary in response to the replication effort and make the process inclusive. In some sense, I think this is also the sort of debate that Ivan Oransky and Adam Marcus think scientific publishing should engender.

// Daniel Lakens explains in a comment on the SPSP blog that there was peer-review of the introduction, method, and analysis plan by the original authors and not an independent group of experts. This was termed “pre-data peer review”: a review of the methods and not the numbers. It is unclear to what extent this was sufficient because it’s only with a scrutiny of the numbers does any ceiling effect become apparent. While post-publication peer-review can check for this, it’s not formalized (at least in this case) and does little to mitigate Schnall’s situation.

// Schnall’s paper was peer-reviewed. The replicators’ paper was peer-reviewed by Schnall et al. Even if both passed the same level of scrutiny, they didn’t pass the same type of it. On this basis, there might be reason for Schnall to be involved with the replication study. Ideally, however, it would have been better if the replication was better formulated, with normal peer-review, in order to eliminate Schnall’s interference. Apart from the conflict of interest that could arise, a replication study needs to be fully independent to make it credible, just like the peer-review process is trusted to be credible because it is independent. So while it is commendable that Schnall shared all the details of her study, it should have been made possible for her participation to end there.

// While I’ve disagreed with Kahneman over the previous point, I do agree with point #3 in his essay that describes the new etiquette: “The replicator is not obliged to accept the author’s suggestions [about the replicators’ M.O.], but is required to provide a full description of the final plan. The reasons for rejecting any of the author’s suggestions must be explained in detail.” [Emphasis mine]

I’m still learning about this fascinating topic, so if I’ve made mistakes in interpretations, please point them out.


Featured image: shutterstock/(c)Sunny Forest