Why India’s rabies problem is especially bad

India bears the world’s heaviest rabies burden, according to a new study from the Global Alliance for Rabies Control, accounting for 35% of all deaths due to the disease. Here’s why you shouldn’t be surprised (data from GARC).

1. Vaccination coverage of dogs

Vaccination coverage of dogs in BRICS nations.
Vaccination coverage of dogs in BRICS nations.

Among the BRICS nations, India has the highest population of dogs and one of the lowest rates of vaccination.

2. Chances of receiving care

Chances of receiving prophylactic care after a rabid animal bite, in BRICS countries.
Chances of receiving prophylactic care after a rabid animal bite, in BRICS countries.

If you were bitten by an animal, in India the animal could be rabid 54% of time, and in China, 55%. But of every thousand people bitten by rabid animals, 24 don’t receive palliative care in India, while only 4 people don’t receive it in China.

3. Access to post-exposure care

Years of life lost due to rabies, in BRICS countries.
Years of life lost due to rabies, in BRICS countries.

Despite China being more populous than India and having a greater bite-incidence (1,107 vs. 691, per 100,000 people), the years of life lost due to rabies is higher in India. The GARC report uses multiple studies to come up with different estimates of that number, but India’s lower limit is comfortably higher than other BRICS countries’ upper limits. This is about there being more people in India exposed to dog-bites – as well as about the physical access to, the quality of and the affordability of care.

The result…

Types of losses incurred due to the burden of rabies, in BRICS countries.
Types of losses incurred due to the burden of rabies, in BRICS countries.

From Orwell to Kafka, Markov to Doctorow: Understanding Big Data through metaphors

Big Data... right? Credit: DARPA

On March 20, I attended a short talk by Malavika Jayaram, a fellow at the Berkman Center for Internet & Society, titled ‘What we talk about when we talk about Big Data’ at the T.A.J. Residency in Bengaluru. It was something of an initiation into the social and political contexts of Big Data and its usage, and the important ethical conundrums assailing these contexts.

Even if it was a little slow during the first 15 minutes, Jayaram’s talk progressed rapidly later on as she quickly piled criticism after criticism upon the concept’s foundation, which was quickly being revealed to be immature. Perhaps those familiar with Jayaram’s past research did (or didn’t) find the contents of her talk to contain more nuances than she’s let on before, but to me it revealed an array of perspectives I’ve remained balefully ignorant of.

The first in line was about the metaphors used to describe Big Data – and how our use of metaphors at all betrays our inability to comprehend Big Data in its entirety. Jayaram quoted at length but loosely from an essay by Sara M. Watson, her colleague at Berkman, titled Data is the new “____”. It describes how the dominant metaphors are industrial, dealing with the data itself as if it were a natural resource and the process of analyzing it as if it were being mined or refined.

Data as a natural resource suggests that it has great value to be mined and refined but that it must be handled by experts and large-scale industrial processes. Data as a byproduct describes the transactional traces of digital interactions but suggests it is also wasteful, pollutive, and may not be meaningful without processing. Data has also been described as a fungible resource, as an asset class, suggesting that it can be traded, stored, and protected in a data vault. One programmatic advertising professional related to me that he thinks “data is the steel of the digital economy,” an image that avoids the negative connotations of oil while at the same time expressing concern about monopolizing forces of firms Google and Facebook.

Not Orwellian but Kafkaesque

There are two casualties of this perspective. The first is the people behind the data – those whose features, actions, choices, etc. have become numbers – are forgotten even as the data they have given “birth” to becomes more important and valuable. The second casualty is the constant reminder that data is valuable, and large amounts of data more so, condemning it to a life where it can’t hope to be stagnant for long.

The dehumanization of Big Data, according to Jayaram, extends beyond analysts forgetting the data belongs to faces and names and unto the restriction of personal ownership. The people the data represents often don’t have access to it. This implies an existential anxiety quite unlike found in George Orwell’s 1984 and more like the one in Franz Kafka’s The Trial. In Jayaram’s words,

You are in prison awaiting your trial. Suddenly you find out the trial has been postponed and you have no idea why or how. There seem to be people who know things that you never will. You don’t know what you can do to encourage their decisions to keep the trial permanently postponed. You don’t know what it was about you and you have no way of changing your behavior accordingly.

In 2013, American attorney John Whitehead popularized this comparison in an article titled Kafka’s America. Whitehead argues that the sentiments of Josef K., the protagonist of The Trial, are increasingly becoming the sentiments of a common American.

Josef K’s plight, one of bureaucratic lunacy and an inability to discover the identity of his accusers, is increasingly an American reality. We now live in a society in which a person can be accused of any number of crimes without knowing what exactly he has done. He might be apprehended in the middle of the night by a roving band of SWAT police. He might find himself on a no-fly list, unable to travel for reasons undisclosed. He might have his phones or internet tapped based upon a secret order handed down by a secret court, with no recourse to discover why he was targeted. Indeed, this is Kafka’s nightmare, and it is slowly becoming America’s reality.

Kafka-biographer Reiner Stach summed up these activities as well as the steadily unraveling realism of Kafka’s book as proof of “the extent to which power relies on the complicity of its victims” – and the ‘evil’ mechanism used to achieve this state is a concern that Jayaram places among the prime contemporary problems threatening civil liberties.

If your hard drive’s not in space…

There is an added complication. If the use of Big Data was predominantly suspect, it would have been easier to build consensus against its abuse. However, that isn’t the case: Big Data is more often than not used in ways that don’t harm our personal liberties, and the misfortune is that their collective beneficence as yet has been no match for the collective harm some of its misuses have achieved. Could this be because the potential for its misuse is almost everywhere?

Yes. An often overlooked facet of using Big Data is the idea that the responsible use of Big Data is not a black-and-white deal. Facebook is not all evil and academic ethnographers are not all benign. Zuckerberg’s social network may collect and store large amounts of information that it nefariously trades with advertisers – and may even comply with the NSA’s “requests” – but there is a systematicity, an orderliness, with which the data is being passed around. The complex’s existence alone presents a problem, no doubt, but that there is a complex at all makes it easier to attempt to fix the problem than if the orderliness were absent.

And this orderliness is often absent among academicians, scholars, journalists, etc., who may not think data is a dollar note but at the same time are processing prodigious amounts of it without being as careful as is necessary about how they are logging, storing and sharing it. Jayaram rightly believes that even if information is collected for benevolent purposes, the moment it becomes data it loses its memory and stays on on the Internet as data; that if we are to be responsible data-scientists, being benevolent alone will be inadequate.

To drive the point home, she recalled a comment someone had made to her during a data workshop.

The Utopian way to secure data is to shoot your hard drive into space.

Every other recourse will only fall short.

Consent is not enough

This memoryless, Markovian character of the data-economy demands a redefinition of consent as well. The question “What is consent?” is dependent on what a person is consenting to. However, almost nobody knows how the data will be used, what for, or over what time-frames. Like a variable flowing through different parts of a computer, data can pass through a variety of contexts to each of which it provides value of varying quality. So, the same question of contextual integrity should retrospectively apply to the process of consent-giving as well: What are we consenting to when we’re consenting to something?

And when both the party asking for consent and the party asked for consent can’t know all the ways in which the data will be used, the typical way-out has been to seek consent that protects one against harm – either by ensuring that one’s civil liberties are safeguarded or by explicitly prohibiting choices that will impinge upon, again, one’s civil liberties. This has also been increasingly done in a one-size-fits-all manner that the average citizen doesn’t have the bargaining power to modify.

However, it’s become obvious by now that just protecting these liberties isn’t enough to ensure that data and consent are both promised a contextual integrity.

Why not? Because the statutes that enshrine many of these liberties is yet to be refashioned for the Internet age. In India, at least, the six fundamental rights are to equality, to freedom, against exploitation, to freedom of religion, cultural and educational rights, and to constitutional remedies. Between them, the promise of protecting against the misuse of not one’s person but one’s data is tenuous (although a recent document from the Telecom Regulatory Authority of India could soon fix this).

The Little Brothers

Anyway, an immediate consequence of this typical way-out has been that one needs to be harmed to get remedy, at a time when it remains difficult to define when one’s privacy has been harmed. And since privacy has been an enabler of human rights, even unobtrusive acts of tagging and monitoring that don’t violate the law can force compliance among the people. This is what hacker Andrew Huang talks about in his afterword to Cory Doctorow’s novel Little Brother (2008),

[In] January 2007, … Boston police found suspected explosive devices and shut down the city for a day. These devices turned out to be nothing more than circuit boards with flashing LEDs, promoting a show for the Cartoon Network. The artists who placed this urban graffiti were taken in as suspected terrorists and ultimately charged with felony; the network producers had to shell out a $2 million settlement, and the head of the Cartoon Network resigned over the fallout.

Huang’s example further weakens the Big Brother metaphor by implicating not one malevolent central authority but an epidemic, Kafkaesque paranoia that has “empowered” a multitude of Little Brothers all convinced that God is only in the detail.

While Watson’s essay (Data is the new “____”) is explicit about the power of metaphors to shape public thought, Doctorow’s book and Huang’s afterword take the next logical step in that direction and highlight the clear and present danger for what it is.

It’s not the abuse of power by one head of state but the evolution of statewide machines that (exhibit the potential to) exploit the unpreparedness of the times to coerce and compel, using as their fuel the mountainous entity – sometimes as Gargantuan as to be formless, and sometimes equally absurd – called Big Data (I exaggerate – Jayaram was more measured in her assessments – but not much).

And even if Whitehead and Stach only draw parallels between The Trial and American society, the relevant, singular “flaw” of that society exists elsewhere in the world, too: the more we surveil others, the more we’ll be surveilled ourselves, and the longer we choose to stay ignorant of what’s happening to our data, the more our complicity in its misuse. It is a bitter pill to swallow.

Featured image credit: DARPA


Oxygen may be a carcinogen

In inordinate amounts or forms, anything can be poison to life – even the air we breathe. But its threat seems more ominous when you think that even in small quantities, accumulated over time, the oxygen in the air can cause cancer. Two American scientists, Kamen Simeonov and Daniel Himmelstein, have concluded exactly that after analyzing cancer-incidence data compiled between 2005 and 2009 among people populating counties along the US’s west coast. Their calculation doesn’t show a dramatic drop in incidence with altitude yet the statistical methods used to refine the results suggest the relationship is definitely there: oxygen contributes to the growth of cancerous tumors. As they write in their paper,

“As a predictor of lung cancer incidence, elevation was second only to smoking prevalence in terms of significance and effect size.

A relative-importance test on R with the data, available on Himmelstein’s GitHub, attests to this (regression indices: LMG, Pratt, first and last). elevlung Additionally,

the lung cancer association was robust to varying regression models, county stratification, and population subgrouping; additionally seven environmental correlates of elevation, such as exposure to sunlight and fine particulate matter, could not capture the association.”

Simeonov and Himmelstein found that with every 1,000 m rise in elevation, lung cancer incidence decreased by 7.23% – that is, 5.18-9.29 per 100,000 individuals, which is fully 12.7% of the mean incidence (56.8 per 100,000 individuals). Overall, the duo attributes a decrease of 25.299% of lung cancer cases per 100,000 individuals to the “range of elevation of counties of the Western United States”. In other words,

Were the entire United States situated at the elevation of San Juan County, CO (3,473 m), we estimate 65.496% [46,855–84,136] fewer new lung cancer cases would arise per year.
Their paper was published in the open access journal PeerJ on January 13, 2015. The validity of the result lies in the strength of the statistical analysis backing it. Cancers are caused by a variety of agents. Respiratory cancers, in turn, are often the result of exposure to certain heavy metals, fine particulate matter, radiation, inhalation of toxic substances and genetic predisposition. To say oxygen could be one such toxic substance requires the claimants to show its relative significance with other known carcinogens and its covariance with incidence of cancer. Only statistics enables this. First, the data shows that the incidence of cancer dropped with increasing altitude.

My plot from data. The grey band represents the confidence level.
My plot from data. The grey band represents the confidence interval. Lung cancer incidence in per 100,000 individuals, elevation in 1,000s of meters.

Next, it shows that the incidence couldn’t have dropped due to anything else but the elevation. (‘Pearson’ is the Pearson correlation coefficient: the higher its absolute value is, the stronger the correlation.)

"Predictors displayed expected correlations such as a strong positive correlation between obesity and diabetes. Collinearity was moderate but pervasive. Elevation covaried with most variables including cancers indicating the need to adjust for covariates while carefully considering collinearity." Credit: http://dx.doi.org/10.7717/peerj.705
“Predictors displayed expected correlations such as a strong positive correlation between obesity and diabetes. Collinearity was moderate but pervasive. Elevation covaried with most variables including cancers indicating the need to adjust for covariates while carefully considering collinearity.” Credit: http://dx.doi.org/10.7717/peerj.705

To corroborate their results, the authors were also able to show that their statistical models were able to point out known risks – such as variation of incidence with smoking and exposure to radon. On the other hand, unlike smoking, exposure to radon also varies with altitude. The paper however does not clarify how it eliminates the resulting confounding fully.

Alternatively, Van Pelt (2003) attributed “some, but not all” of the Cohen (1995) radon association to elevation. Follow-up correspondences by each author revolved around the difficulty in assigning the effect wholly to elevation or radon when both of these highly-correlated predictors remained significant (Cohen, 2004; Van Pelt, 2004). We believe that our data quality improvements, including county-specific smoking prevalences and population-weighted elevations, were responsible for wholly attributing the effect to elevation.
In fact, this admission belies the study’s ultimate problem (and that of others like it): a profusion of influences on the final results. Cancer – lung or another – can be caused due to so many things. To assess its incidence in terms of a few variables – such as elevation, smoking and sunlight – could only be for the sake of convenience. Because, beyond a point, to think cancer could be the result of just one or two factors is to be foolishly reductionist. At the same time, this issue is typical of so many statistical investigations that it would be more productive to consider Simeonov’s and Himmelstein’s find as a springboard off which to launch more studies than to think it the final word on anything. They endorse the same thing with their final admission, that their study is still a victim of the ‘ecological fallacy’ – when studies of groups are thought to be equivalent to studies of individuals but are really not so. As this essay states,
Serious errors can result when an investigator makes the seemingly natural assumption that the inferences from an ecological analysis must pertain either to the individuals within the groups or to individuals across groups. A frequently cited early example of an ecological inference was Durkheim’s study of the correlation between suicide rates and religious denominations in Prussia in which the suicide rate was observed to be correlated with the number of Protestants. However, it could as well have been the Catholics who were committing suicide in largely Protestant provinces.

Counting warheads

Two researchers associated with the Bulletin of the Atomic Scientists have published their research on the number of nuclear warheads possessed by countries worldwide, together with data on where they have been deployed based on various sources. The best part is their paper is available for free, and from the looks of it many of the sources the authors draw on to discuss nuclear proliferation seem to be publicly available, too. I plotted the salient numbers here. For the full paper, go here.


Some research misconduct trends by the numbers

A study published in eLIFE on August 14, 2014, looked at data pertaining to some papers published between 1992 and 2012 that the Office of Research Integrity had determined contained research misconduct. From the abstract:

Data relating to retracted manuscripts and authors found by the Office of Research Integrity (ORI) to have committed misconduct were reviewed from public databases. Attributable costs of retracted manuscripts, and publication output and funding of researchers found to have committed misconduct were determined. We found that papers retracted due to misconduct accounted for approximately $58 million in direct funding by the NIH between 1992 and 2012, less than 1% of the NIH budget over this period. Each of these articles accounted for a mean of $392,582 in direct costs (SD $423,256). Researchers experienced a median 91.8% decrease in publication output and large declines in funding after censure by the ORI.

While the number of retractions worldwide is on the rise – also because the numbers of papers being published and of journals are on the rise – the study addresses a subset of these papers and only those drawn up by researchers who received funding from the National Institutes of Health (NIH).


Among them, there is no discernible trend in terms of impact factors and attributable losses. In the chart below, the size of each datapoint corresponds to the direct attributable loss and its color, to the impact factor of the journal that published the paper.

tabpublic 15-08-2014 100128

However, is the time to retraction dropping?

The maximum time to retraction has been on the decline since 1997. However, on average, the time to retraction is still fluctuating, influenced as it is by the number of papers retracted and the nature of misconduct.


No matter the time to retraction or the impact factors of the journals, most scientists experience a significant difference in funding before and after the ORI report comes through, as the chart below shows, sorted by quanta of funds. The right axis displays total funding pre-ORI and the left, total funding post-ORI.


As the study’s authors summarize in their abstract: “Researchers experienced a median 91.8% decrease in publication output and large declines in funding after censure by the ORI,” while total funding toward all implicated researchers went from $131 million to $74.5 million.

There could be some correlation between the type of misconduct and decline in funding, but there’s not enough data to determine that. Nonetheless, there are eight instances in 1992-2012 when the amount of funding increased after the ORI report, of which the lowest rise as such as is seen for John Ho, who committed fraud, and the highest for Alan Landay, implicated for plagiarism, a ‘lesser’ charge.

incfundFrom the paper:

The personal consequences for individuals found to have committed research misconduct are considerable. When a researcher is found by the ORI to have committed misconduct, the outcome typically involves a voluntary agreement in which the scientist agrees not to contract with the United States government for a period of time ranging from a few years to, in rare cases, a lifetime. Recent studies of faculty and postdoctoral fellows indicate that research productivity declines after censure by the ORI, sometimes to zero, but that many of those who commit misconduct are able to find new jobs within academia (Redman and Merz, 2008, 2013). Our study has found similar results. Censure by the ORI usually results in a severe decrease in productivity, in many cases causing a permanent cessation of publication. However the exceptions are instructive.

Retraction Watch reported the findings with especial focus on the cost of research misconduct. They spoke to Daniele Fanelli, one part of whose quote is notable – albeit no less than the rest.

The question of collateral damage, by which I mean the added costs caused by other research being misled, is controversial. It still has to be conclusively shown, in other words, that much research actually goes wasted directly because of fabricated findings. Waste is everywhere in science, but the role played by frauds in generating it is far from established and is likely to be minor.


Stern, A.M., Casadevall, A., Steen, R.G. and Fang, F.C., Financial costs and personal consequences of research misconduct resulting in retracted publications, eLIFE. August 14, 2014;3:e02956.


Wealth and religiosity disagree while some Hindus look the other way

My extended family’s annual trip to Tirupati is coming up. Because a more indecisive bunch doth not exist, my relatives have been planning the trip for the last week. One creepy fact their discussions threw up is that, in 2013, the temple earned Rs. 220 crore from its sale of human hair. Pilgrims shave their heads at Tirupati as a token offering, and about 40 million people visit it annually. Although not all of them offer their hair, Rs. 220-crore’s worth must be a lot.

According to this PDF detailing the temple’s finances, the biggest chunk of its income comes from cash offerings from devotees, listed as ‘Kanuka’.


(All figures in Rs. crore)

Its other revenue receipts, including hair, are listed as such:


(All figures in Rs. crore)

Many of the world’s richest temples are in India. Some of the richest include the shrine at Shirdi, Maharashtra, for Sai Baba; the Padmanabhaswamy Temple, Thiruvananthapuram, Kerala; the Mahabodhi temple in Bodh Gaya, Bihar; and the Vaishno Devi temple in Jammu and Kashmir. Besides boasting overwhelming attendances, they’re also proof that Hinduism is a very materialistic religion when it comes to offerings despite its abstemious philosophies.

No matter this hypocrisy – the world at large rejects it anyway because religion and wealth share a negative relationship. Specifically, countries with higher GDP have lower religiosity. This document, wherefrom the religiosity numbers were pulled, defines religiosity as simply the fraction of people who identified themselves as religious in a survey.