February 27, 2013

When Public Data Is Too Public

by Jay Pinho

In a recent column for the New York Times, media reporter David Carr asked, “Should data have a conscience?” The context for his query was the recent decision by the Journal News, in the wake of the Newtown school massacre, to publish an interactive map displaying the names and addresses of gun permit applicants throughout two New York counties. The act was met with confusion and consternation, especially by gun-rights advocates but even by some less inclined to support the individual interpretation of the Second Amendment. The newspaper eventually scrapped the project.

Such controversies are inevitable, and will only grow in frequency and stakes. The era of Big Data is well upon us, replete with mesmerizing catalogues of information that continue to snake their way into the public domain.

This data deluge has prompted a collective hand-wringing by privacy advocates, who quite understandably fret about the long-term implications of massive data availability. But what is often forgotten in the rush to comprehend this new data landscape is that much of what is considered newly public has actually long been so, often for years or decades. The debate has thus shifted to unfamiliar ground, and the consequence is a panoply of mostly unsatisfying solutions.

Gun permit applications are a prime example of this newfound conception of public data. As Jack Shafer noted, “By its very definition, the public record is not private. Under New York state law, the information the Journal News obtained from Westchester and Rockland county authorities can be obtained by anybody who asks for it.” The only remarkable aspect of the map was that the newspaper organized the available gun permit applications database into an easily navigable online feature. The Journal News thereby bridged the gap between the public data and the public.

This is an important distinction. In a way, the uproar over the Journal News’ online map represents the emergence of adolescence for the Big Data epoch. Early on in the online data era, most of the clamor on both sides concerned the explosion of large digital datasets, and what this development meant for privacy and the overall user experience. It was, in other words, a debate over data presence: the very existence of these databases was seen as cause for concern.

But we have now moved into a sophomore phase, in which presence has ceded the spotlight to accessibility. We have already accepted, with either youthful exuberance (millennials) or grudging resignation (baby boomers), the rapid proliferation of data and, especially, its migration en masse to the Internet. And yet, our apathy does not always extend to the enterprising minds who cull the contents of these large, unwieldy databases to discover innovative and sometimes terrifying uses for them. The expanding accessibility of data now frightens us in much the same way its mere presence once did.

Our reaction, however, quite predictably depends upon whose data is being publicized.

When online commons advocate Aaron Swartz took his own life on January 11, many lamented his passing and demanded the firing of Carmen Ortiz, the aggressive prosecutor who some felt was responsible for pushing the online pioneer over the brink. Swartz — and, to a lesser extent, polarizing groups such as Anonymous and WikiLeaks — has been widely celebrated as a paragon of data transparency and openness.

While hardly universal, the general sentiment expressed by the oft-repeated refrain “Information wants to be free” has been embraced by much of the online public. Seen in this light, the Journal News’ efforts run along similar lines. Indeed, the public nature of the gun permit records arguably tilts the ethical balance in favor of the newspaper, not Swartz.

Of course, the implications of the two scenarios were different. For one, although the JSTOR articles are technically private, many of their authors shared Swartz’ desire to make them public. No personally identifiable information was involuntarily compromised as a result. And perhaps most importantly, no highly divisive cultural wound was reopened as a consequence of the articles’ mass distribution.

But the real lesson may be that virtually no one applies open data standards to himself. “Information wants to be free — except for mine” is anything but catchy. But it is also a far more accurate depiction of the prevailing consensus. Last year, Viviane Reding, the European Commissioner for Justice, Fundamental Rights, and Citizenship, proposed a “right to be forgotten,” by which any citizen of the European Union “shall have the right — and not only the ‘possibility’ — to withdraw their consent to the processing of the personal data they have given out themselves.”

Differences certainly exist between European and American sensitivities to data accessibility. Nevertheless, Reding’s proposal is starkly at odds with the nearly universal ethos of a generation that freely shares its most intimate moments on social networks, blogs, and other online platforms in exchange (unwittingly or otherwise) for a smorgasbord of free content. For all of the whining that accompanies Facebook’s every revision of its privacy policies, we dutifully fall in line soon enough, mounting only impotent protests, whose invariable expression via Facebook has long since passed from intentional irony to inadvertent self-parody.

Reding’s proposal, then, is premised on the dubious assumption that we are still firmly ensconced in the first phase of the Big Data evolution. But we are not: data presence — proliferation, even — is irreversible. The equilibrium of data accessibility, however, may yet be up for grabs.

Any thoughtful resolution to the debate over data accessibility will be necessarily arbitrary. A prohibition on organizing public data into handy features fails to pass the laugh test, but simply ignoring the dangers altogether is just as naïve. (Indeed, in very short order, Journal News reporters soon found their own home addresses exposed online as retaliation for their efforts.)

Some of the proposed solutions are unworkable. As Jeffrey Rosen warns, “The right to be forgotten could make Facebook and Google, for example, liable for up to two percent of their global income if they fail to remove photos that people post about themselves and later regret, even if the photos have been widely distributed already.”

What does this mean for data accessibility in the future? The evidence is mixed. The muted uproar over Facebook’s policy changes suggest that simple crowd displeasure may not be enough to alter the current trajectory. But recent public behavior suggests the possible existence of a minimum privacy threshold, as mass disapproval of the privacy policies of Facebook’s new corporate acquisition, Instagram, spawned a hasty retreat to a more palatable settlement.

The inherent tension between open data and privacy concerns must not become a deterrent to finding an amenable compromise. The past several years have seen virtually no meaningful debate over the rapidly growing presence of Big Data. And unless we act quickly, the brief window of opportunity to shape the contours of data accessibility may close as well.

Become a Patron!

This post may contain affiliate links.

Literary Dis(-)appearances in (Post)colonial Cities

The latest issue of the Full Stop Quarterly focuses on literary representations of dis(-)appearing (post)colonial cities across the Eurasian continent. Click here to read the introduction.