Category Archives: Big Data

Big Personal Data

Reading this article in a Blog on New York Times today I kept thinking about the what was not being said in it. Data governance.
The assumption, not doubt valid, is that ever larger pools of data will be amassed; and suitable analyzed, solutions will be found to existing problems and fantastic new possibilities will emerge.
I don’t doubt the potential in “Big Data” and look forward to its exploitation with interest. But also with some trepidation. With great power come great scope for abuse. This is as true for technology as it is for people. Necessarily so, since it is people how design and use the technology.
The article mentions car sensor data and its collection. But what is this data and who get’s to use it ?
Going on to mention ever higher transfer speeds allowing larger data quantities to exchanged ever faster.

The term Data Lake is used to describe GE accumulation of sensor data. Siting that it permitted speeding up GE’s error detection by a factor of 2000.

With ever larger quantities gathered and ever faster transfer speeds and distributed analytical tools all the worlds data eventually becomes available in a Data Sea, ready for analysis.

The article argues that this is not just technically possible but also both the trend and desirable. The first two are clearly correct. The problems with different data structures and qualities are surmountable. As for whether or not it is desirable the issues are less clear.
That there are advantages to be had is obvious. It is also probable that the advantages are great enough to justify the expense involved. Indeed it would not happen in the commercial world unless there was money to be made. Governments have other objectives, social control including law enforcement, where the expense is not a primary concern, only technical feasibility.

In engineering technical feasibility is a function of resources expended. And the Big Data field is certainly seeing a great deal of resources expended on it, whatever the motivation. So what is possible in future will certainly expand from what is possible today.

Data governance has become a political issue of late. The relentless surveillance by governments of their citizens is becoming more widely known. Not all governments have the resources of the US or the determination of Chinese. But that is a temporary relief. The capabilities will expand and the costs will decline. In due course everyone can aspire to total surveillance.
At present only the richest and most powerful can access the whole data sea. Others can only access the part of it they gathered themselves. This article implies that limitation will fall away and everything will be accessible to those that can pay. But who gets the money? For now it has been those who accumulate the data that get the money. They incur the expenses; servers, storage and bandwidth cost money. The source of the data gets nothing. Certainly no money but in some cases a service. Social media run on expensive server platforms but the user can use them “for free” in exchange with surrendering all their data and any other information the platform can gather about them. Many consider this fair exchange. Even if this is so, it still in the users interest to control where the data is going, who gets to use it and for what. The stories of social media posts that have circulated outside their intended target audience and caused embarrasment are legion. The columnist Kathleen Parker advocated self-sensorship to a degree inconsistent with civilization.
It is unclear whether she honestly meant what she wrote or had simply disengaged her mental faculties before writing the piece. Bill Mahers response was withering and to the point. We can’t live in a world where the only privacy we have is inside our heads. Privacy matters also for those less exalted among us. Scanning social media before a hire is now common practice. And individuals go not get to decide what they need to “hide” – anything that may cause offense or raise the slightest question can cause later inconvenience or worse. Made even more intractable by ever changing social conventions.
Expect to see much, much more on this subject.

Some gatherers of user data have policies in place and tell the user explicitly that while they gather as much data as possible they do not pass it on to any one else. Or if they do, it is in some lesser, anonymized form.

That data is passed along at all suggest convincingly that the data has commercial value. As interconnections improve and data sales increase it is also clear that the source of this data are being underpaid; the data gatherer can increase their revenue per data item without incurring any additional source costs.

Persons who are the source of this data then have both a financial and a data governance issue (read privacy) at stake.

Social media users voluntarily submit to being guinea pigs to marketers. Their data has a clear commercial value and this is what they use to pay for the service.

In due course platforms will appear that also allow users to pay in other forms than just with their own personal information and data they provide themselves. “Pay small monthly fee and we will not sell your emails to your local grocery chain”.
Pay-for-service is available in social media, particularly in the more specialized subsets, like dating, but it is always either a pay or spy business model. Never a commingling of the two on the same platform. This is in most cases driven by technical considerations. It can be tricky to adequately safeguard the user data that is not available for resale. But see

Internet of Things is the next step. How are we going to handle that data. The Big Data providers have the datastore ready to take on the data and the tools to analyze it. But where are the tools to safeguard the individuals ?