Author Archives: larsrein

What will the STORK bring us ?

STORK is a federated security initiative from the EU. The acronym is ridiculously forced, being short for “Secure idenTity acrOss boRders linKed.” Less than auspicious.
It is supposed to secure interoperability between electronic identity (eID) schemes in different countries. With a few extra bit thrown in on top. Interoperability sounds very nice. Yes, lets have some of that. I applaud the effort.

However, I am dubious about the stuff “under the hood”. Can it be made to work as promised, and even if it can, do we want it or need it ?
I’ll examine this STORK thing more closely and report back here.

sailing the data ocean

What if there was a way for access to data could be authorized everywhere. If you were authorized to access a piece of data you could get access to it wherever it happened to be located.

This is not the way things work at the moment for sure, but if it could be made to work in a convenient way, what should it be like ?

When the first web browser arose some 20+ years ago. Static html pages and other media and document files where available by calling a URL over the HTTP protocol. Security was added – at first pretty coarse grained: If you were logged in you could access pretty much anything. It got better.
Comprehensive tools became available to centrally manage all web access to any document, with the finest granularity. The writing of the rules of who should access to do what, when and where, could be delegated out to those who actually were in a position to know.
But crucially, these tools could only manage access to stuff directly under their control. Often operating in a reverse-proxy mode intercepting HTTP traffic. APIs were available through which other applications could tap into them to take advantage of access control rules contained in them, to do their own authorization. In this way the data under the control of a unified set of access control rules could be made corporate wide. Access to all of a data in a corporation being governed by rules maintained in one place. Everyone would play together in the same data security pool.
In practice this never happened. (re-)Writing applications to take advantage of the API of the chosen security software platform , was too expensive. Other tools emerged to export the security rules from one software platform to another, leaving them to do their own enforcement through their own rule infrastructures. This didn’t work very well because it was too complicated. Rules are fundamentally about meaning, and meaning doesn’t translate easily. Never the less this was an attempt to federate authorization.

Data protected by the same access control rule infrastructure is part of the same pool. A database is a single pool. It has its own internal security governing access to individual pieces of data contained in it, but has no reach outside. The database maintains it’s own list of who gets to access which column in what tables.
A server has it’s own internal arrangement for governing access to the data in its own file systems. It may also have access remote file systems. Some remote file systems would be on other servers, which would govern access (NFS, FTP, Samba etc.) and would therefore not be part of the server’s own pool.

If authorizations could be federated between pools all data would exist in one big virtual pool.
A virtual pool made of multiple physical pools; individual databases, file servers etc. At present this is difficult as there may be user federation between some data pools, but each pool has it’s one authorization, it’s own way to enforce access rules. The rules in one are not known, or directly enforceable in another. There is no federated authorization.

Lets further suppose that any piece of data in this virtual data pool, data ocean really, is accessible over TCP with a URI. The URI may have various formats depending on what type of physical pool is being addressed.
For example, this would be the syntax of an URI accessing a directory (LDAP) store


And this to access a individual file, using HTTP(S)


Access to one of the secure web reverse-proxies mention above, would look like this too.

The would be many others. Note that the username and password does not appear. There would not be any prompting for this information either.
Access control would be through PAML tokens, passed in the headers. A SSL handshake would take place to establish the requesting entity’s authorization for the tokens presented.
All physical pools are defined by the entity that control access to it, and all of these entities, be they LDAP server and file/web server in the URI examples above must be equipped to handle PAML tokens to verify the authorization for the request. Through the acceptance of these PAML token the pools together form a virtual data ocean. Any application can call on data anywhere else and present PAML tokens for authorization.

This leave quite a bit of scope for application architecture. The use of a PAML token require access to the private key of the user to which the PAML token was issued. Which means that if a user is engaged in a transaction with an application and this application needs access to data kept somewhere else on behalf of the user, the application can only present its own PAML token, not forward those it has received from the users. The user must at a minimum contact this other data store directly and engage in a SSL handshake. This way the user’s ownership of the public key is established for the benefit of the data store. The application can then pass the PAML token received from the user on to the data store and the store would now know that the PAML tokens are OK to use; or the user could make the data retrieval directly and pass the data to the application that needs it. Sort of like a data federation.

Note that PAML token are tied to data, not any particular host environment. Among other things this means that the requesting client may send the server a considerable number of tokens in order to establish authorization for all required data. The server will grant the union of all these tokens.

IT security shapes business models

text slightly reworked from talk given to OIC, Oslo Dec. 2014

In his article The Nature of The Firm, Ronald Coase proposed that a firm forms and grow while transaction cost inside the firm are lower than outside it. I.e. a firm can do something cheaper and better in-house than going out in the marketplace for it.

OK, that’s a company, but what about their data ? It seems that something along those lines are going on there too.

Most of us have come across this in the form of federated login. You’re about to log in at a web application and are given the option of login in via Facebook. Or Twitter. OK, what happens if you click on the Facebook button ? Clearly something, since you enter the application in question and are now known as yourself.
This is where federated security comes in. If you where already logged in to Facebook, a message was sent to Facebook were your valid logon session was used to prepare a special access token. A token that the new application could use to establish a new session for you. If you where not already logged in to Facebook, you could log in now and the same thing would happen.
In any case you log in only to Facebook, and other web sites take advantage of this to log you in to their site too.
Acting in this capacity Facebook is called an Identity Provider. And no, that is NOT a deliberate pun. A company, or application that uses such a federated login, is in the jargon called a service provider.

Returning to Coase again. The application can have it’s own login and user database. Doing it in-house. Or you can go out in the market and contract for this service, as he suggests. Allowing login through Facebook.
OK, how much would such an external login service cost in the market? As it turns out, not very much.
For one thing, the marginal costs are low. Once you already have the users and their passwords in your database; adding a token exchange where by other applications can send their users over for authentication, is quite cheap.
Not only is it cheap to allow third parties use you login process, there is also revenue in it. You can now guarantee that the users have the same userid across platforms. All kinds of people would be very interested in that, and pay good money to ensure that it happened. And not just the NSA. Advertising and other businesses who make their money by analyzing consumer behavior likes federated login a whole lot. If you allow other companies to login their users via your platform, you control the origin of this data.

Considering the economies of scale being so much in favor of the federated login and those that make it happen. Why aren’t all logins federated ?

Well, more and more are.
But the savings may not be that large on the service provider side. The in-house part of Coases analysis. Just because you leave the authentication of your users to someone else doesn’t mean you can then forget about them. There is also that bit about authorization. What are people allowed to do. From a security point of view the whole point of identifying the users in the first place is so that we then can then look at our rules for what the user should be permitted. If we don’t know who they are we don’t know what they are allowed to do.

So you still end up needing a user database. And a database of rules for what the users are allowed to do. And maintain those things. Particularly that last bit can be complicated and expensive. Once you have all those things, handling the login isn’t that big of a deal.

From a revenue point of view things are a bit different. Identifying the users has it’s own value – selling information about the user to third parties.
You could limit yourself to just logging the user inn, so that you may accurately track him, but not bother about doing any authorization.

And this is what we see a lot of. Plenty of media sites accept federated login through the likes of Twitter and Facebook.

They only care about getting you name accurately. Their revenue comes from knowing it.
There is no fine grained access control or authorization going on. It is not worth it to them. is an example of this. And your email address is becoming your universal user id another.

Clearly Coase’s balance between transaction costs in the market place and in-house, are still very much in play.

This suggests a possible future.
If we think about a corporation as being fundamentally an agglomeration of data and business practices. Where interaction between businesses, trade, is an interaction between these agglomerations.
Returning to Coase’s starting question, the boundary between a corporation and the market and why they are where they are.
What defines the boundary of these data agglomerations? The authorization rules. The data might be located anywhere. At Dropbox even. But if you write the rules governing access to it, that data is part of your agglomeration, wherever it is. If someone wants to interact with your data or your business processes, you must write the rules to allow it to happen. Your rules map out your agglomeration. Outside the reach of your rules is outside your boundary.

Which sketches out the premise that the limit on the size and shape of a corporation is really a limit of information technology. Specifically the limit of authorization technology.

This is all very well, but what does that look like in real life. What are the business strategic implications of such limits. If they even exist.

There are grounds for suspecting that they do.

We have technical standards for federated login. Therefore we have the Facebook login example. A user known from other places on the web is more valuable on the consumer profiling market than one know just to yourself. Both parties gain. The consumer not so much.

But we do not have standards for federated authorization, much less any off-the-shelf technologies for it.

What should we expect to see if federated authorization were possible?

Truly distributed content for one thing.
At the moment digital content is licensed, which is another way of saying it is rented. Meaning that the right to authorize access to it has been delegated to the licensee for a limited period of time. And the licensee writes the authorization rules for the duration. A temporary data transfer from one agglomeration to another. The data is not really distributed, certainly not by you. Someone else distributes it on their own behalf.
This disintermediation is risky for both parties. If the owner overcharge for the license no one will buy one. Undercharge and you leave money on the table. For the licensee the risk are the reverse. Overpay and you lose money. This makes the distribution process contractually complicated. Made so in no small measure because there is no way to federate authorization. These risks mean that there will be much less of this business than there potentially could be.

So the idea that buying a ticket to a movie would get you a free repeat on Netflix is, as things currently stand, out of the question. Despite the obvious advantages for all concerned. The movie companies and Netflix are separate agglomerations.

And it’s not just media. All kinds of content. Medical records. It is impossible to have fine grained access control to the individual data items in such a record after it has been compiled. Therefore its distribution is exceedingly complex and cumbersome. If you can see any part, you can see the whole. If access to an individual item could be controlled wherever the document was, it would be simple to distribute it. Those that needed to, and had authorization to see an individual data field, could see it. Without always having to be cleared to see the whole thing. As it is,
we now have checkout clerks in pharmacies having access to much your medical records because they need to verify one small item in them.

Federated Authorization

For a while now the use of federated userids has been the norm. A site called the Service Provider, SP, has entered into agreement with another, called the Identity Provider, IP that it will trust the userid the IP sends it as being properly authenticated and accurately identifies the user.
There are a number of way this can be done; the passing of a SAML token from the IP to the SP is a popular way. Where the SAML token contains the userid and the SP trusts this because the token is signed by the IP.

So much for users and their userids, but what about what they have access to; The authorization.
Userids are tied to revenue in significant ways. In businesses deriving revenue from gathering information about the users, the useris is clearly of primary concern. More generally in business applications it is not the userid itself that is central, but what the user has access to and how to execute that access control.

In a standard access control deployment scenario with federated users. For example the user login somewhere else (the IP) and click a link to (return to ) the application in question, with the userid being passed along. This is a federated login. The application (the SP) still examines the userid and refers to its own access control infrastructure for what the user is authorized to do. Could this step be federated too ? Should it ?
Like a login and authentication infrastructure costs money, so does maintaining an access control infrastructure. Federated login cuts the costs of the first, so can Federated Authorization, FA, cut the second.
Form the cost point of view FA makes sense.

Can it be done and does it make sense from a policy and governance point of view ?. It is one thing to let someone else identify the user on your behalf. Digital certificates have been around for a while so the issues surrounding outsourcing the work of identify users are not new. And the risks are largely accepted. But authorization cuts close to the essentials of governance: what are people permitted to do. True, if the user is misidentified having a thorough access control system does not make much difference.
One significant issue is related to changes. A user ID doesn’t change much, if at all. What a user has access to, does. Sometimes great urgency. Having direct control of authorization is then desirable.

The day-to-day problem with access control is not primarily the enforcement, though that can be tricky enough. It is the creation of and updates to policy. The rules governing the access. Who gets to decide and how can those decisions be captured in an enforceable way.

another Silk Road takedown

here we go again… The powers that be have take down some evil doers hiding behind TOR. I’m must ask pardon for being a bit jaundiced about the hole thing.
Clearly there are people doing undesirable things taking advantage of the protection offered by TOR. but TOR has a legitimate purpose and the support behind TOR is impeccable. But TOR is not perfect, specifically it is not fool proof. User error compromises it.
But I’m wondering is there is another way for the vendors on Silk Road to conduct their business. Leaving aside the problems of payment for the moment: BitCoin has it’s own weaknesses.
Concentrating merely on how two parties can get in touch with each other. Were A is looking for something that B has to offer, put very generically.

If my understanding of the attack against Silk Road is correct: It is based on having control over a significant portion of the onion routers. Sufficient for being able to establish a pattern of traffic from yourself to a particular .onion address.
Each request packet is routed every which way by the TOR protocol. But with control of enough routing nodes a pattern will never the less emerge since the request packets must eventually end up in the same place: Where the concealed service is hosted.
This last bit is true even without implementation errors. DNS leakage and the like.
What if the hidden service is not hosted in just one particular server, but on many different ones. And using the service will involve traffic to many different locations. Is this possible and whould it make a service concealed behing a onion routing scheme undetectable ?
That depends. If the hidden service is hosted on a fixed set of servers rather than just one, there is no real difference. Just that more traffic now needs to be analysed to be able to pin-point them. This could be counteracted by moving the service often, but that is a defence mechanism independent of how many host the hidden service is using. And controlling more onion routers will help the tracker.

Trusted VPNs is of course an option. Where both the client and the hidden service use VPN to tunnel to some part of the internet. The tracker is then left with finding a proxy rather than the real host. It is still possible that the tracker might use traffic analysis to get from the proxy to the real host. As long as the service stays put long enough.

This suggests a possible course of action. Can the hidden service be dynamically located ? Perhaps even randomly.
I think I’ll work on that. A facinating challenge. Watch this space.

If there was a pay version of Facebook, how much would it cost ?

Social media platforms, broadly defined comes of two forms, “free” and for-pay… And never the twain shall meet. Or is that so ?
Users of facebook have no choice. The “free” version is all they’ve got. Yahoo Mail has both a “free” version and a pay version. But it is not clear that that the pay version do not have the same drawbacks as the free version: relentless surveillance and selling of the gathered information to third parties. Sometimes the privacy policy states that information is not passed to outside companies. This often means that the information is given to “partnerships” i.e. to the same outside company but payment is taken in a different form. Perhaps somewhat anonymized but still valuable to the marketing dept. Maybe I am being a little cynical here, but in business it is the more prudent path.

New social media outfits pop up left and right. Lately some have tried other revenue models. caught my attention the other day. Like Facebook they try to fake-start exclusivity by being by-invitation only to start with. That will end soon enough. Apparently they are without advertising. “you are not the product” is their slogan. From the premise that those who pay are the customer, those who don’t are the product. Assuming that cash is the only form of payment. Some and indeed a great many people, are quite content to pay with their (digital)life.

It will be interesting to see how Ello progresses. I wish them every good fortune, but I suspect that business /greed will get the better of them. Once there is information about users that can be monetized, it will eventually.
Getting back to the original question. How much does a social media platform like Facebook cost? Adding a suitable profit margin and we have the price facing the users. A dollar (US) a month?. Two dollars? Five ?
The problem can be broken down. There are storage costs, network costs , the cost of computing power, developments costs. (and profits, of course)
The first three are falling by the day. The developments costs should scale nicely too – more users, lower development costs per user. I am assuming current the level of customer care and support will remain the same, and estimate zero costs here. These do not add up to a definite estimate on price, but suggest that whatever it is at the moment, it will be lower in future. Economies of scale can be misleading in social media. Here there is no special value of having absolutely everyone on the same platform, only enough, i.e those you want to be “social” with. Or more particularly use the platform to be in touch with. Linkedin is a special case where there is a premium on getting in touch with the people you DON’T already know. That being said, bigger is better.

And it leads to my next point, switching costs. Building up a profile takes time and effort. Moving to another platform will in general mean starting afresh. Not quite with a blank slate as most allow you to import contact lists from various other application, like Gmail and Yahoo Mail, and using those lists to find others of your contacts already on the platform. Which in the case of a new platform is unlikely to be very many. Then there is any other data you have provided to the platform: pictures, writings etc. The data export features are unlikely to be very helpfull. But some may find the loss of self-provided data to be the very reason to switch platfom and starting fresh; It is not a bug but a feature.

Never the less, switching costs increases the platform owners pricing power; And data portability reduces it. Which means that all social media platform have an interest in keeping the costs of leaving high and the costs of joining low. The winner-takes-all aspect of social media is well known. There is no price for second place. Yet new “platforms” keep emerging. It appears that however much one platform seems to be in the lead a new one can still emerge. Facebook supplanted MySpace but still felt compelled to buy WhatsApp. Now there is SnapChat. Many people are betting a great deal of money of what is going to be the next big thing. Few doubt that there will be one.
All of which suggest that while people have signifigant investments in which ever social media applications they happen to be using, they can and will move. This places an upper bound on what the user will pay (in cash or in privacy). If the terms are too exorbitant the users will move on that mauch more quickly.

XACML made portable with PAML

XACML has been pronounced dead. Repeatedly. And in truth it has never been much used. But I think it still has potential. The standard has been around for years (version 2.0 in 2005) and allows for quite a bit of flexibility. Role based and attribute based. wikipedia provides a decent run down on XACML is a superior resource for all things XACML.

Key for our purposes is the separate between decision and enforcement in XACML; The decision is made one place and enforced somewhere else. This permits the portability we’re looking for. There is nothing in XACML directly mandating online services. A PAML token should be usable for an extended period of time, and XACML allows this.

An XACML policy sample:

<Policy xmlns=”urn:oasis:names:tc:xacml:3.0:core:schema:wd-17″ PolicyId=”medi-xpath-test-policy” RuleCombiningAlgId=”urn:oasis:names:tc:xacml:1.0:rule-combining-algorithm:first-applicable” Version=”1.0″>
<Description>XPath evaluation is done with respect to content elementand check for a matching value. Here content element has been bounded with custom namespace and prefix</Description>
<Match MatchId=”urn:oasis:names:tc:xacml:1.0:function:string-regexp-match”>
<AttributeValue DataType=””>read</AttributeValue&gt;
<AttributeDesignator MustBePresent=”false” Category=”urn:oasis:names:tc:xacml:3.0:attribute-category:action” AttributeId=”urn:oasis:names:tc:xacml:1.0:action:action-id” DataType=””></AttributeDesignator&gt;
<Rule RuleId=”rule1″ Effect=”Permit”>
<Description>Rule to match value in content element using XPath</Description>
<Apply FunctionId=”urn:oasis:names:tc:xacml:1.0:function:any-of”>
<Function FunctionId=”urn:oasis:names:tc:xacml:1.0:function:string-equal”></Function>
<Apply FunctionId=”urn:oasis:names:tc:xacml:1.0:function:string-one-and-only”>
<AttributeDesignator Category=”urn:oasis:names:tc:xacml:1.0:subject-category:access-subject” AttributeId=”urn:oasis:names:tc:xacml:1.0:subject:subject-id” DataType=”; MustBePresent=”false”></AttributeDesignator>
<AttributeSelector MustBePresent=”false” Category=”urn:oasis:names:tc:xacml:3.0:attribute-category:resource” Path=”//ak:record/ak:patient/ak:patientId/text()” DataType=””></AttributeSelector&gt;
<Rule RuleId=”rule2″ Effect=”Deny”>
<Description>Deny rule</Description>

The enforcement point examines the incoming request and create a XACML request, which may look something like this.

<Request xmlns=”urn:oasis:names:tc:xacml:3.0:core:schema:wd-17″ ReturnPolicyIdList=”false” CombinedDecision=”false”>
<Attributes Category=”urn:oasis:names:tc:xacml:1.0:subject-category:access-subject” >
<Attribute IncludeInResult=”false” AttributeId=”urn:oasis:names:tc:xacml:1.0:subject:subject-id”>
<AttributeValue DataType=””>bob</AttributeValue&gt;
<Attributes Category=”urn:oasis:names:tc:xacml:3.0:attribute-category:resource”>
<ak:record xmlns:ak=””&gt;
<ak:street>51 Main road</ak:street>
<Attributes Category=”urn:oasis:names:tc:xacml:3.0:attribute-category:action”>
<Attribute IncludeInResult=”false” AttributeId=”urn:oasis:names:tc:xacml:1.0:action:action-id”>
<AttributeValue DataType=””>read</AttributeValue&gt;

The request is compared to policy and the request allowed or denied accordingly.

The enforcement point must have the capability to create a XACML request from the actual request, and be able to compare it to the applicable XACML policy. This is where PAML tokens comes in, as they can link the request with the policy that governs the request, by placing XACML policy inside the PAML tokens. PAML tokens are issued to users and the user is responsible for sending a token (or possibly more than one) that contains a XACML token that will allow the request. The issuer of the PAML token owns the data and includes in the PAML token the policy XACML containing the access control rules the data owner wants to enforce.

accepting PAML tokens

PAML tokens where developed to address the needs of a distributed environment. Providing the ability to make decisions at data access enforcement points without having to contact other parties.
This sounds dandy, but how exactly is it done ?

The high-level process is simple enough.

The devil is luring in the usual place.

The detail here is the digital signature on the resource item in question.
Any data item can be digitally signed, but it the data item is small the signature can be larger than the original data. There is scope for optimization here.
In the PAML token sample the digital signatures are the larger part of the total data. The balance will not be so skewed in more typical examples. But the fact remains that the digital signature in XML documents is very verbose.
This need not be the case where user data is being stored. Assuming the data item is database column with metadata, timestamps and such tings in the other columns. Additional columns can be added to contain only the relevant bits about the signature: Algorithm, Digest, Signature value. This would not take up nearly as much space and would be manageable. For optimization it would be helpful to also store the owners public key (not certificate). This would speed up processioning. When a user’s PAML Token has been accepted and the user’s request has been determined to be permitted by the token, the owners public key is known from the token. This key can be used as a mask on the data, added to SQL statements to further narrow the search in the database and exclude all rows containing data not governed by the PAML token. The user’s request is then applied to the remaining.

Big Personal Data

Reading this article in a Blog on New York Times today I kept thinking about the what was not being said in it. Data governance.
The assumption, not doubt valid, is that ever larger pools of data will be amassed; and suitable analyzed, solutions will be found to existing problems and fantastic new possibilities will emerge.
I don’t doubt the potential in “Big Data” and look forward to its exploitation with interest. But also with some trepidation. With great power come great scope for abuse. This is as true for technology as it is for people. Necessarily so, since it is people how design and use the technology.
The article mentions car sensor data and its collection. But what is this data and who get’s to use it ?
Going on to mention ever higher transfer speeds allowing larger data quantities to exchanged ever faster.

The term Data Lake is used to describe GE accumulation of sensor data. Siting that it permitted speeding up GE’s error detection by a factor of 2000.

With ever larger quantities gathered and ever faster transfer speeds and distributed analytical tools all the worlds data eventually becomes available in a Data Sea, ready for analysis.

The article argues that this is not just technically possible but also both the trend and desirable. The first two are clearly correct. The problems with different data structures and qualities are surmountable. As for whether or not it is desirable the issues are less clear.
That there are advantages to be had is obvious. It is also probable that the advantages are great enough to justify the expense involved. Indeed it would not happen in the commercial world unless there was money to be made. Governments have other objectives, social control including law enforcement, where the expense is not a primary concern, only technical feasibility.

In engineering technical feasibility is a function of resources expended. And the Big Data field is certainly seeing a great deal of resources expended on it, whatever the motivation. So what is possible in future will certainly expand from what is possible today.

Data governance has become a political issue of late. The relentless surveillance by governments of their citizens is becoming more widely known. Not all governments have the resources of the US or the determination of Chinese. But that is a temporary relief. The capabilities will expand and the costs will decline. In due course everyone can aspire to total surveillance.
At present only the richest and most powerful can access the whole data sea. Others can only access the part of it they gathered themselves. This article implies that limitation will fall away and everything will be accessible to those that can pay. But who gets the money? For now it has been those who accumulate the data that get the money. They incur the expenses; servers, storage and bandwidth cost money. The source of the data gets nothing. Certainly no money but in some cases a service. Social media run on expensive server platforms but the user can use them “for free” in exchange with surrendering all their data and any other information the platform can gather about them. Many consider this fair exchange. Even if this is so, it still in the users interest to control where the data is going, who gets to use it and for what. The stories of social media posts that have circulated outside their intended target audience and caused embarrasment are legion. The columnist Kathleen Parker advocated self-sensorship to a degree inconsistent with civilization.
It is unclear whether she honestly meant what she wrote or had simply disengaged her mental faculties before writing the piece. Bill Mahers response was withering and to the point. We can’t live in a world where the only privacy we have is inside our heads. Privacy matters also for those less exalted among us. Scanning social media before a hire is now common practice. And individuals go not get to decide what they need to “hide” – anything that may cause offense or raise the slightest question can cause later inconvenience or worse. Made even more intractable by ever changing social conventions.
Expect to see much, much more on this subject.

Some gatherers of user data have policies in place and tell the user explicitly that while they gather as much data as possible they do not pass it on to any one else. Or if they do, it is in some lesser, anonymized form.

That data is passed along at all suggest convincingly that the data has commercial value. As interconnections improve and data sales increase it is also clear that the source of this data are being underpaid; the data gatherer can increase their revenue per data item without incurring any additional source costs.

Persons who are the source of this data then have both a financial and a data governance issue (read privacy) at stake.

Social media users voluntarily submit to being guinea pigs to marketers. Their data has a clear commercial value and this is what they use to pay for the service.

In due course platforms will appear that also allow users to pay in other forms than just with their own personal information and data they provide themselves. “Pay small monthly fee and we will not sell your emails to your local grocery chain”.
Pay-for-service is available in social media, particularly in the more specialized subsets, like dating, but it is always either a pay or spy business model. Never a commingling of the two on the same platform. This is in most cases driven by technical considerations. It can be tricky to adequately safeguard the user data that is not available for resale. But see

Internet of Things is the next step. How are we going to handle that data. The Big Data providers have the datastore ready to take on the data and the tools to analyze it. But where are the tools to safeguard the individuals ?

trivial, secure email application, using command line tools

Demonstrating how to have secure email through the service.
Using text files and command line tools. Not for the novice users perhaps, simply a demonstration of capabilities.

To start with you download the code base (
Both the sender and the recipient need an addressbook xml file. In here is the information on the other party: name, public key and internet alias. The internet alias should be unique. It doesn’t have to be, for reasons that will be clear further down, but it saves bother if it is.
Both parties also have a properties xml file containing their own information as well as their private key.

The sample file for addressbook for the mail.

The war file is unziped into its own directory. Open a command line prompt, DOS or bash as you prefer, though the sample commands below will be for DOS.

This example is hardly user friendly. It is not meant to be. But put a GUI wrapper around the commands and it is as user friendly as you like. The point is to demonstrate the possibility of sending secure messages over plain HTTP through a generic service like The bit about generic is a key point. It is easy enough for the powers that be to block access to more specialized services not used by many people. But the more users a service( i.e. a website) has the less likely is it to be blocked: it has too many necessary functions. More over, and perhaps more significant: using an obscure service might attract attention in certain quarters. Attention that a user might not want. Using a generic service should help the user avoid that attention. Now, of course a generic service mostly won’t offer you what you need – which was why you used the obscure one in the first place. This is where comes it. It offers the service you program. is an on-demand service generator.
In this example we are using the API locally, having downloaded it from them. And are just passing the messages through their server. We could also have called the API on their server. But the point was also to remove the need of having to trust a remote server. It could be monitored for all we know. In this example the messages are encrypted before they are sent and can not be decrypted by anyone other than the intended recipient. Which is we why are not much bothered about by who might read our messages while they pass through the network and remote systems, and even use plain HTTP.

The contains sample xml files for a demonstration. There are also two txt files containing the DOS command to be used on Windows. If you prefer bash or another shell on Linux, the modification would be modest.

Both the sender and the recipient have two static xml files: one containing address book information, the other a properties file containing the private information of the party in question such as the user’s private key.

Sender address book, SenderAddressBook.xml

 <contact name="">
  <name>Recipient Lastname001</name>
  <publickey>-----BEGIN PUBLIC KEY-----\n
-----END PUBLIC KEY-----</publickey>

Sender properties file,

  <name>Sender Lastname002</name>
  <publickey encoding="none">-----BEGIN PUBLIC KEY-----\n
-----END PUBLIC KEY-----
  <privatekey encoding="none">-----BEGIN RSA PRIVATE KEY-----\n
-----END RSA PRIVATE KEY-----</privatekey>

    <encryption algo="AES" >
      <key encoding="UTF-8">TheBestSecretKey</key>


Recipient address book, RecipientAddressBook.xml

  <contact name="">
    <name>Sender Lastname002</name>
    <publickey>-----BEGIN PUBLIC KEY-----\n
-----END PUBLIC KEY-----</publickey>


Recipient properties file,

  <name>Recipient Lastname001</name>

  <publickey encoding="none">-----BEGIN PUBLIC KEY-----\n
-----END PUBLIC KEY-----
  <privatekey encoding="none">-----BEGIN RSA PRIVATE KEY-----\n
-----END RSA PRIVATE KEY-----</privatekey>

Sending a message

The flow for sending a message is described here

The individual processing steps follow below. The numbers correspond to those on the graphics.

# step 1
Author the message, enclosed within the ‘text’ tags, in Written.xml
Update the timestamp.
The “to” field must match the alias of an entry in the addressbook. The recipients deliberately obscure internet alias is retrieved from the address book using this alias. The obscure internet alias acts as the recipient email address but it’s uniqueness is not assured, unlike a real email address. But that uniqueness is not really required as only the correct recipient is able to decrypt the message anyway.
The “from” field is populated with the sender’s internet alias kept in the

During subsequent steps digital signatures are created and validated. To avoid errors it is best to keep line break characters out of xml documents.

Flatten the Written.xml document using the flatten.xsl stylesheet.

java -cp “WEB-INF\lib\xalan.jar” org.apache.xalan.xslt.Process -IN Written.xml -XSL flatten.xsl -OUT Written_flattened.xml

cat Written_flattened.xml > Written.xml

# step 2
# Apply XSLT to create a smart document.
java -cp “WEB-INF\lib\xalan.jar” org.apache.xalan.xslt.Process -IN Written.xml -XSL Written2Transmitted_sd.xsl -OUT Transmitted_sd.xml

# step 3
# Execute SmartDocument

java -cp “..\build\classes;WEB-INF\lib\servlet-api.jar;WEB-INF\classes;WEB-INF\lib\commons-io-2.4.jar;WEB-INF\lib\commons-codec-1.8.jar” com.any14.smartdoc.SmartDocument Transmitted_sd.xml > Transmitted.xml

# step 4
# Send the message to

curl -d@Transmitted.xml

# step 5
# Apply XSLT to sent message to create smartdocument used for creating a secure local copy

java -cp “WEB-INF\lib\xalan.jar” org.apache.xalan.xslt.Process -IN Written.xml -XSL Written2SenderStorable_sd.xsl -OUT SenderStorable_sd.xml

# step 6
# Execute SmartDocument

java -cp “..\build\classes;WEB-INF\lib\servlet-api.jar;WEB-INF\classes;WEB-INF\lib\commons-io-2.4.jar;WEB-INF\lib\commons-codec-1.8.jar” com.any14.smartdoc.SmartDocument SenderStorable_sd.xml > SenderStorable.xml

# step 7
# Apply XSLT to the secure local copy to create smartdocument for reading the sent message

java -cp “WEB-INF\lib\xalan.jar” org.apache.xalan.xslt.Process -IN SenderStorable.xml -XSL SenderStorable2SenderReadable_sd.xsl -OUT SenderReadable_sd.xml

# step 8
# Execute SmartDocument

java -cp “..\build\classes;WEB-INF\lib\servlet-api.jar;WEB-INF\classes;WEB-INF\lib\commons-io-2.4.jar;WEB-INF\lib\commons-codec-1.8.jar” com.any14.smartdoc.SmartDocument SenderReadable_sd.xml > SenderReadable.xml
cat SenderReadable.xml

Receiving a message.

The processing flow for receiving a message is described here

And the individual processing steps follow here.

# step 1
# User retrieves recent message

# URL encode the xpath part
# update the timestamp as required (see Written.xml) to get recent messages

curl > Received.xml

# step 2
# Apply XSLT Received2RecipientStorable_sd.xsl

java -cp “WEB-INF\lib\xalan.jar” org.apache.xalan.xslt.Process -IN Received.xml -XSL Received2RecipientStorable_sd.xsl -OUT RecipientStorable_sd.xml

# step 3

# Execute SmartDocument
# note: this step checks the digital signature. If the file has been on a windows system, there is a good chance that carriage return characters have been introduced , in which case the signature validation may fail because of this.

java -cp “..\build\classes;WEB-INF\lib\servlet-api.jar;WEB-INF\classes;WEB-INF\lib\commons-io-2.4.jar;WEB-INF\lib\commons-codec-1.8.jar” com.any14.smartdoc.SmartDocument RecipientStorable_sd.xml > RecipientStorable.xml

# step 4
check if message is already in “mail database”, and add it if not.

# step 5 – retrieve a meaasge to read it
# Apply XSLT RecipientStorable2RecipientReadable_sd.xsl

java -cp “WEB-INF\lib\xalan.jar” org.apache.xalan.xslt.Process -IN RecipientStorable.xml -XSL RecipientStorable2RecipientReadable_sd.xsl -OUT RecipientReadable_sd.xml

# step 6
# Execute SmartDocument

java -cp “..\build\classes;WEB-INF\lib\servlet-api.jar;WEB-INF\classes;WEB-INF\lib\commons-io-2.4.jar;WEB-INF\lib\commons-codec-1.8.jar” com.any14.smartdoc.SmartDocument RecipientReadable_sd.xml > RecipientReadable.xml