Category Archives: shared data

A Digital Commons

I was just reading Schneiers latest book “Data and Goliath” and was struck by a concept he mentioned: Commons. In the meaning of a digital commons, open and available for anyone to use; not under the control of any specific entity.
Usenet still exists but most online discussions are carried out on (corporate) entity owned websites and platforms. Entities that can and do exercise editorial control.
Anyone who has spent any time in newsgroups on Usenet know full well that it is the “wild west” out there. To the considerable chagrin of some.
A number of academic publishers have banded together to a repository for academic publications and named it (a) Digital Commons.
Not under any ones control and open to all. This satisfies some characteristics of a Common.
But going beyond publication of static documents, what is there. In a common there were many different activities possible: pasturing animals, gathering firewood and food stuffs. Perhaps a digital commons should allow for something more than making static files available to the public. Something interactive, something with some logic – i.e code.

Steganographic applications

Steganography is the practice of concealing something in something else. For instance hiding a text message inside a larger body of text. There is no cryptography. The security of the message, such as it is, is in the form of obscurity. That may be enough. Cryptography is computational expensive and required management of encryption keys. And it is no more secure that the keys employed. It appears that the better cryptographic algorithms are quite good and attacks tend to focus on the keys.
Attacking steganography would be different. Rather than asking Where is the key, one would ask is there anything there at all ?
There are applications for conducting steganography. Capable of embedding a message inside an image for example. But this is not about conducting steganography.

What I am after here is for application itself to be steganographic. To conceal the application, not inside another application, like a trojan, but altogether.

What would it mean for an application itself to steganographic.

Imagine an application as a book. And using an application as equivalent of reading a book from cover to cover.
Lets further imagine that we cut the the spine of a number of books and shuffle the loose pages together like so many decks of cards. Assuming uniformity of paper quality and page size, layout and font, and no metadata on page – having a loose page tells you little about which book it was from. Or if it came from any book at all.
If you have access to the full text of the original books that were shuffled together, you could do a text comparison and quickly identify which book the page came from. But in the absence of that there would be little to go on. Analyzing the text on the pages would yield some clues and more or less probable guesses could be made. Particularly if the books had comparatively few pages, i.e. if any one page contained a large fraction of the complete text.
The obfuscation could be further improved by cutting the pages up into individual lines of text.
Identifying a line as belonging to only one particular book would be difficult in many cases.

On a side note: what difference does it make cutting the pages up along the lines or across them, for the purposes of putting back together the original page. Again assuming uniformity of paper, layout and font. I venture that a line of text is harder to mach than a column of text. There is a certain balance here. While a line contains more meaningful information, a whole sentence perhaps – it also has fewer markers, fragments of words, that would match it to the strips of paper on either side of it. You gain information in one way and lose it in another. Specifically the anchoring within the larger body.

But the analogy we’re pursuing here is application functionality as reading a complete text: whole book to page to line.

Alright, so we have a large pile of individual strips of paper. Now, how do we read the book?

The premise being that the application, the book in out analogy, is concealed among a large body of code. Where only those knowing exactly what to look for, can find it. The secret in steganography is not the decryption key but how to find what you are looking for – the detection key if you will.
A steganographic application would have to have a detection key. With which you can locate the application and without which you can not. A link to first page, in the book analogy.

In a book you can just turn the pages but here all the pages are separated from one another. How do you get from page to page – or from one line to the next, in the more fine grained analogy.

Each page has a link to the next page but that link is only usable to someone who has a key. Else anyone stumbling on a page would discover the whole book. The idea was that a page does not guarantee a book. Given a pile of pages you can’t know how many books are in it or even if there are any at all. All you have are disjointed pages. Or code.

sailing the data ocean

What if there was a way for access to data could be authorized everywhere. If you were authorized to access a piece of data you could get access to it wherever it happened to be located.

This is not the way things work at the moment for sure, but if it could be made to work in a convenient way, what should it be like ?

When the first web browser arose some 20+ years ago. Static html pages and other media and document files where available by calling a URL over the HTTP protocol. Security was added – at first pretty coarse grained: If you were logged in you could access pretty much anything. It got better.
Comprehensive tools became available to centrally manage all web access to any document, with the finest granularity. The writing of the rules of who should access to do what, when and where, could be delegated out to those who actually were in a position to know.
But crucially, these tools could only manage access to stuff directly under their control. Often operating in a reverse-proxy mode intercepting HTTP traffic. APIs were available through which other applications could tap into them to take advantage of access control rules contained in them, to do their own authorization. In this way the data under the control of a unified set of access control rules could be made corporate wide. Access to all of a data in a corporation being governed by rules maintained in one place. Everyone would play together in the same data security pool.
In practice this never happened. (re-)Writing applications to take advantage of the API of the chosen security software platform , was too expensive. Other tools emerged to export the security rules from one software platform to another, leaving them to do their own enforcement through their own rule infrastructures. This didn’t work very well because it was too complicated. Rules are fundamentally about meaning, and meaning doesn’t translate easily. Never the less this was an attempt to federate authorization.

Data protected by the same access control rule infrastructure is part of the same pool. A database is a single pool. It has its own internal security governing access to individual pieces of data contained in it, but has no reach outside. The database maintains it’s own list of who gets to access which column in what tables.
A server has it’s own internal arrangement for governing access to the data in its own file systems. It may also have access remote file systems. Some remote file systems would be on other servers, which would govern access (NFS, FTP, Samba etc.) and would therefore not be part of the server’s own pool.

If authorizations could be federated between pools all data would exist in one big virtual pool.
A virtual pool made of multiple physical pools; individual databases, file servers etc. At present this is difficult as there may be user federation between some data pools, but each pool has it’s one authorization, it’s own way to enforce access rules. The rules in one are not known, or directly enforceable in another. There is no federated authorization.

Lets further suppose that any piece of data in this virtual data pool, data ocean really, is accessible over TCP with a URI. The URI may have various formats depending on what type of physical pool is being addressed.
For example, this would be the syntax of an URI accessing a directory (LDAP) store

ldap[s]://hostname:port/base_dn?attributes?scope?filter

And this to access a individual file, using HTTP(S)

https://host:port/path

Access to one of the secure web reverse-proxies mention above, would look like this too.

The would be many others. Note that the username and password does not appear. There would not be any prompting for this information either.
Access control would be through PAML tokens, passed in the headers. A SSL handshake would take place to establish the requesting entity’s authorization for the tokens presented.
All physical pools are defined by the entity that control access to it, and all of these entities, be they LDAP server and file/web server in the URI examples above must be equipped to handle PAML tokens to verify the authorization for the request. Through the acceptance of these PAML token the pools together form a virtual data ocean. Any application can call on data anywhere else and present PAML tokens for authorization.

This leave quite a bit of scope for application architecture. The use of a PAML token require access to the private key of the user to which the PAML token was issued. Which means that if a user is engaged in a transaction with an application and this application needs access to data kept somewhere else on behalf of the user, the application can only present its own PAML token, not forward those it has received from the users. The user must at a minimum contact this other data store directly and engage in a SSL handshake. This way the user’s ownership of the public key is established for the benefit of the data store. The application can then pass the PAML token received from the user on to the data store and the store would now know that the PAML tokens are OK to use; or the user could make the data retrieval directly and pass the data to the application that needs it. Sort of like a data federation.

Note that PAML token are tied to data, not any particular host environment. Among other things this means that the requesting client may send the server a considerable number of tokens in order to establish authorization for all required data. The server will grant the union of all these tokens.

IT security shapes business models

text slightly reworked from talk given to OIC, Oslo Dec. 2014

In his article The Nature of The Firm, Ronald Coase proposed that a firm forms and grow while transaction cost inside the firm are lower than outside it. I.e. a firm can do something cheaper and better in-house than going out in the marketplace for it.

OK, that’s a company, but what about their data ? It seems that something along those lines are going on there too.

Most of us have come across this in the form of federated login. You’re about to log in at a web application and are given the option of login in via Facebook. Or Twitter. OK, what happens if you click on the Facebook button ? Clearly something, since you enter the application in question and are now known as yourself.
This is where federated security comes in. If you where already logged in to Facebook, a message was sent to Facebook were your valid logon session was used to prepare a special access token. A token that the new application could use to establish a new session for you. If you where not already logged in to Facebook, you could log in now and the same thing would happen.
In any case you log in only to Facebook, and other web sites take advantage of this to log you in to their site too.
Acting in this capacity Facebook is called an Identity Provider. And no, that is NOT a deliberate pun. A company, or application that uses such a federated login, is in the jargon called a service provider.

Returning to Coase again. The application can have it’s own login and user database. Doing it in-house. Or you can go out in the market and contract for this service, as he suggests. Allowing login through Facebook.
OK, how much would such an external login service cost in the market? As it turns out, not very much.
For one thing, the marginal costs are low. Once you already have the users and their passwords in your database; adding a token exchange where by other applications can send their users over for authentication, is quite cheap.
Not only is it cheap to allow third parties use you login process, there is also revenue in it. You can now guarantee that the users have the same userid across platforms. All kinds of people would be very interested in that, and pay good money to ensure that it happened. And not just the NSA. Advertising and other businesses who make their money by analyzing consumer behavior likes federated login a whole lot. If you allow other companies to login their users via your platform, you control the origin of this data.

Considering the economies of scale being so much in favor of the federated login and those that make it happen. Why aren’t all logins federated ?

Well, more and more are.
But the savings may not be that large on the service provider side. The in-house part of Coases analysis. Just because you leave the authentication of your users to someone else doesn’t mean you can then forget about them. There is also that bit about authorization. What are people allowed to do. From a security point of view the whole point of identifying the users in the first place is so that we then can then look at our rules for what the user should be permitted. If we don’t know who they are we don’t know what they are allowed to do.

So you still end up needing a user database. And a database of rules for what the users are allowed to do. And maintain those things. Particularly that last bit can be complicated and expensive. Once you have all those things, handling the login isn’t that big of a deal.

From a revenue point of view things are a bit different. Identifying the users has it’s own value – selling information about the user to third parties.
You could limit yourself to just logging the user inn, so that you may accurately track him, but not bother about doing any authorization.

And this is what we see a lot of. Plenty of media sites accept federated login through the likes of Twitter and Facebook.

They only care about getting you name accurately. Their revenue comes from knowing it.
There is no fine grained access control or authorization going on. It is not worth it to them. Disqus.com is an example of this. And your email address is becoming your universal user id another.

Clearly Coase’s balance between transaction costs in the market place and in-house, are still very much in play.

This suggests a possible future.
If we think about a corporation as being fundamentally an agglomeration of data and business practices. Where interaction between businesses, trade, is an interaction between these agglomerations.
Returning to Coase’s starting question, the boundary between a corporation and the market and why they are where they are.
What defines the boundary of these data agglomerations? The authorization rules. The data might be located anywhere. At Dropbox even. But if you write the rules governing access to it, that data is part of your agglomeration, wherever it is. If someone wants to interact with your data or your business processes, you must write the rules to allow it to happen. Your rules map out your agglomeration. Outside the reach of your rules is outside your boundary.

Which sketches out the premise that the limit on the size and shape of a corporation is really a limit of information technology. Specifically the limit of authorization technology.

This is all very well, but what does that look like in real life. What are the business strategic implications of such limits. If they even exist.

There are grounds for suspecting that they do.

We have technical standards for federated login. Therefore we have the Facebook login example. A user known from other places on the web is more valuable on the consumer profiling market than one know just to yourself. Both parties gain. The consumer not so much.

But we do not have standards for federated authorization, much less any off-the-shelf technologies for it.

What should we expect to see if federated authorization were possible?

Truly distributed content for one thing.
At the moment digital content is licensed, which is another way of saying it is rented. Meaning that the right to authorize access to it has been delegated to the licensee for a limited period of time. And the licensee writes the authorization rules for the duration. A temporary data transfer from one agglomeration to another. The data is not really distributed, certainly not by you. Someone else distributes it on their own behalf.
This disintermediation is risky for both parties. If the owner overcharge for the license no one will buy one. Undercharge and you leave money on the table. For the licensee the risk are the reverse. Overpay and you lose money. This makes the distribution process contractually complicated. Made so in no small measure because there is no way to federate authorization. These risks mean that there will be much less of this business than there potentially could be.

So the idea that buying a ticket to a movie would get you a free repeat on Netflix is, as things currently stand, out of the question. Despite the obvious advantages for all concerned. The movie companies and Netflix are separate agglomerations.

And it’s not just media. All kinds of content. Medical records. It is impossible to have fine grained access control to the individual data items in such a record after it has been compiled. Therefore its distribution is exceedingly complex and cumbersome. If you can see any part, you can see the whole. If access to an individual item could be controlled wherever the document was, it would be simple to distribute it. Those that needed to, and had authorization to see an individual data field, could see it. Without always having to be cleared to see the whole thing. As it is,
we now have checkout clerks in pharmacies having access to much your medical records because they need to verify one small item in them.

another Silk Road takedown

here we go again… The powers that be have take down some evil doers hiding behind TOR. I’m must ask pardon for being a bit jaundiced about the hole thing.
Clearly there are people doing undesirable things taking advantage of the protection offered by TOR. but TOR has a legitimate purpose and the support behind TOR is impeccable. But TOR is not perfect, specifically it is not fool proof. User error compromises it.
But I’m wondering is there is another way for the vendors on Silk Road to conduct their business. Leaving aside the problems of payment for the moment: BitCoin has it’s own weaknesses.
Concentrating merely on how two parties can get in touch with each other. Were A is looking for something that B has to offer, put very generically.

If my understanding of the attack against Silk Road is correct: It is based on having control over a significant portion of the onion routers. Sufficient for being able to establish a pattern of traffic from yourself to a particular .onion address.
Each request packet is routed every which way by the TOR protocol. But with control of enough routing nodes a pattern will never the less emerge since the request packets must eventually end up in the same place: Where the concealed service is hosted.
This last bit is true even without implementation errors. DNS leakage and the like.
What if the hidden service is not hosted in just one particular server, but on many different ones. And using the service will involve traffic to many different locations. Is this possible and whould it make a service concealed behing a onion routing scheme undetectable ?
That depends. If the hidden service is hosted on a fixed set of servers rather than just one, there is no real difference. Just that more traffic now needs to be analysed to be able to pin-point them. This could be counteracted by moving the service often, but that is a defence mechanism independent of how many host the hidden service is using. And controlling more onion routers will help the tracker.

Trusted VPNs is of course an option. Where both the client and the hidden service use VPN to tunnel to some part of the internet. The tracker is then left with finding a proxy rather than the real host. It is still possible that the tracker might use traffic analysis to get from the proxy to the real host. As long as the service stays put long enough.

This suggests a possible course of action. Can the hidden service be dynamically located ? Perhaps even randomly.
I think I’ll work on that. A facinating challenge. Watch this space.

computing when you don’t trust the platform

This appears to be virtually contradiction in terms. In IT security racket we have always stressed the importance of having physical control of the hardware. Anyone with physical access to the device could gain root access to the system and from then on do pretty much anything that wanted.
By extension we could only trust the application, and the safety of our data contained in it, if we also trusted the physical safety of hardware.
The Microsoft verdict is interesting in this regard. If the cloud vendor is US company, a US company own the hardware and it is thus subject to US jurisdiction regardless of there it is located. At the moment we do not know if this verdict will survive a supreme court appeal. But in any eventuality the conundrum remains: Can we have trusted computing on a untrusted computing platform ?

I’m going to try to demonstrate that this is possible for a server application.

Clearly a cloud based web application constructed to work on an untrusted platform would be very different in it’s internal workings from how things are generally done now.
The working model that I propose to pursue will be a web service with thick clients. The web service will run on a cloud vendor subject to US jurisdiction, AWS. I will demonstrate that the sensitive application data will sit on the AWS in encrypted form, at no time be decrypted on the AWS nor have the encryption keys at any point be on the AWS in clear.
The thick client will be an Android app.

In the event that I am successful using a thick client (app), I will attempt the same using a thin client (web browser).

untrusted computing base

Writing this as Microsoft has lost an appeal against a court ruling that it must hand over data stored in one of it’s overseas data centers to a US authority as stipulated by the search warrant issued by a US court.

This was likely to be the verdict for a while now, and will cause problems for other US based IT cloud vendors hoping to attract business outside the US. Not as much as some may fear, as most consumers of IT services are quite oblivious. But a few, and perhaps much more that a few will have concerns. Possibly decisive ones.

We implicitly trust our IT to do what we want it to, and nothing else. When all of the IT infrastructure was in-house this was easy enough. As more and more of our IT has moved to external vendors or to the cloud this trust has become more explicit. The trust was always there. Malicious insiders is a problem of long standing and moving IT away from the their grasp has reduced the risk they pose, but in exchange for other, less clear, risks. One of those risks has been made apparent by the Microsoft case.: External jurisdictions.

The case of an employee of a Swiss bank who stole information about account holders and sold that information to various countries tax authorities, is a notorious example of how IT insiders  can harm the business and extend foreign jurisdictions. That case involved a plethora of criminal behavior, most obviously theft and the buying of stolen property. With the Microsoft verdict the tax authorities can achieve the same results without resorting to enlisting the aid of criminals.

If the bank in question had used an external data center owned or controlled by an entity in a compliant jurisdiction, the tax authorities could have gotten a search warrant against the data center and obtained the desired data; a warrant  they would not have been able to obtain against the bank itself.

Clearly the legal waters have been muddied, and only thing that is clear is that we can not trust our computing infrastructure in quite the same way that we have become used to.

 

 

 

the case for a virtual application server

Virtual hardware have been around for decades. Mainframes have been able to host multiple operating systems and IBM is doing good business selling boxes capable of hosting thousands of linux instances.  The economics of such co-location is very compelling.

Each user of such an virtual OS instance thinks it is alone but is in fact sharing with others. If done well it reduces the amount of hardware sitting idly waiting for work. Something which was distressingly prevalent back when every single server application needed its own hardware.

Times have moved on. To the regret of SUN Microsystems which was a leading casualty of the move from physical to virtual hardware. But all the box vendors have taken a hit.

But still it is the case that applications tend to have their separate virtual box. The argument spring from several points. Developers develop from particular versions of software libraries and so depend on exact versions of application servers. A virtual OS instance tend to have only one instance of an application server and so you end of with one OS instance per application. Plus more for redundancy and DR.

The off-shot is that to run your application you end up also running a stack of other software too. Software which is not part of your business logic. Without doing any bit of useful work, all this software still needs to hum along and consume precious hardware resources. And that’s with virtualization hopefully packing everything efficiently on to the hardware so as not to waste it.

Then there is all those system instances that need to be configured and maintained.

The virtualization of hardware saved a lot of space and service work in the server room. I recall having to go in there to load CDs and reboot servers. With virtualization this is all gone.

Still I have to work with operating systems and applications servers.

I want to find a way to get rid of those too.

Just like hardware was virtualized so that multiple application easily could run on the same hardware.  It is time for application servers to be virtualized also.

How should something like that work ?

A virtual application server. It should be able to run any number of different applications simultaneously. Currently it is good practice to have separate AS instances per application. For a virtual AS this no longer makes sense.

Physical hardware can run any number of virtual OSes (resources permitting) so the virtual AS should be able to run any number of applications.

This would be something quite different from a java application running on Tomcat. A collection of java classes, supported by libraries. But which versions of those libraries ? The developer depends of very specific versions. Is not accidental that one Tomcat instance per application is preferred. The whole programming model demands it. So something new is needed.

One can argue that with hyperlinks the whole internet is such a virtual application server, in that all functions are available everywhere over URLs. Web services essentially work along these lines. But that still leaves the problem of all the application server instances. One for every web service. But if we could have one application server that could act as any web service. Simultaneously.

This ties into an objection I have against AWS. Setting up an application in Amazon’s cloud infrastructure requires configuring instances of OS and and application server. And selecting the size and version of those instances. This suggests to me more scope for virtualization. True, it gives me a precise level of control, much like what I’d have if I hosted everything myself. But I don’t want it.

When I drive a car I don’t want care about the surface of the road. I just want to get from point A to point B by car.

There is another trend going on now. Thick clients. Or apps, in the form that that most us encounter. Web services which have been consumed by other applications rather than thin clients have been around for a while, but the massive profusion of apps on tablets and cellphones is recent.

Many of these apps are stand-alone; There is no server component to their functionality. An App version of Solitaire can be solitary in every sense. But many, perhaps most do have it. So to create an app is to also create a server-side application to go with it. Possibly one server application for every app, maybe more; but in many cases one server application can serve multiple apps. Googles API is an example of this. In any case a lot of server applications are required, with functionality tailored to the requirements of the app. The app designer can avail themselves of API s, like that of Google mentioned above or other forms of more or less generic web services. But in all those cases the app designer is constrained by someone elses design. To not be limited the app designers will have to have a server application completely of their own design.

Here the virtual application server can come into its own.