Monday, January 25, 2010

What is zkBox?

In my previous post I was giving a short insight on how the zkBox idea appeared. Now I’ll try to refine the concept a bit and describe even more what zkBox is meant to be used for.

Although the concept was very simple I could not find any easy to use implementations of it. The idea had everything it was needed to have great potential. In short, having a host proof storage platform on top of which client applications might be easily developed sounded very appealing to me.

Some might wonder what that “host proof" storage is. It’s about having a storage back-end designed in such a way that no one, but only the owner of the data, is able to read it. The answer is no-brainer: just store everything in the encrypted form. Add to this one extra ingredient, the fact that server-side one should not be able to obtain the user’s encryption key (this is where a “zero-knowledge password proof” kicks in, I’ll present this later) and voila, you have a secure storage solution in the way that even if the database gets stolen, the attacker will not be able to do anything with the data.

Architectural thoughts
After endless analysis sessions, things were starting to connect to each other and the shape of a complete solution started to emerge.
  • It was becoming clear that an online solution is a good decision (e.g. everything is getting connected nowadays, have a central point of maintenance, etc...), but room should be left for the possibility to go offline, with synchronization afterwards, when needed (e.g. HTML 5’s offline storage, Google Gears, etc...)
  • The main functionality of the system is to keep the data encrypted and accessible only to one user by a given key, so the system started to slip somehow into the key-value storage systems realm. That’s it, no complex database schemas, no relations, no graphs; only key-encrypted value data accessible to the user. This is limiting for sure the type of applications that can be built on top of zkBox, but, depending on the project’s impact, the philosophy might be changed in the future to allow something like “object sharing” functionalities (e.g. via shared keys). However, in the first version, the stored objects will be accessible only to one user
  • We are discussing about a hosted solution, so it seems naturally that the service will have to be exposed via an API; the most likely client to be used is obvious a browser scripting language; so a JavaScript client must be implemented from the beginning
  • Many computations, like the heavy CPU usage during encryption, will be performed on the client; this has the advantage that there will be less pressure put on the zkBox server since the encryption will be a distributed process across the clients, but it has the disadvantage that sometimes some clients are not powerful enough. However, the disadvantage tends to get away as the technology evolves and the clients are getting faster and faster. And the moment of writing this article, the blazing fast WebKit engine (e.g. it’s embedded into Chrome or Safari) or the JavaScript engine from Firefox can do heavy computations without disrupting the users experience but other browsers, like Internet Explorer, are not there yet
  • Being a key-value storage, there should not be a big problem to have an architecture that it’s truly scalable and reliable; there are already nice solutions out there, varying from ordered key-value storages (e.g. Tokyo Tyrant or MemcacheDB) or document oriented storages (e.g. MongoDB or CouchDB) to already hosted solutions in the big clouds (e.g. Google’s BigTable or Amazon’s Simple DB).

Main goals
The above are only a few points presenting how zkBox should work, what zkBox is and what it’s not, what should do and how. Some initial ideas were improved, others were postponed or new functionalities appeared during the first months, but the main goals of zkBox were always the same:
  • To provide to developers a solution that they can use from their applications to store securely their users’ data
  • The design should be in such a way that the storage provider is not able to see the data that it’s stored
  • Moreover, the application that is using the zkBox storage should not be able to see the data that the application users are manipulating
  • To have an anonymous authentication mechanism
  • To be a scalable and reliable solution, preferable built on top of existing (cloud) architectures

The implementation behind this idea is, again, not rocket science and, in short, can be easily presented in a few steps, like:
  • Authenticate to the storage
  • Encrypt some data with my secret key
  • Sent the encrypted data to be stored on the server

What’s challenging is to have the project as a whole, everything to be in the right place, all the pieces functioning together, so that someone who wants to use it when developing, for example a personal finance application, will just write in JavaScript a few lines like:

// John is logging in into the application
$zkBox.Authenticate("john", "my secret password");


// store some data into the system
$zkBox.AddObject({ spent: "47$", when: "11/26/2009", description: "dinner with Alicia"});


Only two lines, that simple. Everything else is happening behind the scenes: encryption, caching, authentication, authorization, redundant storage, etc...

In the next post I’ll talk in more details about the technical solution chosen.

0 comments:

Post a Comment