Amazon S3 – Backstory for Nerds

Note to non-nerds – back to English tomorrow; you may want to skip this one.

Here’s the problem: My broadbandwiki browser application does two things for users: it shows them what broadband access people around them are using and it enables them to add themselves to the map so that their information can be helpful to others. Clearly the data the users provide needs to be stored. However, only some of this data is meant to be visible to other users. Street addresses, for example, are used to generate latitude and longitude but are not meant to be publicly accessible even though they may later be used for engineering studies.

Controlling who has access to what data would be relatively straightforward if there were an application running on a server somewhere between the app running in the browser and the Amazon S3 servers where the data’s stored – but there is no such server application in this case. Nevertheless, the needed security CAN be maintained using available S3 tools.

Data on S3 is stored in buckets. The model is simple; you put objects (lumps of data) in buckets. Each object has exactly one key by which it can be retrieved from the bucket. Objects can also have reasonably elaborate headers.

You, the owner of the S3 buckets, get to decide who can read or write (or change the permissions) on a bucket by bucket basis. Each object also has an object control list (ACL) associated with it which it does NOT inherit from its bucket; the ACL is set when the object’s created and may be changed later by those who have permission to change it.

So I set up a bucket called broadbandwiki for my beta; anyone can read this bucket but only I can write to it or change its permissions. Reading the bucket, however, doesn’t mean reading the contents of the objects in the bucket (they have their own permissions). In practice, the right to read the bucket means the right to read the index of objects stored in the bucket which includes keys and creation dates for objects but does NOT include headers or payload.

Offline I used my secret key (which should never be transmitted) to generate very specific permissions to write a very specific type of object to this specific bucket. The permission restricts the access policy of created objects so it can’t be used either to create objects which are user accessible or which I can’t access. These encoded permissions signed with my secret key are safe to imbed in a browser page and to transmit.

The keys of the objects created and stored by the browser app include all the data which is supposed to be visible to ordinary users. The payload of the objects contain the data meant to be protected. When a user launches the browser, it reads the index of the broadbandwiki bucket (which anyone can read) and uses that to put pins on the map representing data supplied by prior users. A beneficial side effect is that there are usually a thousand index records returned per read of the directory. Much cheaper to get the data most users need out of the index than read each of a thousand records (because Amazon charges for GETs).

When users put their own pin in the map, the presigned permission is used to make sure this is only the kind of object we want here and that other users will not have access to this data.

Problem pretty much solved. The protected data – actually all the data – is retrieved by an administrative utility – whose user must know and input the secret key – but which never transmits that key. The administrative program converts the data to an XML file before downloading so it can be fed into nearly any analysis tool. Excel works fine.

An invaluable tool for anyone doing S3 development is a free Firefox addin called S3 Firefox Organizer. I did donate to the tip jar, though.

More on the broadbandwiki project is here.

Nerd tips on using AJAX GET, PUT, and DELETE with S3 are here.

Economics of S3 and possible implications are here.

And the code for this browser app is here.

Amazon S3 – Backstory for Nerds – Part 2