Amazon has a webservice for a simplified database access called SimpleDB : you can store anything you want in the cloud.
SimpleDB : a simplified database
Unlike relational databases, SimpleDB just gives the possibility to store heterogeneous informations in the same table.
Here is what structure SimpleDB provides :
- Amazon gives you multiple databases : they are identified by the country they are available in (for instance : West-Europe, Asia, West-US, …).
- In thoses databases, you can create up to 250 tables, called domains.
The way the data is stored is a bit confusing at first look : to store data, you give it a Name, and then its attributes. You can even give more than one value to the same attribute. There is no fixed model for one table.
Data example
Take a list of people, with their age and computers : we can store data using the following structure :
- Name => John, Attributes => { Age => 16, Computers => PC }
- Name => Paul, Attributes => { Age => 19, Computers => { Apple Mac, Apple iPhone } }
SimpleDB and Select
The database has one unique format which is text. Thus, you cannot deal with data the same way you can with MySQL, for instance.
Data Format
As I said, data are strings. But you can still store binary data easily.
Compare
That’s where difficulties are : if you have to compare different data attributes (like numbers or dates), you have to format your data so that string comparison will work on your data. Indeed, the lexicographic order is used to compare data (ASCII strcmp).
Here is how you have to format your data if you want to used less < / greater > operators :
- Strings : no issue !
- Numbers :
- They all have to be positive (because lexicographic order does not understand that -5 < -3, because it is equivalent for him to 5 < 3 !). This can by done by adding an offset number. This number should be greater than abs(min(values)). If we want to store numbers between -1000 et 10000, we can add 1000 to all of them : then we will deal with numbers between 0 and 11000.
- You have to format them using zero-padding, so that they all have the same length (as strings). You have to add zeros “0″ to the left of the number, otherwise you will have 5 > 10 (with lexicographic order). In our example, they all have to be 5-length strings, so that 00005 < 00010.
- Dates : you can use timestamps, but it’s not recommended : you will have the same issues you have with number (if you deal with dates prior to 09/09/01). What you can use is a MySQL or ISO 8601 format, which comply with lexicographic order. Example : 2011-11-30 10:00:00 < 2013-01-01 00:00:00.
SimpleDB Group By
If you need Group By and aggregation functions (avg, sum…), then SimpleDB is not for you : SimpleDB does not have aggregation. The only available function is count().
Performances and real-time
After a small benchmark of inserts in a nearly-empty tables, I’m not convinced : 500ms for a simple insert from outside Amazon, 200ms from the inside (EC2). That’s not terrible…
Moreover, if you need your data right after the insert, then again, SimpleDB is not for you. If might take up to a few seconds before the data to be replicated on other servers : SimpleDB will give you a non-updated result if the select is too close to the insert.
Then, why use AWS SimpleDB ?
SimpleDB is not made for most of applications ! But in some cases, this can be a good choice : no size issues, a high availibility, replication…
There’s a lot of cons, and there are more than what I just said. But one good application is log system : you store data you do not need right after, and you can store extra data and filter this data very easily.
Go further
“SimpleDB is not made for most of applications” — Yes, there is a good article — One size not fit for all.
http://perspectives.mvdirona.com/2009/11/03/OneSizeDoesNotFitAll.aspx