Focal Point: 2014

Tuesday, May 13, 2014

Distributed Datastores - let's take a look under the hood

This one is continuation from my last post where I had looked into various alternative options to traditional RDBMS Databases. In this post I will cover some of the basics and go over the factors that influence the choice of a datastore in general and NoSQL Databases in particular. I will also cover the trade offs associated with a choice of a datastore.

Fundamentally, a Database is a specialized software system that allows you to write/store ( i.e. create, update, delete), read, and even do some amount of processing of the data e.g. executing the aggregate functions.

In a world dominated by RDBMSs, Databases are expected to be ACID compliant, in fact, a must have & an important measure of Quality. This is the case with all the RDBMS and they have been doing that job fairly well for many decades. So, what changed recently? To the core, there are really few handful needs that became very important -

Increased Complexity of relationship
Need for Flexible Data Structure
High Availability
Scalability (typically referred as Web Scale)

Increased Complexity of relationship between entities is handled well by Graph Databases. Typical applications include recommendations, social network etc.

Document Databases do exceedingly well when it comes to supporting Flexible Data Structures. Column Family Databases also provide some amount of flexibility, each row can have a different set of attributes. However, in this post, without getting into further details on those factors, I will shift the focus on last two points and explore how various parameters really influence the choice.

So, how does anyone achieve High Availability (HA) for any system? By building redundancy into the system and databases are no exception, they create replicated failover nodes. Failover nodes are exact replicas of the master node and remains passive unless required. Usually, Databases ensure HA but the challenge of ensuring HA is different when it comes to distributed, partitioned databases. Second, it is one thing ensuring HA against a node or machine failure and it's entirely different thing when it comes to ensuring that at no point DB should be unavailable should there be a Network, Machine, Power or any other failure e.g. Data Center goes down. Typically it is achieved by putting the replicas across different Data Centers spread over different geographies and those replicas are not offline. This is also known as Geographically Distributed High Availability (GDHA). Thus Network Partition tolerance becomes critical. Not all databases support GDHA. Note GDHA is more than Disaster Recovery (DR) where in the replicated nodes remain offline and used only when any disaster hits the master node. Usually the focus of DR Systems is not limited to Databases, they kind of keep the entire stack ready.

Other big issue is really about Scalability. How much a database (read RDBMS) can grow? It can grow as much as the largest machine will allow it to grow. But what if you hit that ceiling too? Obvious answer would be to put a second machine. That's correct, but then can the Database still meet the important quality measure called ACID or can be made highly Available when some of the operations (i.e. reading, writing or processing of the Data) are happening in distributed systems? A simple answer is NO and that's the time you start looking into trade off matrix. You take a second look into the operations as discrete activities and take a call on what is critical for your business and what you can give up.

Before we go any further lets put the definition of ACID for reference:

Atomic: Atomicity refers to the ability of the database to guarantee that either all of the tasks of a transaction are performed or none of them are.
Consistent: The consistency property ensures that the database remains in a consistent state before the start of the transaction and after the transaction is over (whether successful or not).
Isolated: Isolation refers to the requirement that other operations/transactions cannot access or see the data in an intermediate state during a transaction.
Durable: Durability refers to the guarantee that once the user has been notified of success, the transaction will persist, and not be undone.

Databases achieve these by effective handling of Concurrency i.e. how many person can act or modify the state of the data. Here are the various Concurrency handling mechanisms/options:

Lock or Exclusive Lock or Pessimistic Lock. Some databases allow only one user to modify a record, row or document at a time. Preventive.
MVCC (multi-version concurrency control) or Optimistic Lock is a mechanism that guarantees consistent reading. It allows multiple users to modify a record with multiple conflicting versions without acquiring an exclusive lock. However, it puts a check when it comes to committing the changes into the database. At that point it allows a successful commit only for the first user to attempt.

Locks ensure changes are either committed or rolled back in case of a successful transaction and it rolls back everything in case of transaction failure.

Next, lets take a look at Replication i.e. copying the datastore to a different node. High Availability is achieved by replicating a database node. Replication comes in two forms:

Master-slave replication makes one node the authoritative copy that handles writes while slaves synchronize with the master and may handle reads.
Peer-to-peer/Master-Master replication allows writes to any node; the nodes coordinate to synchronize their copies of the data.

Master-slave replication reduces the chance of update conflicts but peer-to-peer replication avoids loading all writes onto a single point of failure. Other important factor to consider is when data is written on a node, it takes time before it is reflected on all the nodes. You can do it either synchronously or asynchronously for a particular transaction. Your choice will determine whether your database supports Consistency or Eventual Consistency. In case of Peer-to-Peer replication, the same record can be modified by two different transactions on two different nodes. How a databases handles these scenarios is also influenced by the choice of Consistency viz-a-viz Eventual Consistency. There are specific databases that excel in one usecase over the other. I do plan to cover that in my next blog.

While Database replication primarily helps to handle failover and ensures Higher Availability, it also helps Scalability. Master-Slave replication works well for Read Scalability while write operations can take place only on the Master node and Slaves then syncs up with the Master either synchronously or asynchronously. Peer-to-Peer or Master-Master replication helps achieve both Read and Write Scalability as both read and write operations can take place on all the replicas. Here, all the replicas will have the same copy of Database. This is traditionally known as Scaling Up or Vertical Scaling where a Database system can scale as much as a node can grow. This should work well for most of the systems. However, for infinite scale or web scale, one needs to go for Scale out or Horizontal Scaling where data is Partitioned or Sharded across multiple nodes. This allows Databases to grow infinitely just by adding new hardware (usually commodity hardware). Note each partitioned node will have different set of data and may have its own replicas for high availability, each partitioned node is actually a database in it's own capacity.

Scale Up vs Scale Out

Now lets look at the trade off matrix I mentioned earlier in this post. This trade off matrix is known as CAP Theorem. This is also known as Brewer's theorem. It states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:

Consistency (all nodes see the same data at the same time). Note this consistency is different than what it is in ACID.
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)

A distributed system can achieve only two of them at a time.

Here is a nice summary of how different Datastores complies with CAP Theorem from a presentation by Aleksandar Bradic

Finally, here is a matrix, I prepared, to capture various parameters that one would consider while analyzing a Distributed DB System.

Not all values are filled. I will continue to work on this and update it further.

Connect to me on twitter @satya_paul

Check out my storyboard on www.fanffair.com - http://www.fanffair.com/storyboard/satyajitp2011

Tuesday, April 1, 2014

RDBMS - one size fits all, but not anymore

Relational Databases (RDBMS) have been the de facto standard for storing information for fairly significant period of time, almost since the dawn of the Information Technology Industry. For software developers, traditionally, the obvious choice has been RDBMS and most of us really never gave any serious thought for storing data in any other format. But that's not the case anymore, now the developers are spoilt for choice. In last decade, many new offerings came out of the Labs of Google, Amazon, Facebook, Apache foundation & many niche DB Vendors. They are now mature with proven track records and challenging the dominance of Relation Databases. Yes, I am referring the new generation of Databases i.e. NoSQL Databases (also known as Schemaless Databases), In-memory Data stores and what they call New SQL Databases.

In this post, I will introduce the broad categories of NoSQL databases & various offerings under each category, a brief mention of various drivers behind this movement, cover few use cases where NoSQL databases will score better than the regular RDBMS databases and finally what you give up when you gain so much. What I don't intend to cover here is a detailed technical explanation of key concepts like ACID, BASE, CAP Theorem or details on sharding, partitioning, replication, map-reduce and other associated technological concepts. I do have a plan to cover them in a separate blog post.

Broad Classification of NoSQL Databases

NoSql or Schemaless Databases can broadly be categorized as following:

A key-value store: it's a storage system that stores values indexed by a key. Typically, you can query only by the key and values are opaque and can not be used for querying. This allows very fast read and write operations (a simple disk access) and this model is seen as a kind of non volatile cache (i.e. well suited if you need fast accesses by key to long-lived data). This category of databases are largely inspired by a paper by Amazon based on DynomoDB.

A document-oriented database extends the previous model and values are stored in a structured format (a document, hence the name) that the database can understand. For example, a document could be a blog post and the comments and the tags stored in a denormalized way. Since the data are transparent, the store can do more work (like indexing fields of the document) and you're not limited to query only by key. IBM's Lotus Notes Database has largely influenced this category of databases.

A graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data. A graph database is any storage system that provides index-free adjacency. This means that every element contains a direct pointer to its adjacent elements and no index lookups are necessary.

Source: Presentation by Emil Eifrem, CEO of neo4j

Wide-Column Stores are derived from Google's BigTable. BigTable is a compressed, high performance, and proprietary data storage system built on Google File System. BigTable maps two arbitrary string values (row key and column key) and timestamp (hence three dimensional mapping) into an associated arbitrary byte array. It is not a relational database and can be better defined as a sparse, distributed multi-dimensional sorted map. BigTable is designed to scale into the petabyte range across hundreds or thousands of machines. Every row can have it's set of columns.

Paper by Kai Orend

A New SQL Database - Simply put it's Scalable RDBMS.

Note above definitions are summed up from various sources.

Examples of NoSQL Databases

Here are few examples from each of them, I tried to pick the popular ones from each category.

Key-Value Stores: Redis, Riak, DynamoDB, Voldemort
Wide-Column Stores: BigTable (Internal to Google, partially available via AppStore), Cassandra, HBase
Document Stores: MongoDB, Couchbase, CouchDB, SimpleDB
Graph Databases: Neo4j, InfiniteGraph, FlockDB

Although the following lists are not part of NoSQL family but they are still very relevant and significant, hence mentioning few examples -

In-Memory Stores (Key-Value): memcached, Ehcache

New SQL Databases: VoltDB, Amazon RDS, NuoDB, MySQL Cluster, Clustrix

And for old time's sake -

Relational Databases: Maria DB, MySQL, Microsoft-SQL, IBM DB2, Oracle.

As you see the list is pretty long and at times, it may be slightly confusing as well. In the lists above, the products are grouped under a specific category based on what they claim or their dominant characteristics. But there could be instances where some of the offerings may overlap into two different categories.

Here is nice infographics grouping different offerings under different categories published by Matthew Aslett in his blog.

Usecases and Drivers behind the adoption of NoSQL Databases

Here are few drivers and usecases that drove the adoption of NoSQL databases.

Scalability: With the advent of consumer Internet and massive penetration on mobile devices, scalability challenges increased exponentially. Answer to this challenge was to go for Scale-out architecture (horizontal scaling) over Scale-up (vertical scaling). Traditional RDBMS offerings either failed to adopt the scale-out architecture or they became too complex to manage at that scale. NoSQL databases, particularly Key-Value Pair Databases and Wide Column Stores or Column family Databases came to the rescue. Some of the shining examples would include Google's BigTable and others in it's family (e.g. HBase, Cassandra etc), Amzaon's DynamoDB or other key-value stores like Redis, Riak and finally in memory caches like Memcached etc.

Here is a slide from Emil Eifrem's presentation that shows how various types of DBs are stacked up when it comes to scaling etc.

Schemaless Databases: Scalability was one major reason why companies experiencing Web Scale traffic started looking beyond RDMBS, but a large section of developers started adopting NoSQL Databases for their flexibility. Document databases with their ability to store as well as query JSON/BSON data became popular among the developers. It would be a nightmare to store data like Blogposts, Comments, nested comments etc using a traditional RDBMS and more so as those structures keep on changing. Lot of time, particularly in case of configuration data, each record may have different attributes. Using regular RDBMS databases managing such use cases become very messy - typically for a single such record there will be as many name value pairs entries as the number of attributes or one large table with all possible columns and each record will make use of a fraction of those columns. It becomes quite messy in terms of reading, writing and maintainability point of view. However, Document Databases and Wide Column Databases will work just perfect in these usecases. Unlike traditional RDBMS Databases, Schemaless databases keep related information together and avoid joins.

Economics: Third major reason lies in the economics around capacity building. Unlike large enterprises where they can forecast the future capacity requirements as well as have deep pocket to finance the large servers upfront, start-ups really don't know whether they would be successful enough to invest in large servers coupled with the fact that capital is far more scarce resource in the star-up world. So, upfront investment on large servers becomes a very difficult proposition & a huge entry barrier. They really needed a way to add capacity as they move on (& as far as they move on) and without impacting existing services. The architectures proposed by Amazon's Dynamo DB and Google's BigTable came to the relief. Scale-out architecture took over Scale-up architecture enabling companies to add capacity as they need them & allowing them to grow infinitely. This reduced the need for upfront capital requirement and lowered the entry barrier.

Graph Database: Need for Graph databases is distinctly different from others and represents the real world problem as they appear. In real world everything in interconnected either directly or through others. Graph represents all those connected entities as "nodes" and the connections as "relationship". It can help answer questions like "How do I reach New York?" or "How am I connected to Matt?" or "What movie Julie may like to watch?". Traditional databases fail to scale as the relationships become deeper. Large Social Network sites like Facebook, Linkedin, Twitter have their own graph database implementation and there are few commercially available Graph databases for everyone's use. However, Graph databases are not as scalable as other families of NoSQL databases.

One of the points I did not highlight here is "availability", that's really not a point of differentiation anymore rather a point of parity and most of the RDBMS vendors as well as NoSQL databases provide high availability through replication and fail over.

Conclusion

As you start benefiting from some of the NoSQL features, you must be aware of the areas where you have to give up. Here is short list of areas where you may have to do compromises:

Accepting Eventual Consistency over transaction level consistency (ACID)
Increased Complexity: Organizing data in a way that all related information remain under one sharded node.
Absence of SQL Query (for some of them).
Absence of Joins or Aggregate Functions.
Slowness of Two Phase Commits (2PC) when data spread over multiple nodes.
Any mass update.

So, it really comes down to the point that no single solution will address every problem we have and neither they are meant for that. But the good point is we have options.

To put the things in perspective, take a look at the projected revenue growth for various categories as per a study done by the 451 group

Connect to me on twitter @satya_paul
Check out my storyboard on www.fanffair.com - http://www.fanffair.com/storyboard/satyajitp2011

Focal Point