Why use NoSQL solution

We are living in the age of data and it is growing exponentially in today’s world. Forget data intensive apps or companies like Youtube / Facebook – nowadays even small to medium sized enterprises are facing a never seen before kind of “data boom”. I read somewhere that Facebook CEO Mark Zuckerberg had famously said that there are more people on Facebook today than the number of people living in the planet some 80 years back !!!

So the question that we are facing now is how do we manage this huge amount of data and also economically store/ retrieve the same. One thing is for sure – this huge volume of data can’t be efficiently /economically managed using standard RDBMS solutions. This has given rise to a completely different genre of databases which are commonly referred as NoSQL databases.

NoSQL databases are in fact a cost effective way of storing and using the huge amount of data. Whatever the NoSQL database - it basically leverages distributed computing at its core - which means it divides the data into multiple chunks (popularly known as shards) and have a master which knows which chunk/shard is in which node. Still better - each chunk /shard is also replicated to safeguard against machine failure.

At the moment, there are 150+ types of NoSQL databases. Although each one of them tends to vary from the other in some way or the other – following are some common properties that can be attributed to all types of NoSQL databases:

Interestingly the different NoSQL databases are broadly divided into the following main types:

Let’s understand CAP Theorem and the debate between Consistency & Availability

Interestingly there is a very good theorem which clearly explains the challenges of a distributed architecture. It is known as CAP theorem. Essentially it states that out of the following 3 properties/characteristics, any database system in the world can honor at most 2 and can never satisfactorily honor all 3. Well, below are the 3 properties/characteristics that we were talking about:

As you might have probably guessed, the first 2 characteristics (Consistency & Availability) are often a tradeoff. So basically it means that if you want higher consistency, you will have to forego a bit of availability. This means that if you want to show correct response to your clients, it means that may be some clients will not receive response at all but whoever will receive response – they will always get the latest response. Imagine the case of a messaging platform – where consistency is of paramount importance – you can’t afford to show a stale message to any client but it is probably okay if you can’t respond to all clients. On the other hand – if you want more availability you will have to let go a bit of consistency. Which means that if you want to show response to all clients – it might mean some clients will indeed see the latest response while some others might be seeing a bit of stale response at the moment. As an example imagine an ecommerce platform – where all users are displayed an array of huge number of products. It is very important that all users are shown the list of products while it might be okay if some users might still not see the latest product while some others will see all the latest products.

So in the world of distributed databases –Consistency and availability are always a tradeoff. So in the world of NoSQL – we say that they follow a model of BASE (contrary to the ACID characteristics of RDBMS) – which means they are Basically Available, they are in a Soft state and will be Eventually Consistent.