What is the CAP Theorem? MongoDB vs Cassandra vs RDBMS, where do they stand in the CAP theorem?

CAP Theorem

CAP stands for Consistency, Availability and Partition Tolerance.

What is the CAP Theorem?

A distributed system always needs to be partition tolerant, we shouldn’t be making a system where a network partition brings down the whole system.
So, a distributed system is always built Partition Tolerant.

How a Distributed System breaks Consistency or Availability?

Scenario 1: Failing to propagate update request to other nodes.
Say, we have two nodes(N1 & N2) in a cluster and both nodes can accept read and write requests.

  1. Respond with whatever data it has, in this case, salary = 800, and update the data when the network partition resolves — making the system Available but Inconsistent.
  2. Simply return with an error, saying “I do not have updated data” — making a system Consistent but Unavailable by not returning inconsistent data.

MongoDB vs Cassandra vs RDBMS In CAP Theorem

Many of us have seen the below diagram. Databases in CAP theorem. This categorization of the databases is not entirely correct.

RDBMS(MySQL, Oracle, MS SQL Server, etc)

It’s no brainer that all RDBMS are Consistent as all reads and writes go to a single node/server.

  1. If a leader disconnects from the cluster, it takes a few seconds to elect a new leader. So, definitely not an available system.
  2. A client can always disconnect from the leader due to network partition even if both client and leader node is running fine. Hence making it unavailable.
  1. If the data is read and written from only master/primary node it's always Consistent.
  2. If the read requests are sent to any of the secondary, we will lose consistency and might serve inconsistent data in case of network partition or say master takes time to replicate data.

MongoDB

Things you should know about MongoDB:

  1. MongoDB is a single leader based system that can have multiple replicas. These replicas update themselves asynchronously from Leader’s OpLog.
  2. Each node maintains the heartbeat of every other node to keep track if other replicas or leader is alive or dead.
  3. If the leader/primary node goes down, replicas can identify and elect a new leader based on priority, if they can form the majority. More on this here.
  1. If a leader disconnects from the cluster, it takes a few seconds to elect a new leader. So, making it unavailable for writes and reads.
  2. A client can always disconnect from the leader due to network partition even if both client and leader node is running fine. Hence making it unavailable.

Cassandra

In Cassandra, any coordinator nodes can accept read or write requests and forwards requests to respective replicas based on the partition key. Hence even if a replica/node goes down, others can serve the read/write requests. So, is it safe to say Cassandra is always available?
Hmmm not entirely, we will find out soon why.

Summary Table

The below table summarizes where each DB with a different set of configurations sits on the CAP theorem.

Conclusion

In this blog post, we saw how each DB is categorized in the CAP theorem and how it's difficult to categorize them, as they all behave in a different way based on how you configure them.
Hope this helps :)

Resources

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Bikas Katwal

Bikas Katwal

286 Followers

Coder | Distributed Systems | Search | Software Engineer @Walmartlabs