AdBlock Detected

It looks like you're using an ad-blocker!

Our team work realy hard to produce quality content on this website and we noticed you have ad-blocking enabled. Advertisements and advertising enable us to continue working and provide high-quality content.

What is Cassandra?

In this new entry of Refactorizando, What is Cassandra is and when to use it, we will discuss the principles and basic features of this NoSQL database, as well as when it is beneficial to use it.

What are NoSQL databases?

A NoSQL database is a type of database that differs from relational models in that it does not use SQL as its main language. Instead, it provides its own languages and more flexible query models.

Types of NoSQL Databases

We can narrow down the types of NoSQL databases to four:

  • Columnar Databases: Similar to SQL databases, information is stored in columns. Cassandra is a clear example of this type of database.
  • Key-Value Databases: This database stores data in a key-value format. It can store information in memory or persist it. An example could be Redis.
  • Document-oriented Databases: This type of database allows the retrieval, storage, and management of documents or structured data.
  • Graph Databases: This type of database is composed of nodes and edges and is based on graph theory. An example could be ArangoDB.

Origin of Cassandra

Cassandra made its appearance in 2008 as an open-source project. It was initially created by Facebook to improve searches in its inbox. Since 2010, the project has been maintained by the Apache Foundation.

Cassandra was influenced by Amazon Dynamo and Google BigTable. As a curious fact, its name comes from the priestess Cassandra in Greek mythology, who could prophesy and accurately predicted the deceit of the Trojan Horse.

Key Features of Cassandra

  • Scalability: Cassandra is highly scalable, allowing easy addition of resources.
  • Fault Tolerance: Cassandra is fault-tolerant, with each node functioning independently. If one node goes down, the service continues with the remaining nodes.
  • Flexible Storage: It allows storing flexible structures, meaning not all records have to have the same number of columns.
  • ACID Support: Atomicity, Consistency, Isolation, and Durability.
  • Fast Writes: Cassandra enables fast write operations.
  • No Single Point of Failure: Working with independent nodes, Cassandra does not have a single point of failure.
  • Data Partitioning: Information can be partitioned among multiple nodes.
  • Data has a time-to-live, eliminating the need for manual deletion.
  • Custom Query Language: You can use Cassandra Query Language (CQL) as a custom query language, and interact with the database using cqlsh via the command line.

Cassandra Architecture

The main objective of Cassandra is to manage a large volume of data across multiple nodes. Cassandra replicates and distributes information from the start across all its nodes.

All nodes in the cluster have the same role, and for replication to occur from the beginning, all nodes need to be connected. If a node goes down or fails, another node takes its place. Therefore, each node must function independently.

In a Cassandra cluster, multiple nodes can act as replicas. If we detect that one of these nodes responds with outdated information, we retrieve the most up-to-date information. Once we receive the information, we update the value of the node to have completely up-to-date information.

Cassandra Arquitectura | Cassandra, ¿qué es y cuándo usarla?
Cassandra Architecture

To enable communication and fault detection between nodes, the gossip protocol is used.

Cassandra Components

Cassandra typically consists of the following components:

  • Node: The basic component of any installation, where the data is stored.
  • Data Center: A collection or group of nodes.
  • Cluster: A collection of data centers.
  • Commit log: Similar to other databases, Cassandra has a recovery mechanism where it writes all the operations it performs in a log.
  • Memtable: In this component, we write the information after the commit log into an in-memory structure.
  • SSTable: The information is written from the table to disk.
  • Bloom filter: A structure used to determine if an element is present in the database. This algorithm is a type of cache and operates at an extremely fast pace.

When to use Cassandra?

After considering its main characteristics and architecture, let’s discuss when it might be best to use Cassandra as a columnar database.

It is a good time to consider Cassandra when:

  • Write operations are significantly more frequent than read operations.
  • Read operations are performed using a primary key, as reads without a primary key can be penalized.
  • Flexible data storage is required, meaning not all fields in the structure are mandatory.
  • Joins are not a primary concern.
  • We perform data updates in an idempotent manner.

Examples of Use Cases for Cassandra

Here are some examples of use cases for Cassandra:

Messaging applications:

Cassandra suits messaging applications like chats exceptionally well, as they require fast and efficient message writing. Additionally, Cassandra’s ability to delete messages after a certain period of time allows for temporary message creation.

Weather applications:

Cassandra can store large amounts of information quickly, making it suitable for weather applications that require the storage and processing of vast amounts of data.

Industry or services with high data loads:

Healthcare, logistics, industry, and agriculture services often deal with processing and storing large amounts of real-time data. Cassandra enables efficient management of incoming data flow to provide real-time data analysis.

Web tracking:

We can use Cassandra in websites to store user data and actions, allowing us to create recommendation engines and other functionalities.

Internet of Things (IoT):

As the prevalence of IoT increases, organizations actively seek databases capable of agile and fast data storage. Due to the requirement of managing large volumes of data, businesses commonly utilize Cassandra in IoT applications.

Conclusion

Before starting to use Cassandra, it is best to conduct a thorough analysis to determine if it fits your use case. In general, Cassandra is a good choice when write operations are a priority, and the data load is high. Based on my experience, we used it for a real-time IoT application where we had to store and analyze a large amount of data quickly. We chose Cassandra because it provided the necessary speed for data storage.

We hope this article on Cassandra, its features, and use cases has helped you learn more about this database.

If you need more information, you can leave us a comment or send an email to refactorizando.web@gmail.com You can also contact us through our social media channels on Facebook or twitter and we will be happy to assist you!!

Leave a Reply

Your email address will not be published. Required fields are marked *