Here is what ElasticSearch is and Why you should use it. Part 1

What happens when you type and press that search button. What makes it so fast?

Jul 03, 2024

Have you ever wondered when you search in any website why is the search fast?

Elasticsearch is a distributed, open-source search and analytics engine built on Apache Lucene and developed in Java.

In very simple terms, you can think of Elasticsearch as a server that can process JSON requests and give you back JSON data.

Let us take an example where we go to an e-commerce website and search for a laptop. If the website uses ElasticSearch in the backend, here is how the request and response would look.

POST /products/_search
{
  "query": {
    "match": {
      "name": "laptop"
    }
  }
}

Response

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": 1.0,
    "hits": [
      {
        "_index": "products",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.0,
        "_source": {
          "name": "laptop",
          "brand": "Brand A",
          "price": 1000
        }
      }
      }
    ]
  }
}

Benefits of Using Elasticsearch

Elasticsearch offers a variety of benefits. Here are some of them:

High-Speed Search: It can provide you with fast and near-real-time search capabilities.
Scalability: It can scale horizontally. You can distribute data among multiple nodes in a cluster.
Full-Text Search: It can perform a full-text search with ease. Users can perform complex searches across multiple fields and documents.
Rich Querying: It offers flexible and powerful querying language. It allows you to write complex queries that included various search criteria and filters.

How Does ElasticSearch works?

Behind the scenes ElasticSearch organizes the data in json documents which are grouped based on their indices similar to database. Elasticsearch uses inverted indices, a data structure that maps words to their document locations, for an efficient search.

To better understand elasticsearch here are some of the concepts we should need to know:

Cluster: In Elasticsearch, a cluster defines the collection of one or more nodes(server). These nodes work together to store data and perform different distributed operations. When multiple nodes join together to form a cluster, they share the workload and provide high availability.
Node: A node is a single instance of Elasticsearch. Node is the part of a cluster. These clusters are independently capable of storing data and performing operations. Nodes can be physical machines, virtual machines, or containers.
Shards: It is the building blocks for the distribution of data. The index that is created after getting data is further divided into small parts. These smaller parts are called shards. These shards are a self-contained index segment. It is stored on a single node.

To understand the working of ElasticSearch we will break it into 3 parts

Storage
Search
Results

Storage

Documents

Documents are the basic unit of information that can be indexed in ElasticSearch expressed in JSON. A document is like a row in a relational database, representing a given entity — the thing you’re searching for. In ElasticSearch, a document can be more than just text, it can be any structured data encoded in JSON. That data can be things like numbers, strings, and dates. Each document has a unique ID and a given data type, which describes what kind of entity the document is. Example of a document

{
  "product_id": "12345",
  "name": "Wireless Bluetooth Headphones",
  "brand": "SoundMagic",
  "category": "Electronics",
  "price": 59.99,
  "availability": "In Stock",
  "rating": 4.5,
  "reviews": [
    {
      "user": "JohnDoe",
      "rating": 5,
      "comment": "Excellent sound quality and battery life!"
    },
    {
      "user": "JaneSmith",
      "rating": 4,
      "comment": "Good value for money, but the fit could be better."
    }
  ],
  "features": [
    "Bluetooth 5.0",
    "Noise Cancelling",
    "20 hours battery life",
    "Built-in microphone"
  ],
  "release_date": "2023-05-15",
  "dimensions": {
    "weight": "200g",
    "length": "15cm",
    "width": "10cm",
    "height": "5cm"
  }
}

Indices

After storing, the data needs to be indexed for querying. An index is a logical namespace or a collection of documents that share similar characteristics. It is similar to a database table in a relational database system. You can think of an index as a way to organise and group related data.

Elasticsearch indexing involves defining an index (namespace) and specifying the document structure for the data. Documents in JSON format are added to the index. Indexing allows Elasticsearch to be searchable.

Inverted Index

The image is taken from Jay Gopalakrishnan blog

Inverted indexing is the underlying mechanism by which all search engines work. It is a data structure that stores a mapping from content, such as words or numbers, to its locations in a document or a set of documents. An inverted index doesn’t store strings directly and instead splits each document up into individual search terms (i.e., each word), then maps each search term to the documents those search terms occur within.

Example of Inverted Index Creation

Let’s break down how an inverted index is created with an example. Suppose we have the following documents:

Document 1:

{
  "id": 1,
  "text": "Elasticsearch is a search engine"
}

Document 2:

{
  "id": 2,
  "text": "Elasticsearch is fast"
}

Document 3:

{
  "id": 3,
  "text": "A search engine for all your needs"
}

Tokenization and Inverted Index Creation

Tokenization: Splitting the text into individual terms (tokens).

Document 1: [“Elasticsearch”, “is”, “a”, “search”, “engine”]
Document 2: [“Elasticsearch”, “is”, “fast”]
Document 3: [“A”, “search”, “engine”, “for”, “all”, “your”, “needs”]

2. Inverted Index: Mapping terms to document IDs.

3. Term Document IDs Elasticsearch [1, 2] is [1, 2] a [1] search [1, 3] engine [1, 3] fast [2] for [3] all [3] your [3] needs [3]

In this part of the blog we discussed how the data is stored in ElasticSearch. In the next part, I will cover how data is retrieved and how distributed search works in ElasticSearch.

Schedule a mock System Design Interview with me: https://www.meetapro.com/provider/listing/160769

Linkedin: https://www.linkedin.com/in/mayank-sharma-2002bb10b/

My website: https://www.imayanks.com/

Stackademic 🎓

Thank you for reading until the end. Before you go:

Please consider clapping and following the writer! 👏
Follow us X | LinkedIn | YouTube | Discord
Visit our other platforms: In Plain English | CoFeed | Differ
More content at Stackademic.com

Mayank’s Substack

Discussion about this post