ElasticSearch Tutorial: Introduction to Elasticsearch
When asked "what is Elasticsearch?" some individuals would say "an index," "a search engine," "an analytics database," "a big data solution," "it's quick and scalable," or "it's like Google." These answers may either get you closer to answer or further confuse you, depending on your level of expertise with this technology. However, all of these responses are correct, which is part of Elasticsearch's appeal.
In this article, we'll take a closer look at what Elasitic Search is , how it works, where it is used ,it's benefits, and some fundamental definitions about it.
What is Elasticsearch?
Elasticsearch is a decentralized, free, and open search and analytics platform that can support all types of data, including textual, numerical, spatial data, structured, and unstructured data.
Elasticsearch's speed and scalability, as well as its ability to index a wide range of content types, make it suitable for a variety of applications:
- Application search
- Website search
- Infrastructure metrics and container monitoring
- Application performance monitoring
- Enterprise search
- Geospatial data analysis and visualization
- Security analytics
- Logging and log analytics
- Business analytics
Let’s dive into Elasticsearch which is a backend database option for Zenarmor (Sensei)
What Does Elasticsearch Do?
We understood the definition of elasticsearch lets learn what does it do actually.
Elasticsearch helps you to store, search, and analyze large datasets in near real-time and generate output in milliseconds. It does not search in large text files it searches in indexes.
To understand what elasticsearch does, we need to fully understand the is working mechanism.
How does Elasticsearch work?
Elasticsearch grabs and manages document-oriented and semi-structured files.
Elasticsearch's primary data structure is an inverted index maintained through the Apache Lucene APIs. An inverted index is a mapping of each specific ‘word' (token) to the list of documents (locations) containing that word, allowing users to easily find documents containing given keywords. Index data is contained in one or more partitions, also defined as shards. Elasticsearch also automatically distributes and allocates shards to cluster nodes.
To understand how elasticsearch indexes the documents you can look at that example
Assume that we have simple 3 documents like these
Document 1:"A picture is worth a thousand words." Document 2: "No man is an island." Document 2:"Honesty is the best policy."
After some basic text processing (lowercasing, removing punctuation, and splitting words) we can have such a table.
Figure 1. How elasticsearch indexes documents
The inverted index associates terms with documents (and sometimes places within texts) that include the phrase. We can rapidly discover a term and its occurrences in the postings-structure since the terms in the dictionary are ordered.
We have looked at the definition and working principles of elasticsearch now let’s explore what it is used for in real-world scenarios
What Is Elasticsearch Used For?
Because of Elasticsearch’s customizable scale and speed, there are many different types of usage areas to handle big data. Major usage areas can be listed as :
1. Textual Search
As the name implies full-text query ability of Elasticsearch. Full-text search applies linguistic search on documents that consist of multiple words (Email bodies, word documents, etc.)
Elasticsearch is a powerful and platform-independent search engine that can do full-text searches over millions of documents in real-time.
2. Product Search
E-commerce sites are the second most popular use of search engines, behind web search platforms. It is critical for e-commerce, especially in terms of possible sales, to have the most relevant goods returned at the top of the list. Elasticsearch helps to solve an e-commerce product search problem with the efficacy and relevancy of its search function.
3. Data Aggregation
One of the biggest usage purposes of Elasticsearch is data aggregation. Data are summarized as metrics, percentages, or other analytics through an aggregation. Aggregation allows you to look at data holistically in a summarized way.
Elasticsearch categorizes aggregations into three types:
Metricaggregations that compute metrics from field values, such as a number or average.
Metrics are simple mathematical operations such as min, max, average, total, percentiles, and so on. Metrics take values from documents to use in calculations.
Bucketaggregations divide records into buckets (also known as bins) based on field values, ranges, or other parameters.A criterion is defined by bucketing aggregations. Documents that meet that criterion are added to the bucket. For example bucketing tasks data as “Backlog”,”Todo”,In “progress”,”Q&A”,”Done”.
- Aggregations in a
pipelinethat use data from other aggregations rather than documents or fields.
4. JSON Document Storage
Elasticsearch stores a copy of all the JSON documents you have for indexing in a field called
_source as a default spec. Any query that fits the document returns a copy of the stored data. You can store your data in Elasticsearch and retrieve it as well. It also serves as a document storage system.
One of its prominent features of Elasticsearch is the ability to query spatial data. It allows queries both in point (lat-lon) and geospatial shapes like lines, circles, polygons, multi-polygons, etc.Geo_point and geo_shape functions help users to manage and visualize their spatial data. Many companies have a large amount of geospatial data (like Uber) use elasticsearch for handling thousands of queries in a second on their systems.
When you googling something you may see “Did you mean …?” Google suggests you something according to your search input.Elasticsearch's suggestion feature provides you to get relevant results based on their searched keywords. Term, phrase or context suggestions are the main functions of the Elasticsearch suggestion system.
Figure 2. Elasticsearch auto-suggest
Elasticsearch auto-complete feature provides users to get relevant results based on their searched keywords as they are typing. Its built-in functionality called
Completion Suggester helps a lot in auto-completion needs.
Open a google search and start to type it will automatically complete your query. This feature works in Elasticsearch in the same way.
Figure 3. Elasticsearch auto-complete
8- Analyze Power and Metrics
One of the biggest advantages of using Elasticsearch is its analyzing power. Because of its architecture, it can analyze billions of documents in a short amount of time.
Also, in Elasticsearch, Kibana Metricbeat modules allow you to collect and evaluate metrics from servers, Docker clusters, Kubernetes orchestrations, as well as explore and analyze Prometheus-style metrics or device telemetries.
How to Use Elastic Search in Zenarmor (Sensei)?
Zenarmor is an all-software instant firewall that can be deployed onto virtually anywhere.Although you can use it on opensource firewalls like OPNsense and pfSense ® software it is also avaliable for many linux enviroments.
You can access detailed information through Zenarmor About page.
In Zenarmor enviroment You can install and configure to use a Elasticsearch as backend database. It is also possible to use Elastic as Repoting purposes. In order to use a remote Elasticsearch instance for Zenarmor Reporting, you must follow 3 main steps given below.
- Microsoft Windows Firewall Configuration
- Elasticsearch Installation and Configuration
- Kibana Installation and Configuration
You can have more insight about how to use elastic search as reporting on Zenarmor from Sunny Valley Elastic Search Guide
What Are The Companies Using Elastic Search?
Elasticsearch is a technology stack tool that falls under the Search as a Service category. Elastic enables search solutions for thousands of enterprises around the world, from startups to the global 2000, to find documents, monitor infrastructure, protect against security threats, and more.
Here are a few of them:
- Stack Overflow
What are the Benefits of Elasticsearch?
ElasticSearch is a scalable, enterprise-level, open-source search engine that works in real-time and is based on Apache Lucene. Let's go through some of the primary advantages of using ElasticSearch in your business.
1- Fast performance
With the help of inverted index architecture, elastic allows one to find the search word easily even in very large data sets.
Elasticsearch provides multilingual capability with the help of the
ICU plugin, which is an elasticsearch plugin based on the Lucene implementation of the Unicode text segmentation standard.
Elasticsearch is designed to be scalable and it can run flawlessly on every system or in a network of hundreds of nodes.The transition from a small to a wide-scale cluster is almost completely seamless and straightforward.
4- Auto-completion and instance search
Auto-suggest and auto-complete algorithms in elasticsearch make searching tasks easier. Term Suggester, Phrase Suggester, Completion Suggester, Context Suggester are the core components of its auto-completion and instant search capability.
5- Schema free
Some definitions, such as index, class, and field type, are not required before the indexing procedure, and when an object is indexed later with a new attribute, it is immediately applied to the mapping definitions. Elasticsearch is schema-less, which ensures that documents can be searched without expressly specifying a schema.
6- Document oriented (JSON)
What are the Basic Concepts of Elasticsearch?
After learning what Elasticsearch is, how it works, and what it is used for, let's briefly talk about some of the indispensable terms of the Elastic world.
1- JVM (Java Virtual Machine)
Java has a structure that operates on a virtual machine. As a result, in order to run Java programs, Java virtual machines must be installed on a server. JWM allows running java programs on specified servers.
Shards, on the other hand, are the "Apache Lucene" application itself that provides indexing of data within nodes.
Each record in ElasticSearch consists of JSON documents. Elasticsearch indexes is a collection of JSONson documents. In short, each index is a kind of database.
The Lucene index is split into parts, which are smaller directories. A segment is a subset of the Lucene index.
Mapping is the method of specifying how a document and the fields contained inside it will be stored and indexed. Each index has a single mapping form that defines how the text is indexed.
Any single instance (elasticsearch installed machine ) is defined as Node.
In Elasticsearch a document represents a basic unit of information that can be indexed.
Elasticsearch sends a copy of each data to other machines, thus preventing data loss if one of the machines is down. Theise replicated machines or shards are defined as Replica.
A cluster in Elasticsearch is a set of nodes with the same cluster namecluster .name attribute. As nodes join or exit a cluster, the cluster reorganizes itself to spread data equally over the available nodes.
In Elasticsearch, a type represents a class of related documents and is identified by a name such as customer or product.
Is Elasticsearch free?
After learning so many virtues of elasticsearch, you may wonder if it is paid.
Elasticsearch’s free and unrestricted features are available for use under either the SSPL or the Elastic License. The Elastic License has additional free features. Subscribed versions have access to support as well as specialized features like alerting and machine learning.