Elasticsearch is a real time search engine where a change to an index will be propegated to the whole cluster within a second.
An elasticsearch cluster indicated as one or more nodes, collection of nodes containing all the data, default cluster name is elasticserach.
A node
is a single server
and part of a cluster, node participate in searching and indexing.
Index
is collection of documents equavalent to a database
within a relational system, index name must be lowercase
Type is represetn a class = table
mapping
= schema
of a table
document
= row
in a table
An index is divided in multiple pieces each piece is called a shard
.
- useful when an index contains more data than the hard drive of a node can store (e.g; 1 TB data on 500 GB hard disk) A shard is a fully functional and independent index
Shard
can be stored on any node in a cluster.
Default number of shards is 5
, and there is a replica for each primary shard
Shard
allowes to distribute and parallelize operations across shards which increases performance.
Shards improve scalability
Replica
is a copy of a shard
Replica
nerve resides on the same node as the original shard (e.g; if a given node fails the replica is still available)
- search request hits a node
- Node broadcast to every shard in the index (primary & replica)
- Each shard performs query
- Each shard returns result
- Result merged, sorted and return to client
PUT /ecommerce
{
}
DELETE /ecommerce
GET /_cat/indices?v
# creats the product table
PUT /ecommerce
{
"mappings": {
"product": {
"properties": {
"name": {
"type": "string"
},
"price": {
"type": "double"
},
"descirption": {
"type": "string"
},
"status": {
"type": "string"
},
"quality": {
"type": "integer"
},
"categories": {
"type": "nested",
"properties": {
"name": {
"type": "string"
}
}
},
"tags": {
"type": "string"
}
}
}
}
}
# insert a record
PUT /ecommerce/product/1001
{
"name": "rails framework from beginer to professional"
,
"price": 30.00,
"description": "Learn rails framework in just few hours",
"status": "a'ctive"
,
"quantity": 1,
"categories": [
{"name": "software"}
],
"tags": ["rails framework", "ror1", "ruby","programming"]
}
PUT /ecommerce/product/1001
{
"name": "rails framework from beginer to professional"
,
"price": 40.00,
"description": "Learn rails framework in just few hours",
"status": "active"
,
"quantity": 1,
"categories": [
{"name": "software"}
],
"tags": ["rails framework", "ror1", "ruby","programming"]
}
POST /ecommerce/product/1001/_update
{
"doc": {
"price": 50.00
}
}
DELETE /ecommerce/product/1001
# insert set of records using bulk
POST /ecommerce/product/_bulk
{"index":{"_id":"1002"}}
{"name":"Why elasticsearch is Awesome","price":"50.00","description":"A book about elasticsearch!","status":"active","quantity":10,"categories":[{"name":"Software"}],"tags":["elasticsearch","programming"]}
{"index":{"_id":"1003"}}
{"name":"Dark chocolate","price":4.00,"description":"Yummy dark chocolate.","status":"active","quantity":100,"categories":[{"name":"chocolate"}],"tags":["chocolate"]}
# executing different actions using bulk
POST /ecommerce/product/_bulk
{"delete" : {"_id": "1" }}
{"update" : {"_id": "1002" }}
{"doc" : {"quantity": "9" }}
GET /ecommerce/products/1
POST /_bulk
{"update" : {"_id": "1002", "_index":"ecommerce", "_type": "product"}}
{"doc" : {"quantity" : 8 }}
# get a specific product
GET /ecommerce/product/1002
# search for all products
GET /ecommerce/product/_search?q=*
GET /ecommerce/product/_search?q=chocolate
# search the name attributes that contain 'Awesome'
GET /ecommerce/product/_search?q=name:Awesome
GET /ecommerce/product/_search?q=name:foobar
# name field should have both keywords
GET /ecommerce/product/_search?q=name:(chocolate AND dark)
GET /ecommerce/product/_search?q=name:(framework OR professional)
# contains at least one of them and the status should be active
GET /ecommerce/product/_search?q=(name:(framework OR professional) AND status:active)
# name field should have contains 'framework' but not 'rails'
GET /ecommerce/product/_search?q=name:+framework -rails
# this will give us the result that have keywords 'from' and 'framework' here 'framework' is not guaranteed to be in all the results
GET /ecommerce/product/_search?q=name:from framwork
# searching for a specific sentence (order matters)
GET /ecommerce/product/_search?q=name:"framework from"
# still search work hiphen get dropped
GET /ecommerce/product/_search?q=name:"framework - from"
# special characters get ingnored from the search as the analyzer shows
GET /_analyze?analyzer=standard&text=framework - from
Sum, min, max and stats aggregations:
match_all
:
# to sum the quantities of all products
GET /ecommerce/product/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"quantity_sum": {
"sum": {
"field": "quantity"
}
}
}
}
To aggregate using match
:
GET /ecommerce/product/_search
{
"query": {
"match": {
"name": {
"query": "chocolate"
}
}
},
"size": 0,
"aggs": {
"quantity_sum": {
"sum": {
"field": "quantity"
}
}
}
}
# Gets the average of documents
GET /ecommerce/product/_search
{
"query": {
"match": {
"name": {
"query": "chocolate"
}
}
},
"size": 0,
"aggs": {
"quantity_avg": {
"avg": {
"field": "quantity"
}
}
}
}
# max and min aggregation
GET /ecommerce/product/_search
{
"query": {
"match": {
"name": {
"query": "car"
}
}
},
"size": 0,
"aggs": {
"max_quantity": {
"max": {
"field": "quantity"
}
}
}
}
# stats aggregation (count, max, min , avg, sum)
GET /ecommerce/product/_search
{
"query": {
"match": {
"name": {
"query": "car"
}
}
},
"size": 0,
"aggs": {
"quantity_stat": {
"stats": {
"field": "quantity"
}
}
}
}
Bucket aggergation:
This will give us the number of documents in range 50 and range 100 (bucket aggregation)
GET /ecommerce/product/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"quantity_ranges": {
"range": {
"field": "quantity",
"ranges": [
{
"from": 1,
"to": 50
},
{
"from": 50,
"to": 100
}
]
}
}
}
}
Nested aggregations:
Stats on the document within each bucket, this can be done using a sub aggregation ( the second aggretaion will operate on each of the ranges within the parrent aggregation) - there is no limit at nesting aggregations but it will effect the performance the more you add.
GET /ecommerce/product/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"quantity_ranges": {
"range": {
"field": "quantity",
"ranges": [
{
"from": 1,
"to": 50
},
{
"from": 50,
"to": 100
}
]
},
"aggs": {
"quantity_stats": {
"stats": {
"field": "quantity"
}
}
}
}
}
}