- I’ve identified we have a problem in one of our elasticsearch clusters -> it reached the low water mark - 87%
- Shards are not allocated to the nodes who passed the water mark - a couple of them
- I’ve increased water marks:
- low - 95%
- high - 97%
- flood - 99%
- In the meantime we discussed on our plan and strategy and decided to scale out the cluster by 3 nodes
- I’ve created 3 nodes using Terraform
- I’ve deployed elasticsearch one by one using Ansible
- Decreased the water marks back to default to encourage the cluster to rebalance the shards and spread the data across the nodes
- One of the developers has reached out saying that he’s getting multiple errors in one of the services:
[FORBIDDEN/12/index read-only]
- I’ve googled the error and found this: https://selleo.com/til/posts/esrgfyxjee-how-to-fix-elasticsearch-forbidden12index-read-only TLDR - once a node reaches the flood watermark, the one I restored to default(95%), all indices on the nodes become locked and read only.
- SHIT - at that point I realized that multiple nodes reached 95% already
- I increased the flood water mark to 99 %:
curl -XPUT 'http://localhost:9200/_cluster/settings' -H 'Content-Type: application/json' -d '{ "transient" : { "cluster.routing.allocation.disk.watermark.flood_stage" : "99%" } }'
- Used the command from the post I found and unlocked all locked indices:
curl -XPUT -H "Content-Type: application/json" http://localhost:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'
Lags decreased and shards started spreading across the cluster slowly. ALL GOOD, GOOD VIBES