32 essential checks across performance, reliability, security, architecture, and monitoring. Ensure your Elasticsearch cluster is production-ready and optimized.
0 of 32 items completed
Use keyword for exact matching, text for full-text search. Avoid wildcards at the beginning of queries.
Oversized shards slow searches, undersized shards waste resources. Aim for 20-50GB per shard.
Default 1s refresh is often unnecessary. Increase to 30s or more for bulk indexing workloads.
Batch your indexing operations. 1000-5000 docs per bulk request is optimal.
Cache frequently used aggregations and filters to reduce query load.
Disable _source for indices where you only need search, not retrieval.
Pre-sort indices by frequently queried fields to improve query performance.
Enable slow log, analyze patterns, and optimize or cache frequent slow queries.
Replicas provide redundancy and increase search capacity. Never run production with 0 replicas.
Schedule daily snapshots to S3/GCS. Test restoration regularly.
Alert on yellow/red status, high JVM heap, disk space < 15%.
Separate master-eligible nodes prevent cluster instability during high load.
Never exceed 31GB heap. Use remaining RAM for Lucene caches.
Set to (master_eligible_nodes / 2) + 1 to prevent split brain.
Have runbooks for common failures. Practice cluster recovery quarterly.
Never expose Elasticsearch without authentication. Use X-Pack Security or equivalent.
Encrypt data in transit between nodes and clients. Use valid certificates.
Use encrypted EBS volumes or native encryption for regulated data.
Principle of least privilege. Separate read-only and admin roles.
Log authentication attempts, index changes, and admin actions.
Apply security patches promptly. Test updates in staging first.
Automatically move old data to cheaper storage and delete expired data.
Recent data on fast SSD, older data on HDD, cold data on S3.
Dedicated roles improve performance and resource allocation at scale.
Define templates for time-series indices to ensure consistent settings.
Leave capacity for traffic spikes and maintenance operations.
CCS adds latency. Use separate clusters or local replicas instead.
Monitor CPU, memory, disk, JVM heap, query rate, indexing rate.
Alert at 85% disk usage. Elasticsearch blocks writes at 95%.
Track p95/p99 latencies. Alert on degradation.
Review slow queries weekly. Optimize or cache problematic patterns.
Rejections indicate resource constraints. Add capacity or optimize.
Our experts can audit your cluster, identify gaps, and help you implement these best practices for optimal performance and reliability.