How it works...
Lucene stores your data in several segments on disk. These segments are created when you index a new document or record, or when you delete a document.
In Elasticsearch, the deleted document is not removed from disk, but is marked as deleted (and referred to as a tombstone). To free up space, you need to forcemerge to purge deleted documents.
Due to all these factors, the segment numbers can be large. (For this reason, in the setup, we have increased the file description number for Elasticsearch processes.)
Internally, Elasticsearch has a merger, which tries to reduce the number of segments, but it's designed to improve the index performances rather than search performances. The forcemerge operation in Lucene tries to reduce the segments in an IO-heavy way by removing unused ones, purging deleted documents, and rebuilding the index with a minimal number of segments.
The main advantages of this are as follows:
- Reducing both file descriptors
- Freeing memory used by the segment readers
- Improving performance during searches due to less segment management