Recently, I upgraded Lucene library (the core jar file) from version 3.0.2 to version 3.0.3 (which was released in December 2010) in my project. The purpose of the upgrading is just for keeping up and sticking with a more bug-free release.
However, after upgrading, I noticed that the optimized index folder contains more index files than previously using version 3.0.2. That means the index merge during
IndexWriter.optimize() stops at some point. I am not sure if the un-merged index file may cause any performance degradation during index search, but i am not satisfied with the fact that many files stay in the index folder (although it's not too many).After reading the change document for Lucene 3.0.3 release, I realized that some changes had been made to avoid high disk usage during indexing. The original change log item from Lucene 3.0.3 is stated as follows:
LUCENE-2773: LogMergePolicy accepts a double noCFSRatio (default = 0.1), which means any time a merged segment is greater than 10% of the index size, it will be left in non-compound format even if compound format is on. This change was made to reduce peak transient disk usage during optimize which increased due to LUCENE-2762.
Since my index is not very very big and I don't care about the 'peak transient disk usage", I still want the index to be created and optimized in a cleaner way. This means the merge should be still continued to form a whole compound format.
Obviously I now need to change the default Lucene indexing and optimizing behavior by adding extra code in my project. The following is my tweak:
// an IndexWriter instance created through method getWriter
IndexWriter writer = getWriter(dir);
// The tweak starts here
MergePolicy mp = writer.getMergePolicy();
if (mp instanceof LogByteSizeMergePolicy) {
LogByteSizeMergePolicy lbsmp = (LogByteSizeMergePolicy) mp;
lbsmp.setNoCFSRatio(1.0);
}
Compiled, deployed, run. Hooray! The cleaner index folder is back!
Very cool, it just solve one of my problem with this merging thing, now tell me how did you did it for the spellIndex, bcz it has the same issue to not being compound into .cfs file like before.
ReplyDeleteBest regards,
Wilson