Lucene 5.1.0 发布，Java 搜索引擎

发布于 2015-06-03 01:47:29 | 237 次阅读 | 评论: 0 | 来源: 网友投递

Apache Lucene全文检索引擎工具包

Lucene是apache软件基金会4 jakarta项目组的一个子项目，是一个开放源代码的全文检索引擎工具包，即它不是一个完整的全文检索引擎，而是一个全文检索引擎的架构，提供了完整的查询引擎和索引引擎，部分文本分析引擎（英文与德文两种西方语言）。Lucene的目的是为软件开发人员提供一个简单易用的工具包，以方便的在目标系统中实现全文检索的功能，或者是以此为基础建立起完整的全文检索引擎。

Lucene 5.1.0 发布，此版本现已提供在：http://www.apache.org/dyn/closer.cgi/lucene/java/5.1.0。

更新内容如下：

新特性 (9)
1. LUCENE-6066: Added DiversifiedTopDocsCollector to misc for collecting no more than a given number of results under a choice of key. Introduces new remove method to core's PriorityQueue.
  (Mark Harwood)
2. LUCENE-6191: New spatial 2D heatmap faceting for PrefixTreeStrategy.
  (David Smiley)
3. LUCENE-6227: Added BooleanClause.Occur.FILTER to filter documents without participating in scoring (on the contrary to MUST).
  (Adrien Grand)
4. LUCENE-6294: Added oal.search.CollectorManager to allow for parallelization of the document collection process on IndexSearcher.
  (Adrien Grand)
5. LUCENE-6303: Added filter caching baked into IndexSearcher, disabled by default.
  (Adrien Grand)
6. LUCENE-6304: Added a new MatchNoDocsQuery that matches no documents.
  (Lee Hinman via Adrien Grand)
7. LUCENE-6341: Add a -fast option to CheckIndex.
  (Robert Muir)
8. LUCENE-6355: IndexWriter's infoStream now also logs time to write FieldInfos during merge
  (Lee Hinman via Mike McCandless)
9. LUCENE-6339: Added Near-real time Document Suggester via custom postings format
  (Areek Zillur, Mike McCandless, Simon Willnauer)
Bug 修复 (10)
1. LUCENE-6368: FST.save can truncate output (BufferedOutputStream may be closed after the underlying stream).
  (Ippei Matsushima via Dawid Weiss)
2. LUCENE-6249: StandardQueryParser doesn't support pure negative clauses.
  (Dawid Weiss)
3. LUCENE-6190: Spatial pointsOnly flag on PrefixTreeStrategy shouldn't switch all predicates to Intersects.
  (David Smiley)
4. LUCENE-6242: Ram usage estimation was incorrect for SparseFixedBitSet when object alignment was different from 8.
  (Uwe Schindler, Adrien Grand)
5. LUCENE-6293: Fixed TimSorter bug.
  (Adrien Grand)
6. LUCENE-6001: DrillSideways hits NullPointerException for certain BooleanQuery searches.
  (Dragan Jotannovic, jane chang via Mike McCandless)
7. LUCENE-6311: Fix NIOFSDirectory and SimpleFSDirectory so that the toString method of IndexInputs confess when they are from a compound file.
  (Robert Muir, Mike McCandless)
8. LUCENE-6381: Add defensive wait time limit in DocumentsWriterStallControl to prevent hangs during indexing if we miss a .notify/All somewhere
  (Mike McCandless)
9. LUCENE-6386: Correct IndexWriter.forceMerge documentation to state that up to 3X (X = current index size) spare disk space may be needed to complete forceMerge(1).
  (Robert Muir, Shai Erera, Mike McCandless)
10. LUCENE-6395: Seeking by term ordinal was failing to set the term's bytes in MemoryIndex
  (Mike McCandless)
优化(16)
1. LUCENE-6183, LUCENE-5647: Avoid recompressing stored fields and term vectors when merging segments without deletions. Lucene50Codec's BEST_COMPRESSION mode uses a higher deflate level for more compact storage.
  (Robert Muir)
2. LUCENE-6184: Make BooleanScorer only score windows that contain matches.
  (Adrien Grand)
3. LUCENE-6161: Speed up resolving of deleted terms to docIDs by doing a combined merge sort between deleted terms and segment terms instead of a separate merge sort for each segment. In delete-heavy use cases this can be a sizable speedup.
  (Mike McCandless)
4. LUCENE-6201: BooleanScorer can now deal with values of minShouldMatch that are greater than one and is used when queries produce dense result sets.
  (Adrien Grand)
5. LUCENE-6218: Don't decode frequencies or match all positions when scoring is not needed.
  (Robert Muir)
6. LUCENE-6233 Speed up CheckIndex when the index has term vectors
  (Robert Muir, Mike McCandless)
7. LUCENE-6198: Added the TwoPhaseIterator API, exposed on scorers which is for now only used on phrase queries and conjunctions in order to check positions lazily if the phrase query is in a conjunction with other queries.
  (Robert Muir, Adrien Grand, David Smiley)
8. LUCENE-6244, LUCENE-6251: All boolean queries but those that have a minShouldMatch > 1 now either propagate or take advantage of the two-phase iteration capabilities added in LUCENE-6198.
  (Adrien Grand, Robert Muir)
9. LUCENE-6241: FSDirectory.listAll() doesnt filter out subdirectories anymore, for faster performance. Subdirectories don't matter to Lucene. If you need to filter out non-index files with some custom usage, you may want to look at the IndexFileNames class.
  (Robert Muir)
10. LUCENE-6262: ConstantScoreQuery does not wrap the inner weight anymore when scores are not required.
  (Adrien Grand)
11. LUCENE-6263: MultiCollector automatically caches scores when several collectors need them.
  (Adrien Grand)
12. LUCENE-6275: SloppyPhraseScorer now uses the same logic as ConjunctionScorer in order to advance doc IDs, which takes advantage of the cost() API.
  (Adrien Grand)
13. LUCENE-6290: QueryWrapperFilter propagates approximations and FilteredQuery rewrites to a BooleanQuery when the filter is a QueryWrapperFilter in order to leverage approximations.
  (Adrien Grand)
14. LUCENE-6318: Reduce RAM usage of FieldInfos when there are many fields.
  (Mike McCandless, Robert Muir)
15. LUCENE-6320: Speed up CheckIndex.
  (Robert Muir)
16. LUCENE-4942: Optimized the encoding of PrefixTreeStrategy indexes for non-point data: 33% smaller index, 68% faster indexing, and 44% faster searching. YMMV
  (David Smiley)
API 改进 (21)
- > .totalMaxDoc and MergePolicy.OneMerge.totalDocCount ->
1. LUCENE-6204, LUCENE-6208: Simplify CompoundFormat: remove files() and remove files parameter to write().
  (Robert Muir)
2. LUCENE-6217: Add IndexWriter.isOpen and getTragicException.
  (Simon Willnauer, Mike McCandless)
3. LUCENE-6218, LUCENE-6220: Add Collector.needsScores() and needsScores parameter to Query.createWeight().
  (Robert Muir, Adrien Grand)
4. LUCENE-4524, LUCENE-6246, LUCENE-6256, LUCENE-6271: Merge DocsEnum and DocsAndPositionsEnum into a single PostingsEnum iterator. TermsEnum.docs() and TermsEnum.docsAndPositions() are replaced by TermsEnum.postings().
  (Alan Woodward, Simon Willnauer, Robert Muir, Ryan Ernst)
5. LUCENE-6222: Removed TermFilter, use a QueryWrapperFilter(TermQuery) instead. This will be as efficient now that queries can opt out from scoring.
  (Adrien Grand)
6. LUCENE-6269: Removed BooleanFilter, use a QueryWrapperFilter(BooleanQuery) instead.
  (Adrien Grand)
7. LUCENE-6270: Replaced TermsFilter with TermsQuery, use a QueryWrapperFilter(TermsQuery) instead.
  (Adrien Grand)
8. LUCENE-6223: Move BooleanQuery.BooleanWeight to BooleanWeight.
  (Robert Muir)
9. LUCENE-1518: Make Filter extend Query and return 0 as score.
  (Uwe Schindler, Adrien Grand)
10. LUCENE-6245: Force Filter subclasses to implement toString API from Query.
  (Ryan Ernst)
11. LUCENE-6268: Replace FieldValueFilter and DocValuesRangeFilter with equivalent queries that support approximations.
  (Adrien Grand)
12. LUCENE-6289: Replace DocValuesRangeFilter with DocValuesRangeQuery which supports approximations.
  (Adrien Grand)
13. LUCENE-6266: Remove unnecessary Directory params from SegmentInfo.toString, SegmentInfos.files/toString, and SegmentCommitInfo.toString.
  (Robert Muir)
14. LUCENE-6272: Scorer extends DocSetIdIterator rather than DocsEnum
  (Alan Woodward)
15. LUCENE-6281: Removed support for slow collations from lucene/sandbox. Better performance would be achieved through CollationKeyAnalyzer or ICUCollationKeyAnalyzer.
  (Adrien Grand)
16. LUCENE-6286: Removed IndexSearcher methods that take a Filter object. A BooleanQuery with a filter clause must be used instead.
  (Adrien Grand)
17. LUCENE-6300: PrefixFilter, TermRangeFilter and NumericRangeFilter have been removed. Use PrefixQuery, TermRangeQuery and NumericRangeQuery instead.
  (Adrien Grand)
18. LUCENE-6303: Replaced FilterCache with QueryCache and CachingWrapperFilter with CachingWrapperQuery.
  (Adrien Grand)
19. LUCENE-6317: Deprecate DataOutput.writeStringSet and writeStringStringMap. Use writeSetOfStrings/Maps instead.
  (Mike McCandless, Robert Muir)
20. LUCENE-6307: Rename SegmentInfo.getDocCount -> .maxDoc, SegmentInfos.totalDocCount -> .totalMaxDoc, MergeInfo.totalDocCount.totalMaxDoc
  (Adrien Grand, Robert Muir, Mike McCandless)
21. LUCENE-6367: PrefixQuery now subclasses AutomatonQuery, removing the specialized PrefixTermsEnum.
  (Robert Muir, Mike McCandless)
其他 (6)
1. LUCENE-6248: Remove unused odd constants from StandardSyntaxParser.jj
  (Dawid Weiss)
2. LUCENE-6193: Collapse identical catch branches in try-catch statements.
  (shalin)
3. LUCENE-6239: Removed RAMUsageEstimator's sun.misc.Unsafe calls.
  (Robert Muir, Dawid Weiss, Uwe Schindler)
4. LUCENE-6292: Seed StringHelper better.
  (Robert Muir)
5. LUCENE-6333: Refactored queries to delegate their equals and hashcode impls to the super class.
  (Lee Hinman via Adrien Grand)
6. LUCENE-6343: DefaultSimilarity javadocs had the wrong float value to demonstrate precision of encoded norms
  (András Péteri via Mike McCandless)

更多详细内容请看更新日志。

Lucene 是apache软件基金会一个开放源代码的全文检索引擎工具包，是一个全文检索引擎的架构，提供了完整的查询引擎和索引引擎，部分文本分析引擎。 Lucene的目的是为软件开发人员提供一个简单易用的工具包，以方便的在目标系统中实现全文检索的功能，或者是以此为基础建立起完整的全文检索引擎。

Lucene 最初是由Doug Cutting所撰写的，是一位资深全文索引/检索专家，曾经是V-Twin搜索引擎的主要开发者，后来在Excite担任高级系统架构设计师，目前从事于一些INTERNET底层架构的研究。他贡献出Lucene的目标是为各种中小型应用程式加入全文检索功能。

历史版本 :
Java 搜索引擎 Apache Lucene 7.2.0 发布，Bug 修复
Apache Lucene 7.2.0 发布，Java 搜索引擎
Apache Lucene 5.5.5 发布，Java 搜索引擎
Apache Lucene 6.6.2 发布，Java 搜索引擎
Apache Lucene 和 Solr 7.1.0 发布，Java 搜索引擎
Apache Lucene 7.0.1 发布，Java 搜索引擎
Apache Lucene 7.0.0 发布，Java 搜索引擎
Apache Lucene 6.6.1 发布，Java 搜索引擎
LucenePlus 1.4，基于 Lucene 的全文搜索框架
Apache Lucene 6.6.0 发布，Java 搜索引擎
Apache Lucene 6.5.1 发布，Java 搜索引擎
Apache Lucene 6.5.0 发布，Java 搜索引擎

Lucene 5.1.0 发布，Java 搜索引擎

Apache Lucene全文检索引擎工具包

后端技术

前端技术

数据库

热门框架

常用IDE

其他