Lucene 3.6发布 Java全文搜索引擎-java搜索引擎

Lucene是一套用于全文检索和搜寻的开源程式库，由Apache软件基金会支持和提供。Lucene提供了一个简单确强大的应用程式接口，能够做全文索引和搜寻，在Java开发环境里Lucene是一个成熟的免费开放源代码工具;就其本身而论，Lucene是现在并且是这几年，最受欢迎的免费java资讯检索程式库。人们经常提到资讯检索程式库，就像是搜寻引擎，但是不应该将资讯检索程式库与网搜索引擎相混淆。

Lucene 3.6 包含大量的 bug 修复、优化和改进，主要内容有：

* 完全支持 Java 7，要求 JDK 7u1
* TypeTokenFilter filters tokens based on their TypeAttribute.
* Fixed offset bugs in a number of CharFilters, Tokenizers and TokenFilters that could lead to exceptions during highlighting.
* Added phonetic encoders: Metaphone, Soundex, Caverphone, Beider-Morse, etc.
* CJKBigramFilter and CJKWidthFilter replace CJKTokenizer.
* Kuromoji morphological analyzer tokenizes Japanese text, producing both compound words and their segmentation.
* Static index pruning (Carmel pruning) removes postings with low within-document term frequency.
* QueryParser now interprets '*' as an open end for range queries.
* FieldValueFilter excludes documents missing the specified field.
* CheckIndex and IndexUpgrader allow you to specify the specific FSDirectory implementation to use with the new -dir-impl command-line option.
* FSTs can now do reverse lookup (by output) in certain cases and can be packed to reduce their size. There is now a method to retrieve top N shortest paths from a start node in an FST.
* New WFSTCompletionLookup suggester supports finer-grained ranking for suggestions.
* FST based suggesters now use an offline (disk-based) sort, instead of in-memory sort, when pre-sorting the suggestions.
* ToChildBlockJoinQuery joins in the opposite direction (parent down to child documents).
* New query-time joining is more flexible (but less performant) than index-time joins.
* Added HTMLStripCharFilter to strip HTML markup.
* Security fix: Better prevention of virtual machine SIGSEGVs when using MMapDirectory: Code using cloned IndexInputs of already closed indexes could possibly crash VM, allowing DoS attacks to your application.
* Many bug fixes...

下载地址：http://www.apache.org/dyn/closer.cgi/lucene/java/

【编辑推荐】