发布于 2017-05-20 00:23:13 | 172 次阅读 | 评论: 0 | 来源: 网友投递
OpenNLP 自然语言处理工具
OpenNLP 是一个机器学习工具包,用于处理自然语言文本。支持大多数常用的 NLP 任务,例如:标识化、句子切分、部分词性标注、名称抽取、组块、解析等。
Apache OpenNLP 1.8.0 发布了,此版本带来了许多新功能、改进和错误修复。API 已经得到改进以获得更好的一致性,并且删除了许多不被赞同的方法。
更新如下:
POS Tagger context generator now supports feature generation XML
Add a Name Finder feature generator that adds POS Tag features
Add CONLL-U format support
Improve default Name Finder settings
TokenNameFinderEvaluator CLI now support nameTypes argument
Stupid backoff is now the default in NGramLanguageModel
Language codes now are ISO 639-3 compliant
Add many unit tests
Distribution package now includes example parameters file
Now prefix and suffix feature generators are configurable
Remove API in Document Categorizer for user specified tokenizer
Learnable lemmatizer now returns all possible lemmas for a given word and pos tag
Lemmatizer API backward compatibility break: no need to encode/decode lemmas anymore, now LemmatizerME lemmatize method returns the actual lemma
Add stemmer, detokenizer and sentence detection abbreviations for Irish
Chunker SequenceValidator signature changed to allow access to both token and POS tag
下载地址:
https://opennlp.apache.org/download.html