发布于 2017-07-12 23:52:06 | 147 次阅读 | 评论: 0 | 来源: 网友投递
Apache Tika 内容抽取工具集合
Apache Tika 利用现有的解析类库,从不同格式的文档中(例如HTML, PDF, Doc),侦测和提取出元数据和结构化内容。
Apache Tika 1.16 发布了,
部分更新内容如下:
Exclude jj2000 from edu.ucar grip to avoid potential
license conflicts with ASL 2.0
Add Age recognition using Ensemble model for Linear regression
and Apache OpenNLP Maximum Entropy. Tika can now detect age from
text (TIKA-1988).
Add Tika Deep Learning support for the VGG16 model for
Very Deep Convolutional Networks for Large-Scale Image Recognition.
Now Tika supports both Inception v3/v4 and VGG16 based image
recognition (TIKA-2298).
Extract macros from PPT (TIKA-2089).
下载地址: