发布于 2016-03-11 06:02:25 | 184 次阅读 | 评论: 0 | 来源: 网友投递
Apache Spark
Spark是UC Berkeley AMP lab所开源的类Hadoop MapReduce的通用的并行,Spark,拥有Hadoop MapReduce所具有的优点;但不同于MapReduce的是Job中间输出结果可以保存在内存中,从而不再需要读写HDFS,因此Spark能更好地适用于数据挖掘与机器学习等需要迭代的map reduce的算法。
Apache spark 1.6.1 发布了,
新特性
[SPARK-10359] - Enumerate Spark's dependencies in a file and diff against it for new pull requests
Bug 修复
[SPARK-7615] - MLLIB Word2Vec wordVectors divided by Euclidean Norm equals to zero
[SPARK-9844] - File appender race condition during SparkWorker shutdown
[SPARK-10524] - Decision tree binary classification with ordered categorical features: incorrect centroid
[SPARK-10847] - Pyspark - DataFrame - Optional Metadata with `None` triggers cryptic failure
[SPARK-11394] - PostgreDialect cannot handle BYTE types
[SPARK-11624] - Spark SQL CLI will set sessionstate twice
[SPARK-11972] - [Spark SQL] the value of 'hiveconf' parameter in CLI can't be got after enter spark-sql session
[SPARK-12006] - GaussianMixture.train crashes if an initial model is not None
[SPARK-12010] - Spark JDBC requires support for column-name-free INSERT syntax
[SPARK-12016] - word2vec load model can't use findSynonyms to get words
[SPARK-12026] - ChiSqTest gets slower and slower over time when number of features is large
[SPARK-12268] - pyspark shell uses execfile which breaks python3 compatibility
[SPARK-12300] - Fix schema inferance on local collections
[SPARK-12316] - Stack overflow with endless call of `Delegation token thread` when application end.
[SPARK-12327] - lint-r checks fail with commented code
下载地址:http://spark.apache.org/downloads.html