发布于 2017-01-10 08:47:15 | 91 次阅读 | 评论: 0 | 来源: 网友投递
JStorm 分布式计算系统
Storm 是一个类似Hadoop MapReduce的系统, 用户按照指定的接口实现一个任务,然后将这个任务递交给JStorm系统,Jstorm将这个任务跑起来,并且按7 * 24小时运行起来,一旦中间一个worker 发生意外故障, 调度器立即分配一个新的worker替换这个失效的worker。因此,从应用的角度,JStorm 应用是一种遵守某种编程规范的分布式应用。
JStorm 2.2.1 发布,更新内容如下:
新功能
Performance is improved by 200%~300%, compared to Release 2.1.1 and 0.9.8.1 in several testing scenarios, while 120%~200% compared to Flink and 300%~400% compared to Storm.
Restructure the batch solution
Improve serialization and deserialization to reduce the cost of cpu and network
Improve the cost of cpu on critical path and metrics
Improve the strategy of netty client and netty server
Support consume and publish of disruptor queue under batch mode
Introduce snapshot exactly once framework
Compared to Trident solution, the performance of new framework is increased by several times. Besides it.
The new framework also support "at least once" mode. Compared to the acker mechanism,it will reduce the cost of relative calculation in acker, and the cost of network, which will improve the performance singificantly.
Support JStorm on yarn
Currently, jstorm cluster is capable of fast deployments,and fast scale-in/scale-out. It will improve the utility of resource.
Re-design the solution of backpressure. Currently, the flow control is stage by stage。
The solution is simple and effective now. The response is much more faster when the exchange of switch on/off of backpressure.
The performance and stability is improved significantly, compared to the original solution.
Introduce Window API
Support tumbling window,sliding window
window support two collection mode, count and duration.
Support watermark mechanism
Introduce the support of Flux
Flux is a programing framework or component which is aim to help create and deploy the topology of jstorm quickly.
Isolate the dependencies of jstorm and user topology by maven shade plugin to fix the conflict problem.
Improve Shuffle grouping solution
Integrate shuffle, localOrShuffle and localFirst. The grouping solution will be auto adapted according to the assignment of topology.
Introduce load aware in shuffle to ensure the load balance of downstreams.
Support to configure blacklist in Nimbus to exclude some problematic nodes
Support batch mode in trident
Supervisors will synchronize cluster configuration from nimbus master automatically
Add buildTs to supervisor info and heartbeats
Add ext module for nimbus and supervisor to support external plugins
Add jstorm-elasticsearch support, thanks to @elloray for your contribution
改进
Restructure nimbus metrics implementation. Currently, the topology metrics runnable is event-driven.
Restructure topology master. Currently, the processor in TM is event-drive.
Add some examples to cover more scenarios
Disable stream metrics to reduce the cost of sending metrics to Nimbus
Support metrics in local mode
Improve the implementation of gauge by changing the instantaneous value of each minute,to the average value of some sample values in each minute.
Introduce an approximate histogram calculation to reduce memory usage of histogram metrics
Add Full GC and supervisor network related metrics
修复
Fix message disorder bug
Fix the bug that some connections to zookeeper are not closed by expected when encountering exception in supervisor.
The deactivate might be called by mistake when task init
The rootId might be duplicated occasionally. It will cause the unexpected message failure.
Fix the bug when local mode
Fix logwriter's bug
Some task metrics(RecvTps ProcessLatency) might not be aggregated correctly.
Fix the racing condition of AsmCounter during flushing
下载地址: