在大数据浪潮全面来袭的历史背景下,我们一直面临着同一类难题的困扰——该选择哪款工具解决相关问题?这项挑战在大数据SQL引擎领域同样存在。作为大数据报告工具开发商,AtScale公司通过基准测试为我们带来了如下答案:
1. Spark 2.0在大规模查询性能方面可达1.6版本的2.4倍。二者的小规模查询性能基本持平。
Spark 2.0 improved its large query performance by an average of 2.4X over Spark 1.6 (so upgrade!). Small query performance was already good and remained roughly the same.
2. Impala 2.6版本在大规模查询性能可达2.3版本的2.8倍,小规模查询基本持平。
Impala 2.6 is 2.8X as fast for large queries as version 2.3. Small query performance was already good and remained roughly the same.
3. Hive 2.1配合LLAP在大规模查询场景下可实现1.2版本性能的3.4倍,小规模查询性能则为2倍。
Hive 2.1 with LLAP is over 3.4X faster than 1.2, and its small query performance doubled. If you're using Hive, this isn't an upgrade you can afford to skip.