ONE FOR ALL! Using Apache Calcite to make SQL smart

講者: 葉祐欣 Evans Ye / 技術專家 @ 阿里巴巴
地點:綜合科管 B1 第二演講廳
講題:ONE FOR ALL! Using Apache Calcite to make SQL smart


In the past when Hadoop was born, the big data world was focusing on how to build systems that scales. Now the world has evolved. HBase hits 2.0, Cassandra hits 3.0, Hive hits 3.0, etc. When scalability is conquered, what’s next? That’s right, usability comes into play. If we look back into the history, NoSQL is really just using divide and concur mechanism to tackle big data problems by trading off SQL capabilities. But once big data problem solved, we see more and more NoSQL and data processing engines start to build up SQL or SQL-like interfaces. Therefore, a generic SQL engine that provides core SQL capabilities such as query parsing, relational algebra, and query optimization starts to shine.

In this talk, I’ll walk you through the architecture, functionality, and design concept of Apache Calcite. Notice that Calcite itself is not a database, but many well known systems already incorporate Calcite as a library. For instance, Hive, Drill, Druid, Phoenix, Apex, Flink, Storm, Samza, and more. To better illustrate how Calcite works, I’ll choose some of the systems and describe how they adopt Calcite and which part is enhanced by Calcite. Furthermore, I’ll talk about several features that Calcite provides such as query optimization, heterogeneous data source, materialized view, and Stream SQL. From the user’s perspective, knowing better how these systems work behind the scene equips you with more knowledge to chose a system that ultimately suits your needs.


Yu-Hsin Yeh(Evans Ye) is currently a committer and PMC member of Apache Bigtop. He loves to code, automate things, and tackling big data challenges. Aside from engineering stuff, he is also an enthusiast in giving talks to share software innovations and cutting-edge technologies. Evans had talked about Bigtop’s several new features on Dataworks Summit 2017 San Jose, Apache: Big Data NA 2017/2016, EU 2015. He also proposed the SDACK architecture on DockerCon 2016. In addition to that, 2 times Big Data Innovation Summit speaker, 2 times HadoopCon speaker, 2 times Taiwan Hadoop user group speaker, and dozens of company internal talks make him an experienced presenter.

Tagged on: ,