What Makes a Modern Stream Processor: the Science behind Apache Flink

講者: Tzu-Li (Gordon) Tai / Software Engineer @ Data Artisans
地點:綜合科管 B1 第二演講廳
講題:What Makes a Modern Stream Processor: the Science behind Apache Flink


Stream Processing has evolved quickly in a short time: a few years ago, stream processing was mostly simple real-time aggregations with limited throughput and consistency. Today, many stream processing applications have complex logic, strict correctness guarantees, high performance, low latency, and maintain large state without databases.

Since then, Stream processing has become much more sophisticated because the stream processors – the systems that run the application code, coordinate the distributed execution, route the data streams, and ensure correctness in the face of failures and crashes – have become much more technologically advanced.

In this talk, we walk through some of the techniques and innovations behind Apache Flink, one of the most powerful open source stream processors. In particular, we plan to discuss:
The evolution of fault tolerance in stream processing, Flink’s approach of distributed asynchronous snapshots, and how that approach looks today after multiple years of collaborative work with users running large scale stream processing deployments. How Flink supports applications with terabytes of state and offers efficient snapshots, fast recovery, rescaling, and high throughput. How to build end-to-end consistency (exactly-once semantics) and transactional integration with other systems. How batch and streaming can both run on the same execution model with best-in-class performance.


Tzu-Li (Gordon) Tai is an Apache Flink® PMC member and software engineer at data Artisans. His main contributions in Apache Flink® includes work on some of the most widely used Flink connectors (Apache Kafka, AWS Kinesis, Elasticsearch), as well as features surrounding upgrade compatibility of stateful Flink streaming applications. Gordon is a frequent speaker at conferences such as Flink Forward, Strata Data, as well as several Taiwan-based conferences on the Hadoop ecosystem and data engineering in general.

Tagged on: ,