|講者： Josh YEH / Software Engineer @ Cloudera
地點：綜合科館 1F 第一演講廳
講題：Machine Learning Model Experiments and Deployment 0-60
Deep learning is cool but hard. Choosing your infrastructure for DL is even harder. One of the core ML problems is the ability to manage and track model experiments, such as training runs where they are interested in tracking model performance for different parameters or model configurations. Once promising experiment runs are produced, how to reproduce and continue tuning it with better accuracy is another challenge. Not even to mention how to deploy well-trained models at production on large scale with ease. Once models are becoming inaccurate, retrain with the latest dataset is another non-trivial task. This talk will look into these major challenges where data scientist spends a majority of time on, and solve them with a unified platform.
* Worked on Cloudera Manager (CM) and Cloudera Distribution Including Apache Hadoop (CDH)
* Worked on HDFS and Hive Backup Disaster Recovery (BDR)
* Work on the internal DevOP framework (CDEP) and end-to-end test framework (SYSTEST)
* Working on Cloudera Data Science Workbench (CDSW) with kubernetes/docker as the backend, with emphasis on compatibility with Apache HDFS, YARN, Spark. Built automation framework and testing infrastructure for CDSW from scratch.
* Building data pipeline/workload automation with ML/DL/AI framework: Keras, Torch, TensorFlow and etc, in CDSW
Recipient of 2017 Cloudera Impact Award. Speaker for Big Data and ML conferences/meetups.
- DataCon.TW 2018 Kafka 修煉之道 課前須知
- The Stream Processing Landscape: Mindset and Technologies