Building-up Centralized Data Catalog Service on AWS among various data/schema file formats, analytic tools

講者: 羅子勝 & 林振鍇 @ Trend Micro
時段: 15:50~16:30
地點: 綜合科管 B1 第三演講廳
講題: Building-up Centralized Data Catalog Service on AWS among various data/schema file formats, analytic tools

摘要:

In this talk, I will share the experience of 104 hackathon contest which includes how to build a chatbot system and I will also go through each solution for solving three kinds of question types (true-or-false, multiple choices and fill-in-the-blank).

In this contest, our team (Tobacco AI) build information retrieval system (ElasticSearch) for fetching the source text and do the similarity between source and question first. Then, for the different question types, we use the sentence embedding SIF (Smooth Inverse Frequency) for the true-or-false question and present a neural network model (context2vec) for learning a context embedding vector from 104 text corpus, using bidirectional LSTM. Therefore, we could use three types of similarity metrics in embedding space: target-to-context (t2c), context-to-context (c2c) and target-to-target (t2t) for the fill-in-blank question, As a result, we are the first place and win the prize in this contest.

講者簡介:

Rex TS Lo (Tzu-Sheng Lo) is software engineer of TrendMicro, Java and Python developer. Recently, he is responsible for building backend service in DataLake team on AWS.

Vincent Lin is a Sr. Data Engineer working in TrendMicro. He is an enthusiast in big data and cloud computing technologies, such as Spark and Hadoop. Recently, he is responsible for designing highly available, scalable and fault-tolerant systems on AWS and striving forward to solve any data related problem in the DataLake.

Tagged on: