PyData Yerevan 2022

Scaling Semi-Supervised Production-Grade ASR on 200 Languages
08-13, 16:15–16:55 (Asia/Yerevan), 113W PAB

Self-Supervised pretraining has been wildly successful lately, covering almost every domain: Speech, NLP, Vision. Networks, such as: Wav2Vec2, Hubert, JUST and alikes have enabled rapid development of Speech-related products. In this talk we're going to go through the end-to-end research and engineering process of production-grade self-supervised ASR in the multilingual setting. Covered topics include: Compute, Data, Scalability, Engineering for Pretraining and Downstream Tuning.


In the difference to academic research & papers, scaling cutting edge deep learning products for production is practically another field. The entire pipeline - Data Acquisition, Network Design, Compute Acquisition and Planning, Parallel Experimentation, Pretraining, Downstream Tuning - is vastly different in the production setting. And, scaling the full pipeline on hundreds of languages require top-notch expertise and a lot of resources. In this talk, we're going to cover the entire process - from idea, to deployment in the production setting, using novel, cutting edge Deep Learning approaches. We are going to define the computational resources and their efficiency, understand secret data recipes and keys for smarter pretraining and verify the importance of parallel experimentation.


Prior Knowledge Expected

Previous knowledge expected

Luka Chkhetiani is a Deep Learning Research & Technology Lead with extensive experience in research, management, deployment, and optimization of end-to-end deep learning services in cloud and edge ecosystems.
He is responsible for Research & Technology Leadership for Unsupervised and Semi-Supervised multilingual ASR research, deployment, and optimization at AssemblyAI.