PyData Yerevan 2022

NVIDIA NeMo: toolkit for conversational AI
08-12, 11:15–11:55 (Asia/Yerevan), 113W PAB

This talk introduces NeMo: NVIDIA's open-source toolkit for conversational AI that provides a wide collection of models for automatic speech recognition, text-to-speech, natural language processing and neural machine translation.


Conversational AI is a technology that allows a “machine” to speak to a person in a natural language. NVIDIA NeMo is an open source conversational AI toolkit built for researchers working on automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech synthesis (TTS). The primary objective of NeMo is to help researchers from industry and academia to develop new models for automatic speech recognition, text-to-speech, natural language processing and neural machine translation. Nemo also has a large number of step-by-step tutorials and pre-trained models.

The outline of the talk goes as follows:
1. NeMo overview.
2. Where to start: tutorials on ASR, TTS and NLP.
3. NeMo ASR overview.
4. NeMo TTS overview.
5. NeMo NLP overview.
6. From research to production: deploying NeMo models.

After the talk, you will learn where to start diving into conversational AI and how to create and use AI models with NeMo.

Prior deep learning knowledge will help make the most of the presentation.


Prior Knowledge Expected

No previous knowledge expected

Aleksandr Laptev is a Ph.D. student at ITMO University and a senior research scientist as NVIDIA. His scientific interests are Automatic Speech Recognition, Speech Synthesis (TTS), and Natural Language Processing. He writes open-access scientific articles, contributes to open-source software, and participates in international speech recognition competitions. His current research area is differentiable Weighted Finite-State Transducers.