├── ICASSP2021_tutorial_T9_slides.pdf └── README.md /ICASSP2021_tutorial_T9_slides.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ICASSP2021-tutorial9/Distant_conversational_ASR_and_analysis/0b79f13cec89f53f1ceccacc970571de31953cb9/ICASSP2021_tutorial_T9_slides.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## [ICASSP 2021 Tutorial] Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization 2 | 3 | This repository contains a set of materials used for the ICASSP2021 Tutorial T9 **"Distant conversational speech recognition and analysis: Recent advances, and trends towards end-to-end optimization"** presented by **Keisuke Kinoshita, Yusuke Fujita, Naoyuki Kanda, Shinji Watanabe**. 4 | 5 | ### Slides 6 | 7 | [PDF slides (latest)](ICASSP2021_tutorial_T9_slides.pdf) 8 | 9 | ### Abstract 10 | 11 | Recognizing unsegmented conversational speech recorded with distant microphone(s) is a challenging but an essential task to be solved to unfold a myriad of new speech applications, such as a communication agent that can understand, respond to and facilitate our conversation. This task contains a number of subtasks, which has been studied rather independently for a decade, such as multichannel/single-channel source separation, speaker diarization with source number counting, and conversational speech recognition. This tutorial first revisits, with demonstration, current state-of-the-art systems for this task, which were developed for challenges such as CHiME 5-6 challenges, and commercial products. These systems typically consist of a combination of well-established independently optimized modules. While these systems are designed carefully to consolidate these independent modules, there is still a large room for improvement. In the latter part of the tutorial, we introduce a recent new research trend that aims to establish an optimal joint neural system that solves those subtasks all together, through end-to-end optimization based on common integrated objective. By showing the potential of such jointly-optimal systems that now start outperforming previous top-performing systems in many tasks, we discuss the future directions and challenges for this task from both industry and academic perspectives. 12 | 13 | ### Tutorial Presentors 14 | - Keisuke Kinoshita (NTT Corporation, Japan) ([Email](mailto:keisuke.kinoshita@ieee.org)) 15 | - Yusuke Fujita (Line Corporation, Japan) 16 | - Naoyuki Kanda (Microsoft, USA) 17 | - Shinji Watanabe (Carnegie Mellon University, USA) 18 | 19 | ### Changelog 20 | 21 | #### 1.0.0 / 2021-06-07 22 | 23 | * First public release 24 | --------------------------------------------------------------------------------