Keynotes ©APSIPA ASC 2013

Keynote #1: Nelson Morgan
Keynote #2: John Apostolopoulos
Keynote #3: Keiichi Tokuda

Keynote #1
Artificial Neural Networks for Speech Recognition: A Historical Perspective

Nelson Morgan
EECS Department at the University of California, USA
Nelson Morgan has been working on problems in signal processing and pattern recognition since 1974, with a primary emphasis on speech processing. He may have been the first to use neural networks for speech classification in a commercial application. He is a former Editor-in-chief of Speech Communication, and is also a Fellow of the IEEE and of ISCA. In 1997 he received the Signal Processing Magazine best paper award (together with co-author Herve Bourlard) for an article that described the basic hybrid HMM/MLP approach that is used in most of the current “Deep” neural network approaches to speech recognition. He also co-wrote a text (written jointly with Ben Gold) on speech and audio signal processing, with a new (2011) second edition that was revised in collaboration with Dan Ellis of Columbia University. He is the deputy director (and former director) of the International Computer Science Institute (ICSI), and is a Professor-in-residence in the EECS Department at the University of California at Berkeley.
Abstract
For over twenty years, Gaussian mixtures have been the predominant mechanism for computing the emission probabilities used in hidden Markov model–?based systems for speech recognition. In recent years, there has been significant progress in the use of artificial neural networks for this purpose, enough so that many proponents view this method as now being dominant. However, this development did not arise from a vacuum. On the contrary, as often happens in science, the methods now in use have predecessors from decades ago. In this presentation, I will describe the modern history of neural networks for speech recognition, reaching back as far as 50 years ago. I will conclude with a set of concerns that remain, and point to possible insights that we can gain from the one truly effective system for speech recognition: the human brain.

Keynote #2
Advances in Signal Processing for Networked Applications

John Apostolopoulos
CTO and VP, Enterprise Networking Group, and leading Enterprise Networking Labs at Cisco Systems, USA
John Apostolopoulos is VP & CTO of the Enterprise Networking Group (ENG) at Cisco. ENG is a $20B/year business that covers wired and wireless networking, mobility/BYOD, software defined networking, Internet of Things, and video over enterprise networks. He is also founder of the Enterprise Networking Labs whose goal is to increase innovation in areas of strategic importance to ENG. Previously John was Lab Director for the Mobile & Immersive Experience Lab (MIX Lab) at HP Labs. The MIX Lab’s goal was to create compelling networked media experiences that fundamentally change how people communicate, collaborate, socialize and entertain. The MIX Lab conducted research on novel mobile devices and sensing, mobile client/cloud multimedia computing, immersive environments, video & audio signal processing, computer vision & graphics, multimedia networking, glasses-free 3D, next-generation plastic displays, wireless, and user experience design. John received a number of honors and awards for his individual technical contributions including IEEE SPS Distinguished Lecturer, IEEE Fellow, named “one of the world’s top 100 young (under 35) innovators in science and technology” (TR100) by MIT Technology Review, Certificate of Honor for contributing to the US Digital TV standard (Engineering Emmy Award, 1997), and he also helped create the JPEG-2000 Security (JPSEC) standard. He has published over 100 papers, received several paper awards, and has 60 granted US patents. John also has strong ties with the academic community and was a Consulting Associate Professor of EE at Stanford (2000-09) and is a frequent visiting lecturer at MIT. He received his B.S., M.S., and Ph.D. from MIT.
Abstract
Advances in how we capture, process, and deliver information are enabling novel and compelling networked applications. This talk will discuss exciting recent advances and promising on-going research efforts in the following areas: a) capture and sensing, b) mobile client/cloud computing, c) indoor-location-based services, and d) the Internet of Things. This talk will showcase the central importance of signal processing in advancing the above areas, and also highlight some of the interdisciplinary advances in networking and the broader technical landscape that make this possible.

Keynote #3
Flexible Speech Synthesis Based on Hidden Markov Models

Keiichi Tokuda
Department of Computer Science, Nagoya Institute of Technology, Japan
Keiichi Tokuda is the director of the Speech Processing Laboratory and a Professor in the Department of Computer Science at Nagoya Institute of Technology. He has been working on HMM-based speech synthesis after he proposed an algorithm for speech parameter generation from HMM in 1995. He is also the principal designer of opensource software packages: HTS (http://hts.sp.nitech.ac.jp/) and SPTK (http://sp-tk.sourceforge.net/). In 2005, Dr. Alan Black (CMU) and Keiichi Tokuda organized the largest ever evaluation of corpus-based speech synthesis techniques, the Blizzard Challenge, which has progressed to an annual event. He is a Fellow of ISCA. He published over 80 journal papers and over 200 conference papers, and received six paper awards and two achievement awards.
Abstract
This talk will give a general overview of hidden Markov model (HMM)-based speech synthesis, which has recently been demonstrated to be very effective in synthesizing speech. For constructing human-like talking machines, speech synthesis systems are required to have an ability to generate speech with arbitrary speaker’s voice, various speaking styles in different languages, varying emphasis and focus, and/or emotional expressions. The main advantage of the HMM-based approach is that such flexibility can easily be realized using mathematically well-defined algorithms. In this talk, the system architecture is outlined, and then basic techniques used in the system are presented. Advanced techniques for future developments will also be presented. Not only the technical description but also recent results and demos will be presented.