Keynote Speeches

Transfer Learning: from Bayesian Adaptation to Teacher-Student Modeling

Chin-Hui

Chin-Hui Lee
School of ECE, Georgia Tech, USA

Chair: Haizhou Li, Professor, National University of Singapore

Abstract
Transfer learning is referred to as a process of distilling knowledge learned in one task and utilizing it in another related task. In machine learning, transfer learning and domain adaptation are often synonymous, and they are designed to combat catastrophic forgetting of not remembering much of what had already been learned in the transfer process. When using generative models, such as probability distributions to characterize observed data with a set of parameters to be transferred, a Bayesian formulation is often adopted to combine knowledge summarized in prior distributions of the parameters and likelihood of newly observed adaptation data to establish a posterior distribution of the parameters to be optimized. Recently we had extended Bayesian adaptation to discriminative models, such as deep neural networks, and obtained a similar effectiveness. Another emerging approach, known as teacher-student (T-S) modeling, is to summarize what had been learned in a teacher model and what to be transferred to in a student model with similar or different architectures. An objective function characterizing the discrepancies between behaviors of the teacher and student models is then optimized for the student model on a set of adaptation data. Generative adversarial networks have also been used to preform adaptation data augmentation. Such a T-S learning framework facilitates a versatile variety of scenarios and applications. In this talk, we will present technical dimensions in transfer learning and highlight its potential opportunities.

Speaker's Biography
Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Before joining academia in 2001, he had accumulated 20 years of industrial experience ending in Bell Laboratories, Murray Hill, as a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. Dr. Lee is a Fellow of the IEEE and a Fellow of ISCA. He has published over 500 papers and 30 patents, with more than 45,000 citations and an h-index of 80 on Google Scholar. He received numerous awards, including the Bell Labs President's Gold Award in 1998. He won the SPS's 2006 Technical Achievement Award for “Exceptional Contributions to the Field of Automatic Speech Recognition”. In 2012 he gave an ICASSP plenary talk on the future of automatic speech recognition. In the same year he was awarded the ISCA Medal in scientific achievement for “pioneering and seminal contributions to the principles and practice of automatic speech and speaker recognition”.

Digital Retina – Improvement of Cloud Artificial Vision System from Enlighten of HVS Evolution

Gao-Wen

Wen Gao
Department of Computer Science and Technology, Peking University, China

Chair: Hitoshi Kiya, Professor, Tokyo Metropolitan University

Abstract
Edge computing is hop topics recently, and the smart city wave seems to be making more and more video devices in cloud vision system upgraded from traditional video camera into edge video device. However, there are some arguments on how much intelligence the device should be with, and how much the cloud should keep. Human visual system (HVS) took millions of years to reach its present highly evolved state, it might not be perfect yet, but much better than any of exist computer vision system. Most artificial visual system are consisted of camera and computer, like eye and brain for human, but with very low level pathway between two parts, comparing to human being. The pathway model of human being between eye and brain is quite complex, but energy efficient and comprehensive accurate, evolved by natural selection. In this talk, I will discuss a new idea about how we can improve the cloud vision system by HVS-like pathway model, which is called digital retina, to make the cloud vision system being more efficient and smart. The digital retina is with three key features, and the detail will be given in the talk.

Speaker's Biography
Wen Gao is a professor in the Department of Computer Science and Technology at Peking University, Beijing, China. He is the founding director of NELVT (National Engineering Lab. on Video Technology) at Peking University. He is also the Chief Scientist of the National Basic Research Program of China (973 Program) on Video Coding Technology from 2009, and the vice president of National Natural Science Foundation of China from 2013. He is working in the areas of multimedia and computer vision, including video coding, video analysis, multimedia retrieval, face recognition, and multimodal interface. He has published 6 books and over 700 technical articles in refereed journals and proceedings in above areas. His publications have been cited for over 21,000 times according to Google Scholar. He served or serves on the editorial board for several journals, such as IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Multimedia, IEEE Transactions on Autonomous Mental Development, EURASIP Journal of Image Communications, Journal of Visual Communication and Image Representation. He chaired a number of prestigious international conferences on multimedia and video signal processing, such as IEEE ICME 2007, ACM Multimedia 2009, IEEE ISCAS 2013, and also served on the advisory and technical committees of numerous professional organizations. He earned many awards such as one second class award in technology invention by the State Council, and six second class awards in science and technology achievement by State Council. He is also active in national and international academic activities. He has been featured by IEEE Spectrum in June 2005 as one of the "Ten To Watch" among China's leading technologists. He served as the chairman of steering committee for intelligent computing system in 863 Hi-Tech Program from 1996 to 2001. He served or serves as the vice chairman of Chinese Association of Image and Graphics, the vice chairman of Chinese Association of Software Industry. He was the Head of Chinese Delegation to the Moving Picture Expert Group (MPEG) of International Standard Organization (ISO) from 1997 to 2011. He is the chair of Audio Video coding Standard (AVS) working group in China, and the chair of IEEE 1857 standard working group, which is a new standard working force in IEEE standard society for internet multimedia coding.

Applying Deep Learning in Non-native Spoken English Assessment

Kate-Knill

Kate Knill
Automatic Language Teaching and Assessment Institute (ALTA), Cambridge University, UK

Chair: Hongwu Yang, Professor, Northwest Normal University

Abstract
Over 1.5 billion people worldwide are using and learning English as an additional language. This has created a high and growing demand for certification of learners' proficiency, for example for entry to university or for jobs. Automatic assessment systems can help meet this need by reducing human assessment effort. They can also enable learners to monitor their progress with informal assessment when and wherever they choose. Traditionally automatic speech assessment systems were based on read speech so what the candidate said was (mostly) known. To properly assess a candidate's spoken communication ability, however, the candidate needs to be assessed on free, spontaneous, speech. The text is, of course, unknown in such speech, and we don't speak in fluent sentences. we hesitate and stop and restart. Added to this any automatic system has to handle a wide variety of accents and pronunciations for learners across first languages and highly variable audio recording quality. Together this makes non-native spoken English assessment a challenging problem. To help meet the challenge deep learning has been applied to a number of sub-tasks. This talk will look at some examples of how deep learning is helping to create automatic systems capable of free speaking spoken English assessment. These will include: 1) efficient ASR systems, and ensemble combination, for non-native English; 2) prompt-response relevance for off-topic response detection; 3) task-specific phone “distance” features for assessment and L1 detection; 4) grammatical error detection and correction for learner English. Deep learning techniques used in the above, include: recurrent sequence models; sequence ensemble distillation (teacher-student training); attentions mechanisms; and Siamese networks.

Speaker's Biography
Dr. Kate Knill is a Principal Research Associate at the Department of Engineering and the Automatic Language Teaching and Assessment Institute (ALTA), Cambridge University. Kate was sponsored by Marconi Underwater Systems Ltd for her 1st class B.Eng. (Jt. Hons) degree in Electronic Engineering and Maths at Nottingham University and a PhD in Digital Signal Processing at Imperial College. She has worked for 25 years on spoken language processing, developing automatic speech recognition and text-to-speech synthesis systems in industry and academia. As an individual researcher and a leader of multi-disciplinary teams as Languages Manager, Nuance Communications, and Assistant Managing Director, Toshiba Research Europe Ltd, Cambridge Research Lab, she has developed speech systems for over 50 languages and dialects. Her current research focus is on applications for non-native spoken English language assessment and learning and detection of speech and language disorders. She is Secretary of the International Speech Communication Association (ISCA) and a member of the Institution of Engineering and Technology (IET) and Institute of Electrical and Electronic Engineers (IEEE).