Multimedia Technologies Lab


Lab Name and Affiliation

Multimedia Technologies Lab

Institute of Information Science, Academia Sinica, Taiwan

Lab Director (or Principal Investigator)

MARK LIAO (F) received a BS degree in physics from National Tsing-Hua University, Hsin-Chu, Taiwan, in 1981, and an MS and Ph.D degree in electrical engineering from Northwestern University in 1985 and 1990, respectively. In July 1991, he joined the Institute of Information Science, Academia Sinica, Taiwan and currently, is a Distinguished Research Fellow. During 2009-2011, he was the Division Chair of the computer science and information engineering division II, National Science Council of Taiwan. He is jointly appointed as a Professor of the Computer Science and Information Engineering Department of National Chiao-Tung University and the Department of Electrical Engineering of National Cheng Kung University. During 2009-2012, he was jointly appointed as the Multimedia Information Chair Professor of National Chung Hsing University. From August 2010, he has been appointed as an Adjunct Chair Professor of Chung Yuan Christian University. His current research interests include multimedia signal processing, video-based Surveillance Systems, video forensics, and multimedia protection.

Dr. Liao is a Fellow of the IEEE. He is the recipient of the Young Investigators' Award from Academia Sinica in 1998; Distinguished Research Award from the National Science Council of Taiwan in 2003 and 2010; National Invention Award of Taiwan in 2004; Distinguished Scholar Research Project Award from National Science Council of Taiwan in 2008; and Academia Sinica Investigator Award in 2010. His professional activities include: Co-Chair, 2004 International Conference on Multimedia and Exposition (ICME); Technical Co-chair, 2007 ICME; General Co-Chair, 17th International Conference on Multimedia Modeling; President, Image Processing and Pattern Recognition Society of Taiwan (2006-08); Editorial Board Member, IEEE Signal Processing Magazine; Associate Editor, IEEE Transactions on Image Processing, IEEE Transactions on Information Forensics and Security (2009-12) and IEEE Transactions on Multimedia (1998-2001).

Lab Introduction

In the past two decades, multimedia technology influences many aspects of our daily life. Besides biotechnology and nanotechnology, multimedia technology has been considered one of the three most promising industries of the twenty-first century. Multimedia research covers a broad scope of techniques and rich applications, including those working on music, video, image, text, and 3-D animation. In the upcoming few years, we would continue to devote our research efforts in advancing the key fields in multimedia, including multi-perspective computer vision, compressive sensing/ sparse representation, video forensics, etc. In what follows, we shall describe in details some key fields.
A. Video Forensics
Since the 911 attacks on the United States, counter-terrorism strategies have been given a high priority in many countries. Surveillance camcorders are now almost ubiquitous in modern cities. As a result, the amount of recorded data is enormous, and it is time-consuming to search the digital video content manually. In this next few years, we shall put part of our effort on video forensics, in which a major proportion of related research work is to perform mining for criminal evidence in videos recorded by a heterogeneous collection of surveillance camcorders. This is a new interdisciplinary field, and people working in the field need video processing skills as well as an in-depth knowledge of forensic science; hence the barrier for entering the field is high. Mining surveillance videos directly for criminal evidence is very different from conventional crime scene investigations. In the latter, detectives need to actually visit the crime scene, check all available details and collect as much physical evidence as possible. By contrast, to conduct crime scene investigations directly from surveillance videos, forensic experts need to develop software that facilitates the automatic detection, tracking, and recognition of objects in the videos. Since the videos are captured by heterogeneous camputer corders, to perform evidence mining on these videos is more challenging. We shall start by addressing the multiple-camera people counting problem as well as visual knowledge transfer among a heterogeneous collection of surveillance camcorders.
B. Compressive Sensing and Sparse Representation
Compressed Sensing/Sampling (CS) is a revolutionary technology of simultaneously sensing and compressing signals, and builds a new sampling theorem beyond the Nyquist rate. It enables to finish joint data acquisition and compression with slight cost at the encoder (for resource-limited mobile devices and sensors) but shift major computational overhead to the decoder. Based on the assumption of signal sparsity, CS, in theory, can perfectly reconstruct the original signal from (far) fewer measurements via convex optimization or greedy algorithms. This completely new idea makes CS a hot topic in signal processing-related fields since its first appearance in 2006. Furthermore, for the problems that are inherent sparse or can be sparsified, CS have been adopted in broad areas. Undoubtedly, this emerging area opens opportunities for the study of fundamental issues and application-oriented problems. In the future, we will plan to study the following topics: (1) Fast Compressed Image Sensing (CIS); (2) Fast Orthogonal Matching Pursuit (FOMP); (3) Multiple input systems exploiting sparse representation (e.g., microphone array signal processing); and (4) single-pass codeword learning for sparse representation.
C. Multi-perspective computer vision
Making computers capable of perceiving the real-world visual information from various clues is challenging because of highcomplexity conceptions, changing environments, free motion, high articulations, and so on. As many visual concepts are difficult to be summarized in simple and plain rules, (statistical) machine learning has played an important role in the past decade (as witnessed in the main conferences such as CVPR, ICCV, and NIPS), and is still expected to be vital to the progress of computer vision. Besides, due to the considerable growing of data amount in the Internet age, training in large-scale (and possibly noisy) datasets becomes a significant issue. Furthermore, instead of observing the world only with color images in common viewing angles, 3D imaging (providing further depth information) and flying camera (providing more un-common viewing angles from bird eye views) could also bring us chances for developing novel applications in the near future. High-level visual concepts, such as aesthetics, have also been shown the possibility of being tackled by machine learning. To address the above issues, we will study several topics toward understanding visual information from multi-perspectives: (1) object detection, recognition, and segmentation from visual saliency, (2) tracking and interacting with flying cameras, (3) on-line aesthetic value assessment when shooting, and (4) deriving the 3D structure of conventional camera images. The research outcomes are expected to be helpful in making computers understand human intension, assisting human with better-quality and more-safety life, and supporting robot to see and understand the world better.

Lab Contact E-mail