( 1 January 2016 - 31 December 2017)|
Wen-Huang Cheng, Academia Sinica, Taiwan|
Cheng received the B.S. and M.S. degrees in computer science and information engineering
from National Taiwan University, Taipei, Taiwan, in 2002 and 2004, respectively,
where he received the Ph.D. (Hons.) degree from the Graduate Institute of Networking
and Multimedia in 2008.
is currently an Associate Research Fellow with the Research Center for Information
Technology Innovation (CITI), Academia Sinica, Taipei, Taiwan, where he is the
Founding Leader with the Multimedia Computing Laboratory (MCLab), CITI, and an
Assistant Research Fellow with a joint appointment in the Institute of Information
Science. Before joining Academia Sinica, he was a Principal Researcher with MagicLabs,
HTC Corporation, Taoyuan, Taiwan, from 2009 to 2010. His current research interests
include multimedia content analysis, multimedia big data, deep learning, computer
vision, mobile multimedia computing, social media, and human computer interaction.
Cheng has received numerous research awards, including the Outstanding Youth Electrical
Engineer Award from the Chinese Institute of Electrical Engineering in 2015, the
Top 10% Paper Award from the 2015 IEEE International Workshop on Multimedia Signal
Processing, the Outstanding Reviewer Award from the 2015 ACM International Conference
on Internet Multimedia Computing and Service, the Prize Award of Multimedia Grand
Challenge from the 2014 ACM Multimedia Conference, the K. T. Li Young Researcher
Award from the ACM Taipei/Taiwan Chapter in 2014, the Outstanding Young Scholar
Awards from the Ministry of Science and Technology in 2014 and 2012, the Outstanding
Social Youth of Taipei Municipal in 2014, the Best Reviewer Award from the 2013
Pacific-Rim Conference on Multimedia, the Best Poster Paper Award from the 2012
International Conference on 3D Systems and Applications.
Lecture 1: Sensing
Visual Semantics for Interactive Multimedia Applications
For the effective
development of interactive multimedia applications, one key technology is the
multimedia content analysis, especially its achievable semantic level, i.e. the
level of comprehension for a multimedia system on the multimedia content. However,
since visual entities like objects in real-world photos and videos are usually
captured in uncontrolled conditions, such as various viewpoints, positions, scales,
and background clutter, in this lecture, we will present sensing techniques for
giving robust visual semantics retrieval and recognition in real-world scenes.
Several application scenarios will be showcased to demonstrate the effectiveness
of the proposed sensing techniques. In particular, one application is to analyze
the fashion trend of clothes from real video contents. Another application is
mobile vision for locating the visual objects precisely and meanwhile achieving
real-time performances. The third and the last application is video-based human
posture and gesture detection to benefit the creation of serious gaming environments
for professional training purpose.
2: Exploring Social Semantics from Multimedia Big Data
Users are key elements
in social multimedia and a huge amount of user-generated multimedia contents (multimedia
big data) are created and exchanged through the social interactions among users.
Exploring social semantics from multimedia big data is thus an effective way to
understanding users and their behaviors. Particularly, popularity prediction on
social media is a specific type of social semantics and has attracted extensive
attention because of its widespread applications, such as online marketing, trend
detection and resource allocation. Generally, given historical user-item pairs,
popularity prediction is defined as the problem of estimating the rating scores,
view counts or click-through of a newly post in social media. In this lecture,
we first review existing research on popularity prediction that predominantly
focuses on exploring the correlation between popularity and user-item factors,
such as item content, user cues, social relation, and user-item interaction. In
fact, time also exerts crucial impact on the popularity but is often overlooked.
We further present techniques on investigating the popularity prediction from
two complementary perspectives by factoring popularity into two contextual associations,
i.e., user-item context and time-sensitive context. The user-item context is linked
to popularity with user-specific and item-specific contextual information, which
can be derived from user-item sharing behaviors on social media. The time-sensitive
context is affected by 'change over time' information (associated with sharing
time of photos) including user activeness variability and photo prevalence variability.
Finally, further research on exploring social semantics from multimedia big data
will be addressed.
Gene Cheung, National Institute of Informatics,
(M'00-SM'07) received the B.S. degree in electrical engineering from Cornell University
in 1995, and the M.S. and Ph.D. degrees in electrical engineering and computer
science from the University of California, Berkeley, in 1998 and 2000, respectively.
was a senior researcher in Hewlett-Packard Laboratories Japan, Tokyo, from 2000
till 2009. He is now an associate professor in National Institute of Informatics
in Tokyo, Japan. He is an adjunct associate professor in the Hong Kong University
of Science & Technology (HKUST) since 2015.
His research interests
include image & video representation, immersive visual communication and graph
signal processing. He has served as associate editor for IEEE Transactions on
Multimedia (2007-2011) and DSP Applications Column in IEEE Signal Processing Magazine
(2010-2014). He currently serves as associate editor for IEEE Transactions on
Image Processing (2015-present), SPIE Journal of Electronic Imaging (2014-present)
and APSIPA Journal on Signal & Information Processing (2011-present), and
as area editor for EURASIP Signal Processing: Image Communication (2011-present).
He will serve as associate editor for IEEE Circuits and Systems for Video Technology
starting 2016. He served as the lead guest editor of the special issue on "Interactive
Media Processing for Immersive Communication'" in IEEE Journal on Special
Topics on Signal Processing, published in March 2015. He served as a member of
the Multimedia Signal Processing Technical Committee (MMSP-TC) in IEEE Signal
Processing Society (2012-2014), and a member of the Image, Video, and Multidimensional
Signal Processing Technical Committee (IVMSP-TC) (2015-2017). He has also served
as technical program co-chair of International Packet Video Workshop (PV) 2010
and IEEE International Workshop on Multimedia Signal Processing (MMSP) 2015, area
chair in IEEE International Conference on Image Processing (ICIP) 2010, 2012-2013,
2015, track co-chair for Multimedia Signal Processing track in IEEE International
Conference on Multimedia and Expo (ICME) 2011, symposium co-chair for CSSMA Symposium
in IEEE GLOBECOM 2012, and area chair for ICME 2013-2015. He was invited as plenary
speaker for IEEE MMSP 2013 on the topic "3D visual communication: media representation,
transport and rendering'". He is a co-author of best student paper award
in IEEE Workshop on Streaming and Media Communications 2011 (in conjunction with
ICME 2011), best paper finalists in ICME 2011, ICIP 2011 and ICME 2015, best paper
runner-up award in ICME 2012 and best student paper award in ICIP 2013.
Lecture 1: Graph Signal
Processing for Image Coding & Restoration
Graph signal processing (GSP)
is the study of discrete signals that live on structured data kernels described
by graphs. By allowing a more flexible graphical description of the underlying
data kernel, GSP can be viewed as a generalization of traditional signal processing
techniques that target signals in regular kernels, while still providing a frequency
domain interpretation of the observed signals. Though an image is a regularly
sampled signal on a 2D grid, one can nonetheless consider an image patch as a
graph-signal on a sparsely connected graph defined signal-dependently. Recent
GSP works have shown that such approach can lead to a compact signal representation
in the graph Fourier domain, resulting in noticeable gain in image compression
and restoration. Specifically, in this talk I will overview recent advances in
GSP as applied to image processing. I will first describe how a Graph Fourier
Transform (GFT)-a generalization of known transforms like Discrete Cosine Transform
(DCT)-can be defined in a signal-dependent manner and leads to compression gain
for piecewise smooth images, outperforming H.264 intra by up to 6.8dB. I will
then describe how suitable graph-signal smoothness priors can be constructed for
a graph-based image denoising algorithm, outperforming state-of-the-art BM3D by
up to 2dB for piecewise smooth images. Similar graph-signal smoothness priors
can also be used for other image restoration problems, such as de-quantization
of compressed JPEG images.
2: 3D Image Representation & Coding for Interactive Navigation
image representation, a 3D scene is represented by color (RGB) and depth images
as observed from multiple viewpoints, and intermediate virtual views can be further
rendered via depth-image-based rendering (DIBR). However, conventional transform
coding plus lossy quantization of depth images can lead to geometric distortion,
resulting in undesirable bleeding artifacts in DIBR-synthesized images. Observing
that disparity information-like motion vectors in video coding-should be coarsely
represented but losslessly coding, in this talk I first introduce a graph-based
representation called GBR-plus that compactly represents disparity information
to displace entire pixel patches from one reference view to a target view in a
graphical manner. Second, I discuss how disparity information in GBR-plus can
be approximated and then efficiently coded using arithmetic edge coding (AEC).
Finally, to enable interactive view navigation at the client so that any viewpoint
image can be flexibly decoded from a number of decoding paths, I present a new
distributed source coding (DSC) coding framework called merge frame that does
not require traditional channel coding and bit-plane coding, while achieving identical
merging and good rate-distortion performance.
Zhu Li, University of Missouri, USA
Zhu Li is an Associate Professor with the Dept of Computer Science &
Electrical Engineering (CSEE) at Univeristy of Missouri, Kansas City. He received
his PhD in Electrical & Computer Engineering from Northwestern University,
Evanston in 2004. He was Sr. Staff Researcher/Sr. Manager with Samsung Research
America's Multimedia Standards Research Lab in Dallas, 2012-15, Senior Staff Researcher/Group
Lead with FutureWei(Huawei)'s Media Lab in Bridgater, NJ, from 2010-12, an Assistant
Professor with the Dept of Computing, The Hong Kong Polytechnic University from
2008-2010, and a Principal Staff Research Engineer with the Multimedia Research
Lab (MRL), Motorola Labs, Schaumburg, Illinois, from 2000 to 2008.
research interests include audio-visual analytics and machine learning with its
application in large scale video repositories annotation, search and recommendation,
as well as video adaptation, source-channel coding and distributed optimization
issues of the wireless video networks. He has 30+ issued or pending patents, 90+
publications in book chapters, journals, conference proceedings and standards
contributions in these areas. He is an IEEE senior member, elected member of the
IEEE Multimedia Signal Processing (MMSP) Technical Committee ,2014-16, elected
Vice Chair of the IEEE Multimedia Communication Technical Committee (MMTC) 2008~2010,
and Standards Liaison, 2014-16. He is an Associated Editor for IEEE Trans. on
Multimedia, IEEE Trans. on Circuits & System for Video Technology, Springer
Journal of Signal Processing Systems, co-editor for the Springer-Verlag book on
"Intelligent Video Communication: Techniques and Applications". He served
on numerous conference and workshop TPCs and was symposium co-chair at IEEE ICC'2010,
and on the Best Paper Award Committee for IEEE ICME 2010.
received the Best Poster Paper Award from IEEE Int'l Conf on Multimedia &
Expo (ICME) at Toronto, 2006, and the Best Paper Award from IEEE Int'l Conf on
Image Processing (ICIP) at San Antonio, 2007.
Lecture 1: Robust
Visual Object Re-Identification Against Very Large Repositories - The MPEG Mobile
Visual Search Standardization Research
Visual object identification against
a very large repository is a key technical challenge in a variety of mobile visual
search and virtual reality/augmented reality applications. MPEG created a working
group on Compact Descriptor for Visual Search (CDVS) to develop relevant technology
and standard to address this issue and enable mobile visual search and object
re-identification applications. In this talk, I will review the key technical
challenges to the CDVS pipeline, and covering the novel contributions made in
CDVS work on alternative interesting points detection, more efficient aggregation
scheme, indexing / hashing issues and retrieval system optimization, as well as
the future direction of the research in this area.
Lecture 2: Visual
Recognition over Large Repositories with Subspace Indexing on Grassmann Manifolds
In large scale visual pattern recognition applications, when the subject set is
large the traditional linear models like PCA/LDA/LPP, become inadequate in capturing
the non-linearity and local variations of visual appearance manifold. Kernelized
non-linear solutions can alleviate the problem to certain degree, but faces a
computational complexity challenge of solving an Eigen problems of size n x n
for number of training samples n. In this work, we developed a novel solution
by applying a data partition on the BIGDATA training set first and obtain a rich
set of local data patch models, then the hierarchical structure of this rich set
of models are computed with subspace clustering on Grassmanian manifold, via a
VQ like algorithm with data partition locality constraint. At query time, a probe
image is projected to the data space partition first to obtain the probe model,
and the optimal local model is computed by traversing the model hierarchical tree.
Simulation results demonstrated the effectiveness of this solution in capturing
larger degree of freedom (DoF) of the problem, with good computational efficiency
and recognition accuracy, for applications in large subject set face recognition
and image tagging.
Jiaying Liu, Peking University, China
Liu received the B.E. degree in computer science from Northwestern Polytechnic
University, Xi'an, China, and the Ph.D. degree with the Best Graduate Honor in
computer science from Peking University, Beijing, China, in 2005 and 2010, respectively.
is currently an Associate Professor with the Institute of Computer Science and
Technology, Peking University. She has authored or co-authored over 60 papers,
and hold 10 granted patents. Her current research interests include image/video
processing, computer vision, and video compression.
Dr. Liu was a Visiting
Scholar with the University of Southern California, Los Angeles, from 2007 to
2008. Supported by Star Track program, she was a Visiting Researcher at Microsoft
Research Asia (MSRA) in 2015. She has also served as TC member in APSIPA IVM since
She has also engaged in computing education. She has run MOOC courses
"C++ Programming" and "Fundamental Algorithm Design" on Coursera/edX/ChineseMOOC.
There are more than 30 thousand students enrolled. She also got Peking University
Teaching Excellence Award.
Chia-Hung Yeh, National Sun Yat-Sen University, Taiwan
Yeh (M'03-SM'12) received his B.S. and Ph.D. degrees from National Chung Cheng
University, Taiwan, in 1997 and 2002, respectively, both from the Department of
Electrical Engineering. Dr. Yeh joined the Department of Electrical Engineering,
National Sun Yat-sen University (NSYSU) as an assistant professor in 2007 and
became an associate professor in 2010. In Feb. 2013, Dr. Yeh is promoted to a
full professor. Dr. Yeh's research interests include multimedia communication,
multimedia database management, and image/audio/video signal processing. He served
on the Editorial Board of the Journal of Visual Communication and Image Representation,
and the EURASIP Journal on Advances in Signal Processing. In addition, he has
rich experience in organizing various conferences serving as keynote speaker,
session chair, and technical program committee and program committee member for
international/domestic conferences. Dr. Yeh has co-authored more than 170 technical
international conferences and journal papers and holds 42 patents in the U.S.,
Taiwan, and China. He received the 2007 Young Researcher Award of NSYSU, the 2011
Distinguished Young Engineer Award from the Chinese Institute of Electrical Engineering,
the 2013 Distinguished Young Researcher Award of NSYSU, the 2013 IEEE MMSP Top
10% Paper Award, and the 2014 IEEE GCCE Outstanding Poster Award.
1: A Light-weight 3D Reconstruction System
3D models allow us to explore
all dimensions of the objects, e.g. monuments, sites, even the whole city regions.
Over the last decade, a significant number of 3D related approaches and applications,
such as 3D printing, 3D films, 3D archive and so on, have become popular research
topics. Furthermore, to meet the 3D industry's requirements on high accuracy and
flexibility, 3D reconstruction approaches that can be performed anywhere and anytime
are in high demand. To achieve this goal, two main challenges need to be overcome,
the acquisition of acceptable input data and the computational complexity of the
reconstruction procedure. Many researches are devoted to developing the technique
of laser scanner calibration. However, the cost of a 3D laser scanner restricts
its usage to general since it is not affordable to most people. In addition, due
to the size of it, this device cannot be considered as a part of a mobile-based
application. In this lecture, we will present a light-weight 3D object reconstruction
approach. It aims to fulfill the increasing demand for fast and reliable 3D reconstruction
in a mobile environment. Thereby, people can directly use their own portable device
to reconstruct the desired objects into 3D models.
2: New Intra Coding Schemes for High Efficiency Video Coding
coding is a procedure that compresses digital video data to reduce the required
bandwidth when transmitting a video. The goal of video coding is to compress a
large amount of data efficiently for the transmission of data over the Internet
while keeping acceptable visual quality of the reconstructed video. Intra coding
has an important role in video coding because it prevents error propagation and
maintains better visual quality; in addition, it requires much less computations
than inter coding because no motion estimation computations are required. However,
the current intra coding methods for HEVC standard is still inefficient, so a
new intra coding scheme is required for further improvement of coding efficiency.
For these reasons, two new directions including pattern matching and predictive
texture synthesis are used to enhance the intra coding efficiency, and to achieve
a better coding performance than HEVC intra prediction.
Jiangtao Wen, Tsinghua University, China
Jiangtao (Gene) Wen received the BS, MS and Ph.D. degrees with honors from
Tsinghua University, Beijing, China, in 1992, 1994 and 1996 respectively, all
in Electrical Engineering.
From 1996 to 1998, he was a Staff Research Fellow
at UCLA, where he conducted research on multimedia coding and communications.
Many of his inventions were later adopted by international standards such as H.263,
MPEG and H.264. After UCLA, he served as the Principal Scientist of PacketVideo
Corp. (NASDAQ: WAVE/DCM), the CTO of Morphbius Technology Inc., the Director of
Video Codec Technologies of Mobilygen Corp (NASDAQ: MXIM), the Senior Director
of Technology of Ortiva Wireless (NASDAQ: ALLT) and consulted for Stretch Inc.,
Ocarina Networks (NASDAQ: DELL) and QuickFire Networks (NASDAQ: FB). Since 2009,
Dr. Wen has held a Professorship at the Department of Computer Science and Technology
of Tsinghua University. He was a Visiting Professor at Princeton University in
2010 and 2011.
Dr. Wen's research focuses on multimedia communication over
challenging networks and computational photography. He has authored many widely
referenced papers in related fields. Products deploying technologies that Dr.
Wen developed are currently widely used worldwide. Dr. Wen holds over 40 patents
with numerous others pending. Dr. Wen is an Associate Editor for IEEE Transactions
Circuits and Systems for Video Technologies (CSVT). He is a recipient of the 2010
IEEE Trans. CSVT Best Paper Award.
Dr. Wen was elected a Fellow of the IEEE
in 2011. He is the Director of the Research Institute of the Internet of Things
of Tsinghua University, and a Co-Director of the Ministry of Education Tsinghua-Microsoft
Joint Lab of Multimedia and Networking.
Besides teaching and conducting research,
Dr. Wen also invests in high technology companies as an angel investor.