Keynote Speeches

KEYNOTE SPEECH I

Cinematic Virtual Reality: Immersive Video for Head-Mounted Displays

Bernd Girod
Stanford University, USA

ABSTRACT  The 2014 acquisition of a fledging head-mounted display company for over $2B has reignited the excitement in virtual reality, not just for rendered 3d computer graphics but primarily for immersive “cinematic” video captured by means of special camera rigs. In this talk, we will review the principles of representing immersive video for head-mounted displays and the challenges that arise for efficient coding and delivery. How many pixels are needed to cover the full field of view? How can we provide binocular stereo in all directions? How can we accommodate head motion parallax? How can we provide defocus cues to overcome the conflict between vergence and accommodation? And what are best video representations for compact storage and transmission that support all of the above? We show that significant technology challenges remain for cinematic virtual reality to live up to its high expectations, some of them familiar and some new.

BIOSKETCH  Bernd Girod is the Robert L. and Audrey S. Hancock Professor of Electrical Engineering at Stanford University, California. He also serves as Director of the Stanford Center for Image Systems Engineering (SCIEN), the Max Planck Center for Visual Computing and Communication, and as Founding Director Emeritus and now Chair of the Advisory Board of the David and Helen Gurley Brown Institute for Media Innovation, a bicoastal institute between Stanford and Columbia University in New York City. He has also served as a Senior Associate Dean of the Stanford School of Engineering from 2012 to 2016.

He received his M. S. degree in Electrical Engineering from Georgia Institute of Technology, in 1980 and his Doctoral degree from University of Hannover, Germany, in 1987. He joined Massachusetts Institute of Technology, Cambridge, MA, USA, and was an Assistant Professor at the MIT Media Laboratory until 1990. From 1990 to 1993, he was Professor of Computer Graphics and Technical Director of the Academy of Media Arts in Cologne, Germany, jointly appointed with the Computer Science Section of Cologne University. From 1993 until 1999, he held the Chair of Electrical Engineering / Telecommunications at University of Erlangen-Nuremberg, Germany, and was the Head of the Telecommunications Institute I and director of the Telecommunications Laboratory. He served as Chair of the Electrical Engineering Department from 1995 to 1997.

Professor Girod’s research over the course of more than three decades has spanned a broad range of topics including image and video coding, networked media systems, and image-based retrieval. He has authored or co-authored one major text-book (printed in 3 languages), five monographs, and over 600 book chapters, journal articles and conference papers, and is a named inventor of over 25 US patents. He has been a member of the IEEE Image and Multidimensional Signal Processing Technical Committee from 1989 to 1997 and has served on the Editorial Boards for several journals in his field, among them as founding Associate Editor for the IEEE Transactions on Image Processing and Area Editor for Speech, Image, Video & Signal Processing of the IEEE Transactions on Communications. He has served on numerous conference committees, e.g., as Tutorial Chair of ICASSP-97 in Munich and again for ICIP-2000 in Vancouver, as General Chair of the 1998 IEEE Image and Multidimensional Signal Processing Workshop in Alpbach, Austria, as General Chair of the Visual Communication and Image Processing Conference (VCIP) in San Jose, CA, in 2001, and General Chair of Vision, Modeling, and Visualization (VMV) at Stanford, CA, in 2004, and General Co-Chair of ICIP-2008 in San Diego, of VCIP 2010 in China, and of the Packet Video Workshop 2013 in San Jose.

For over 25 years, Professor Girod has worked with start-up ventures as founder, investor, director, or advisor. Most notably, he has been a co-founder and Chief Scientist of Vivo Software, Inc., Waltham, MA (1993-98); after Vivo’s aquisition, 1998-2002, Chief Scientist of RealNetworks, Inc. (Nasdaq: RNWK). He has served on the Board of Directors for 8×8, Inc., Santa Clara, CA, (Nasdaq: EGHT) 1996-2004, and for GeoVantage, Inc., Swampscott, MA, 2000-2005. In 2007, he co-founded Dyyno, Inc. Palo Alto, CA. From 2004 to 2007, he also served as Chairman of the Steering Committee of the new Deutsche Telekom Laboratories at the Technical University of Berlin. He has been an angel investor for 20 years, served on numerous advisory boards, and currently advises HearstLab, a corporate incubator for women-led startup companies in New York City.

Professor Girod was elected Fellow of the IEEE in 1998 ‘for his contributions to the theory and practice of video communications’ and a Fellow of EURASIP in 2008. He has been named ‘Distinguished Lecturer’ for the year 2002 by the IEEE Signal Processing Society. He received the the EURASIP Signal Processing Best Paper Award in 2002, the IEEE Multimedia Communication Best Paper Award in 2007, the EURASIP Image Communication Best Paper Award in 2008, the EURASIP Signal Processing Most Cited Paper Award in 2008, as well as the EURASIP Technical Achievement Award in 2004 and the Technical Achievement Award of the IEEE Signal Processing Society in 2011. The German National Academy of Sciences (Leopoldina) inducted him as a member 2007. He was elected to the National Academcy of Engineering in 2015 for “For contributions to video compression, streaming, and multimedia systems.”


KEYNOTE SPEECH II

Deep and Broad Learning on Neurological Disorder

Philip S. Yu
University of Illinois at Chicago, USA

ABSTRACT  Neurological disorder has affected a third of the population in the US and put an enormous strain to the health care system. Mining from neuro-imaging data is becoming increasingly popular in the field of healthcare and bioinformatics, due to its potential to discover clinically meaningful structure patterns that could facilitate the understanding and diagnosis of neurological and neuropsychiatric disorders. Modern imaging techniques have allowed us to model the human brain as a network or graph. A brain connectivity network can be constructed from neuro-imaging data, where the nodes of the network correspond to a set of brain regions and links represent the functional or structural connectivity between these regions. The linkage structure in brain networks can encode valuable information about the organizational properties of the human brain as a whole. Most recent research concentrates on applying subgraph mining techniques to discover connected subgraph patterns in the brain network. However, the underlying brain network structure is complicated. As a shallow linear model, subgraph mining cannot capture the highly non-linear structures, resulting in sub-optimal patterns. In this talk, we focus on how to learn representations that can capture the highly non-linearity of brain networks and preserve the underlying structures. In addition to brain image data, we will also consider how to exploit behavior data for neurological disorder detection.

BIOSKETCH  Philip S. Yu’s main research interests include big data, data mining (especially on graph/network mining), social network, privacy preserving data publishing, data stream, database systems, and Internet applications and technologies. He is a Distinguished Professor in the Department of Computer Science at UIC and also holds the Wexler Chair in Information and Technology. Before joining UIC, he was with IBM Thomas J. Watson Research Center, where he was manager of the Software Tools and Techniques department. Dr. Yu has published more than 970 papers in refereed journals and conferences with more than 74,500 citations and an H-index of 127. He holds or has applied for more than 300 US patents.

Dr. Yu is a Fellow of the ACM and the IEEE. He is the recipient of ACM SIGKDD 2016 Innovation Award for his influential research and scientific contributions on mining, fusion and anonymization of big data, the IEEE Computer Society’s 2013 Technical Achievement Award for “pioneering and fundamentally innovative contributions to scalable indexing, querying, searching, mining and anonymization of big data”, and the Research Contributions Award from IEEE Intl. Conference on Data Mining (ICDM) in 2003 for his pioneering contributions to the field of data mining. He also received an IEEE Region 1 Award for “promoting and perpetuating numerous new electrical engineering concepts” in 1999. He had received several UIC honors, including Research of the Year at 2013 and UI Faculty Scholar at 2014. He also received many IBM honors including 2 IBM Outstanding Innovation Awards, an Outstanding Technical Achievement Award, 2 Research Division Awards and the 94th plateau of Invention Achievement Awards. He was an IBM Master Inventor.

Dr. Yu is the Editor-in-Chief of ACM Transactions on Knowledge Discovery from Data. He is on the steering committee of ACM Conference on Information and Knowledge Management and was a steering committee member of the IEEE Conference on Data Mining and the IEEE Conference on Data Engineering. He was the Editor-in-Chief of IEEE Transactions on Knowledge and Data Engineering (2001-2004). He had also served as an associate editor of ACM Transactions on the Internet Technology (2000-2010) and Knowledge and Information Systems (1998-2004). In addition to serving as program committee member on various conferences, he was the program chair or co-chairs of the 2009 IEEE Intl. Conf. on Service-Oriented Computing and Applications, the IEEE Workshop of Scalable Stream Processing Systems (SSPS’07), the IEEE Workshop on Mining Evolving and Streaming Data (2006), the 2006 joint conferences of the 8th IEEE Conference on E-Commerce Technology (CEC’ 06) and the 3rd IEEE Conference on Enterprise Computing, E-Commerce and E-Services (EEE’ 06), the 11th IEEE Intl. Conference on Data Engineering, the 6th Pacific Area Conference on Knowledge Discovery and Data Mining, the 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, the 2nd IEEE Intl. Workshop on Research Issues on Data Engineering: Transaction and Query Processing, the PAKDD Workshop on Knowledge Discovery from Advanced Databases, and the 2nd IEEE Intl. Workshop on Advanced Issues of E-Commerce and Web-based Information Systems. He served as the general chair or co-chairs of the 2016 IEEE Intl. Conference on BIGDATA, the 2014 IEEE Intl. Conference on Data Science and Advanced Analytics, the 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, the 2012 Pacific-Asia Conference on Knowledge Discovery and Data Mining, the 2009 IEEE Intl. Conf. on Data Mining, the 2009 IEEE Intl. Conf. on Data Engineering, the 2006 ACM Conference on Information and Knowledge Management, the 1998 IEEE Intl. Conference on Data Engineering, and the 2nd IEEE Intl. Conference on Data Mining.

Dr. Yu received the B.S. Degree in E.E. from National Taiwan University, the M.S. and Ph.D. degrees in E.E. from Stanford University, and M.B.A. degree from New York University.

 


KEYNOTE SPEECH III

Perception of Visual Sentiment: From Experimental Psychology to Computational Modeling

Mohan Kankanhalli
National University of Singapore, Singapore

ABSTRACT  A picture is worth a thousand words. Visual representation is one of the dominant forms of social media. The emotions that viewers feel when observing a visual content is often referred to as the content’s visual sentiment. Analysis of visual sentiment has become increasingly important due to the huge volume of online visual data generated by users of social media. Automatic assessment of visual sentiment has many applications, such as monitoring the mood of the population in social media platforms (e.g., Twitter, Facebook), facilitating advertising, and understanding user behavior. However, in contrast to the extensive research on predicting textual sentiment, relatively less work has been done on sentiment analysis of visual content. In contrast to textual sentiment, visual sentiment is more subjective and implicit. There exists significant semantic gap between high-level visual perception and low-level computational attributes.

In this talk, we argue that these challenges can be addressed by combining the findings from the psychology and cognitive science domain. We will show that a deeper understanding of human perception helps create better computational models. To support that thesis, we will first briefly overview our human-centric research framework, which focuses on applying the paradigms and methodologies from experimental psychology to computer science: First, we collect visual data with human perception through online or lab-controlled psychophysics studies. Then we use inferential statistics to analyze the psychophysics data and model human perception empirically. We then design computational models based on the empirical findings.

We will present three works on visual sentiment in our lab, guided by this research framework. In our first work, we aim to understand human visual perception in a holistic way. We first fuse various partially overlapping datasets with human emotion. We build an empirical model of human visual perception, which suggests that six different types of visual perception (i.e., familiarity, aesthetics, dynamics, oddness, naturalness, spaciousness) significantly contribute to human’s positive sentiment (i.e., liking) of a visual scene.

In our second work, we investigate the relation between human attention and visual sentiment. We build a unique emotional eye fixation dataset with object and scene-level human annotations, and exploit comprehensively how human attention is affected by emotional properties of images. Further, we train a deep convolutional neural network for human attention prediction on our dataset. Results demonstrate that efficient encoding of image sentiment information helps boost its performance.

Our third work explores how human attention influences visual sentiment. We experimentally disentangle effects of focal information and contextual information on human emotional reactions, then we incorporate related insights into computational models. On two benchmark datasets, the proposed computational models demonstrate superior performance compared to the state-of-the-art methods on visual sentiment prediction.

We will end with future research direction on visual sentiment analysis. Our studies highlight the importance of understanding human cognition for interpreting the latent sentiments behind visual scenes.

BIOSKETCH  Mohan Kankanhalli is Provost’s Chair Professor of Computer Science at the National University of Singapore (NUS). He is also the Dean of NUS School of Computing. Before becoming the Dean in July 2016, he was the NUS Vice Provost (Graduate Education) during 2014-2016 and Associate Provost during 2011-2013. Mohan obtained his BTech from IIT Kharagpur and MS & PhD from the Rensselaer Polytechnic Institute.

His current research interests are in Multimedia Computing, Information Security & Privacy, Image/Video Processing and Social Media Analysis. He directs the SeSaMe (Sensor-enhanced Social Media) Centre which does fundamental exploration of social cyber-physical systems which has applications in social sensing, sensor analytics and smart systems. He is on the editorial boards of several journals including the ACM Transactions on Multimedia, Springer Multimedia Systems Journal, Pattern Recognition Journal and Springer Multimedia Tools & Applications Journal. He is a Fellow of IEEE.

 

 

 

 

 


KEYNOTE SPEECH IV

Concealing Network Delays in Fast Multi-Player Online Games

Benjamin W. Wah
The Chinese University of Hong Kong, China

ABSTRACT  Just-noticeable difference (JND) refers to the smallest detectable difference between a starting and a secondary level of a given sensory stimulus. It was first pioneered by Ernst Weber, a 19th century experimental psychologist. Weber’s Law simply states that the size of JND is a constant proportion of the original stimulus value. Although the concept is known for over one and a half centuries, it has recently received more attention in the multimedia community. With the quality degradations incurred by losses and delays in transferring multimedia signals over the Internet, researchers have found that existing quantitative metrics cannot model perceptual degradations experienced by users. In this presentation, we examine the limitations of current results on JND and the reasons why they are inadequate for improving the perceptual quality of real-time multiplayer online games. Features that contribute to the complications include the presence of multiple and possibly dependent stimuli that may be related to perceptual quality in a linear or nonlinear fashion and whose effects may be additive or non-additive. We present a new approach for minimizing the effects of multiple changes on user perception. In contrast to previous work that finds the combined effect using some functions of individual changes (such as the maximum or the square root of the changes), we argue that the perception of a change is based on awareness (or the probability of perceiving a change when compared to the reference), not on the magnitude of the change. By using the property that players are generally more sensitive to the most prominent artifact (with the highest awareness), the perceptual effect of multiple changes is, therefore, governed by the maximum of the corresponding awareness, and the optimal solution is formulated as the minimax of the corresponding awareness. The new formulation allows designers to decompose the evaluation of a multi-dimensional awareness function into the evaluation of individual awareness, each corresponding to one control assignment. The resulting complexity of evaluating the perceptual quality due to multiple controls becomes polynomial instead of exponential. We demonstrate the effectiveness of the approach using a popular open-source online shooting game BZFlag. The understanding of the properties of JND with multidimensional stimuli will help reduce the number of subjective tests needed in designing better QoE-based control and optimization in multimedia algorithms.

BIOSKETCH  Benjamin W. Wah is currently the Provost and Wei Lun Professor of Computer Science and Engineering of the Chinese University of Hong Kong. Before then, he served as the Director of the Advanced Digital Sciences Center in Singapore, as well as the Franklin W. Woeltge Endowed Professor of Electrical and Computer Engineering and Professor of the Coordinated Science Laboratory of the University of Illinois, Urbana-Champaign, USA. He received his Ph.D. degree in computer science from the University of California, Berkeley, CA, in 1979. He had served on the faculty of Purdue University. He has received a number of awards for his research contributions, which include the IEEE CS Technical Achievement Award (1998), the IEEE Millennium Medal (2000), the Society for Design and Process Science Raymond T. Yeh Lifetime Achievement Award (2003), the IEEE-CS W. Wallace-McDowell Award (2006), the Pan Wen-Yuan Outstanding Research Award (2006), the IEEE-CS Richard E. Merwin Award (2007), the IEEE-CS Technical Committee on Distributed Processing Outstanding Achievement Award (2007), the IEEE-CS Tsutomu Kanai Award (2009), and the Distinguished Alumni Award in Computer Science of the University of California, Berkeley (2011). Wah’s current research interests are in the areas of nonlinear search and optimization, multimedia signal processing, and computer networks.

Wah cofounded the IEEE Transactions on Knowledge and Data Engineering in 1988 and served as its Editor-in-Chief between 1993 and 1996, and is the Honorary Editor-in-Chief of Knowledge and Information Systems. He currently serves on the editorial boards of Information Sciences, International Journal on Artificial Intelligence Tools, Journal of VLSI Signal Processing, and World Wide Web. He has served the IEEE Computer Society in various capacities, including Vice President for Publications (1998 and 1999) and President (2001). He is a Fellow of the AAAS, ACM, and IEEE.