Visual data is exploding! 500 billion consumer photos are taken each year world-wide, 633 million photos taken per year in NYC alone. 120 new video-hours are uploaded on YouTube per minute. The explosion of digital multimedia data is creating a valuable open source for insights. However, the unconstrained nature of “image/video in the wild” makes it very challenging for automated computer-based analysis. Furthermore, the most interesting content in the multimedia files is often complex in nature reflecting a diversity of human behaviors, scenes, activities and events. To address these challenges, this tutorial will provide a unified overview of the two emerging techniques: Semantic modeling and Massive scale visual recognition, with a goal of both introducing people from different backgrounds to this exciting field and reviewing state of the art research in the new computational era.
Dr. John R. Smith
John R. Smith is Senior Manager of the Intelligent Information Management Department at IBM T. J. Watson Research Center. He received his M. Phil and Ph.D. degrees in Electrical Engineering from Columbia University in 1994 and 1997, respectively. He currently leads R\&D across multiple areas at IBM Research including multimedia, image/video analytics, biometrics, exploratory computer vision and machine learning. Dr. Smith is also principal investigator for the IBM Multimedia Analysis and Retrieval System (IMARS) project. Previously, Dr. Smith led IBM’s participation in MPEG-7 / MPEG-21 standards and served as a Chair of the MPEG Multimedia Description Schemes Group and co-project Editor of MPEG-7 Standard. Dr. Smith is currently Editor-in-Chief of IEEE Multimedia and Fellow of IEEE.
Dr. Smith has given numerous tutorials at major conferences including ACM Multimedia, IEEE Intl. Conf. on Multimedia and Expo (ICME), World Wide Web (WWW), ACM Intl. Conference on Management of Data (SIGMOD), ACM International Conference on Conceptual Modeling (ER). Dr. Smith has published more than 100 papers at top conferences and journals. His papers have received more than 13,000 citations and have an h-index of 55 and i10-index of 164.
Liangliang Cao is a Research Staff Member in IBM T. J. Watson Research Center, and also an adjunct assistant professor at Columbia University. His research lies in the intersection of computer vision, multimedia and big data analytics. His work has won three prestigious visual recognition competitions, include ImageCLEF Medical Image Classification (2012, 2013), ImageNet Large Scale Visual Recognition Challenge (2010), and TRECVID Airport Surveillance Competition (2008). His contribution has been awarded the IBM Outstanding Accomplishment (2012), the Best Paper Award in the First International Workshop on Big Data Mining (2012), IBM Watson Emerging Leader in Multimedia and Signal Processing (2010), Facebook Fellowship Finalist (2010), and UIUC Computational Science and Engineering Fellowship (2009-2010).
Dr. Cao has authored more than 40 papers in top conferences and journals, including ICCV, CVPR, ECCV, NIPS, ACM Multimedia, WWW, TPAMI, and PIEEE. Dr. Cao is an area chair of ACM Multimedia 2012 and IEEE WACV 2014. He fulfills review duties for more than 15 journals and various conferences. He is a general chair of New York Area Multimedia and Vision Meeting in Greater in 2012 and 2013. He is a guest editor of ACM Transactions on Multimedia Computing, Communications (TOMCCAP) and Applications and also Computer Vision and Image Understanding (CVIU) Journal.