Crowdsourcing for Multimedia Research


Crowdsourcing refers to human computation techniques that exploit human intelligence and also take advantage of a large population of contributors. This tutorial is motivated by the enormous potential that crowdsourcing represents for the multimedia community. For example, in the area of multimedia content analysis, the main hurdle faced is bridging the semantic gap, i.e., the distance between pixel-based representations and human perceptions. Upon first consideration, exploiting human intelligence appears to be a silver-bullet solution for bridging the semantic gap. For this reason, there is an enormous amount of excitement and curiosity concerning crowdsourcing among multimedia researchers.

Upon further consideration, applying crowdsourcing to solve multimedia problems is riddled with challenges: Who are the crowd? What motivates them? How is it possible to guarantee the quality and consistency of crowd input? Crowdsourcing quickly reveals itself to be something far less than a panacea for all the challenges faced by multimedia researchers. There is a substantial chance that the enormous promise of crowdsourcing for multimedia will fail to be realized due to the feeling of disillusionment of the research community concerning crowdsourcing techniques.

In order for crowdsourcing to research its full potential, it is necessary that researchers follow the middle way between euphoria and disappointment. This tutorial has been created to provide the multimedia researchers with the practical introduction to crowdsourcing for multimedia and the hands-on experience that will allow the multimedia community to make productive use of crowdsourcing methods informed by sensible best-practice guidelines, leading to steady and sustainable advancement of the state of the art.


Mohammad Soleymani

Dr. Mohammad Soleymani is a Marie Curie Fellow at the intelligent Behaviour Understanding Group (iBUG) at Imperial College London, where he conducts research on sensor-based and implicit emotional tagging. His work in the area of crowdsourcing for multimedia has focused on developing techniques for data set design and development. As one of the founding  organizers of the MediaEval Multimedia Benchmark, he has put his experience to use to develop multimedia resources used by the larger research community.  Soleymani received his PhD in computer science from the University of Geneva, Switzerland in 2011. He has worked extensively, in collaboration with the Swiss Center of Affective Sciences,on assessing emotional reactions in response to video content and developing multimedia techniques to predict these reactions. Soleymani has contributed to educating researchers in the area of crowdsourcing for multimedia by offering a lecture and the lab on the subject at the University of Geneva and also co-teaching a short-course on the subject of crowdsourcing at the at the ICT doctoral school at the University of Trento. In 2013, he is serving as an organizer of: the International Conference on Affective Computing and Intelligent Interaction (ACII) 2013, the International Workshop on Affective Analysis in Multimedia, and the MediaEval 2013 Workshop. In 2012, he was invited to give an Expert Talk on Affective computing at IEEE ICME 2013.  In the past, he has served as a special session chair, program committee member and reviewer for multiple conferences and workshops including ACM ICMR, ACM MM, ACM ICMI, IEEE SMC, and IEEE ICME.

Martha Larson

Dr. Martha Larson is assistant professor in the Multimedia Information Retrieval Lab at Delft University of Technology in the Netherlands. Her work involves developing algorithms to improve video retrieval, with a special focus on exploiting speech, language and human perceptions of the meaning of multimedia. As a co-founder of the MediaEval Multimedia Benchmark, she has played a central role in introducing crowdsourcing techniques into the multimedia community both for the generation of multimedia data sets and also for video search engine evaluation. Before joining Delft University of Technology, she researched and lectured in the area of audio-visual retrieval at Fraunhofer IAIS, Germany, and at the University of Amsterdam, the Netherlands. Larson holds a MA and PhD in theoretical linguistics from Cornell University and a BS in Mathematics from the University of Wisconsin. Larson has contributed to educating researchers in the area of multimedia by co-teaching a crowdsourcing short-course (together with Mohammad Soleymani, as mentioned above). In 2010 and 2011, she co-presented the ACM Multimedia tutorial Frontiers in Multimedia Search together with Alan Hanjalic. In 2013, she is serving as an organizer of ACM Multimedia CrowdMM 2013 workshop on Crowdsourcing for Multimedia, a continuation of the workshop series established in 2012. Recently she was lead guest editor of the ACM TOIS special issue on searching spontaneous conversational speech, and she also published a monograph on Spoken Content Retrieval together with Gareth Jones of Dublin City University. She has served as a reviewer and program committee member for multiple conferences including ACM MM, ACM ICMR, ACM SIGIR, IEEE ICME, Interspeech, ACM MMSys and MMM.

Comments are closed.