3D human reconstruction and action recognition from multiple active and passive sensors
This challenge calls for demonstrations of methods and technologies that support real-time or near real-time 3D reconstruction of moving humans from multiple calibrated and remotely located RGB cameras and/or consumer depth cameras. Additionally, this challenge also calls for methods for human gesture/movement recognition from multimodal data. The challenge targets mainly real-time applications, such as collaborative immersive environments and inter-personal communications over the Internet or other dedicated networking environments.
To this end, we provide two data sets to support investigation of various techniques in the fields of 3D signal processing, computer graphics and pattern recognition, and enable demonstrations of various relevant technical achievements.
Consider multiple distant users, which are captured in real-time by their own visual capturing equipment, ranging from a single Kinect (simple user) to multiple Kinects and/or high-definition cameras (advanced users), as well as non-visual sensors, such as Wearable Inertial Measurement Units (WIMUs) and multiple microphones. The captured data is either processed at the capture site to produce 3D reconstructions of users or directly coded and transmitted, enabling rendering of multiple users in a shared environment, where users can “meet” and “interact” with each other or the virtual environment via a set of gestures/movements.
Of course, we are not expecting participants to this challenge to recreate this scenario completely, but rather work with the provided data sets to illustrate key technical components that would be required to realize a relevant scenario. The challenges that may be addressed include, but are not limited to:
- Realistic, on-the-fly 3D reconstruction of humans, in the form of polygonal meshes (and/or point clouds), based on noisy source data from calibrated (geometrically and photometrically) cameras.
- Fast and efficient compression/coding methods for dynamic time-varying meshes or multi-view RGB+depth video that will enable the real-time transmission of data over current and future network infrastructures.
- Realistic free-view-point rendering of humans, either from full-geometry 3D reconstructions via standard computer graphics, or via view interpolation from original multiple RGB (+Depth) views.
- Fast and accurate motion tracking of humans (e.g. in the form of skeleton tracking) from the multiple provided data streams.
- Efficient recognition of human gestures/movements from multimodal data, including RGB and/or depth video, WIMU data and audio.
Two datasets are provided for this challenge and can be downloaded from:
- Dataset 1 was captured at CERTH/ITI in Greece and consists of synchronized RGB-plus-Depth video streams of multiple humans in multiple actions captured by five Kinects, as well as multiple-Kinects audio and WIMU streams.
- Dataset 2 was captured at Fraunhofer HHI in Belrin and consists of synchronized multi-view HD video streams, of multiple humans in multiple actions.
Noel O’Connor: noel.oconnor -at- dcu.ie