This work aims to propose a unified method for region tracking and motion recognition, that could be applied on 2D or 3D data depending on the context. First, low-level features will be extracted from optical flow analysis to characterize key elementary events. Spatio-temporal relationships of these elementary events will be modeled in a compact and discriminative way in order to represent more complex activities. In particular, the PhD will focus on the covariance region descriptors for merging visual information of heterogeneous natures (optical flow, depth, color and texture). The representation of complex actions will be applied to urban surveillance and human-computer interaction.
Automatically recognizing an action, a gesture, an activity in a video is a major issue in computer vision, for automatic video surveillance, human-computer interaction or videos indexing. For over ten years, many approaches have been introduced but a reliable application in real conditions remains challenging. To demonstrate the generic nature of the proposed approach and study two different scales, two applications will be covered:
(1) video surveillance : the aim is to identify and automatically save events and activities visible in a video and detect any abnormal events. It is a difficult problem because of the diversity of events, the variability in appearance and also the diversity of the meaning of the concept of abnormality.
(2) human-machine interaction: simple actions (selection, moving a message, change viewing options, zoom, scroll) to complex (eg a gesture for a shortcut to a application, writing, elements of sign language)
The objective is to develop new methods for motion description and recognition based on covariance descriptors Tuzel06 for human computer interaction and videosurveillance. These methods have been mainly applied to object recognition and tracking and are of growing interest Lui12, since regions are represented by a discriminant and compact matrix (of fixed size regardless of the resolution of the object, typically 7x7) which mixes visual features of heterogeneous types. Each pixel of the object is represented by a feature vector consisting of geometric ( spatial coordinates , gradient, texture ) , radiometric or kinematic descriptors. In the case of actions identification, existing methods Guo10, Guo11, Sanin13 using a covariance descriptor are quite promising. The objective is to extend the extend the existing method to 3D motion, to propose a unified tracking/recognition technique.
1- Bibliographic review on covariance descriptors for complex activity recognition (actions and gestures).
2- Proposition of a first methodology on 2D motion characterization.
Application to human activities classification in complex scenes, evaluation of the results on public database and publication
3- Proposition of a second methodology on 3D motion characterization for gesture analysis. Application to gesture classification in complex scenes, evaluation of the results on public database and publication
4- Towards a unified approach of tracking and recognition
5- PhD manuscript and defence
The expected results are clear conclusions regarding the strengths and weaknesses of covariance descriptors for action recognition.