Let's Dance: Learning From Online Dance Videos
Daniel Castro
Steven Hickson
Patsorn Sangkloy
Bhavishya Mittal
Sean Dai
James Hays
Irfan Essa
Abstract
In recent years, deep neural network approaches have naturally extended to the video domain, in their simplest case by aggregating per-frame classifications as a baseline for action recognition. A majority of the work in this area extends from the imaging domain, leading to visual-feature heavy approaches on temporal data. To address this issue we introduce ``Let's Dance'', a 1000 video dataset (and growing) comprised of 10 visually overlapping dance categories that require motion for their classification. We stress the important of human motion as a key distinguisher in our work given that, as we show in this work, visual information is not sufficient to classify motion-heavy categories. We compare our datasets' performance using imaging techniques with UCF-101 and demonstrate this inherent difficulty. We present a comparison of numerous state-of-the-art techniques on our dataset using three different representations (video, optical flow and multi-person pose data) in order to analyze these approaches. We discuss the motion parameterization of each of them and their value in learning to categorize online dance videos. Lastly, we release this dataset (and its three representations) for the research community to use.
Dataset

Our dataset is divided into three partitions, the original frames, the computed optical flow, and the extracted skeletons. We have recently updated our dataset to include six additional dances, which are reflected below. Since the publication of our original work, we've also had to remove videos from each of the categories as they have been taken down from YouTube. We updated the optical flow to be compute using FlowNet2 and the pose to be extracted from Faceook's DensePose.

The skeletal data follows the same folder structure as the original frames, and the skeletal points are in the same resolution as their original frame. They are formatted as a JSON, so parsing it as that is likely the most convenient approach (we use simplejson in Python).

Citation
@InProceedings{
 CastroDance2017,
 author = {Daniel Castro, Steven Hickson, Patsorn Sangkloy, Bhavishya Mittal, Sean Dai, James Hays and Irfan Essa},
 title = {Let's Dance: Learning From Online Dance Videos},
 booktitle = {eprint arXiv:2139179},
 year = {2018},
}
            
View Paper