DAVIS: Densely Annotated VIdeo Segmentation

Definition

The unsupervised scenario assumes that the user does not interact with the algorithm to obtain the segmentation masks. Methods should provide a set of object candidates with no overlapping pixels that span through the whole video sequence. This set of objects should contain at least the objects that capture human attention when watching the whole video sequence i.e objects that are more likely to be followed by human gaze.
More information in the DAVIS 2019 publication and the Codalab submission site..

Dates and Phases

- Test-Dev: Unlimited number of submissions, open end.
- Test-Challenge: Limited to 5 submissions in total from 3rd May 2020 23:59 UTC to 15th May 2020 23:59 UTC.

Prizes

All the participants invited to the workshop will get a subscription to Adobe CC for 1 year.

Datasets (Download here, 480p resolution)

- Train + Val Unsupervised: the 60 Train and 30 Val sequences from DAVIS 2017, but reannotated taking into account the unsupervised definition.
- Test-Dev 2019: 30 new sequences. Ground truth not publicly available.
- Test-Challenge 2019: 30 new sequences. Ground truth not publicly available.
Feel free to train or pre-train your algorithms on any other dataset apart from DAVIS (Youtube-VOS, MS COCO, Pascal, etc.) or use the full resolution DAVIS annotations and images.

Submission

Submissions to all phases are done through the Codalab site of the challenge. Please register to the site and read carefully the instructions in the Learn the Details -> Overview and Evaluation sections.
The submitted masks should use the Indexed PNG format. Each pixel in a frame should be equal to the id of the object that it belongs to. For a certain object, its id should be the same for the whole video sequence.

Evaluation

Methods have to provide a pool of N non-overlapping video object proposals for every video sequence i.e. a segmentation mask for each frame in the video sequence where the mask id for a certain object has to be consistent through the whole sequence. During evaluation, each of the annotated objects in the ground truth is matched with one of the N video object proposals predicted by the methods that maximize J&F using a bipartite graph matching. Note that we do not penalize if methods detect more objects than the ones annotated in the ground truth. The final J&F result is the mean of all the matched objects in all the video sequences.
More details of can be found in the DAVIS 2019 publication and the evaluation code.

Papers

- Right after the Challenge closes (15th May) we will invite participants to submit a short abstract (400 words maximum) of their method (Deadline 19th May 23:59 UTC).
- Together with the results obtained, we will decide which teams are accepted at the workshop. Date of notification 20th May.
- Accepted teams will be able to submit a paper describing their approach (Deadline 4th June 23:59 UTC). The template of the paper is the same as CVPR, but length will be limited to 4 pages including references.
- Papers will also be invited to the workshop in form of oral presentation or poster.
- Accepted papers will be self-published in the web of the challenge (not in the official proceedings, although they have the same value).

Other considerations

- Each entry must be associated to a team and provide its affiliation.
- The best entry of each team will be public in the leaderboard at all times.
- We will only consider the "unsupervised" scenario: no human input of any kind allowed when testing a sequence. Although we have no way to check at the challenge stage, we will make our best to detect it a posteriori before the workshop.
- We reserve the right to remove one of the entry methods to the competition when there is a high technical similarity to methods published in previous conferences or workshops. We do so in order to keep the workshop interesting and to push state of the art to move forward.
- The new annotations in this dataset belong to the organizers of the challenge and are licensed under a Creative Commons Attribution 4.0 License.

Citation

The 2019 DAVIS Challenge on VOS: Unsupervised Multi-Object Segmentation
S. Caelles, J. Pont-Tuset, F. Perazzi, A. Montes, K.-K. Maninis, and L. Van Gool
arXiv:1905.00737, 2019
[PDF] [BibTex]

@article{Caelles_arXiv_2019,
              author = {Sergi Caelles and Jordi Pont-Tuset and Federico Perazzi and Alberto Montes and Kevis-Kokitsi Maninis and Luc {Van Gool}},
              title = {The 2019 DAVIS Challenge on VOS: Unsupervised Multi-Object Segmentation},
              journal = {arXiv:1905.00737},
              year = {2019}
            }

The 2017 DAVIS Challenge on Video Object Segmentation
J. Pont-Tuset, F. Perazzi, S. Caelles, P. Arbeláez, A. Sorkine-Hornung, and L. Van Gool
arXiv:1704.00675, 2017
[PDF] [BibTex]

@article{Pont-Tuset_arXiv_2017,
              author = {Jordi Pont-Tuset and Federico Perazzi and Sergi Caelles and Pablo Arbel\'aez and Alexander Sorkine-Hornung and Luc {Van Gool}},
              title = {The 2017 DAVIS Challenge on Video Object Segmentation},
              journal = {arXiv:1704.00675},
              year = {2017}
            }

A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation
F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung
Computer Vision and Pattern Recognition (CVPR) 2016
[PDF] [Supplemental] [BibTex]

@inproceedings{Perazzi2016,
  author = {F. Perazzi and J. Pont-Tuset and B. McWilliams and L. {Van Gool} and M. Gross and A. Sorkine-Hornung},
  title = {A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation},
  booktitle = {Computer Vision and Pattern Recognition},
  year = {2016}
}

Please cite these papers in your publications if DAVIS helps your research.

Contact

If you have any further questions, contact us!.

DAVIS Challenge on Video Object Segmentation 2020

Workshop in conjunction with CVPR 2020, Seattle, Washington