DAVIS Challenge on Video Object Segmentation 2019

Workshop in conjunction with CVPR 2019, Long Beach, California


The interactive scenario assumes the user gives iterative refinement inputs to the algorithm, in our case in the form of a scribble, to segment the object of interest. Methods have to produce the segmentation mask for that object in all the frames of a video sequence taking into account all the user interactions.

Dates and Phases

- Test 2019: 5th May 2019 23:59 UTC - 24th May 2019 23:59 UTC.


- Right after the Challenge closes (24th May) we will invite all participants to submit a short abstract (400 words maximum) of their method (Deadline 29th of May, 23:59 UTC).
- Together with the results obtained, we will decide which teams are accepted at the workshop. Date of notification June 3rd.
- Accepted teams will be able to submit a paper describing their approach (Deadline 12th June, 23:59 UTC). The template of the paper is the same as CVPR, but length will be limited to 4 pages including references.
- Papers will also be invited to the workshop in form of oral presentation or poster.
- Accepted papers will be self-published in the web of the challenge (not in the official proceedings, although they have the same value).

Datasets (Download here, 480p resolution)

- Train 2017 + Val 2017: 90 sequences, the 50 original DAVIS 2016 sequences (reannotated with multiple objects) plus 40 new sequences. The scribbles for this set can be obtained here (coming soon).
- Test-Dev 2017: 30 new sequences, available as of beginning April 2017. Ground truth not publicly available, unlimited number of submissions. The scribbles are obtained interacting with the server.
Local evaluation of the results is available for train and val sets. The evaluation using test-dev against the server is available during the Test 2019 period. In both cases, please use the Interactive Python package
Feel free to train or pre-train your algorithms on any other dataset apart from DAVIS (Youtube-VOS, MS COCO, Pascal, etc.) or use the full resolution DAVIS annotations and images.


We have released a Python package to simulate the human interaction, you can find more information here. You don't have to submit the results anywhere, the result of the interactions are logged in the server by just using the Python package that we provide. Once all the interactions have been done, the server will check which is your best submission and update the leaderboard accordingly.


This year we are trying to increase the prize pool, so stay tuned!


The main idea is that our servers will simulate human interaction in the form of scribbles. A participant will connect to our servers and receive a set of scribbles, to which they should reply with a video segmentation. The servers will register the time spent and the quality of the results, and then reply with a refinement scribble. We consider two metrics for the challenge:
  • We compute the Area Under the Curve (AUC) of the plot Time vs Jaccard. Each sample in the plot is computed considering the average time and the average Jaccard obtained for the whole test-dev for a certain interaction. If the user is not able to compute all the interactions, the values from the last valid interactions will be used for the following ones.
  • We interpolate the previous Time vs Jaccard plot at 60 seconds to obtain a Jaccard value. This evaluates which quality a method can obtain in 60 seconds for a sequence containing the average number of objects in test-dev of DAVIS 2017 (~3 objects).
For the challenge, the maximum number of interactions is 8 and the maximum time is 30 seconds per object for each interaction (so if there are 2 objects in a sequence, your method has 1 minute for each interaction). Therefore, in order to do 8 interactions, the timeout to interact with a certain sequence is computed as 30*num_obj*8. If the timeout is reached before finishing the 8 interactions, the last interaction will be discarded and only the previous will be considered for evaluation.
You can find more information here (coming soon, information from last year as a reference here) and in the the paper below.

Other considerations

- Each entry must be associated to a team and provide its affiliation.
- The best entry of each team will be public in the leaderboard at all times.
- We will only consider the "interactive" scenario: only supervision provided by the scribbles that the server sends is allowed. Although we have no way to check the latter at the challenge stage, we will make our best to detect it a posteriori before the workshop.
- We reserve the right to remove one of the entry methods to the competition when there is a high technical similarity to methods published in previous conferences or workshops. We do so in order to keep the workshop interesting and to push state of the art to move forward.
- The new annotations in this dataset belong to the organizers of the challenge and are licensed under a Creative Commons Attribution 4.0 License.



The 2018 DAVIS Challenge on Video Object Segmentation
S. Caelles, A. Montes, K.-K. Maninis, Y. Chen, L. Van Gool, F. Perazzi, and J. Pont-Tuset
arXiv:1803.00557, 2018
[PDF] [BibTex]

              author = {Sergi Caelles and Alberto Montes and Kevis-Kokitsi Maninis and Yuhua Chen and Luc {Van Gool} and Federico Perazzi and Jordi Pont-Tuset},
              title = {The 2018 DAVIS Challenge on Video Object Segmentation},
              journal = {arXiv:1803.00557},
              year = {2018}

The 2017 DAVIS Challenge on Video Object Segmentation
J. Pont-Tuset, F. Perazzi, S. Caelles, P. Arbeláez, A. Sorkine-Hornung, and L. Van Gool
arXiv:1704.00675, 2017
[PDF] [BibTex]

              author = {Jordi Pont-Tuset and Federico Perazzi and Sergi Caelles and Pablo Arbel\'aez and Alexander Sorkine-Hornung and Luc {Van Gool}},
              title = {The 2017 DAVIS Challenge on Video Object Segmentation},
              journal = {arXiv:1704.00675},
              year = {2017}

A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation
F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung
Computer Vision and Pattern Recognition (CVPR) 2016
[PDF] [Supplemental] [BibTex]

author = {F. Perazzi and J. Pont-Tuset and B. McWilliams and L. {Van Gool} and M. Gross and A. Sorkine-Hornung},
title = {A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation},
booktitle = {Computer Vision and Pattern Recognition},
year = {2016}
Please cite these papers in your publications if DAVIS helps your research.


If you have any further questions, contact us!.