Robust Vision Challenge

The Robust Vision Challenge 2020 was a virtual full day event held in conjunction with ECCV 2020 in Glasgow. Videos of the full workshop are available at YouTube:

Introduction to Robust Vision Challenge 2020

First Live Session: 12h-14h UTC+1

Youtube link for session 1
  • 12h00-12h15: Introduction / Announcement of RVC Winners
  • 12h15-12h30: CFNet_RVC (Stereo)
  • 12h30-12h45: NLCA_NET_v2_RVC (Stereo)
  • 12h45-13h00: PRAFlow_RVC (Flow)
  • 13h00-13h15: RMDP_RVC (Depth)
  • 13h15-13h30: wisedet_RVC (Object Det.)
  • 13h30-13h45: EffPS_b1bs4_RVC (Panoptic)
  • 13h45-14h00: Closing

Second Live Session: 22h-24h UTC+1

Youtube link for session 2
  • 22h00-22h15: Introduction / Announcement of RVC Winners
  • 22h15-22h30: RAFT-TF_RVC (Flow)
  • 22h30-23h45: UniDet_RVC (Object Det.)
  • 22h45-23h00: UniDet_RVC (Instance)
  • 23h00-23h15: SN_RN152pyrx8_RVC (Semantic)
  • 23h15-23h30: MSeg1080_RVC (Semantic)
  • 23h30-24h00: Closing

Our 2020 Keynote Speakers:

Keynote: Robustness Across the Data Abundance Spectrum

Ross Girshick is a research scientist at Facebook AI Research (FAIR), working on computer vision and machine learning. He received a PhD in computer science from the University of Chicago under the supervision of Pedro Felzenszwalb in 2012. Prior to joining FAIR, Ross was a researcher at Microsoft Research, Redmond and a postdoc at the University of California, Berkeley, where he was advised by Jitendra Malik and Trevor Darrell. His interests include instance-level object understanding and visual reasoning challenges that combine natural language processing with computer vision. He received the 2017 PAMI Young Researcher Award and is well-known for developing the R-CNN approach to object detection. In 2017, Ross also received the Marr Prize at ICCV for Mask R-CNN.


Keynote: Noisy Student Training for Robust Vision

Quoc Le is a Principal Scientist at Google Brain, where he works on large scale brain simulation using unsupervised feature learning and deep learning. His work focuses on object recognition, speech recognition and language understanding. Quoc obtained his PhD at Stanford, undergraduate degree with First Class Honours and Distinguished Scholar at the Australian National University, and was a researcher at National ICT Australia, Microsoft Research and Max Planck Institute of Biological Cybernetics. Quoc won best paper award as ECML 2007.


Keynote: What Do Our Models Learn?

Aleksander Mądry is Professor of Computer Science in the MIT EECS Department. He is a principal investigator in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), the Director of the MIT Center for Deployable Machine Learning, and the Faculty Lead of the CSAIL-MSR Trustworthy and Robust AI Collaboration. Aleksander received his PhD from MIT in 2011. Prior to joining MIT, he spent time at Microsoft Research New England and on the faculty of EPFL. Aleksander's research interests span algorithms, continuous optimization, science of deep learning and understanding machine learning from a robustness perspective. His work has been recognized with a number of awards, including an NSF CAREER Award, an Alfred P. Sloan Research Fellowship, an ACM Doctoral Dissertation Award Honorable Mention, and 2018 Presburger Award.


Epilogue of Robust Vision Challenge 2020

Challenges

RVC 2020 featured seven challenges: stereo, optical flow, single image depth prediction, object detection, semantic segmentation, instance segmentation, and panoptic segmentation. Participants are free to submit to a single challenge or to multiple challenges. For each challenge, the results of a single model must be submitted to all benchmarks (indicated with an x below).

Stereo
Flow
Depth
Obj. Det.
Semantic
Instance
Panoptic

Winners and prices for the seven challenges in 2020 were:

1st Place: $1200

2nd Place: $600

Presentation at our
ECCV 2020 Workshop

RVC 2020 Stereo Leaderboard

1
CFNet_RVC
2
2
1
Submitted by Anonymous
2
NLCA_NET_v2_RVC
3
1
2
3
HSM-Net_RVC
1
6
3
4
CVANet_RVC
5
3
4
5
AANet_RVC
6
5
5
6
GANetREF_RVC
7
4
6
Baseline - Submitted by Nicolas Jourdan (RVC Team)
7
SGM_RVC
4
7
7
8
ELAS_RVC
8
8
8
Baseline - Submitted by Thomas Schöps (RVC Team)
STTRV1_RVC
[incomplete submission] Submitted by Anonymous

RVC 2020 Flow Leaderboard

RVC 2020 Semantic Segmentation Leaderboard

1
SN_RN152pyrx8_RVC
2
2
1
1
1
1
1
2
MSeg1080_RVC
1
1
1
2
2
2
2
Baseline - Submitted by John Lambert (Georgia Tech)
seamseg_rvcsubset
[incomplete submission] Baseline - Submitted by Oliver Zendel (RVC Team)
EffPS_b1bs4_RVC
[incomplete submission] Submitted by Rohit Mohan (University of Freiburg)

RVC 2020 Instance Segmentation Leaderboard

1
UniDet_RVC
1
1
1
1
1
1
1
1
Submitted by Anonymous
seamseg_rvcsubset
[incomplete submission] Baseline - Submitted by Oliver Zendel (RVC Team)
EffPS_b1bs4_RVC
[incomplete submission] Submitted by Rohit Mohan (University of Freiburg)


FAQ

When will we know the final result/winners?

The final winners are announced at the workshop live sessions on 28th August (12-14; 22-24h UTC+1). The leaderboard might still change after the submission deadline due to long evaluation times, stuck evaluations, or legitimate claims for a reevaluation (e.g. if the benchmark was down/had a bug). Finally, we might have to remove entries for violation of the contest rules.

What format/content should be in the report?

Please send a pdf version of your report (1 or 2 pages, no special format/layout required) to rvc2020eccvw@gmail.com. Note that the report will be publicly available on this website. It should include:
  • Data/Datasets used for training (supervised, semi-, and unsupervised)
  • Short summary of data augmentations used during training
  • Benchmark-specific steps you took for individual datasets during training and submission. If there are label mappings, a full list of intermediary labels and mappings to/from this space should be included or a public source with this information must be linked
  • A short paragraph on your methodology/differences vs. state of the art
  • A short paragraph about the biggest challenge met specifically when dealing with multiple leaderboards and data policies.
  • If there is a paper connected to the submission: a BibTeX entry
  • Include URLs if code is publicly available

What interinsics/scale is correct for the mono depth prediction task?

The metrics in this task are scale-invariant! In general, we recommend you use KITTI's intrinsics.

What happened to the Obj365 dataset/benchmark?

Megvii had to pull support for RVC due to internal policy changes. Please see objects365.org for more details. The object detection challenge will be held using the remaining benchmarks: COCO, MVD, and OID. Due to this change, we have extended the submission deadline for all seven tasks to August 14th.

Rules for RVC2020

The aim of RVC is to push real-world usability and reduce dataset bias of solutions for the defined computer vision tasks: stereo, optical flow, monocular depth estimation, object detection, semantic segmentation, instance segmentation, and panoptic segmentation.

Participants shall create solutions which are agnostic to the input dataset. Submissions which deliberately include a dataset recognition part and dataset-specific sub-solutions are prohibited.

In practice, some tasks may require meta-information for successful model training (e.g. negative labels per image for object detection). We allow the use of such meta-information during training as long as the resulting solution is dataset-agnostic. The detection of dataset sources during prediction of the test data is not allowed. A valid RVC submission must create a unified result from the union of the benchmarking frames without identifying the individual dataset sources. Our dev kit helps to apply basic preprocessing to normalize/unify the datasets.

The unified result may be post-processed for individual benchmark submissions. Our dev kit also provides support for this. The unified results shall be kept archived until the challenge is concluded and should be valid on their own (i.e. a proper prediction for the task at hand in a compatible data format; logits per class per pixel are also allowed). The potential winners of prize money (first/second place per task) are required to allow organizers inspection of the training and prediction source code, process, and unified results to verify compliance with the RVC rules. This inspection is done in confidence and no details about your solution are publicized or shared with the other workshop organizers or participants. We encourage all participants to eventually open source their solutions, but this is not mandatory.

Organizers of the challenge cannot receive prize money. Should such entries be among the top two spots, the next participants in line win the respective prize money.

For tasks that require predicting semantic labels, such as object detection, semantic segmentation, instance segmentation, and panoptic segmentation:

The model should not have manually designed dataset-specific components, such as dataset-specific heads. The model must predict in a dataset-agnostic unified label space. We provide such a space for each task in the dev kits. Participants can use their own unified label space, but its cardinality (number of classes/logits) must not be higher than these limits (defined per task):

  • Object Detection: 700
  • Semantic Segmentation: 300
  • Instance Segmentation: 400
  • Panoptic Segmentation: 200
Motivaiton for individual labels in a unified label space should be semantic-driven rather than dataset-driven.

These limits are based on the RVC unified label space per task with some added slack and shall prevent cheating. The participant must publicly disclose their label space with their submission. The participant must upload a specification of their label space along with their submission to RVC. The participant is allowed to use simple post-processing scripts that project from their unified label space to a dataset-specific space for their submission to each specific leaderboard. But all such scripts must be made public, open-source, and submitted for inspection along with their submission to RVC. The scripts must operate on each class individually and cannot access input images or spatial locations of predictions (e.g., boxes, masks, or image coordinates). Operations such as non-maximum suppression are not allowed during dataset-specific post-processing (NMS is allowed during the dataset-agnostic generation of the unified prediction). The post-processing must operate on each class separately and can only map from a given class (or logits) in the unified label space to a class in a dataset-specific label space.

Please see the RVC dev kit for more details on how to participate at each task and help for training unification: https://github.com/ozendelait/rvc_devkit/tree/release


In general, approaches should be dataset agnostic and work well on unconstrained new data. Good: "Solve the task"; Bad: "Solve the dataset". In detail, some leeway has to be given to allow smooth training and the creation of valid submissions. Here is a summary of allowed and prohibited approaches:

Invalid Approaches:

  • Benchmarking using individual versions/parameters/models per dataset to directly generate individual predictions
  • Training individual separate solutions per dataset
  • Designing a solution with the explicit number of datasets in mind (e.g. having the same number of encoder/decoders in parallel as the number of datasets) to guide the network towards the creation of internally separate solutions per dataset. Another invalid example is the design of parallel input layers based on the number of datasets.
  • Using dataset-specific pre-processing during benchmarking
  • Using the input frame or side channel information to post-process the unified label data into dataset-specific submissions during benchmarking. The postprocessing should only work with the intermediar "unified" result, a result which is valid and usable on its own, and transform it into a valid submission.

Valid

  • Choosing individual sampling strategy per dataset during training
  • Using your own unified label space
  • Using dataset-specific pre- / post- processing steps to convert to a unified label space during training
  • Using dataset-specific post processing steps during benchmarking
  • Create/Keep logits as the unified result (while observing cardinality limits) and combine the logits before argmax during the dataset-specific post-processing


RVC 2020 was sponsored by:

Gold Sponsors


Silver Sponsors




eXTReMe Tracker