1. Information


2. Prerequisites

Two Registrations:

Please refer to our sample solution for track3 for more details of participation.


3. Submission Format

Model Format: Participants can train their models using various libraries, such as PyTorch, ONNX, AI Model Efficiency Toolkit (AIMET) quantized models, and TensorFlow. Qualcomm AI Hub supports models trained with these libraries, and can directly compile them for mobile devices. Once the model is compiled using Qualcomm AI Hub, please follow the next two steps to submit it for evaluation and ranking.

There are two steps to complete the submission.

Step1: On Qualcomm AI Hub, share the access permission of the model that you want to submit with lowpowervision@gmail.com. It ensures that our evaluation server can access submitted models from Qualcomm AI Hub.

Step2: Fill up a submission form.

Please refer to our sample solution for track3 for more details of submission.

Please note: Our models will not be evaluated and ranked unless we complete both steps (step1 & step2). Each model requires a unique submission form because we will specify the Compile Job ID in the form.


4. Evaluation Details

4.1 Data

The evaluation dataset comprises 2,000 RGB images captured in various indoor and outdoor scenes under different lighting conditions (normal and low light) using a range of mobile devices. As detailed in the table below, the dataset includes 500 indoor images with normal light, 500 indoor images with low light, 500 outdoor images with normal light, and 500 outdoor images with low light. 10% of these evaluation RGB images will be publicly accessible.

Scene \ Lighting

Normal Light

Low Light

Indoor

500 images

500 images

Outdoor

500 images

500 images

Each image is accompanied by a corresponding Depth Map and Confidence Map. The Depth Map indicates the depth of each pixel in the RGB image, while the Confidence Map shows the confidence level of the corresponding pixel in the Depth Map. The following picture displays some samples (indoor normal light, indoor low light, outdoor low light). Each row contains the RGB image, Visualized Depth Map, and Visualized Confidence Map from left to right.

4.2 Model input and output

4.3 Metrics

Given that this challenge focuses on low-power computer vision tasks, we propose a two-stage evaluation strategy:

Note: 

4.4 Leaderboard

Once the submission form is uploaded, the model (specified by the compile job id) will be evaluated and the ranking result will be available on our leaderboard.


References

  1. Jaime Spencer, C. Stella Qian, Chris Russell, Simon Hadfield, Erich Graf, Wendy Adams, Andrew J. Schofield et al. "The monocular depth estimation challenge." In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 623-632. 2023.
  2. Filippo Aleotti, Fabio Tosi, Matteo Poggi, and Stefano Mattoccia. "Generative adversarial networks for unsupervised monocular depth prediction." In Proceedings of the European conference on computer vision (ECCV) workshops, pp. 0-0. 2018.
  3. Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden et al. "The third monocular depth estimation challenge." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1-14. 2024.
  4. Evin Pınar Örnek, Shristi Mudgal, Johanna Wald, Yida Wang, Nassir Navab, and Federico Tombari. "From 2D to 3D: Re-thinking benchmarking of monocular depth prediction." arXiv preprint arXiv:2203.08122. 2022.