Document Proxy

1. Information

Submission Window: From 12:00 AM Eastern Time on March 1, 2026, to 11:59 PM Eastern Time on April 30, 2026
Sponsor: Qualcomm Technology Inc.
Competition: 2026 Low-Power Computer Vision Challenge (2026 LPCVC)
Vision Task: Video Action Recognition
Hardware: The submitted model will be evaluated under the following platforms on Qualcomm AI Hub:

Qualcomm Dragonwing IQ-9075 EVK
(Important! We have transitioned Track 2 evaluation from the Snapdragon mobile platform to the Dragonwing platform to more effectively leverage Qualcomm’s broad and diverse platform experience.)

Software: Qualcomm AI Hub
Technical Support: Subscribe to the newsletter or join the Qualcomm AI Hub Slack workspace. Make sure to join channel #lpcvc for competition related notifications.
Evaluation: Details see below
Prizes:
Champion: $6,000
2nd: $3,000
3rd: $1,000
$300 for the first 5 teams with valid submissions (better than sample solution)

____________________________________________________________________________________

2. Prerequisites

Two Registrations:

Each team is required to register a team account and sign the agreement document (one team only needs to register once). We will use this registration information to manage teams and their submissions. (Updates: The registration page is available!)
Sign up for an account on Qualcomm® AI Hub (top right corner). Every team member can register an account if they want. The Qualcomm® AI Hub is a powerful tool for users to compile, profile, and infer images on real mobile devices

____________________________________________________________________________________

3. Submission Format

Model Format: Participants can train their models using various libraries, such as PyTorch, ONNX, AI Model Efficiency Toolkit (AIMET) quantized models, and TensorFlow. Qualcomm AI Hub supports models trained with these libraries, and can directly compile them for mobile devices. Once the model is compiled using Qualcomm AI Hub, please follow the next two steps to submit it (compiled_job on AIHub) for evaluation and ranking.

There are two steps to complete the submission.

Step1: On Qualcomm AI Hub, share the access permission of the model that you want to submit with lowpowervision@gmail.com. It ensures that our evaluation server can access submitted models from Qualcomm AI Hub.

Step2: Fill up a submission form.

* Please refer to [Track 2 Sample Solution] for more details of submission.

Share compile job

# IMPORTANT! You must share your compile job to LPCVC organizers thus we can pull and evaluate it.

compile_job.modify_sharing(add_emails=['lowpowervision@gmail.com'])

Please note: Your models will not be evaluated and ranked unless you complete both steps (step1 & step2). Each model requires a unique submission form because you must specify the Compile Job ID in the form.

____________________________________________________________________________________

4. Evaluation Details

4.1 Data

Training data: We do not limit specific training data for this competition. Participants are free to use any accessible datasets.

Test data: Hidden QEVD

Sample data: QEVD

Class Labels

4.2 Task

Goal: Classify the exercise action in a video clip

Model Input: A 16-frame video clip

Model Output: Classification logic

* Check the provided [Sample Solution] for detailed input and output data format for the evaluation pipeline

4.3 Metrics

The evaluation is conducted in two stages:

Stage 1: Execution Time: For all submitted models, only those with a faster execution time than 34ms as a valid solution.
Stage 2: Accuracy: For valid solutions, we will calculate the prediction accuracy across all test data and rank the results accordingly.

4.4 Sample Solution

We accept [ResNet-2Plus1D] as a sample solution to better support potential participants. The corresponding latency (inference time) on the test data will be used as the reference to determine if the submitted solutions are valid or not.

4.5 Data Format in Evaluation: All test data will be used to evaluate the submitted solutions online using AIHub. Thus, we prepare the test data into a specific format to fit the requirements of the AIHub platform and QNN libraries.

Input	Video
Data Format	Video: .mp4 files ranging between 2-10 seconds RGB is preferred & video audio will be ignored 16 frames are extracted from each video and are stacked as a tensor into 1 clip (3,16,112,112)
Explanation	The following preprocessing has been done to adapt QEVD to R(2+1)D: 1) Sort QEVD data into train and val splits, ensuring videos are sorted under their respective action category directories within each split 2) QEVD videos vary in duration, frame rate, and spatial resolution, whereas R(2+1)D requires fixed-size spatiotemporal inputs. Each video is therefore converted into a clip-based representation with a fixed number of frames and spatial resolution 3) For each input video, 16 frames are sampled uniformly across the entire video duration 4) Shorter videos that may not have enough frames initiate a dynamic frame selection process which adjusts the frame rate to ensure 16 frames can be sampled Before being passed to the model, each sampled frame undergoes the following preprocessing: 1) Each frame is resized to 128 x 171 for normalization 2) A center crop of size 112 x 112 is applied to obtain the fixed spatial resolution expected by R(2+1)D model 3) Frames are converted into PyTorch tensor, stacked, and permuted to the (C, T, H, W) format 4) Pixel values are converted to float32 and normalized using standard RGB channel-wise mean and standard deviation. Any additional normalization or preprocessing steps should be included inside the submitted model if required.
Sample data preparation code	import torch import torchvision.transforms as transforms from torchvision.datasets.video_utils import VideoClips clip_len = 16 frame_rate = 4 # expects `video_path` to be defined video_paths = [video_path] video_clips = VideoClips( video_paths, clip_length_in_frames=clip_len, frames_between_clips=1, frame_rate=frame_rate, output_format="TCHW", ) if video_clips.num_clips() < 1: raise ValueError(f"Not enough frames to form a clip of length {clip_len} after resampling.") # Deterministic: first available clip (matches UniformClipSampler with clips_per_video=1) clip, _, info, video_idx = video_clips.get_clip(0) mean = (0.43216, 0.394666, 0.37645) std = (0.22803, 0.22145, 0.216989) spatial = transforms.Compose( [ transforms.ConvertImageDtype(torch.float32), transforms.Resize((128, 171), antialias=False), transforms.Normalize(mean=mean, std=std), transforms.CenterCrop((112, 112)), ] ) # clip: (T, C, H, W) clip = spatial(clip) # (C, T, H, W) clip = clip.permute(1, 0, 2, 3) # add batch dim -> (1, 3, 16, 112, 112) video_input = clip.unsqueeze(0)
Example data	exampledata.mp4

____________________________________________________________________________________

5. Compile, Profile, Inference via AIHub

Please refer to the provided [Sample Solution]for details of model compile, profile, and inference via AIHub.

Important! After the close of the submission window, the TOP-5 teams on the leaderboard will be contacted to confirm which model will be their final solution used for the evaluation on the whole test data. The converted ONNX model as well as detailed evaluation scripts should also be requested in addition to the QNN model shared via AIHub.

____________________________________________________________________________________

6. References