1. Information

____________________________________________________________________________________

2. Prerequisites

Two Registrations:

____________________________________________________________________________________

3. Submission Format

Model Format: Participants can train their models using various libraries, such as PyTorch, ONNX, AI Model Efficiency Toolkit (AIMET) quantized models, and TensorFlow. Qualcomm AI Hub supports models trained with these libraries, and can directly compile them for mobile devices. Once the model is compiled using Qualcomm AI Hub, please follow the next two steps to submit it (compiled_job on AIHub) for evaluation and ranking.

There are two steps to complete the submission.

Step1: On Qualcomm AI Hub, share the access permission of the model that you want to submit with lowpowervision@gmail.com. It ensures that our evaluation server can access submitted models from Qualcomm AI Hub.

Step2: Fill up a submission form.

* Please refer to [Track 2 Sample Solution] for more details of submission. (TO BE UPDATED)

Share compile job

# IMPORTANT! You must share your compile job to LPCVC organizers thus we can pull and evaluate it.

compile_job.modify_sharing(add_emails=['lowpowervision@gmail.com'])

Please note: Your models will not be evaluated and ranked unless you complete both steps (step1 & step2). Each model requires a unique submission form because you must specify the Compile Job ID in the form.

____________________________________________________________________________________

4. Evaluation Details

4.1 Data

Training data: We do not limit specific training data for this competition. Participants are free to use any accessible datasets.

Test data: Hidden QEVD (TO BE QUPDATED)

Sample data: QEVD (TO BE QUPDATED)

4.2 Task

Goal: Classify the exercise action in a video clip

Model Input: An 8-frame video clip

Model Output: Classification logic

* Check the provided [Sample Solution (TO BE UPDATED)] for detailed input and output data format for the evaluation pipeline

4.3 Metrics

The evaluation is conducted in two stages:

4.4 Sample Solution

We accept [ResNet-2Plus1D] as a sample solution to better support potential participants. The corresponding latency (inference time) on the test data will be used as the reference to determine if the submitted solutions are valid or not.

4.5 Data Format in Evaluation: All test data will be used to evaluate the submitted solutions online using AIHub. Thus, we prepare the test data into a specific format to fit the requirements of the AIHub platform and QNN libraries.

Input

Video

Data Format

  • Video: .mp4 files ranging between 2-10 seconds
  • RGB is preferred & video audio will be ignored
  • 8 frames are extracted from each video and are stacked as a tensor into 1 clip (3,8,112,112)
  • Participants are free to change spatial resolution (112x112) or the number of clips per video for their submitted model

Explanation

The following preprocessing has been done to adapt QEVD to R(2+1)D:

1) Sort QEVD data into train and val splits, ensuring videos are sorted under their respective action category directories within each split

2) QEVD videos vary in duration, frame rate, and spatial resolution, whereas R(2+1)D requires fixed-size spatiotemporal inputs. Each video is therefore converted into a clip-based representation with a fixed number of frames and spatial resolution

3) For each input video, 8 frames are sampled uniformly across the entire video duration

4) Shorter videos that may not have enough frames initiate a dynamic frame selection process which adjusts the frame rate to ensure 8 frames can be sampled

Before being passed to the model, each sampled frame undergoes the following preprocessing:

1) Each frame is resized to 128 x 171 for normalization

2) A center crop of size 112 x 112 is applied to obtain the fixed spatial resolution expected by R(2+1)D model

3) Frames are converted into PyTorch tensor, stacked, and  permuted to the (C, T, H, W) format

4) Pixel values are converted to float32 and normalized using standard RGB channel-wise mean and standard deviation.

Any additional normalization or preprocessing steps should be included inside the submitted model if required.

Sample data preparation code

import torch, random

import torchvision.transforms as transforms

from torchvision.io import read_video

clip_len = 8

video, _, _ = read_video(video_path, pts_unit="sec")   # (T, H, W, C)

video = video.float()  # uint8 -> float

# pick contiguous clip (fallback: pad with last frame if too short)

num_frames = video.shape[0]

if num_frames >= clip_len:

    start = random.randint(0, num_frames - clip_len)  # or choose center

    clip = video[start : start + clip_len]

else:

    pad = clip_len - num_frames

    clip = torch.cat([video, video[-1:].unsqueeze(0).repeat(pad, 1, 1, 1)], dim=0)

# spatial preproc per-frame (or apply to whole tensor)

spatial = transforms.Compose([transforms.Resize((128, 171)), transforms.CenterCrop((112, 112))])

clip = torch.stack([spatial(frame.permute(2, 0, 1)) for frame in clip])  # (T, C, H, W)

# permute to (C, T, H, W) and scale to [0,1]

clip = clip.permute(1, 0, 2, 3) / 255.0  # (C, T, H, W)

# normalize with repo stats

mean = torch.tensor([0.43216, 0.394666, 0.37645]).view(3, 1, 1, 1)

std = torch.tensor([0.22803, 0.22145, 0.216989]).view(3, 1, 1, 1)

clip = (clip - mean) / std

# add batch dim -> (1, 3, 8, 112, 112)

video_input = clip.unsqueeze(0)

Example data

exampledata.mp4

____________________________________________________________________________________

5. Compile, Profile, Inference via AIHub

 Please refer to the provided [Sample Solution]for details of model compile, profile, and inference via AIHub.

Important! After the close of the submission window, the TOP-5 teams on the leaderboard will be contacted to confirm which model will be their final solution used for the evaluation on the whole test data. The converted ONNX model as well as detailed evaluation scripts should also be requested in addition to the QNN model shared via AIHub.

____________________________________________________________________________________

6. References