1. Information

____________________________________________________________________________________

2. Prerequisites

Two Registrations:

____________________________________________________________________________________

3. Submission Format

Model Format: Participants can train and quantize their models using various libraries, such as PyTorch, ONNX, AI Model Efficiency Toolkit (AIMET). Qualcomm AI Hub and Qualcomm AI SDK support models trained with these libraries, and can compile them for mobile devices. Once the model is generated, please submit to organizers for evaluation and ranking.

* Please refer to [Sample Solution] for more details of model quantization and compilation. (TO BE UPDATED) 

* For Vision-Language models, vision module and language module will need separate compilation.

Please note: Your models will not be evaluated and ranked unless you complete the entire model compilation.

____________________________________________________________________________________

4. Evaluation Details

4.1 Task Description

Track 3 focuses on detecting AI-generated images and providing structured explanations for the detection results. Unlike traditional binary classification task, this track additionally introduces a Multi-Criteria AIGC Image Evaluation pipeline, requiring models to reason about image authenticity across eight criteria:

  1. Lighting & Shadows Consistency
  2. Edges & Boundaries
  3. Texture & Resolution
  4. Perspective & Spatial Relationships
  5. Physical & Common-Sense Logic
  6. Text & Symbols
  7. Human & Biological Structure Integrity
  8. Material & Object Details

Models must output structured content (JSON) for both ground truth and predictions, decomposing explanations into per-criterion scores, confidence, and evidence.

4.2 Evaluation Metrics

Detection Task: F1-score (Detection score)

Explanation Task:

Final Score: 0.5 × Detection score + 0.5 × Explanation score

 *Model’s output format must follow a specific template, please refer to the provided [Sample Solution] for output JSON structure

4.3 Sample output

AI Generated Image

Sample response:

{

  "aggregate_suspicion_score": 8,

  "overall_likelihood": "AI-Generated",

  "overall_confidence": 100,

  "per_criterion": [

    {

      "criterion": "Lighting & Shadows Consistency",

      "score": 1,

      "confidence": 80,

      "evidence": "The overall lighting is consistent, but the contact shadow where the watch body meets the wooden table is overly soft and lacks a defined edge, suggesting an imperfect simulation of physical contact."

    },

    {

      "criterion": "Edges & Boundaries",

      "score": 0,

      "confidence": 95,

      "evidence": "The edges and boundaries of the watch and surrounding objects appear clean and consistent with photographic depth of field, with no obvious haloing or slicing artifacts."

    },

    {

      "criterion": "Texture & Resolution",

      "score": 1,

      "confidence": 90,

      "evidence": "The wood grain texture on the table is overly smooth and uniform, lacking the natural micro-variations and imperfections of real wood. The entire image possesses a characteristic synthetic smoothness often seen in AI generations."

    },

    {

      "criterion": "Perspective & Spatial Relationships",

      "score": 1,

      "confidence": 75,

      "evidence": "The watch appears to be slightly floating rather than resting firmly on the table surface, an effect exaggerated by the weak contact shadow."

    },

    {

      "criterion": "Physical & Common Sense Logic",

      "score": 1,

      "confidence": 85,

      "evidence": "The reflection on the watch face is a generic, hazy light source that does not correspond to a specific environment, which is common in generated images lacking a coherent world model."

    },

    {

      "criterion": "Text & Symbols",

      "score": 2,

      "confidence": 100,

      "evidence": "The characters displayed on the watch screen are completely incoherent. The main 'time' display is a nonsensical jumble of glyphs, and the smaller icons are equally meaningless, a definitive red flag for AI generation."

    },

    {

      "criterion": "Human & Biological Structure Integrity",

      "score": 0,

      "confidence": 100,

      "evidence": "Not assessable as no human or biological subjects are present in the image."

    },

    {

      "criterion": "Material & Object Details",

      "score": 2,

      "confidence": 95,

      "evidence": "The watch strap appears to merge or fuse directly into the watch lug on the lower left side without a proper pin or attachment mechanism. The pen next to the notepad is overly simplified and lacks realistic details like a brand or seam."

    }

  ]

}

____________________________________________________________________________________

5. Compile, Profile, Inference via AIHub

 Please refer to the provided [Sample Solution]for details of model compile, profile, and inference via AIHub.

Important! After the close of the submission window, the TOP-5 teams on the leaderboard will be contacted to confirm which model will be their final solution used for the evaluation on the whole test data. The converted ONNX model as well as detailed evaluation scripts should also be requested in addition to the QNN model shared via AIHub.

____________________________________________________________________________________

6. References