Foundation Models Meet Embodied Agents @ CVPR 2026

ENACT Challenge

Evaluating Embodied Cognition with Egocentric Interaction World Modeling

Challenge Instructions Dataset & Code Submit on EvalAI

🏛️

Hosted by: Foundation Models Meet Embodied Agents Workshop, CVPR 2026.

We are organizing the ENACT Challenge, a benchmark designed to evaluate embodied cognition through egocentric interaction world modeling. Participants will be ranked by Pairwise Accuracy and Task Accuracy on a held-out test set.

Quick Links

Dataset & Code: https://github.com/mll-lab-nu/ENACT
Challenge Contact: qinengw@u.northwestern.edu
Submission Portal: EvalAI (Coming Soon)

Challenge Overview

Goal

Given egocentric observations of embodied interactions, predict the correct outcomes for world modeling tasks. Models will be evaluated on their ability to understand and reason about embodied cognition in interactive environments.

What You Do

Train / fine-tune on the ENACT training set.
Develop and validate on the ENACT validation set.
Run inference on the held-out test set (to be released) and submit predictions via EvalAI.

Data Splits

📚

Train

ENACT_train.jsonl

Public

🔬

Validation

ENACT_val.jsonl

Public

🏆

Test (Held-out)

Final evaluation set

Coming Soon

Dataset: Data can be found at huggingface.co/datasets/MLL-Lab/ENACT

Format & loading: Please refer to the official instructions in the ENACT repository.

Evaluation

Primary Metrics:
- Pairwise Accuracy: Measures the model's ability to correctly compare and rank pairs of interactions.
- Task Accuracy: Measures the model's ability to correctly predict task outcomes.
Ranking: Teams are ranked by a weighted combination of Pairwise Accuracy and Task Accuracy.
(Optional) We may additionally report accuracy by task category and interaction type for detailed analysis.
Tie-break: Higher Task Accuracy, then earlier submission time.

Challenge Leaderboard

Performance of submitted methods on the held-out test set.

Click on column headers to sort the results

Baseline Participants

Rank ↕	Team / Method ↕	Overall ↕	Pairwise Acc. ↕	Task Acc. ↕
-	Random Baseline	-	-	-
Challenge submissions coming soon...

Leaderboard will be updated after the test set is released and submissions are evaluated.

Submission

Submission File Format (JSONL)

Submit a single .jsonl file with one JSON object per line, containing:

sample_id (string): Unique identifier for the test sample
prediction (string or integer): Your model's prediction

Example submission format

{"sample_id":"enact_000001","prediction":"A"}
{"sample_id":"enact_000002","prediction":"B"}

Requirements

Provide exactly one prediction for each sample_id in the test set.
Duplicate IDs: Will result in invalid submission
Missing IDs: Count as incorrect / invalid submission
(Recommended) You may gzip the file for size: predictions.jsonl.gz

How to Submit

Download the held-out test set (coming soon)

Generate your predictions.jsonl following the required format.

Name the file as: TeamName_MethodName.jsonl (or .jsonl.gz)

Submit via EvalAI platform (link coming soon)

Submission Limit: Up to 5 submissions per team; best submission counts

Deadline: TBD

Results Announcement: TBD

Rules

External data / models / APIs: Allowed with disclosure
Teams may use external datasets, pre-trained models, or APIs, but must clearly disclose all external resources used in their submission.
Human-in-the-loop labeling on test: Disallowed
Participants must not attempt to obtain test labels or manipulate evaluation.
Verification: Top teams may be asked to provide a brief method description and reproducibility details.
Team size: No limit on team size, but each team may only submit under one team name.

Baselines & Starter Kit

Baselines, data loaders, and evaluation scripts are available in the official ENACT repository:

github.com/mll-lab-nu/ENACT

Getting Started: Check out our baseline implementations and starter code to quickly get up and running with the ENACT dataset.

Contact

For questions, please reach out via:

Email Us GitHub Issues

Back to ENACT Home