ENACT Challenge
Evaluating Embodied Cognition with Egocentric Interaction World Modeling
We are organizing the ENACT Challenge, a benchmark designed to evaluate embodied cognition through egocentric interaction world modeling. Participants will be ranked by Pairwise Accuracy and Task Accuracy on a held-out test set.
Quick Links
- Challenge Dataset: huggingface.co/datasets/Inevitablevalor/ENACT-Challenge
- Slack (preferred): Join the ENACT Challenge Slack — we highly encourage participants to communicate here.
- Challenge Contact: qinengw@u.northwestern.edu
- Submission Portal: EvalAI — ENACT Challenge
Challenge Overview
Goal
Given egocentric observations of embodied interactions, predict the correct outcomes for world modeling tasks. Models will be evaluated on their ability to understand and reason about embodied cognition in interactive environments.
What You Do
- Train / fine-tune on the ENACT training set.
- Develop and validate on the ENACT validation set.
- Run inference on the held-out test set (to be released) and submit predictions via EvalAI.
Data Splits
Train
ENACT_train.jsonl
Validation
ENACT_val.jsonl
Test (Held-out)
Final evaluation set
AvailableDataset: Dev and test splits are available at huggingface.co/datasets/Inevitablevalor/ENACT-Challenge
Format & loading: Please refer to the official instructions in the ENACT repository.
Evaluation
- Primary Metrics:
- Pairwise Accuracy: Measures the model's ability to correctly compare and rank pairs of interactions.
- Task Accuracy: Measures the model's ability to correctly predict task outcomes.
- Ranking: Teams are ranked by a weighted combination of Pairwise Accuracy and Task Accuracy.
- (Optional) We may additionally report accuracy by task category and interaction type for detailed analysis.
- Tie-break: Higher Task Accuracy, then earlier submission time.
Challenge Leaderboard
Performance of submitted methods on the held-out test set.
Click on column headers to sort the results
| Rank ↕ | Team / Method ↕ | Overall ↕ | Pairwise Acc. ↕ | Task Acc. ↕ |
|---|---|---|---|---|
| - | Random Baseline | - | - | - |
| Challenge submissions coming soon... | ||||
Submission
Submission File Format (JSONL)
Submit a single .jsonl file with one JSON object per line, containing:
id(string) — the sample ID from the test setanswer(string) — the predicted ordering as a stringified list of indices (e.g.,"[3, 4, 2, 1]")
{"id": "enact_000001", "answer": "[3, 4, 2, 1]"}
{"id": "enact_000002", "answer": "[1, 2, 4, 3]"} Requirements
- Provide exactly one prediction for each
idin the test set. - Duplicate IDs: Keep last / invalid submission
- Missing IDs: Count as incorrect / invalid submission
How to Submit
Download the test set from Hugging Face.
Generate your predictions.jsonl following the required format.
Name the file as: TeamName_MethodName.jsonl
Submit your predictions on EvalAI. Questions? Contact qinengw@u.northwestern.edu.
Rules
- External data / models / APIs: Open-source models and external data are allowed. Commercial API-only (closed-source) models are disallowed. Please disclose any external resources used in your method description.
- Human-in-the-loop labeling on test: Disallowed
- Participants must not attempt to obtain test labels or manipulate evaluation.
- Verification: Top teams may be asked to provide a brief method description and reproducibility details.
- Team size: No limit on team size, but each team may only submit under one team name.
Baselines & Starter Kit
Baselines, data loaders, and evaluation scripts are available in the official ENACT repository:
github.com/mll-lab-nu/ENACTGetting Started: Check out our baseline implementations and starter code to quickly get up and running with the ENACT dataset.
Contact
For questions, please reach out via: