DAIGenC collocated with COLING-2025
AI vs. Human – Academic Essay Authenticity Challenge task atThe aim of this task is to address the detection task of AI vs. human authored essays for academic purposes. Please see the Task Description below.
Table of contents:
- Recent Updates
- Contents of the Directory
- Task Description
- Dataset
- Scorer and Official Evaluation Metrics
- Baselines
- Format checker
Recent Updates
- [15/08/2024] Training and dev data released
Contents of the Directory
-
Main folder: baselines
Contains scripts provided for baseline models of the task. -
Main folder: example_scripts
Contains an example script provided to run DistilBERT model for the task. -
Main folder: format_checker
Contains scripts provided to check the format of the submission file. -
Main folder: scorer
Contains scripts provided to score the output of the model when provided with the label (i.e., dev). -
README.md
This file!
Task Description
The objective is to identify machine-generated essays to safeguard academic integrity and prevent the misuse of large language models in educational settings. The input to the system would be essays including texts authored by both native and non-native speakers, as well as essays generated by various large language models.
The task is defined as follows: "Given an essay, identify whether it is generated by a machine or authored by a human." This is a binary classification task and is offered in English and Arabic.
Dataset
The dataset consists of essay written by human and AI generated. The human authored essay has been curated from ETS Corpus of Non-Native Written English. For the AI generated text we used seven different open and closed models, which include GPT-3.5-Turbo, GPT-4o, GPT-4o-mini, Gemini-1.5, Llama-3.1 (8B), Phi-3.5-mini and Claude-3.5.
Given that there is a limitation to distribution human authored dataset, therefore, we kindly request you to fill up the below form. Once we receive your consent, we will send you data as soon as possible.
How to obtain data:
Please fill up data sharing consent form below. We will send the data as soon as possible.
Data sharing consent form
Input data format
Each file uses the JSONL format. A line within the file adheres to the following JSON structure:
{"id": "b8edfe1634c0d601af8cfa3709d04311c464fcfa0fdbe7bf405c57701c141b3b_20544", "essay": "Essay text", "label": "ai"}
Where:
- id: id of the text
- essay: essay written by human or ai
- label: either ai or human
Example
Scorer and Official Evaluation Metrics
Scorers
The scorer for the task is located in the scorer module of the project. The scorer will report official evaluation metrics and other metrics of a prediction file. The scorer invokes the format checker for the task to verify the output is properly shaped. It also handles checking if the provided predictions file contains all tweets from the gold one.
You can install all prerequisites through,
pip install -r requirements.txt
Launch the scorer for the task as follows:
python scorer/task.py --gold-file-path=<path_gold_file> --pred-file-path=<predictions_file>
Example
python scorer/task.py --pred_files_path task_dev_output.txt --gold_file_path data/academic_essay_dev.jsonl
Official Evaluation Metrics
The official evaluation metric for the task is macro-F1. However, the scorer also reports accuracy, precision and recall.
Baselines
The baselines module currently contains a majority, random and a simple n-gram baseline.
Example
python baselines/task.py -t data/academic_essay_train.jsonl -d data/academic_essay_dev_test.jsonl
Format checker
The format checkers for the task are located in the format_checker module of the project. The format checker verifies that your generated results file complies with the expected format.
Before running the format checker please install all prerequisites,
pip install -r requirements.txt
To launch it, please run the following command:
python format_checker/task.py -p results_files
Example
python format_checker/task.py -p ./task.txt
results_files: can be one path or space-separated list of paths
Note that the checker cannot verify whether the prediction file you submit contains all lines, because it does not have access to the corresponding gold file.
Submission
Guidelines
Evaluation consists of two phases:
- Development phase: This phase involves working on the dev-test set.
- Evaluation phase: This phase involves working on the test set, which will be released during the evaluation cycle.
For each phase, please adhere to the following guidelines:
- We request each team to establish and manage a single account for all submissions. Hence, all runs should be submitted through the same account. Any submissions made from multiple accounts by the same team may lead to your system being not ranked from the final ranking in the overview paper.
- The most recently uploaded file on the leaderboard will serve as your final submission.
- Adhere strictly to the naming convention for the output file, which must be labeled as 'task.tsv'. Deviation from this standard could trigger an error on the leaderboard.
- Submission protocol requires you to compress the '.tsv' file into a '.zip' file (for instance, zip task.zip task.tsv) and submit it through the Codalab page.
- With each submission, ensure to include your team name along with a brief explanation of your methodology.
- Each team is allowed a maximum of 50 submissions per day for the given task. Please adhere to this limit.
Submission Format
Submission file format is tsv (tab separated values). A row within the tsv adheres to the following structure:
id label
Where:
- id: a id of the text
- label: either ai or human
Submission Site
Please use the link below to submit your system.
Licensing
ETS essay can only be used only for the shared task. Please accept the license agreement.
Contact
Slack Channel: join
Email: genai-content-detection@googlegroups.com
Communication
Slack Channel: join
Email: genai-content-detection@googlegroups.com
Organizers
- Shammur Absar Chowdhury, Qatar Computing Research Institute, Qatar
- Hind AL-Merekhi, Qatar Computing Research Institute, Qatar
- Muhammad Tasnim Mohiuddin, Qatar Computing Research Institute, Qatar
- Mucahid Kutlu, Qatar University, Qatar
- George Mikros, Hamad Bin Khalifa University, Qatar
- Firoj Alam, Qatar Computing Research Institute, Qatar