Skip to content
Snippets Groups Projects
Select Git revision
  • main default protected
1 result

genai-content-detection-coling-2025

  • Clone with SSH
  • Clone with HTTPS
  • Firoj Alam's avatar
    Firoj Alam authored
    6a630e49
    History

    AI vs. Human – Academic Essay Authenticity Challenge task at DAIGenC collocated with COLING-2025

    The aim of this task is to address the detection task of AI vs. human authored essays for academic purposes. Please see the Task Description below.

    Table of contents:

    Recent Updates

    • [15/08/2024] Training and dev data released

    Contents of the Directory

    • Main folder: baselines
      Contains scripts provided for baseline models of the task.

    • Main folder: example_scripts
      Contains an example script provided to run DistilBERT model for the task.

    • Main folder: format_checker
      Contains scripts provided to check the format of the submission file.

    • Main folder: scorer
      Contains scripts provided to score the output of the model when provided with the label (i.e., dev).

    • README.md
      This file!

    Task Description

    The objective is to identify machine-generated essays to safeguard academic integrity and prevent the misuse of large language models in educational settings. The input to the system would be essays including texts authored by both native and non-native speakers, as well as essays generated by various large language models.

    The task is defined as follows: "Given an essay, identify whether it is generated by a machine or authored by a human." This is a binary classification task and is offered in English and Arabic.

    Dataset

    The dataset consists of essay written by human and AI generated. The human authored essay has been curated from ETS Corpus of Non-Native Written English. For the AI generated text we used seven different open and closed models, which include GPT-3.5-Turbo, GPT-4o, GPT-4o-mini, Gemini-1.5, Llama-3.1 (8B), Phi-3.5-mini and Claude-3.5.

    Given that there is a limitation to distribution human authored dataset, therefore, we kindly request you to fill up the below form. Once we receive your consent, we will send you data as soon as possible.

    How to obtain data:

    Please fill up data sharing consent form below. We will send the data as soon as possible.
    Data sharing consent form

    Input data format

    Each file uses the JSONL format. A line within the file adheres to the following JSON structure:

    {"id": "b8edfe1634c0d601af8cfa3709d04311c464fcfa0fdbe7bf405c57701c141b3b_20544", "essay": "Essay text", "label": "ai"}

    Where:

    • id: id of the text
    • essay: essay written by human or ai
    • label: either ai or human
    Example

    Scorer and Official Evaluation Metrics

    Scorers

    The scorer for the task is located in the scorer module of the project. The scorer will report official evaluation metrics and other metrics of a prediction file. The scorer invokes the format checker for the task to verify the output is properly shaped. It also handles checking if the provided predictions file contains all tweets from the gold one.

    You can install all prerequisites through,

    pip install -r requirements.txt

    Launch the scorer for the task as follows:

    python scorer/task.py --gold-file-path=<path_gold_file> --pred-file-path=<predictions_file>
    Example
    python scorer/task.py --pred_files_path task_dev_output.txt --gold_file_path data/academic_essay_dev.jsonl

    Official Evaluation Metrics

    The official evaluation metric for the task is macro-F1. However, the scorer also reports accuracy, precision and recall.

    Baselines

    The baselines module currently contains a majority, random and a simple n-gram baseline.

    Example

    python baselines/task.py -t data/academic_essay_train.jsonl -d data/academic_essay_dev_test.jsonl

    Format checker

    The format checkers for the task are located in the format_checker module of the project. The format checker verifies that your generated results file complies with the expected format.

    Before running the format checker please install all prerequisites,

    pip install -r requirements.txt

    To launch it, please run the following command:

    python format_checker/task.py -p results_files
    Example
    python format_checker/task.py -p ./task.txt

    results_files: can be one path or space-separated list of paths

    Note that the checker cannot verify whether the prediction file you submit contains all lines, because it does not have access to the corresponding gold file.

    Submission

    Guidelines

    Evaluation consists of two phases:

    1. Development phase: This phase involves working on the dev-test set.
    2. Evaluation phase: This phase involves working on the test set, which will be released during the evaluation cycle.

    For each phase, please adhere to the following guidelines:

    • We request each team to establish and manage a single account for all submissions. Hence, all runs should be submitted through the same account. Any submissions made from multiple accounts by the same team may lead to your system being not ranked from the final ranking in the overview paper.
    • The most recently uploaded file on the leaderboard will serve as your final submission.
    • Adhere strictly to the naming convention for the output file, which must be labeled as 'task.tsv'. Deviation from this standard could trigger an error on the leaderboard.
    • Submission protocol requires you to compress the '.tsv' file into a '.zip' file (for instance, zip task.zip task.tsv) and submit it through the Codalab page.
    • With each submission, ensure to include your team name along with a brief explanation of your methodology.
    • Each team is allowed a maximum of 50 submissions per day for the given task. Please adhere to this limit.

    Submission Format

    Submission file format is tsv (tab separated values). A row within the tsv adheres to the following structure:

    id  label

    Where:

    • id: a id of the text
    • label: either ai or human

    Submission Site

    Please use the link below to submit your system.

    codalab submission system

    Licensing

    ETS essay can only be used only for the shared task. Please accept the license agreement.

    Contact

    Slack Channel: join
    Email: genai-content-detection@googlegroups.com

    Communication

    Slack Channel: join
    Email: genai-content-detection@googlegroups.com

    Organizers