Mutable resources are problematic for daily production evaluation and local evaluation
Problem to solve
Originally raised in https://gitlab.slack.com/archives/C05CJ1T3P0W/p1709605515696909
dev-ai-research-0e2f8974.duo_chat.chat_dataset_2_v1
is currently directly referring to an URL of resource on gitlab.com e.g. https://gitlab.com/gitlab-org/gitlab/-/issues/371038
, but when I want to clone the seeding data (i.e. issues-371038) into my local GDK, I need to import/export the entire gitlab-org group, which takes long time due to tons of resources (mostly unrelated to evaluation) under the group/project.
In addition, we need to reconcile it periodically to make sure that the local evaluation environment is same with production evaluation environment. For example, when the issue-371038 is updated on gitlab.com, there is a chance that the local evaluation produces a different result due to the outdated resource. In fact, gitlab-org&10814 is used in daily production evaluation, but the same data existing in the local dataset is outdated. See https://gitlab.slack.com/archives/C05CJ1T3P0W/p1709606238876829?thread_ts=1709605515.696909&cid=C05CJ1T3P0W for more information.
This also means that the daily evaluation score could be affected by these resource modification. e.g. gitlab-org&10814 was updated 5 hours ago and it will more or less impact the eval result.
We should lock down and version these resources to consistently reproduce the evaluation score.
Proposal
- Create a new group
https://gitlab.com/gitlab-ai-evaluation
. This is the top-level group. Do NOT put any resources at this level. - Create a new subgroup
https://gitlab.com/gitlab-ai-evaluation/duo-chat
. Put group-level resources, such as Epics, for Duo Chat evaluations. - Create a new project
https://gitlab.com/gitlab-ai-evaluation/duo-chat/test
. Put project-level resources, such as Issues and Code, for Duo Chat evaluations - Update
resource_url
column ofdev-ai-research-0e2f8974.duo_chat.chat_dataset_2_v1
to point to the new location.
To illustrate:
gitlab-ai-evaluation (Group) / ... Top-level group. Do NOT put any resources at this level.
duo-chat (Sub-Group) / ... Put Epics for Duo Chat evaluations
test (Project) ... Put Issues and Code for Duo Chat evaluations
NOTE:
- Do NOT use
https://gitlab.com/gitlab-org
group. This contains tons of group-level resources that are unrelated to Duo Chat evaluation. - Do NOT use
https://gitlab.com/gitlab-org/gitlab
project. This contains tons of project-level resources that are unrelated to Duo Chat evaluation.