Issue to summarize results from Review publishing meeting
Problem to solve
- Security problems of publishing process (mlreef.yml file is in user's repo: visible URL's, credentials etc).
- How to support GPU and CPU execution (GPU Hima solution: new base image)
- How to manage versions and publishing triggers (latest/branches/tags, manual/automatic).
- How to save the versions of published data processor in a efficient way (currently overwriting must be changed).
User experience goal
- User should be the one to decide when to publish manually, choosing the branch. No automatic publishing will be allowed, because it takes to much resources.
- The version will be saved as docker layers, so we can save disk space and also have all the older versions available.
- The .mlreef.yml file that is now in the repo when publishing should be private(mlreef usage) and the user should not be able to modify it or see its contents to prevent security issues and commit conflicts. (Possible API pipeline trigger)
- The user should be able to cancel the publishing after it started. In order to do this we will use the commit hash to find the pipeline (Herman will check how to solve this)
- The base images that we are going to support now will be in a backend folder.
- We now have to support two types of experiment image, CPU and GPU, the backend will decide which one to use according to the machine type of the model or data operation that will be executed.
- To ensure nautilus compatibility we have to define that the default setting is CPU for experiment image.
- From now on we will have two types of runners, GPU and CPU. We have to take some decisions about how this is going to change the current workflow. (Pending in the discussion, Hima will explain how backend can be informed about the amount of GPU runners available) 8.1. Nautilus can have a variable to define the amount of GPU's available.
Versioning
@si-ge-st : Here is the versioning description
- The user will provide a tag when he publishes a new version from master.
- MLReef will keep versions of master, maximum 10 latest versions (several versions available to the public).
- The branches can also be published but the pushes will rewrite the branch image (one version per branch only visible to the user that published the code, except for master).
- The tag for publishing a new image will be filled with a default value, but the user can modify it (what is a default value?).
- The tag provided by the user should be used as tag for the commit. So the user can recover the commit and the image that goes with it.
The user can only modify through decorators:
- data processor: command, description
- parameters
Attributes that are immutable and belong to the code project:
- visibility
- author
- input_type
- output_type
- ModelType
- MLCategory
@erika.torres Please sort these params: @si-ge-st it's ready.
- Environment: When the user publishes manually he can change the environment.
- Main script path: if the user publishes manually he has to choose the entry point each time that he publishes code.
Also remember please the metadata that we haven't now but we need to implement during publish/republish. eg. CPU/GPU type maybe Note: This info comes with the base environment.
Proposal for Technical Solution
Permissions and Security
Documentation
Major/Significant changes for FE
/api/v1/data-processors
- Processors cannot be search by input and output types currently - this attribute belongs to Code project - use Marketplace search instead
Availability, Testing & Test Cases
What does success look like, and how can we measure that?
Additional Notes
What is the type of buyer?
Is this a cross-stage feature?
Links / references
Edited by German Sidorenko