Inference Server MVC
What is this MR?
This MR adds a simple fastapi endpoint to get the suggested code using a model (our finetuned model or original codegen)
Currently we can use curl
to call this endpoint. We need to login to GCE VM via ssh
, then
curl -d @request.json -H 'Content-Type: application/json' http://localhost:8000/
An example request.json
will look like:
{
"prompt": {
"content": "\nclass AboutArrays < Neo::Koan\ndef test_creating_arrays\n empty_array = Array.new\n assert_equal __(Array), empty_array.class\n haha this line is not ruby"
},
"model_config": {
"model_name": "Salesforce/codegen-350M-multi",
"state_dict_path": "model.pth",
"max_new_tokens": 30,
"tokenizer_name": "Salesforce/codegen-350M-multi"
}
}
Why this MR?
Please refer to ai-assist#32 (closed) for the motivation.