Skip to content

Inference Server MVC

Hongtao Yang requested to merge inference/inference-server-mvc into main

What is this MR?

This MR adds a simple fastapi endpoint to get the suggested code using a model (our finetuned model or original codegen)

Currently we can use curl to call this endpoint. We need to login to GCE VM via ssh, then

curl -d @request.json  -H 'Content-Type: application/json' http://localhost:8000/

An example request.json will look like:

{
    "prompt": {
        "content": "\nclass AboutArrays < Neo::Koan\ndef test_creating_arrays\n    empty_array = Array.new\n    assert_equal __(Array), empty_array.class\n    haha this line is not ruby"
    },
    "model_config": {
        "model_name": "Salesforce/codegen-350M-multi",
        "state_dict_path": "model.pth",
        "max_new_tokens": 30,
        "tokenizer_name": "Salesforce/codegen-350M-multi"
    }
}

Why this MR?

Please refer to ai-assist#32 (closed) for the motivation.


ref: ai-assist#32 (closed)

Merge request reports