Get completions on the Triton backend side (!41) · Merge requests · GitLab.org / ModelOps / AI Assisted (formerly Applied ML) / Code Suggestions / AI Gateway

The existing way to obtain completion sometimes causes a bug described in #28 (closed):

decode the entire model output (including prompt) on the Triton backend side and return it back to model-gateway
use the prompt length to obtain the generated completion on the model-gateway side

With the following changes, we obtain the completions on the Triton backend side before decoding the sequence of IDs into the corresponding tokens.

Edited Jan 26, 2023 by Alexander Chueshev

Get completions on the Triton backend side