Skip to content

Get completions on the Triton backend side

Alexander Chueshev requested to merge fix-completions-length into main

The existing way to obtain completion sometimes causes a bug described in #28 (closed):

  • decode the entire model output (including prompt) on the Triton backend side and return it back to model-gateway
  • use the prompt length to obtain the generated completion on the model-gateway side

With the following changes, we obtain the completions on the Triton backend side before decoding the sequence of IDs into the corresponding tokens.

Ref: #28 (closed)

Edited by Alexander Chueshev

Merge request reports