Use asyncio to concurrently request for completions. (!48) · Merge requests · GitLab.org / ModelOps / AI Model Validation and Research / AI Evaluation / Prompt Library

Hongtao Yang requested to merge async_request into more-vertex-models Sep 05, 2023

What does this merge request do and why?

This MR uses asyncio to concurrently request for completions. As we add more models, sequentially request completion from different models using blocking calls will take too long. This MR non-blocking calls to significantly speed up the pipeline.

This is not to replace a dedicated evaluation harness, but as a temporary solution before the harness come to shape.

How to set up and validate locally

To see the speed up using concurrency, run the following script:

import asyncio
from time import perf_counter

from promptlib.completion.vertex_ai_models import (
    VertexModel,
    get_batch_completions,
    get_completion,
)

batch_prefix = ["def hello_world():"] * 10
batch_suffix = [None] * len(batch_prefix)

# sync calls
before_time = perf_counter()
results = []
for prefix in batch_prefix:
    results.append(
        get_completion(
            model_name=VertexModel.CODE_GECKO,
            prefix=prefix,
        )
    )
print(results)
print(f"Total time (synchronous): {perf_counter() - before_time}")


# async calls
before_time = perf_counter()
results = asyncio.run(
    get_batch_completions(
        model_name=VertexModel.CODE_GECKO,
        batch_prefix=batch_prefix,
        batch_suffix=batch_suffix,
    )
)
print(results)
print(f"Total time (asynchronous): {perf_counter() - before_time}")

On my local machine, the concurrent calls offers 10x speedup.

Merge request checklist

Tests added for new functionality. If not, please raise an issue to follow up.
Documentation added/updated, if needed.

Edited Sep 05, 2023 by Hongtao Yang

Use asyncio to concurrently request for completions.

What does this merge request do and why?

How to set up and validate locally

Merge request checklist

Merge request reports