Skip to content

Use asyncio to concurrently request for completions.

Hongtao Yang requested to merge async_request into more-vertex-models

What does this merge request do and why?

This MR uses asyncio to concurrently request for completions. As we add more models, sequentially request completion from different models using blocking calls will take too long. This MR non-blocking calls to significantly speed up the pipeline.

This is not to replace a dedicated evaluation harness, but as a temporary solution before the harness come to shape.

How to set up and validate locally

To see the speed up using concurrency, run the following script:

import asyncio
from time import perf_counter

from promptlib.completion.vertex_ai_models import (
    VertexModel,
    get_batch_completions,
    get_completion,
)

batch_prefix = ["def hello_world():"] * 10
batch_suffix = [None] * len(batch_prefix)

# sync calls
before_time = perf_counter()
results = []
for prefix in batch_prefix:
    results.append(
        get_completion(
            model_name=VertexModel.CODE_GECKO,
            prefix=prefix,
        )
    )
print(results)
print(f"Total time (synchronous): {perf_counter() - before_time}")


# async calls
before_time = perf_counter()
results = asyncio.run(
    get_batch_completions(
        model_name=VertexModel.CODE_GECKO,
        batch_prefix=batch_prefix,
        batch_suffix=batch_suffix,
    )
)
print(results)
print(f"Total time (asynchronous): {perf_counter() - before_time}")

On my local machine, the concurrent calls offers 10x speedup.

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Hongtao Yang

Merge request reports