Metric: Performance Metrics
Overview
We have added quality metrics to Prompt Library. This issue is to explore performance metrics.
Performance Metrics for LLM Evaluation
Requests per Second (Concurrency)
- Definition: Number of requests processed by the LLM per second.
- Purpose: Measures the capability of the LLM to handle multiple requests simultaneously, indicating its scalability and performance under load.
Tokens per Second
- Definition: Counts the tokens rendered per second during LLM response streaming.
- Purpose: Assesses the speed and efficiency of the LLM in generating and streaming responses, which is critical for user experience in real-time applications.
Time to First Token Render
- Definition: Time to first token render from submission of the user prompt, measured at multiple percentiles.
- Purpose: Evaluates the responsiveness of the LLM, providing insights into how quickly it can begin delivering a response after receiving a user prompt.
Error Rate
- Definition: Error rate for different types of errors such as 401 error, 429 error.
- Purpose: Tracks the frequency and types of errors encountered, which helps in diagnosing issues and improving the reliability and stability of the LLM.
Reliability
- Definition: The percentage of successful requests compared to total requests, including those with errors or failures.
- Purpose: Measures the overall reliability of the LLM, indicating its robustness and dependability in handling user requests without failures.
Latency
- Definition: The average duration of processing time between the submission of a request query and the receipt of a response.
- Purpose: Reflects the efficiency of the LLM in processing and responding to queries, which is crucial for maintaining a smooth and responsive user interaction.
- Status: - Done
These metrics collectively provide a comprehensive overview of the performance, reliability, and efficiency of the LLM, guiding improvements and ensuring optimal user experience.
Edited by Mon Ray