vllm.entrypoints.api_server ¶
 NOTE: This API server is used only for demonstrating usage of AsyncEngine and simple performance benchmarks. It is not intended for production use. For production use, we recommend using our OpenAI compatible server. We are also not going to accept PRs modifying this file, please change vllm/entrypoints/openai/api_server.py instead.
  _generate  async  ¶
 _generate(
    request_dict: dict, raw_request: Request
) -> Response
Source code in vllm/entrypoints/api_server.py
   generate  async  ¶
  Generate completion for the request.
The request should be a JSON object with the following fields: - prompt: the prompt to use for the generation. - stream: whether to stream the results or not. - other fields: the sampling parameters (See SamplingParams for details).
Source code in vllm/entrypoints/api_server.py
   health  async  ¶
     init_app  async  ¶
 init_app(
    args: Namespace,
    llm_engine: AsyncLLMEngine | None = None,
) -> FastAPI
Source code in vllm/entrypoints/api_server.py
   run_server  async  ¶
 run_server(
    args: Namespace,
    llm_engine: AsyncLLMEngine | None = None,
    **uvicorn_kwargs: Any,
) -> None