vllm.v1.utils ¶
   APIServerProcessManager ¶
 Manages a group of API server processes.
Handles creation, monitoring, and termination of API server worker processes. Also monitors extra processes to check if they are healthy.
Source code in vllm/v1/utils.py
   __init__ ¶
 __init__(
    target_server_fn: Callable,
    listen_address: str,
    sock: Any,
    args: Namespace,
    num_servers: int,
    input_addresses: list[str],
    output_addresses: list[str],
    stats_update_address: str | None = None,
)
Initialize and start API server worker processes.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
 target_server_fn  |   Callable  |    Function to call for each API server process  |  required | 
 listen_address  |   str  |    Address to listen for client connections  |  required | 
 sock  |   Any  |    Socket for client connections  |  required | 
 args  |   Namespace  |    Command line arguments  |  required | 
 num_servers  |   int  |    Number of API server processes to start  |  required | 
 input_addresses  |   list[str]  |    Input addresses for each API server  |  required | 
 output_addresses  |   list[str]  |    Output addresses for each API server  |  required | 
 stats_update_address  |   str | None  |    Optional stats update address  |   None  |  
Source code in vllm/v1/utils.py
   ConstantList ¶
 Source code in vllm/v1/utils.py
   CpuGpuBuffer ¶
 Buffer to easily copy tensors between CPU and GPU.
Source code in vllm/v1/utils.py
   __init__ ¶
 __init__(
    *size: int | SymInt,
    dtype: dtype,
    device: device,
    pin_memory: bool,
    with_numpy: bool = True,
) -> None
Source code in vllm/v1/utils.py
   copy_to_cpu ¶
  NOTE: Because this method is non-blocking, explicit synchronization is needed to ensure the data is copied to CPU.
Source code in vllm/v1/utils.py
   copy_slice ¶
  Copy the first length elements of a tensor into another tensor in a non-blocking manner.
Used to copy pinned CPU tensor data to pre-allocated GPU tensors.
Returns the sliced target tensor.
Source code in vllm/v1/utils.py
   get_engine_client_zmq_addr ¶
  Assign a new ZMQ socket address.
If local_only is True, participants are colocated and so a unique IPC address will be returned.
Otherwise, the provided host and port will be used to construct a TCP address (port == 0 means assign an available port).
Source code in vllm/v1/utils.py
   record_function_or_nullcontext ¶
 record_function_or_nullcontext(
    name: str,
) -> AbstractContextManager
Source code in vllm/v1/utils.py
   report_usage_stats ¶
 report_usage_stats(
    vllm_config,
    usage_context: UsageContext = ENGINE_CONTEXT,
) -> None
Report usage statistics if enabled.
Source code in vllm/v1/utils.py
   shutdown ¶
 shutdown(procs: list[BaseProcess])
Source code in vllm/v1/utils.py
   tensor_data ¶
 tensor_data(tensor: Tensor) -> memoryview
Get the raw data of a tensor as a uint8 memoryview, useful for serializing and hashing.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
 tensor  |   Tensor  |    The input tensor.  |  required | 
Returns:
| Type | Description | 
|---|---|
 memoryview  |    A memoryview of the tensor data as uint8.  |  
Source code in vllm/v1/utils.py
   wait_for_completion_or_failure ¶
 wait_for_completion_or_failure(
    api_server_manager: APIServerProcessManager,
    engine_manager: Union[
        CoreEngineProcManager, CoreEngineActorManager
    ]
    | None = None,
    coordinator: Optional[DPCoordinator] = None,
) -> None
Wait for all processes to complete or detect if any fail.
Raises an exception if any process exits with a non-zero status.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
 api_server_manager  |   APIServerProcessManager  |    The manager for API servers.  |  required | 
 engine_manager  |   Union[CoreEngineProcManager, CoreEngineActorManager] | None  |    The manager for engine processes. If CoreEngineProcManager, it manages local engines; if CoreEngineActorManager, it manages all engines.  |   None  |  
 coordinator  |   Optional[DPCoordinator]  |    The coordinator for data parallel.  |   None  |