vllm.v1.kv_offload.spec ¶
   OffloadingSpec ¶
  Bases: ABC
Spec for an offloading connector
Source code in vllm/v1/kv_offload/spec.py
   offloaded_block_size  instance-attribute  ¶
 offloaded_block_size = int(
    get("block_size", gpu_block_size)
)
  __init__ ¶
 __init__(vllm_config: VllmConfig)
Source code in vllm/v1/kv_offload/spec.py
   get_handlers  abstractmethod  ¶
 get_handlers(
    kv_caches: dict[str, Tensor],
) -> Iterator[
    tuple[
        type[LoadStoreSpec],
        type[LoadStoreSpec],
        OffloadingHandler,
    ]
]
Get offloading handlers along with their respective src and dst types.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
 kv_caches  |   dict[str, Tensor]  |    A dictionary of layer_name -> gpu_kv_cache tensor.  |  required | 
Yields:
| Type | Description | 
|---|---|
 tuple[type[LoadStoreSpec], type[LoadStoreSpec], OffloadingHandler]  |    Tuples of (src_type, dst_type, offloading_handler).  |  
Source code in vllm/v1/kv_offload/spec.py
   get_manager  abstractmethod  ¶
 get_manager() -> OffloadingManager
Get an OffloadingManager that will be used by the scheduler-side offloading connector to track offloaded blocks and manage evictions.