vllm.v1.attention.backends.mamba_attn ¶
   BaseMambaAttentionMetadataBuilder ¶
  Bases: AttentionMetadataBuilder[M], ABC
Source code in vllm/v1/attention/backends/mamba_attn.py
   cudagraph_support  class-attribute  ¶
 cudagraph_support: AttentionCGSupport = (
    UNIFORM_SINGLE_TOKEN_DECODE
)
  decode_cudagraph_max_bs  instance-attribute  ¶
 decode_cudagraph_max_bs = min(
    max_num_seqs, max_cudagraph_capture_size
)
  state_indices_tensor  instance-attribute  ¶
 state_indices_tensor = empty(
    (decode_cudagraph_max_bs,), dtype=int32, device=device
)
  __init__ ¶
 __init__(
    kv_cache_spec: AttentionSpec,
    layer_names: list[str],
    vllm_config: VllmConfig,
    device: device,
)
Source code in vllm/v1/attention/backends/mamba_attn.py
   build_for_cudagraph_capture ¶
 build_for_cudagraph_capture(
    common_attn_metadata: CommonAttentionMetadata,
) -> M
This method builds the metadata for full cudagraph capture. Currently, only decode is supported for full cudagraphs with Mamba.