vllm.model_executor.layers.quantization.qutlass_utils ¶
   ceil_div ¶
     to_blocked ¶
  Rearrange a large matrix by breaking it into blocks and applying the rearrangement pattern.
See
https://docs.nvidia.com/cuda/cublas/index.html#d-block-scaling-factors-layout
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
 input_matrix  |   Tensor  |    Input tensor of shape (H, W)  |  required | 
 backend  |   Literal['torch', 'triton']  |    "torch" (PyTorch path) or "triton" (Triton kernel)  |   'triton'  |  
Returns:
| Type | Description | 
|---|---|
 Tensor  |    Rearranged tensor of shape (32ceil_div(H,128), 16ceil_div(W,4))  |  
Source code in vllm/model_executor/layers/quantization/qutlass_utils.py
   triton_mx_block_rearrange ¶
  Rearranges an E8M0 tensor scale from row-major format to block-scaled swizzle format.
This format is suitable for Tmem as described in NVIDIA documentation: https://docs.nvidia.com/cuda/cublas/index.html#d-block-scaling-factors-layout
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
 scale_tensor  |   Tensor  |    Input tensor in row-major format with 8-bit elements  |  required | 
Returns:
| Type | Description | 
|---|---|
 Tensor  |    Rearranged tensor in block-scaled swizzle format  |  
Source code in vllm/model_executor/layers/quantization/qutlass_utils.py
   triton_scale_swizzle ¶
 triton_scale_swizzle(
    scale_ptr: Tensor,
    scale_rows: int,
    scale_cols: int,
    output_ptr: Tensor,
    input_row_stride: int,
    output_block_stride: int,
    BLOCK_ROWS: constexpr,
    BLOCK_COLS: constexpr,
)
Rearranges tensor data from row-major to block-scaled swizzle format.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
 scale_ptr  |   Tensor  |    Pointer to the input scale tensor  |  required | 
 scale_rows  |   int  |    Number of rows in the scale tensor  |  required | 
 scale_cols  |   int  |    Number of columns in the scale tensor  |  required | 
 output_ptr  |   Tensor  |    Pointer to the output tensor  |  required | 
 input_row_stride  |   int  |    Stride between rows in the input tensor  |  required | 
 output_block_stride  |   int  |    Stride between blocks in the output tensor  |  required | 
 BLOCK_ROWS  |   constexpr  |    Number of rows in a tile (compile-time constant)  |  required | 
 BLOCK_COLS  |   constexpr  |    Number of columns in a tile (compile-time constant)  |  required |