vllm.model_executor.models.transformers.utils ¶
 Transformers backend utilities.
  Style  module-attribute  ¶
 Style = Literal[
    "colwise",
    "colwise_rep",
    "rowwise",
    "rowwise_rep",
    "replicate",
]
  can_enable_torch_compile ¶
 can_enable_torch_compile(vllm_config: VllmConfig) -> bool
Callable to be passed to @support_torch_compile's enable_if argument.
Defaults to True but is disabled in the following situations:
- The model uses dynamic rope scaling.
 
Source code in vllm/model_executor/models/transformers/utils.py
   get_feature_request_tip ¶
  Source code in vllm/model_executor/models/transformers/utils.py
   init_on_device_without_buffers ¶
 init_on_device_without_buffers(device: device)
A context manager under which models are initialized with all parameters on the specified device. However buffers are not initialized on specified device.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
 device  |   `torch.device`  |    Device to initialize all parameters on.  |  required | 
Source code in vllm/model_executor/models/transformers/utils.py
   log_replacement ¶
     replace_linear_class ¶
 replace_linear_class(
    linear: Linear,
    style: Style = "replicate",
    quant_config: QuantizationConfig | None = None,
    *,
    prefix: str = "",
) -> (
    ColumnParallelLinear
    | RowParallelLinear
    | ReplicatedLinear
)
Replace nn.Linear with one of vLLM's tensor parallel linear classes.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
 linear  |   Linear  |    
  |  required | 
 style  |   Style  |    Tensor parallel style of the new linear, e.g. "colwise".  |   'replicate'  |  
 quant_config  |   QuantizationConfig | None  |    Quantization config for the new linear.  |   None  |  
Returns: The new linear.
Source code in vllm/model_executor/models/transformers/utils.py
   replace_rms_norm_class ¶
  Replace a Transformers RMSNorm with vLLM's RMSNorm.
This method assumes: - Weight is stored as weight. - Epsilon is stored as eps or variance_epsilon. - with_scale indicates whether the layer has a weight (Gemma3n only). - var_hidden_size is only ever used for Intern vision encoder in vLLM and Transformers doesn't appear to have the same concept.