vllm.model_executor.warmup.deep_gemm_warmup ¶
 Warmup deep_gemm kernels. DeepGEMM JIT's the kernels. The warmup aims to JIT all the kernels that would be used during model execution beforehand.
  GROUPED_FP8_GEMM_NT_CONTIGUOUS_WARMUP_CACHE  module-attribute  ¶
    _deepgemm_fp8_gemm_nt_warmup ¶
  Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
   _deepgemm_grouped_fp8_gemm_nt_contiguous_warmup ¶
 _deepgemm_grouped_fp8_gemm_nt_contiguous_warmup(
    w1: Tensor,
    w2: Tensor,
    w1_scale: Tensor,
    w2_scale: Tensor,
    num_topk: int,
    max_tokens: int,
)
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
   _extract_data_from_fused_moe_module ¶
  Extract weights, weight scales and num_topk from FusedMoE module.
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
   _extract_data_from_linear_base_module ¶
  Extract weights, weight scales and quantization block sizes from the given LinearBase module.
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
   _fp8_linear_may_use_deep_gemm ¶
  Return True if the input module/layer could be processed with DeepGEMM.
Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
   _fused_moe_grouped_gemm_may_use_deep_gemm ¶
  Source code in vllm/model_executor/warmup/deep_gemm_warmup.py
   _generate_optimal_warmup_m_values ¶
  Generate M values that cover all possible DeepGEMM kernel configurations. Reference: https://github.com/deepseek-ai/DeepGEMM/blob/79f48ee15a82dd5fad5cd9beaa393c1f755e6b55/csrc/jit_kernels/heuristics/common.hpp
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
 max_tokens  |   int  |    Maximum number of tokens to warmup for  |  required | 
 n  |   int  |    The actual N dimension from the weight tensor  |  required | 
 device  |   device  |    The torch device to get properties from.  |  required |