vllm.model_executor.layers.fused_moe.prepare_finalize ¶
   MoEPrepareAndFinalizeNoEP ¶
  Bases: FusedMoEPrepareAndFinalize
Source code in vllm/model_executor/layers/fused_moe/prepare_finalize.py
   finalize ¶
 finalize(
    output: Tensor,
    fused_expert_output: Tensor,
    topk_weights: Tensor,
    topk_ids: Tensor,
    apply_router_weight_on_input: bool,
    weight_and_reduce_impl: TopKWeightAndReduce,
) -> None
Source code in vllm/model_executor/layers/fused_moe/prepare_finalize.py
   prepare ¶
 prepare(
    a1: Tensor,
    topk_weights: Tensor,
    topk_ids: Tensor,
    num_experts: int,
    expert_map: Tensor | None,
    apply_router_weight_on_input: bool,
    quant_config: FusedMoEQuantConfig,
) -> PrepareResultType