vllm.transformers_utils.configs.radio ¶
 Radio vision model configuration
  VIT_TIMM_DIM_BY_NAME  module-attribute  ¶
 VIT_TIMM_DIM_BY_NAME: dict[
    str, tuple[int, int, int, int]
] = {
    "vit_small_patch16_224": (384, 12, 6, 1536),
    "vit_base_patch16_224": (768, 12, 12, 3072),
    "vit_large_patch16_224": (1024, 24, 16, 4096),
    "vit_huge_patch16_224": (1280, 32, 16, 5120),
}
  RadioConfig ¶
  Bases: PretrainedConfig
This is the configuration class to store the configuration of a Radio vision model. It is used to instantiate a Radio model according to the specified arguments, defining the model architecture.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
 model_name  |   str  |    Name of the vision transformer model (e.g., "vit_base_patch16_224"). Used to determine architecture dimensions from   |  required | 
 image_size  |   int  |    The size (resolution) of each image.  |   224  |  
 patch_size  |   int  |    The size (resolution) of each patch.  |   16  |  
 qkv_bias  |   bool  |    Whether to add a bias to the queries, keys and values.  |   True  |  
 qk_normalization  |   bool  |    Whether to apply normalization to queries and keys.  |   False  |  
 norm_type  |   str  |    The normalization type to use.  |   'layer_norm'  |  
 layer_norm_eps  |   float  |    The epsilon used by the layer normalization layers.  |   1e-06  |  
 initializer_factor  |   float  |    A factor for initializing all weight matrices.  |   1.0  |  
 hidden_act  |   str  |    The non-linear activation function in the encoder.  |   'gelu'  |  
 max_img_size  |   int  |    Maximum image size for position embeddings.  |   2048  |  
 norm_mean  |   tuple[float, float, float] | list  |    Mean values for image normalization (RGB channels). Defaults to (0.48145466, 0.4578275, 0.40821073)).  |   OPENAI_CLIP_MEAN  |  
 norm_std  |   tuple[float, float, float] | list  |    Standard deviation values for image normalization (RGB channels). Defaults to (0.26862954, 0.26130258, 0.27577711)).  |   OPENAI_CLIP_STD  |  
 reg_tokens  |   int | None  |    Number of register tokens to use.  |   None  |  
Source code in vllm/transformers_utils/configs/radio.py
   norm_mean  instance-attribute  ¶
 norm_mean = (
    list(norm_mean)
    if isinstance(norm_mean, (tuple, list))
    else norm_mean
)
  norm_std  instance-attribute  ¶
 norm_std = (
    list(norm_std)
    if isinstance(norm_std, (tuple, list))
    else norm_std
)
  __init__ ¶
 __init__(
    model_name: str,
    image_size: int = 224,
    patch_size: int = 16,
    qkv_bias: bool = True,
    qk_normalization: bool = False,
    norm_type: str = "layer_norm",
    layer_norm_eps: float = 1e-06,
    initializer_factor: float = 1.0,
    hidden_act: str = "gelu",
    max_img_size: int = 2048,
    norm_mean: tuple[float, float, float]
    | list = OPENAI_CLIP_MEAN,
    norm_std: tuple[float, float, float]
    | list = OPENAI_CLIP_STD,
    reg_tokens: int | None = None,
    **kwargs,
)