Shortcuts

mmseg.apis

class mmseg.apis.MMSegInferencer(model: Union[dict, mmengine.config.config.Config, mmengine.config.config.ConfigDict, str], weights: Optional[str] = None, classes: Optional[Union[str, List]] = None, palette: Optional[Union[str, List]] = None, dataset_name: Optional[str] = None, device: Optional[str] = None, scope: Optional[str] = 'mmseg')[source]

Semantic segmentation inferencer, provides inference and visualization interfaces. Note: MMEngine >= 0.5.0 is required.

Parameters
  • model (str, optional) – Path to the config file or the model name defined in metafile. Take the mmseg metafile as an example the model could be “fcn_r50-d8_4xb2-40k_cityscapes-512x1024”, and the weights of model will be download automatically. If use config file, like “configs/fcn/fcn_r50-d8_4xb2-40k_cityscapes-512x1024.py”, the weights should be defined.

  • weights (str, optional) – Path to the checkpoint. If it is not specified and model is a model name of metafile, the weights will be loaded from metafile. Defaults to None.

  • classes (list, optional) – Input classes for result rendering, as the prediction of segmentation model is a segment map with label indices, classes is a list which includes items responding to the label indices. If classes is not defined, visualizer will take cityscapes classes by default. Defaults to None.

  • palette (list, optional) – Input palette for result rendering, which is a list of color palette responding to the classes. If palette is not defined, visualizer will take cityscapes palette by default. Defaults to None.

  • dataset_name (str, optional) – Dataset name or alias visulizer will use the meta information of the dataset i.e. classes and palette, but the classes and palette have higher priority. Defaults to None.

  • device (str, optional) – Device to run inference. If None, the available device will be automatically used. Defaults to None.

  • scope (str, optional) – The scope of the model. Defaults to ‘mmseg’.

postprocess(preds: Union[mmseg.structures.seg_data_sample.SegDataSample, Sequence[mmseg.structures.seg_data_sample.SegDataSample]], visualization: List[numpy.ndarray], return_datasample: bool = False, pred_out_dir: str = '')dict[source]

Process the predictions and visualization results from forward and visualize.

This method should be responsible for the following tasks:

  1. Pack the predictions and visualization results and return them.

  2. Save the predictions, if it needed.

Parameters
  • preds (List[Dict]) – Predictions of the model.

  • visualization (List[np.ndarray]) – The list of rendering color segmentation mask.

  • return_datasample (bool) – Whether to return results as datasamples. Defaults to False.

  • pred_out_dir – File to save the inference results w/o visualization. If left as empty, no file will be saved. Defaults to ‘’.

Returns

Inference and visualization results with key predictions and visualization

  • visualization (Any): Returned by visualize()

  • predictions (List[np.ndarray], np.ndarray): Returned by forward() and processed in postprocess(). If return_datasample=False, it will be the segmentation mask with label indice.

Return type

dict

visualize(inputs: list, preds: List[dict], show: bool = False, wait_time: int = 0, img_out_dir: str = '', opacity: float = 0.8)List[numpy.ndarray][source]

Visualize predictions.

Parameters
  • inputs (list) – Inputs preprocessed by _inputs_to_list().

  • preds (Any) – Predictions of the model.

  • show (bool) – Whether to display the image in a popup window. Defaults to False.

  • wait_time (float) – The interval of show (s). Defaults to 0.

  • img_out_dir (str) – Output directory of rendering prediction i.e. color segmentation mask. Defaults: ‘’

  • opacity (int, float) – The transparency of segmentation mask. Defaults to 0.8.

Returns

Visualization results.

Return type

List[np.ndarray]

mmseg.apis.inference_model(model: mmseg.models.segmentors.base.BaseSegmentor, img: Union[str, numpy.ndarray, Sequence[str], Sequence[numpy.ndarray]])Union[mmseg.structures.seg_data_sample.SegDataSample, Sequence[mmseg.structures.seg_data_sample.SegDataSample]][source]

Inference image(s) with the segmentor.

Parameters
  • model (nn.Module) – The loaded segmentor.

  • imgs (str/ndarray or list[str/ndarray]) – Either image files or loaded images.

Returns

If imgs is a list or tuple, the same length list type results will be returned, otherwise return the segmentation results directly.

Return type

SegDataSample or list[SegDataSample]

mmseg.apis.init_model(config: Union[str, pathlib.Path, mmengine.config.config.Config], checkpoint: Optional[str] = None, device: str = 'cuda:0', cfg_options: Optional[dict] = None)[source]

Initialize a segmentor from config file.

Parameters
  • config (str, Path, or mmengine.Config) – Config file path, Path, or the config object.

  • checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.

  • device (str, optional) – 0’. Use ‘cpu’ for loading model on CPU.

  • cfg_options (dict, optional) – Options to override some settings in the used config.

Returns

The constructed segmentor.

Return type

nn.Module

mmseg.apis.show_result_pyplot(model: mmseg.models.segmentors.base.BaseSegmentor, img: Union[str, numpy.ndarray], result: mmseg.structures.seg_data_sample.SegDataSample, opacity: float = 0.5, title: str = '', draw_gt: bool = True, draw_pred: bool = True, wait_time: float = 0, show: bool = True, save_dir=None, out_file=None)[source]

Visualize the segmentation results on the image.

Parameters
  • model (nn.Module) – The loaded segmentor.

  • img (str or np.ndarray) – Image filename or loaded image.

  • result (SegDataSample) – The prediction SegDataSample result.

  • opacity (float) – Opacity of painted segmentation map. Default 0.5. Must be in (0, 1] range.

  • title (str) – The title of pyplot figure. Default is ‘’.

  • draw_gt (bool) – Whether to draw GT SegDataSample. Default to True.

  • draw_pred (bool) – Whether to draw Prediction SegDataSample. Defaults to True.

  • wait_time (float) – The interval of show (s). 0 is the special value that means “forever”. Defaults to 0.

  • show (bool) – Whether to display the drawn image. Default to True.

  • save_dir (str, optional) – Save file dir for all storage backends. If it is None, the backend storage will not save any data.

  • out_file (str, optional) – Path to output file. Default to None.

Returns

the drawn image which channel is RGB.

Return type

np.ndarray

mmseg.datasets

datasets

class mmseg.datasets.ADE20KDataset(img_suffix='.jpg', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]

ADE20K dataset.

In segmentation map annotation for ADE20K, 0 stands for background, which is not included in 150 categories. reduce_zero_label is fixed to True. The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.AdjustGamma(gamma=1.0)[source]

Using gamma correction to process the image.

Required Keys:

  • img

Modified Keys:

  • img

Parameters

gamma (float or int) – Gamma value used in gamma correction. Default: 1.0.

transform(results: dict)dict[source]

Call function to process the image with gamma correction.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Processed results.

Return type

dict

class mmseg.datasets.BaseSegDataset(ann_file: str = '', img_suffix='.jpg', seg_map_suffix='.png', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img_path': '', 'seg_map_path': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, ignore_index: int = 255, reduce_zero_label: bool = False, backend_args: Optional[dict] = None)[source]

Custom dataset for semantic segmentation. An example of file structure is as followed.

├── data
│   ├── my_dataset
│   │   ├── img_dir
│   │   │   ├── train
│   │   │   │   ├── xxx{img_suffix}
│   │   │   │   ├── yyy{img_suffix}
│   │   │   │   ├── zzz{img_suffix}
│   │   │   ├── val
│   │   ├── ann_dir
│   │   │   ├── train
│   │   │   │   ├── xxx{seg_map_suffix}
│   │   │   │   ├── yyy{seg_map_suffix}
│   │   │   │   ├── zzz{seg_map_suffix}
│   │   │   ├── val

The img/gt_semantic_seg pair of BaseSegDataset should be of the same except suffix. A valid img/gt_semantic_seg filename pair should be like xxx{img_suffix} and xxx{seg_map_suffix} (extension is also included in the suffix). If split is given, then xxx is specified in txt file. Otherwise, all files in img_dir/``and ``ann_dir will be loaded. Please refer to docs/en/tutorials/new_dataset.md for more details.

Parameters
  • ann_file (str) – Annotation file path. Defaults to ‘’.

  • metainfo (dict, optional) – Meta information for dataset, such as specify classes to load. Defaults to None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Defaults to None.

  • data_prefix (dict, optional) – Prefix for training data. Defaults to dict(img_path=None, seg_map_path=None).

  • img_suffix (str) – Suffix of images. Default: ‘.jpg’

  • seg_map_suffix (str) – Suffix of segmentation maps. Default: ‘.png’

  • filter_cfg (dict, optional) – Config for filter data. Defaults to None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.

  • pipeline (list, optional) – Processing pipeline. Defaults to [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Defaults to False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=True. Defaults to False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.

  • ignore_index (int) – The label index to be ignored. Default: 255

  • reduce_zero_label (bool) – Whether to mark label zero as ignored. Default to False.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

classmethod get_label_map(new_classes: Optional[Sequence] = None)Optional[Dict][source]

Require label mapping.

The label_map is a dictionary, its keys are the old label ids and its values are the new label ids, and is used for changing pixel labels in load_annotations. If and only if old classes in cls.METAINFO is not equal to new classes in self._metainfo and nether of them is not None, label_map is not None.

Parameters

new_classes (list, tuple, optional) – The new classes name from metainfo. Default to None.

Returns

The mapping from old classes in cls.METAINFO to

new classes in self._metainfo

Return type

dict, optional

load_data_list()List[dict][source]

Load annotation from directory or annotation file.

Returns

All data info of dataset.

Return type

list[dict]

class mmseg.datasets.BioMedical3DPad(pad_shape: Tuple[int, int, int], pad_val: float = 0.0, seg_pad_val: int = 0)[source]

Pad the biomedical 3d image & biomedical 3d semantic segmentation maps.

Required Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Modified Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Added Keys:

  • pad_shape (Tuple[int, int, int]): The padded shape.

Parameters
  • pad_shape (Tuple[int, int, int]) – Fixed padding size. Expected padding shape (Z, Y, X).

  • pad_val (float) – Padding value for biomedical image. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.

  • seg_pad_val (int) – Padding value for biomedical 3d semantic segmentation maps. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.

transform(results: dict)dict[source]

Call function to pad images, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Updated result dict.

Return type

dict

class mmseg.datasets.BioMedical3DRandomCrop(crop_shape: Union[int, Tuple[int, int, int]], keep_foreground: bool = True)[source]

Crop the input patch for medical image & segmentation mask.

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

  • gt_seg_map (np.ndarray, optional): Biomedical semantic segmentation mask

    with shape (Z, Y, X).

Modified Keys:

  • img

  • img_shape

  • gt_seg_map (optional)

Parameters
  • crop_shape (Union[int, Tuple[int, int, int]]) – Expected size after cropping with the format of (z, y, x). If set to an integer, then cropping width and height are equal to this integer.

  • keep_foreground (bool) – If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the crop bounding-box. Default to True.

crop(img: numpy.ndarray, crop_bbox: tuple)numpy.ndarray[source]

Crop from img

Parameters
  • img (np.ndarray) – Original input image.

  • crop_bbox (tuple) – Coordinates of the cropped image.

Returns

The cropped image.

Return type

np.ndarray

generate_margin(results: dict)tuple[source]

Generate margin of crop bounding-box.

If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the bounding-box, and return the margin between of the bounding-box and image. If keep_foreground is False, it will return the difference from crop shape and image shape.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

The margin for 3 dimensions of crop bounding-box and image.

Return type

tuple

random_generate_crop_bbox(margin_z: int, margin_y: int, margin_x: int)tuple[source]

Randomly get a crop bounding box.

Parameters

seg_map (np.ndarray) – Ground truth segmentation map.

Returns

Coordinates of the cropped image.

Return type

tuple

random_sample_location(seg_map: numpy.ndarray)dict[source]

sample foreground voxel when keep_foreground is True.

Parameters

seg_map (np.ndarray) – gt seg map.

Returns

Coordinates of selected foreground voxel.

Return type

dict

transform(results: dict)dict[source]

Transform function to randomly crop images, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Randomly cropped results, ‘img_shape’ key in result dict is

updated according to crop size.

Return type

dict

class mmseg.datasets.BioMedical3DRandomFlip(prob: float, axes: Tuple[int, ...], swap_label_pairs: Optional[List[Tuple[int, int]]] = None)[source]

Flip biomedical 3D images and segmentations.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/spatial_transforms.py # noqa:E501

Copyright 2021 Division of Medical Image Computing, German Cancer Research Center (DKFZ) and Applied Computer Vision Lab, Helmholtz Imaging Platform. Licensed under the Apache-2.0 License.

Required Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Modified Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Added Keys:

  • do_flip

  • flip_axes

Parameters
  • prob (float) – Flipping probability.

  • axes (Tuple[int, ...]) – Flipping axes with order ‘ZXY’.

  • swap_label_pairs (Optional[List[Tuple[int, int]]]) –

  • segmentation label pairs that are swapped when flipping. (The) –

transform(results: Dict)Dict[source]

Call function to flip and swap pair labels.

Parameters

results (dict) – Result dict.

Returns

Flipped results, ‘do_flip’, ‘flip_axes’ keys are added into

result dict.

Return type

dict

class mmseg.datasets.BioMedicalGaussianBlur(sigma_range: Tuple[float, float] = (0.5, 1.0), prob: float = 0.2, prob_per_channel: float = 0.5, different_sigma_per_channel: bool = True, different_sigma_per_axis: bool = True)[source]

Add Gaussian blur with random sigma to image.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L81 # noqa:E501

Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

Modified Keys:

  • img

Parameters
  • sigma_range (Tuple[float, float]|float) – range to randomly select sigma value. Default to (0.5, 1.0).

  • prob (float) – Probability to apply Gaussian blur for each sample. Default to 0.2.

  • prob_per_channel (float) – Probability to apply Gaussian blur for each channel (axis N of the image). Default to 0.5.

  • different_sigma_per_channel (bool) – whether to use different sigma for each channel (axis N of the image). Default to True.

  • different_sigma_per_axis (bool) – whether to use different sigma for axis Z, X and Y of the image. Default to True.

transform(results: Dict)Dict[source]

Call function to add random Gaussian blur to image.

Parameters

results (dict) – Result dict.

Returns

Result dict with random Gaussian noise.

Return type

dict

class mmseg.datasets.BioMedicalGaussianNoise(prob: float = 0.1, mean: float = 0.0, std: float = 0.1)[source]

Add random Gaussian noise to image.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L53 # noqa:E501

Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

Modified Keys:

  • img

Parameters
  • prob (float) – Probability to add Gaussian noise for each sample. Default to 0.1.

  • mean (float) – Mean or “centre” of the distribution. Default to 0.0.

  • std (float) – Standard deviation of distribution. Default to 0.1.

transform(results: Dict)Dict[source]

Call function to add random Gaussian noise to image.

Parameters

results (dict) – Result dict.

Returns

Result dict with random Gaussian noise.

Return type

dict

class mmseg.datasets.BioMedicalRandomGamma(prob: float = 0.5, gamma_range: Tuple[float] = (0.5, 2), invert_image: bool = False, per_channel: bool = False, retain_stats: bool = False)[source]

Using random gamma correction to process the biomedical image.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/color_transforms.py#L132 # noqa:E501 With licence: Apache 2.0

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

Modified Keys: - img

Parameters
  • prob (float) – The probability to perform this transform. Default: 0.5.

  • gamma_range (Tuple[float]) – Range of gamma values. Default: (0.5, 2).

  • invert_image (bool) – Whether invert the image before applying gamma augmentation. Default: False.

  • per_channel (bool) – Whether perform the transform each channel individually. Default: False

  • retain_stats (bool) – Gamma transformation will alter the mean and std of the data in the patch. If retain_stats=True, the data will be transformed to match the mean and standard deviation before gamma augmentation. Default: False.

transform(results: dict)dict[source]

Call function to perform random gamma correction :param results: Result dict from loading pipeline. :type results: dict

Returns

Result dict with random gamma correction performed.

Return type

dict

class mmseg.datasets.CLAHE(clip_limit=40.0, tile_grid_size=(8, 8))[source]

Use CLAHE method to process the image.

See ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994:474-485. for more information.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • clip_limit (float) – Threshold for contrast limiting. Default: 40.0.

  • tile_grid_size (tuple[int]) – Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. It defines the number of tiles in row and column. Default: (8, 8).

transform(results: dict)dict[source]

Call function to Use CLAHE method process images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Processed results.

Return type

dict

class mmseg.datasets.COCOStuffDataset(img_suffix='.jpg', seg_map_suffix='_labelTrainIds.png', **kwargs)[source]

COCO-Stuff dataset.

In segmentation map annotation for COCO-Stuff, Train-IDs of the 10k version are from 1 to 171, where 0 is the ignore index, and Train-ID of COCO Stuff 164k is from 0 to 170, where 255 is the ignore index. So, they are all 171 semantic categories. reduce_zero_label is set to True and False for the 10k and 164k versions, respectively. The img_suffix is fixed to ‘.jpg’, and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.ChaseDB1Dataset(img_suffix='.png', seg_map_suffix='_1stHO.png', reduce_zero_label=False, **kwargs)[source]

Chase_db1 dataset.

In segmentation map annotation for Chase_db1, 0 stands for background, which is included in 2 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘_1stHO.png’.

class mmseg.datasets.CityscapesDataset(img_suffix='_leftImg8bit.png', seg_map_suffix='_gtFine_labelTrainIds.png', **kwargs)[source]

Cityscapes dataset.

The img_suffix is fixed to ‘_leftImg8bit.png’ and seg_map_suffix is fixed to ‘_gtFine_labelTrainIds.png’ for Cityscapes dataset.

class mmseg.datasets.DRIVEDataset(img_suffix='.png', seg_map_suffix='_manual1.png', reduce_zero_label=False, **kwargs)[source]

DRIVE dataset.

In segmentation map annotation for DRIVE, 0 stands for background, which is included in 2 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘_manual1.png’.

class mmseg.datasets.DarkZurichDataset(img_suffix='_rgb_anon.png', seg_map_suffix='_gt_labelTrainIds.png', **kwargs)[source]

DarkZurichDataset dataset.

class mmseg.datasets.DecathlonDataset(ann_file: str = '', img_suffix='.jpg', seg_map_suffix='.png', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img_path': '', 'seg_map_path': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, ignore_index: int = 255, reduce_zero_label: bool = False, backend_args: Optional[dict] = None)[source]

Dataset for Dacathlon dataset.

The dataset.json format is shown as follows

{
    "name": "BRATS",
    "tensorImageSize": "4D",
    "modality":
    {
        "0": "FLAIR",
        "1": "T1w",
        "2": "t1gd",
        "3": "T2w"
    },
    "labels": {
        "0": "background",
        "1": "edema",
        "2": "non-enhancing tumor",
        "3": "enhancing tumour"
    },
    "numTraining": 484,
    "numTest": 266,
    "training":
    [
        {
            "image": "./imagesTr/BRATS_306.nii.gz"
            "label": "./labelsTr/BRATS_306.nii.gz"
            ...
        }
    ]
    "test":
    [
        "./imagesTs/BRATS_557.nii.gz"
        ...
    ]
}
load_data_list()List[dict][source]

Load annotation from directory or annotation file.

Returns

All data info of dataset.

Return type

list[dict]

class mmseg.datasets.GenerateEdge(edge_width: int = 3, ignore_index: int = 255)[source]

Generate Edge for CE2P approach.

Edge will be used to calculate loss of CE2P.

Modified from https://github.com/liutinglt/CE2P/blob/master/dataset/target_generation.py # noqa:E501

Required Keys:

  • img_shape

  • gt_seg_map

Added Keys:
  • gt_edge_map (np.ndarray, uint8): The edge annotation generated from the

    seg map by extracting border between different semantics.

Parameters
  • edge_width (int) – The width of edge. Default to 3.

  • ignore_index (int) – Index that will be ignored. Default to 255.

transform(results: Dict)Dict[source]

Call function to generate edge from segmentation map.

Parameters

results (dict) – Result dict.

Returns

Result dict with edge mask.

Return type

dict

class mmseg.datasets.HRFDataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=False, **kwargs)[source]

HRF dataset.

In segmentation map annotation for HRF, 0 stands for background, which is included in 2 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.ISPRSDataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]

ISPRS dataset.

In segmentation map annotation for ISPRS, 0 is the ignore index. reduce_zero_label should be set to True. The img_suffix and seg_map_suffix are both fixed to ‘.png’.

class mmseg.datasets.LIPDataset(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]

LIP dataset.

The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.LoadAnnotations(reduce_zero_label=None, backend_args=None, imdecode_backend='pillow')[source]

Load annotations for semantic segmentation provided by dataset.

The annotation format is as the following:

{
    # Filename of semantic segmentation ground truth file.
    'seg_map_path': 'a/b/c'
}

After this module, the annotation has been changed to the format below:

{
    # in str
    'seg_fields': List
     # In uint8 type.
    'gt_seg_map': np.ndarray (H, W)
}

Required Keys:

  • seg_map_path (str): Path of semantic segmentation ground truth file.

Added Keys:

  • seg_fields (List)

  • gt_seg_map (np.uint8)

Parameters
  • reduce_zero_label (bool, optional) – Whether reduce all label value by 1. Usually used for datasets where 0 is background label. Defaults to None.

  • imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :fun:mmcv.imfrombytes for details. Defaults to ‘pillow’.

  • backend_args (dict) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

class mmseg.datasets.LoadBiomedicalAnnotation(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]

Load seg_map annotation provided by biomedical dataset.

The annotation format is as the following:

{
    'gt_seg_map': np.ndarray (X, Y, Z) or (Z, Y, X)
}

Required Keys:

  • seg_map_path

Added Keys:

  • gt_seg_map (np.ndarray): Biomedical seg map with shape (Z, Y, X) by

    default, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.

Parameters
  • decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.

  • to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.

  • to_float32 (bool) – Whether to convert the loaded seg map to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See mmengine.fileio for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.LoadBiomedicalData(with_seg=False, decode_backend: str = 'numpy', to_xyz: bool = False, backend_args: Optional[dict] = None)[source]

Load an biomedical image and annotation from file.

The loading data format is as the following:

{
    'img': np.ndarray data[:-1, X, Y, Z]
    'seg_map': np.ndarray data[-1, X, Y, Z]
}

Required Keys:

  • img_path

Added Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

  • img_shape

  • ori_shape

Parameters
  • with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.

  • decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.

  • to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.LoadBiomedicalImageFromFile(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]

Load an biomedical mage from file.

Required Keys:

  • img_path

Added Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.

  • img_shape

  • ori_shape

Parameters
  • decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.

  • to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.

  • to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.LoadImageFromNDArray(to_float32: bool = False, color_type: str = 'color', imdecode_backend: str = 'cv2', file_client_args: Optional[dict] = None, ignore_empty: bool = False, *, backend_args: Optional[dict] = None)[source]

Load an image from results['img'].

Similar with LoadImageFromFile, but the image has been loaded as np.ndarray in results['img']. Can be used when loading image from webcam.

Required Keys:

  • img

Modified Keys:

  • img

  • img_path

  • img_shape

  • ori_shape

Parameters

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

transform(results: dict)dict[source]

Transform function to add image meta information.

Parameters

results (dict) – Result dict with Webcam read image in results['img'].

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.LoveDADataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]

LoveDA dataset.

In segmentation map annotation for LoveDA, 0 is the ignore index. reduce_zero_label should be set to True. The img_suffix and seg_map_suffix are both fixed to ‘.png’.

class mmseg.datasets.MapillaryDataset_v1(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]

Mapillary Vistas Dataset.

Dataset paper link: http://ieeexplore.ieee.org/document/8237796/

v1.2 contain 66 object classes. (37 instance-specific)

v2.0 contain 124 object classes. (70 instance-specific, 46 stuff, 8 void or crowd).

The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’ for Mapillary Vistas Dataset.

class mmseg.datasets.MapillaryDataset_v2(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]

Mapillary Vistas Dataset.

Dataset paper link: http://ieeexplore.ieee.org/document/8237796/

v1.2 contain 66 object classes. (37 instance-specific)

v2.0 contain 124 object classes. (70 instance-specific, 46 stuff, 8 void or crowd).

The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’ for Mapillary Vistas Dataset.

class mmseg.datasets.MultiImageMixDataset(dataset: Union[mmengine.dataset.dataset_wrapper.ConcatDataset, dict], pipeline: Sequence[dict], skip_type_keys: Optional[List[str]] = None, lazy_init: bool = False)[source]

A wrapper of multiple images mixed dataset.

Suitable for training on multiple images mixed data augmentation like mosaic and mixup.

Parameters
  • dataset (ConcatDataset or dict) – The dataset to be mixed.

  • pipeline (Sequence[dict]) – Sequence of transform object or config dict to be composed.

  • skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline. Default to None.

full_init()[source]

Loop to full_init each dataset.

get_data_info(idx: int)dict[source]

Get annotation by index.

Parameters

idx (int) – Global index of ConcatDataset.

Returns

The idx-th annotation of the datasets.

Return type

dict

property metainfo: dict

Get the meta information of the multi-image-mixed dataset.

Returns

The meta information of multi-image-mixed dataset.

Return type

dict

update_skip_type_keys(skip_type_keys)[source]

Update skip_type_keys.

It is called by an external hook.

Parameters

skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline.

class mmseg.datasets.NightDrivingDataset(img_suffix='_leftImg8bit.png', seg_map_suffix='_gtCoarse_labelTrainIds.png', **kwargs)[source]

NightDrivingDataset dataset.

class mmseg.datasets.PackSegInputs(meta_keys=('img_path', 'seg_map_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'reduce_zero_label'))[source]

Pack the inputs data for the semantic segmentation.

The img_meta item is always populated. The contents of the img_meta dictionary depends on meta_keys. By default this includes:

  • img_path: filename of the image

  • ori_shape: original shape of the image as a tuple (h, w, c)

  • img_shape: shape of the image input to the network as a tuple (h, w, c). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.

  • pad_shape: shape of padded images

  • scale_factor: a float indicating the preprocessing scale

  • flip: a boolean indicating if image flip transform was used

  • flip_direction: the flipping direction

Parameters

meta_keys (Sequence[str], optional) – Meta keys to be packed from SegDataSample and collected in data[img_metas]. Default: ('img_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction')

transform(results: dict)dict[source]

Method to pack the input data.

Parameters

results (dict) – Result dict from the data pipeline.

Returns

  • ‘inputs’ (obj:torch.Tensor): The forward data of models.

  • ’data_sample’ (obj:SegDataSample): The annotation info of the

    sample.

Return type

dict

class mmseg.datasets.PascalContextDataset(ann_file: str, img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]

PascalContext dataset.

In segmentation map annotation for PascalContext, 0 stands for background, which is included in 60 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’.

Parameters

ann_file (str) – Annotation file path.

class mmseg.datasets.PascalContextDataset59(ann_file: str, img_suffix='.jpg', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]

PascalContext dataset.

In segmentation map annotation for PascalContext, 0 stands for background, which is included in 60 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’.

Parameters

ann_file (str) – Annotation file path.

class mmseg.datasets.PascalVOCDataset(ann_file, img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]

Pascal VOC dataset.

Parameters

split (str) – Split txt file for Pascal VOC.

class mmseg.datasets.PhotoMetricDistortion(brightness_delta: int = 32, contrast_range: Sequence[float] = (0.5, 1.5), saturation_range: Sequence[float] = (0.5, 1.5), hue_delta: int = 18)[source]

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

  1. random brightness

  2. random contrast (mode 0)

  3. convert color from BGR to HSV

  4. random saturation

  5. random hue

  6. convert color from HSV to BGR

  7. random contrast (mode 1)

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • brightness_delta (int) – delta of brightness.

  • contrast_range (tuple) – range of contrast.

  • saturation_range (tuple) – range of saturation.

  • hue_delta (int) – delta of hue.

brightness(img: numpy.ndarray)numpy.ndarray[source]

Brightness distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after brightness change.

Return type

np.ndarray

contrast(img: numpy.ndarray)numpy.ndarray[source]

Contrast distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after contrast change.

Return type

np.ndarray

convert(img: numpy.ndarray, alpha: int = 1, beta: int = 0)numpy.ndarray[source]

Multiple with alpha and add beat with clip.

Parameters
  • img (np.ndarray) – The input image.

  • alpha (int) – Image weights, change the contrast/saturation of the image. Default: 1

  • beta (int) – Image bias, change the brightness of the image. Default: 0

Returns

The transformed image.

Return type

np.ndarray

hue(img: numpy.ndarray)numpy.ndarray[source]

Hue distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after hue change.

Return type

np.ndarray

saturation(img: numpy.ndarray)numpy.ndarray[source]

Saturation distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after saturation change.

Return type

np.ndarray

transform(results: dict)dict[source]

Transform function to perform photometric distortion on images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with images distorted.

Return type

dict

class mmseg.datasets.PotsdamDataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]

ISPRS Potsdam dataset.

In segmentation map annotation for Potsdam dataset, 0 is the ignore index. reduce_zero_label should be set to True. The img_suffix and seg_map_suffix are both fixed to ‘.png’.

class mmseg.datasets.REFUGEDataset(**kwargs)[source]

REFUGE dataset.

In segmentation map annotation for REFUGE, 0 stands for background, which is not included in 2 categories. reduce_zero_label is fixed to True. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.RGB2Gray(out_channels=None, weights=(0.299, 0.587, 0.114))[source]

Convert RGB image to grayscale image.

Required Keys:

  • img

Modified Keys:

  • img

  • img_shape

This transform calculate the weighted mean of input image channels with weights and then expand the channels to out_channels. When out_channels is None, the number of output channels is the same as input channels.

Parameters
  • out_channels (int) – Expected number of output channels after transforming. Default: None.

  • weights (tuple[float]) – The weights to calculate the weighted mean. Default: (0.299, 0.587, 0.114).

transform(results: dict)dict[source]

Call function to convert RGB image to grayscale image.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with grayscale image.

Return type

dict

class mmseg.datasets.RandomCrop(crop_size: Union[int, Tuple[int, int]], cat_max_ratio: float = 1.0, ignore_index: int = 255)[source]

Random crop the image & seg.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • img_shape

  • gt_seg_map

Parameters
  • crop_size (Union[int, Tuple[int, int]]) – Expected size after cropping with the format of (h, w). If set to an integer, then cropping width and height are equal to this integer.

  • cat_max_ratio (float) – The maximum ratio that single category could occupy.

  • ignore_index (int) – The label index to be ignored. Default: 255

crop(img: numpy.ndarray, crop_bbox: tuple)numpy.ndarray[source]

Crop from img

Parameters
  • img (np.ndarray) – Original input image.

  • crop_bbox (tuple) – Coordinates of the cropped image.

Returns

The cropped image.

Return type

np.ndarray

transform(results: dict)dict[source]

Transform function to randomly crop images, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Randomly cropped results, ‘img_shape’ key in result dict is

updated according to crop size.

Return type

dict

class mmseg.datasets.RandomCutOut(prob, n_holes, cutout_shape=None, cutout_ratio=None, fill_in=(0, 0, 0), seg_fill_in=None)[source]

CutOut operation.

Randomly drop some regions of image used in Cutout.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • gt_seg_map

Parameters
  • prob (float) – cutout probability.

  • n_holes (int | tuple[int, int]) – Number of regions to be dropped. If it is given as a list, number of holes will be randomly selected from the closed interval [n_holes[0], n_holes[1]].

  • cutout_shape (tuple[int, int] | list[tuple[int, int]]) – The candidate shape of dropped regions. It can be tuple[int, int] to use a fixed cutout shape, or list[tuple[int, int]] to randomly choose shape from the list.

  • cutout_ratio (tuple[float, float] | list[tuple[float, float]]) – The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time.

  • fill_in (tuple[float, float, float] | tuple[int, int, int]) – The value of pixel to fill in the dropped regions. Default: (0, 0, 0).

  • seg_fill_in (int) – The labels of pixel to fill in the dropped regions. If seg_fill_in is None, skip. Default: None.

transform(results: dict)dict[source]

Call function to drop some regions of image.

class mmseg.datasets.RandomMosaic(prob, img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=0, seg_pad_val=255)[source]

Mosaic augmentation. Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

                   mosaic transform
                      center_x
           +------------------------------+
           |       pad        |  pad      |
           |      +-----------+           |
           |      |           |           |
           |      |  image1   |--------+  |
           |      |           |        |  |
           |      |           | image2 |  |
center_y   |----+-------------+-----------|
           |    |   cropped   |           |
           |pad |   image3    |  image4   |
           |    |             |           |
           +----|-------------+-----------+
                |             |
                +-------------+

The mosaic transform steps are as follows:
    1. Choose the mosaic center as the intersections of 4 images
    2. Get the left top image according to the index, and randomly
       sample another 3 images from the custom dataset.
    3. Sub image will be cropped if image is larger than mosaic patch

Required Keys:

  • img

  • gt_seg_map

  • mix_results

Modified Keys:

  • img

  • img_shape

  • ori_shape

  • gt_seg_map

Parameters
  • prob (float) – mosaic probability.

  • img_scale (Sequence[int]) – Image size after mosaic pipeline of a single image. The size of the output image is four times that of a single image. The output image comprises 4 single images. Default: (640, 640).

  • center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default: (0.5, 1.5).

  • pad_val (int) – Pad value. Default: 0.

  • seg_pad_val (int) – Pad value of segmentation map. Default: 255.

get_indices(dataset: mmseg.datasets.dataset_wrappers.MultiImageMixDataset)list[source]

Call function to collect indices.

Parameters

dataset (MultiImageMixDataset) – The dataset.

Returns

indices.

Return type

list

transform(results: dict)dict[source]

Call function to make a mosaic of image.

Parameters

results (dict) – Result dict.

Returns

Result dict with mosaic transformed.

Return type

dict

class mmseg.datasets.RandomRotFlip(rotate_prob=0.5, flip_prob=0.5, degree=(- 20, 20))[source]

Rotate and flip the image & seg or just rotate the image & seg.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • gt_seg_map

Parameters
  • rotate_prob (float) – The probability of rotate image.

  • flip_prob (float) – The probability of rotate&flip image.

  • degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (-degree, +degree)

transform(results: dict)dict[source]

Call function to rotate or rotate & flip image, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Rotated or rotated & flipped results.

Return type

dict

class mmseg.datasets.RandomRotate(prob, degree, pad_val=0, seg_pad_val=255, center=None, auto_bound=False)[source]

Rotate the image & seg.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • gt_seg_map

Parameters
  • prob (float) – The rotation probability.

  • degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (-degree, +degree)

  • pad_val (float, optional) – Padding value of image. Default: 0.

  • seg_pad_val (float, optional) – Padding value of segmentation map. Default: 255.

  • center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used. Default: None.

  • auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image. Default: False

transform(results: dict)dict[source]

Call function to rotate image, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Rotated results.

Return type

dict

class mmseg.datasets.Rerange(min_value=0, max_value=255)[source]

Rerange the image pixel value.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • min_value (float or int) – Minimum value of the reranged image. Default: 0.

  • max_value (float or int) – Maximum value of the reranged image. Default: 255.

transform(results: dict)dict[source]

Call function to rerange images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Reranged results.

Return type

dict

class mmseg.datasets.ResizeShortestEdge(scale: Union[int, Tuple[int, int]], max_size: int)[source]

Resize the image and mask while keeping the aspect ratio unchanged.

Modified from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/transforms/augmentation_impl.py#L130 # noqa:E501 Copyright (c) Facebook, Inc. and its affiliates. Licensed under the Apache-2.0 License

This transform attempts to scale the shorter edge to the given scale, as long as the longer edge does not exceed max_size. If max_size is reached, then downscale so that the longer edge does not exceed max_size.

Required Keys:

  • img

  • gt_seg_map (optional)

Modified Keys:

  • img

  • img_shape

  • gt_seg_map (optional))

Added Keys:

  • scale

  • scale_factor

  • keep_ratio

Parameters
  • scale (Union[int, Tuple[int, int]]) – The target short edge length. If it’s tuple, will select the min value as the short edge length.

  • max_size (int) – The maximum allowed longest edge length.

transform(results: Dict)Dict[source]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

class mmseg.datasets.ResizeToMultiple(size_divisor=32, interpolation=None)[source]

Resize images & seg to multiple of divisor.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • img_shape

  • pad_shape

Parameters
  • size_divisor (int) – images and gt seg maps need to resize to multiple of size_divisor. Default: 32.

  • interpolation (str, optional) – The interpolation mode of image resize. Default: None

transform(results: dict)dict[source]

Call function to resize images, semantic segmentation map to multiple of size divisor.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Resized results, ‘img_shape’, ‘pad_shape’ keys are updated.

Return type

dict

class mmseg.datasets.STAREDataset(img_suffix='.png', seg_map_suffix='.ah.png', reduce_zero_label=False, **kwargs)[source]

STARE dataset.

In segmentation map annotation for STARE, 0 stands for background, which is included in 2 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘.ah.png’.

class mmseg.datasets.SegRescale(scale_factor=1)[source]

Rescale semantic segmentation maps.

Required Keys:

  • gt_seg_map

Modified Keys:

  • gt_seg_map

Parameters

scale_factor (float) – The scale factor of the final output.

transform(results: dict)dict[source]

Call function to scale the semantic segmentation map.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with semantic segmentation map scaled.

Return type

dict

class mmseg.datasets.SynapseDataset(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]

Synapse dataset.

Before dataset preprocess of Synapse, there are total 13 categories of foreground which does not include background. After preprocessing, 8 foreground categories are kept while the other 5 foreground categories are handled as background. The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.iSAIDDataset(img_suffix='.png', seg_map_suffix='_instance_color_RGB.png', ignore_index=255, **kwargs)[source]

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images In segmentation map annotation for iSAID dataset, which is included in 16 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘_manual1.png’.

transforms

class mmseg.datasets.transforms.AdjustGamma(gamma=1.0)[source]

Using gamma correction to process the image.

Required Keys:

  • img

Modified Keys:

  • img

Parameters

gamma (float or int) – Gamma value used in gamma correction. Default: 1.0.

transform(results: dict)dict[source]

Call function to process the image with gamma correction.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Processed results.

Return type

dict

class mmseg.datasets.transforms.BioMedical3DPad(pad_shape: Tuple[int, int, int], pad_val: float = 0.0, seg_pad_val: int = 0)[source]

Pad the biomedical 3d image & biomedical 3d semantic segmentation maps.

Required Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Modified Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Added Keys:

  • pad_shape (Tuple[int, int, int]): The padded shape.

Parameters
  • pad_shape (Tuple[int, int, int]) – Fixed padding size. Expected padding shape (Z, Y, X).

  • pad_val (float) – Padding value for biomedical image. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.

  • seg_pad_val (int) – Padding value for biomedical 3d semantic segmentation maps. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.

transform(results: dict)dict[source]

Call function to pad images, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Updated result dict.

Return type

dict

class mmseg.datasets.transforms.BioMedical3DRandomCrop(crop_shape: Union[int, Tuple[int, int, int]], keep_foreground: bool = True)[source]

Crop the input patch for medical image & segmentation mask.

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

  • gt_seg_map (np.ndarray, optional): Biomedical semantic segmentation mask

    with shape (Z, Y, X).

Modified Keys:

  • img

  • img_shape

  • gt_seg_map (optional)

Parameters
  • crop_shape (Union[int, Tuple[int, int, int]]) – Expected size after cropping with the format of (z, y, x). If set to an integer, then cropping width and height are equal to this integer.

  • keep_foreground (bool) – If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the crop bounding-box. Default to True.

crop(img: numpy.ndarray, crop_bbox: tuple)numpy.ndarray[source]

Crop from img

Parameters
  • img (np.ndarray) – Original input image.

  • crop_bbox (tuple) – Coordinates of the cropped image.

Returns

The cropped image.

Return type

np.ndarray

generate_margin(results: dict)tuple[source]

Generate margin of crop bounding-box.

If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the bounding-box, and return the margin between of the bounding-box and image. If keep_foreground is False, it will return the difference from crop shape and image shape.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

The margin for 3 dimensions of crop bounding-box and image.

Return type

tuple

random_generate_crop_bbox(margin_z: int, margin_y: int, margin_x: int)tuple[source]

Randomly get a crop bounding box.

Parameters

seg_map (np.ndarray) – Ground truth segmentation map.

Returns

Coordinates of the cropped image.

Return type

tuple

random_sample_location(seg_map: numpy.ndarray)dict[source]

sample foreground voxel when keep_foreground is True.

Parameters

seg_map (np.ndarray) – gt seg map.

Returns

Coordinates of selected foreground voxel.

Return type

dict

transform(results: dict)dict[source]

Transform function to randomly crop images, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Randomly cropped results, ‘img_shape’ key in result dict is

updated according to crop size.

Return type

dict

class mmseg.datasets.transforms.BioMedical3DRandomFlip(prob: float, axes: Tuple[int, ...], swap_label_pairs: Optional[List[Tuple[int, int]]] = None)[source]

Flip biomedical 3D images and segmentations.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/spatial_transforms.py # noqa:E501

Copyright 2021 Division of Medical Image Computing, German Cancer Research Center (DKFZ) and Applied Computer Vision Lab, Helmholtz Imaging Platform. Licensed under the Apache-2.0 License.

Required Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Modified Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Added Keys:

  • do_flip

  • flip_axes

Parameters
  • prob (float) – Flipping probability.

  • axes (Tuple[int, ...]) – Flipping axes with order ‘ZXY’.

  • swap_label_pairs (Optional[List[Tuple[int, int]]]) –

  • segmentation label pairs that are swapped when flipping. (The) –

transform(results: Dict)Dict[source]

Call function to flip and swap pair labels.

Parameters

results (dict) – Result dict.

Returns

Flipped results, ‘do_flip’, ‘flip_axes’ keys are added into

result dict.

Return type

dict

class mmseg.datasets.transforms.BioMedicalGaussianBlur(sigma_range: Tuple[float, float] = (0.5, 1.0), prob: float = 0.2, prob_per_channel: float = 0.5, different_sigma_per_channel: bool = True, different_sigma_per_axis: bool = True)[source]

Add Gaussian blur with random sigma to image.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L81 # noqa:E501

Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

Modified Keys:

  • img

Parameters
  • sigma_range (Tuple[float, float]|float) – range to randomly select sigma value. Default to (0.5, 1.0).

  • prob (float) – Probability to apply Gaussian blur for each sample. Default to 0.2.

  • prob_per_channel (float) – Probability to apply Gaussian blur for each channel (axis N of the image). Default to 0.5.

  • different_sigma_per_channel (bool) – whether to use different sigma for each channel (axis N of the image). Default to True.

  • different_sigma_per_axis (bool) – whether to use different sigma for axis Z, X and Y of the image. Default to True.

transform(results: Dict)Dict[source]

Call function to add random Gaussian blur to image.

Parameters

results (dict) – Result dict.

Returns

Result dict with random Gaussian noise.

Return type

dict

class mmseg.datasets.transforms.BioMedicalGaussianNoise(prob: float = 0.1, mean: float = 0.0, std: float = 0.1)[source]

Add random Gaussian noise to image.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L53 # noqa:E501

Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

Modified Keys:

  • img

Parameters
  • prob (float) – Probability to add Gaussian noise for each sample. Default to 0.1.

  • mean (float) – Mean or “centre” of the distribution. Default to 0.0.

  • std (float) – Standard deviation of distribution. Default to 0.1.

transform(results: Dict)Dict[source]

Call function to add random Gaussian noise to image.

Parameters

results (dict) – Result dict.

Returns

Result dict with random Gaussian noise.

Return type

dict

class mmseg.datasets.transforms.BioMedicalRandomGamma(prob: float = 0.5, gamma_range: Tuple[float] = (0.5, 2), invert_image: bool = False, per_channel: bool = False, retain_stats: bool = False)[source]

Using random gamma correction to process the biomedical image.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/color_transforms.py#L132 # noqa:E501 With licence: Apache 2.0

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

Modified Keys: - img

Parameters
  • prob (float) – The probability to perform this transform. Default: 0.5.

  • gamma_range (Tuple[float]) – Range of gamma values. Default: (0.5, 2).

  • invert_image (bool) – Whether invert the image before applying gamma augmentation. Default: False.

  • per_channel (bool) – Whether perform the transform each channel individually. Default: False

  • retain_stats (bool) – Gamma transformation will alter the mean and std of the data in the patch. If retain_stats=True, the data will be transformed to match the mean and standard deviation before gamma augmentation. Default: False.

transform(results: dict)dict[source]

Call function to perform random gamma correction :param results: Result dict from loading pipeline. :type results: dict

Returns

Result dict with random gamma correction performed.

Return type

dict

class mmseg.datasets.transforms.CLAHE(clip_limit=40.0, tile_grid_size=(8, 8))[source]

Use CLAHE method to process the image.

See ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994:474-485. for more information.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • clip_limit (float) – Threshold for contrast limiting. Default: 40.0.

  • tile_grid_size (tuple[int]) – Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. It defines the number of tiles in row and column. Default: (8, 8).

transform(results: dict)dict[source]

Call function to Use CLAHE method process images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Processed results.

Return type

dict

class mmseg.datasets.transforms.GenerateEdge(edge_width: int = 3, ignore_index: int = 255)[source]

Generate Edge for CE2P approach.

Edge will be used to calculate loss of CE2P.

Modified from https://github.com/liutinglt/CE2P/blob/master/dataset/target_generation.py # noqa:E501

Required Keys:

  • img_shape

  • gt_seg_map

Added Keys:
  • gt_edge_map (np.ndarray, uint8): The edge annotation generated from the

    seg map by extracting border between different semantics.

Parameters
  • edge_width (int) – The width of edge. Default to 3.

  • ignore_index (int) – Index that will be ignored. Default to 255.

transform(results: Dict)Dict[source]

Call function to generate edge from segmentation map.

Parameters

results (dict) – Result dict.

Returns

Result dict with edge mask.

Return type

dict

class mmseg.datasets.transforms.LoadAnnotations(reduce_zero_label=None, backend_args=None, imdecode_backend='pillow')[source]

Load annotations for semantic segmentation provided by dataset.

The annotation format is as the following:

{
    # Filename of semantic segmentation ground truth file.
    'seg_map_path': 'a/b/c'
}

After this module, the annotation has been changed to the format below:

{
    # in str
    'seg_fields': List
     # In uint8 type.
    'gt_seg_map': np.ndarray (H, W)
}

Required Keys:

  • seg_map_path (str): Path of semantic segmentation ground truth file.

Added Keys:

  • seg_fields (List)

  • gt_seg_map (np.uint8)

Parameters
  • reduce_zero_label (bool, optional) – Whether reduce all label value by 1. Usually used for datasets where 0 is background label. Defaults to None.

  • imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :fun:mmcv.imfrombytes for details. Defaults to ‘pillow’.

  • backend_args (dict) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

class mmseg.datasets.transforms.LoadBiomedicalAnnotation(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]

Load seg_map annotation provided by biomedical dataset.

The annotation format is as the following:

{
    'gt_seg_map': np.ndarray (X, Y, Z) or (Z, Y, X)
}

Required Keys:

  • seg_map_path

Added Keys:

  • gt_seg_map (np.ndarray): Biomedical seg map with shape (Z, Y, X) by

    default, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.

Parameters
  • decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.

  • to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.

  • to_float32 (bool) – Whether to convert the loaded seg map to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See mmengine.fileio for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.transforms.LoadBiomedicalData(with_seg=False, decode_backend: str = 'numpy', to_xyz: bool = False, backend_args: Optional[dict] = None)[source]

Load an biomedical image and annotation from file.

The loading data format is as the following:

{
    'img': np.ndarray data[:-1, X, Y, Z]
    'seg_map': np.ndarray data[-1, X, Y, Z]
}

Required Keys:

  • img_path

Added Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

  • img_shape

  • ori_shape

Parameters
  • with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.

  • decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.

  • to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.transforms.LoadBiomedicalImageFromFile(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]

Load an biomedical mage from file.

Required Keys:

  • img_path

Added Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.

  • img_shape

  • ori_shape

Parameters
  • decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.

  • to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.

  • to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.transforms.LoadImageFromNDArray(to_float32: bool = False, color_type: str = 'color', imdecode_backend: str = 'cv2', file_client_args: Optional[dict] = None, ignore_empty: bool = False, *, backend_args: Optional[dict] = None)[source]

Load an image from results['img'].

Similar with LoadImageFromFile, but the image has been loaded as np.ndarray in results['img']. Can be used when loading image from webcam.

Required Keys:

  • img

Modified Keys:

  • img

  • img_path

  • img_shape

  • ori_shape

Parameters

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

transform(results: dict)dict[source]

Transform function to add image meta information.

Parameters

results (dict) – Result dict with Webcam read image in results['img'].

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.transforms.PackSegInputs(meta_keys=('img_path', 'seg_map_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'reduce_zero_label'))[source]

Pack the inputs data for the semantic segmentation.

The img_meta item is always populated. The contents of the img_meta dictionary depends on meta_keys. By default this includes:

  • img_path: filename of the image

  • ori_shape: original shape of the image as a tuple (h, w, c)

  • img_shape: shape of the image input to the network as a tuple (h, w, c). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.

  • pad_shape: shape of padded images

  • scale_factor: a float indicating the preprocessing scale

  • flip: a boolean indicating if image flip transform was used

  • flip_direction: the flipping direction

Parameters

meta_keys (Sequence[str], optional) – Meta keys to be packed from SegDataSample and collected in data[img_metas]. Default: ('img_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction')

transform(results: dict)dict[source]

Method to pack the input data.

Parameters

results (dict) – Result dict from the data pipeline.

Returns

  • ‘inputs’ (obj:torch.Tensor): The forward data of models.

  • ’data_sample’ (obj:SegDataSample): The annotation info of the

    sample.

Return type

dict

class mmseg.datasets.transforms.PhotoMetricDistortion(brightness_delta: int = 32, contrast_range: Sequence[float] = (0.5, 1.5), saturation_range: Sequence[float] = (0.5, 1.5), hue_delta: int = 18)[source]

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

  1. random brightness

  2. random contrast (mode 0)

  3. convert color from BGR to HSV

  4. random saturation

  5. random hue

  6. convert color from HSV to BGR

  7. random contrast (mode 1)

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • brightness_delta (int) – delta of brightness.

  • contrast_range (tuple) – range of contrast.

  • saturation_range (tuple) – range of saturation.

  • hue_delta (int) – delta of hue.

brightness(img: numpy.ndarray)numpy.ndarray[source]

Brightness distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after brightness change.

Return type

np.ndarray

contrast(img: numpy.ndarray)numpy.ndarray[source]

Contrast distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after contrast change.

Return type

np.ndarray

convert(img: numpy.ndarray, alpha: int = 1, beta: int = 0)numpy.ndarray[source]

Multiple with alpha and add beat with clip.

Parameters
  • img (np.ndarray) – The input image.

  • alpha (int) – Image weights, change the contrast/saturation of the image. Default: 1

  • beta (int) – Image bias, change the brightness of the image. Default: 0

Returns

The transformed image.

Return type

np.ndarray

hue(img: numpy.ndarray)numpy.ndarray[source]

Hue distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after hue change.

Return type

np.ndarray

saturation(img: numpy.ndarray)numpy.ndarray[source]

Saturation distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after saturation change.

Return type

np.ndarray

transform(results: dict)dict[source]

Transform function to perform photometric distortion on images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with images distorted.

Return type

dict

class mmseg.datasets.transforms.RGB2Gray(out_channels=None, weights=(0.299, 0.587, 0.114))[source]

Convert RGB image to grayscale image.

Required Keys:

  • img

Modified Keys:

  • img

  • img_shape

This transform calculate the weighted mean of input image channels with weights and then expand the channels to out_channels. When out_channels is None, the number of output channels is the same as input channels.

Parameters
  • out_channels (int) – Expected number of output channels after transforming. Default: None.

  • weights (tuple[float]) – The weights to calculate the weighted mean. Default: (0.299, 0.587, 0.114).

transform(results: dict)dict[source]

Call function to convert RGB image to grayscale image.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with grayscale image.

Return type

dict

class mmseg.datasets.transforms.RandomCrop(crop_size: Union[int, Tuple[int, int]], cat_max_ratio: float = 1.0, ignore_index: int = 255)[source]

Random crop the image & seg.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • img_shape

  • gt_seg_map

Parameters
  • crop_size (Union[int, Tuple[int, int]]) – Expected size after cropping with the format of (h, w). If set to an integer, then cropping width and height are equal to this integer.

  • cat_max_ratio (float) – The maximum ratio that single category could occupy.

  • ignore_index (int) – The label index to be ignored. Default: 255

crop(img: numpy.ndarray, crop_bbox: tuple)numpy.ndarray[source]

Crop from img

Parameters
  • img (np.ndarray) – Original input image.

  • crop_bbox (tuple) – Coordinates of the cropped image.

Returns

The cropped image.

Return type

np.ndarray

transform(results: dict)dict[source]

Transform function to randomly crop images, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Randomly cropped results, ‘img_shape’ key in result dict is

updated according to crop size.

Return type

dict

class mmseg.datasets.transforms.RandomCutOut(prob, n_holes, cutout_shape=None, cutout_ratio=None, fill_in=(0, 0, 0), seg_fill_in=None)[source]

CutOut operation.

Randomly drop some regions of image used in Cutout.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • gt_seg_map

Parameters
  • prob (float) – cutout probability.

  • n_holes (int | tuple[int, int]) – Number of regions to be dropped. If it is given as a list, number of holes will be randomly selected from the closed interval [n_holes[0], n_holes[1]].

  • cutout_shape (tuple[int, int] | list[tuple[int, int]]) – The candidate shape of dropped regions. It can be tuple[int, int] to use a fixed cutout shape, or list[tuple[int, int]] to randomly choose shape from the list.

  • cutout_ratio (tuple[float, float] | list[tuple[float, float]]) – The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time.

  • fill_in (tuple[float, float, float] | tuple[int, int, int]) – The value of pixel to fill in the dropped regions. Default: (0, 0, 0).

  • seg_fill_in (int) – The labels of pixel to fill in the dropped regions. If seg_fill_in is None, skip. Default: None.

transform(results: dict)dict[source]

Call function to drop some regions of image.

class mmseg.datasets.transforms.RandomMosaic(prob, img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=0, seg_pad_val=255)[source]

Mosaic augmentation. Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

                   mosaic transform
                      center_x
           +------------------------------+
           |       pad        |  pad      |
           |      +-----------+           |
           |      |           |           |
           |      |  image1   |--------+  |
           |      |           |        |  |
           |      |           | image2 |  |
center_y   |----+-------------+-----------|
           |    |   cropped   |           |
           |pad |   image3    |  image4   |
           |    |             |           |
           +----|-------------+-----------+
                |             |
                +-------------+

The mosaic transform steps are as follows:
    1. Choose the mosaic center as the intersections of 4 images
    2. Get the left top image according to the index, and randomly
       sample another 3 images from the custom dataset.
    3. Sub image will be cropped if image is larger than mosaic patch

Required Keys:

  • img

  • gt_seg_map

  • mix_results

Modified Keys:

  • img

  • img_shape

  • ori_shape

  • gt_seg_map

Parameters
  • prob (float) – mosaic probability.

  • img_scale (Sequence[int]) – Image size after mosaic pipeline of a single image. The size of the output image is four times that of a single image. The output image comprises 4 single images. Default: (640, 640).

  • center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default: (0.5, 1.5).

  • pad_val (int) – Pad value. Default: 0.

  • seg_pad_val (int) – Pad value of segmentation map. Default: 255.

get_indices(dataset: mmseg.datasets.dataset_wrappers.MultiImageMixDataset)list[source]

Call function to collect indices.

Parameters

dataset (MultiImageMixDataset) – The dataset.

Returns

indices.

Return type

list

transform(results: dict)dict[source]

Call function to make a mosaic of image.

Parameters

results (dict) – Result dict.

Returns

Result dict with mosaic transformed.

Return type

dict

class mmseg.datasets.transforms.RandomRotFlip(rotate_prob=0.5, flip_prob=0.5, degree=(- 20, 20))[source]

Rotate and flip the image & seg or just rotate the image & seg.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • gt_seg_map

Parameters
  • rotate_prob (float) – The probability of rotate image.

  • flip_prob (float) – The probability of rotate&flip image.

  • degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (-degree, +degree)

transform(results: dict)dict[source]

Call function to rotate or rotate & flip image, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Rotated or rotated & flipped results.

Return type

dict

class mmseg.datasets.transforms.RandomRotate(prob, degree, pad_val=0, seg_pad_val=255, center=None, auto_bound=False)[source]

Rotate the image & seg.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • gt_seg_map

Parameters
  • prob (float) – The rotation probability.

  • degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (-degree, +degree)

  • pad_val (float, optional) – Padding value of image. Default: 0.

  • seg_pad_val (float, optional) – Padding value of segmentation map. Default: 255.

  • center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used. Default: None.

  • auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image. Default: False

transform(results: dict)dict[source]

Call function to rotate image, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Rotated results.

Return type

dict

class mmseg.datasets.transforms.Rerange(min_value=0, max_value=255)[source]

Rerange the image pixel value.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • min_value (float or int) – Minimum value of the reranged image. Default: 0.

  • max_value (float or int) – Maximum value of the reranged image. Default: 255.

transform(results: dict)dict[source]

Call function to rerange images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Reranged results.

Return type

dict

class mmseg.datasets.transforms.ResizeShortestEdge(scale: Union[int, Tuple[int, int]], max_size: int)[source]

Resize the image and mask while keeping the aspect ratio unchanged.

Modified from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/transforms/augmentation_impl.py#L130 # noqa:E501 Copyright (c) Facebook, Inc. and its affiliates. Licensed under the Apache-2.0 License

This transform attempts to scale the shorter edge to the given scale, as long as the longer edge does not exceed max_size. If max_size is reached, then downscale so that the longer edge does not exceed max_size.

Required Keys:

  • img

  • gt_seg_map (optional)

Modified Keys:

  • img

  • img_shape

  • gt_seg_map (optional))

Added Keys:

  • scale

  • scale_factor

  • keep_ratio

Parameters
  • scale (Union[int, Tuple[int, int]]) – The target short edge length. If it’s tuple, will select the min value as the short edge length.

  • max_size (int) – The maximum allowed longest edge length.

transform(results: Dict)Dict[source]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

class mmseg.datasets.transforms.ResizeToMultiple(size_divisor=32, interpolation=None)[source]

Resize images & seg to multiple of divisor.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • img_shape

  • pad_shape

Parameters
  • size_divisor (int) – images and gt seg maps need to resize to multiple of size_divisor. Default: 32.

  • interpolation (str, optional) – The interpolation mode of image resize. Default: None

transform(results: dict)dict[source]

Call function to resize images, semantic segmentation map to multiple of size divisor.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Resized results, ‘img_shape’, ‘pad_shape’ keys are updated.

Return type

dict

class mmseg.datasets.transforms.SegRescale(scale_factor=1)[source]

Rescale semantic segmentation maps.

Required Keys:

  • gt_seg_map

Modified Keys:

  • gt_seg_map

Parameters

scale_factor (float) – The scale factor of the final output.

transform(results: dict)dict[source]

Call function to scale the semantic segmentation map.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with semantic segmentation map scaled.

Return type

dict

mmseg.engine

hooks

class mmseg.engine.hooks.SegVisualizationHook(draw: bool = False, interval: int = 50, show: bool = False, wait_time: float = 0.0, backend_args: Optional[dict] = None)[source]

Segmentation Visualization Hook. Used to visualize validation and testing process prediction results.

In the testing phase:

  1. If show is True, it means that only the prediction results are

    visualized without storing data, so vis_backends needs to be excluded.

Parameters
  • draw (bool) – whether to draw prediction results. If it is False, it means that no drawing will be done. Defaults to False.

  • interval (int) – The interval of visualization. Defaults to 50.

  • show (bool) – Whether to display the drawn image. Default to False.

  • wait_time (float) – The interval of show (s). Defaults to 0.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

optimizers

class mmseg.engine.optimizers.LayerDecayOptimizerConstructor(optim_wrapper_cfg, paramwise_cfg)[source]

Different learning rates are set for different layers of backbone.

Note: Currently, this optimizer constructor is built for BEiT, and it will be deprecated. Please use LearningRateDecayOptimizerConstructor instead.

class mmseg.engine.optimizers.LearningRateDecayOptimizerConstructor(optim_wrapper_cfg: dict, paramwise_cfg: Optional[dict] = None)[source]

Different learning rates are set for different layers of backbone.

Note: Currently, this optimizer constructor is built for ConvNeXt, BEiT and MAE.

add_params(params, module, **kwargs)[source]

Add all parameters of module to the params list.

The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.

Parameters
  • params (list[dict]) – A list of param groups, it will be modified in place.

  • module (nn.Module) – The module to be added.

mmseg.evaluation

metrics

class mmseg.evaluation.metrics.CityscapesMetric(output_dir: str, ignore_index: int = 255, format_only: bool = False, keep_results: bool = False, collect_device: str = 'cpu', prefix: Optional[str] = None, **kwargs)[source]

Cityscapes evaluation metric.

Parameters
  • output_dir (str) – The directory for output prediction

  • ignore_index (int) – Index that will be ignored in evaluation. Default: 255.

  • format_only (bool) – Only format result for results commit without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.

  • keep_results (bool) – Whether to keep the results. When format_only is True, keep_results must be True. Defaults to False.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)Dict[str, float][source]

Compute the metrics from processed results.

Parameters

results (list) – Testing results of the dataset.

Returns

float]: Cityscapes evaluation results.

Return type

dict[str

process(data_batch: dict, data_samples: Sequence[dict])None[source]

Process one batch of data and data_samples.

The processed results should be stored in self.results, which will be used to computed the metrics when all batches have been processed.

Parameters
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of outputs from the model.

class mmseg.evaluation.metrics.IoUMetric(ignore_index: int = 255, iou_metrics: List[str] = ['mIoU'], nan_to_num: Optional[int] = None, beta: int = 1, collect_device: str = 'cpu', output_dir: Optional[str] = None, format_only: bool = False, prefix: Optional[str] = None, **kwargs)[source]

IoU evaluation metric.

Parameters
  • ignore_index (int) – Index that will be ignored in evaluation. Default: 255.

  • iou_metrics (list[str] | str) – Metrics to be calculated, the options includes ‘mIoU’, ‘mDice’ and ‘mFscore’.

  • nan_to_num (int, optional) – If specified, NaN values will be replaced by the numbers defined by the user. Default: None.

  • beta (int) – Determines the weight of recall in the combined score. Default: 1.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • output_dir (str) – The directory for output prediction. Defaults to None.

  • format_only (bool) – Only format result for results commit without perform evaluation. It is useful when you want to save the result to a specific format and submit it to the test server. Defaults to False.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)Dict[str, float][source]

Compute the metrics from processed results.

Parameters

results (list) – The processed results of each batch.

Returns

The computed metrics. The keys are the names of

the metrics, and the values are corresponding results. The key mainly includes aAcc, mIoU, mAcc, mDice, mFscore, mPrecision, mRecall.

Return type

Dict[str, float]

static intersect_and_union(pred_label: torch._VariableFunctionsClass.tensor, label: torch._VariableFunctionsClass.tensor, num_classes: int, ignore_index: int)[source]

Calculate Intersection and Union.

Parameters
  • pred_label (torch.tensor) – Prediction segmentation map or predict result filename. The shape is (H, W).

  • label (torch.tensor) – Ground truth segmentation map or label filename. The shape is (H, W).

  • num_classes (int) – Number of categories.

  • ignore_index (int) – Index that will be ignored in evaluation.

Returns

The intersection of prediction and ground truth

histogram on all classes.

torch.Tensor: The union of prediction and ground truth histogram on

all classes.

torch.Tensor: The prediction histogram on all classes. torch.Tensor: The ground truth histogram on all classes.

Return type

torch.Tensor

process(data_batch: dict, data_samples: Sequence[dict])None[source]

Process one batch of data and data_samples.

The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

Parameters
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of outputs from the model.

static total_area_to_metrics(total_area_intersect: numpy.ndarray, total_area_union: numpy.ndarray, total_area_pred_label: numpy.ndarray, total_area_label: numpy.ndarray, metrics: List[str] = ['mIoU'], nan_to_num: Optional[int] = None, beta: int = 1)[source]

Calculate evaluation metrics :param total_area_intersect: The intersection of prediction

and ground truth histogram on all classes.

Parameters
  • total_area_union (np.ndarray) – The union of prediction and ground truth histogram on all classes.

  • total_area_pred_label (np.ndarray) – The prediction histogram on all classes.

  • total_area_label (np.ndarray) – The ground truth histogram on all classes.

  • metrics (List[str] | str) – Metrics to be evaluated, ‘mIoU’ and ‘mDice’.

  • nan_to_num (int, optional) – If specified, NaN values will be replaced by the numbers defined by the user. Default: None.

  • beta (int) – Determines the weight of recall in the combined score. Default: 1.

Returns

per category evaluation metrics,

shape (num_classes, ).

Return type

Dict[str, np.ndarray]

mmseg.models

backbones

class mmseg.models.backbones.BEiT(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, qv_bias=True, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, num_fcs=2, norm_eval=False, pretrained=None, init_values=0.1, init_cfg=None)[source]

BERT Pre-Training of Image Transformers.

Parameters
  • img_size (int | tuple) – Input image size. Default: 224.

  • patch_size (int) – The patch size. Default: 16.

  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (int) – Embedding dimension. Default: 768.

  • num_layers (int) – Depth of transformer. Default: 12.

  • num_heads (int) – Number of attention heads. Default: 12.

  • mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Default: 4.

  • out_indices (list | tuple | int) – Output from which stages. Default: -1.

  • qv_bias (bool) – Enable bias for qv if True. Default: True.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – Stochastic depth rate. Default 0.0.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • pretrained (str, optional) – Model pretrained path. Default: None.

  • init_values (float) – Initialize the values of BEiTAttention and FFN with learnable scaling.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

resize_rel_pos_embed(checkpoint)[source]

Resize relative pos_embed weights.

This function is modified from https://github.com/microsoft/unilm/blob/master/beit/semantic_segmentation/mmcv_custom/checkpoint.py. # noqa: E501 Copyright (c) Microsoft Corporation Licensed under the MIT License :param checkpoint: Key and value of the pretrain model. :type checkpoint: dict

Returns

Interpolate the relative pos_embed weights

in the pre-train model to the current model size.

Return type

state_dict (dict)

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

class mmseg.models.backbones.BiSeNetV1(backbone_cfg, in_channels=3, spatial_channels=(64, 64, 64, 128), context_channels=(128, 256, 512), out_indices=(0, 1, 2), align_corners=False, out_channels=256, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]

BiSeNetV1 backbone.

This backbone is the implementation of BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation.

Parameters
  • backbone_cfg – (dict): Config of backbone of Context Path.

  • in_channels (int) – The number of channels of input image. Default: 3.

  • spatial_channels (Tuple[int]) – Size of channel numbers of various layers in Spatial Path. Default: (64, 64, 64, 128).

  • context_channels (Tuple[int]) – Size of channel numbers of various modules in Context Path. Default: (128, 256, 512).

  • out_indices (Tuple[int] | int, optional) – Output from which stages. Default: (0, 1, 2).

  • align_corners (bool, optional) – The align_corners argument of resize operation in Bilateral Guided Aggregation Layer. Default: False.

  • out_channels (int) – The number of channels of output. It must be the same with in_channels of decode_head. Default: 256.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.BiSeNetV2(in_channels=3, detail_channels=(64, 64, 128), semantic_channels=(16, 32, 64, 128), semantic_expansion_ratio=6, bga_channels=128, out_indices=(0, 1, 2, 3, 4), align_corners=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]

BiSeNetV2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation.

This backbone is the implementation of BiSeNetV2.

Parameters
  • in_channels (int) – Number of channel of input image. Default: 3.

  • detail_channels (Tuple[int], optional) – Channels of each stage in Detail Branch. Default: (64, 64, 128).

  • semantic_channels (Tuple[int], optional) – Channels of each stage in Semantic Branch. Default: (16, 32, 64, 128). See Table 1 and Figure 3 of paper for more details.

  • semantic_expansion_ratio (int, optional) – The expansion factor expanding channel number of middle channels in Semantic Branch. Default: 6.

  • bga_channels (int, optional) – Number of middle channels in Bilateral Guided Aggregation Layer. Default: 128.

  • out_indices (Tuple[int] | int, optional) – Output from which stages. Default: (0, 1, 2, 3, 4).

  • align_corners (bool, optional) – The align_corners argument of resize operation in Bilateral Guided Aggregation Layer. Default: False.

  • conv_cfg (dict | None) – Config of conv layers. Default: None.

  • norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’BN’).

  • act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.CGNet(in_channels=3, num_channels=(32, 64, 128), num_blocks=(3, 21), dilations=(2, 4), reductions=(8, 16), conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'PReLU'}, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]

CGNet backbone.

This backbone is the implementation of A Light-weight Context Guided Network for Semantic Segmentation.

Parameters
  • in_channels (int) – Number of input image channels. Normally 3.

  • num_channels (tuple[int]) – Numbers of feature channels at each stages. Default: (32, 64, 128).

  • num_blocks (tuple[int]) – Numbers of CG blocks at stage 1 and stage 2. Default: (3, 21).

  • dilations (tuple[int]) – Dilation rate for surrounding context extractors at stage 1 and stage 2. Default: (2, 4).

  • reductions (tuple[int]) – Reductions for global context extractors at stage 1 and stage 2. Default: (8, 16).

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’PReLU’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[source]

Convert the model into training mode will keeping the normalization layer freezed.

class mmseg.models.backbones.ERFNet(in_channels=3, enc_downsample_channels=(16, 64, 128), enc_stage_non_bottlenecks=(5, 8), enc_non_bottleneck_dilations=(2, 4, 8, 16), enc_non_bottleneck_channels=(64, 128), dec_upsample_channels=(64, 16), dec_stages_non_bottleneck=(2, 2), dec_non_bottleneck_channels=(64, 16), dropout_ratio=0.1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]

ERFNet backbone.

This backbone is the implementation of ERFNet: Efficient Residual Factorized ConvNet for Real-time SemanticSegmentation.

Parameters
  • in_channels (int) – The number of channels of input image. Default: 3.

  • enc_downsample_channels (Tuple[int]) – Size of channel numbers of various Downsampler block in encoder. Default: (16, 64, 128).

  • enc_stage_non_bottlenecks (Tuple[int]) – Number of stages of Non-bottleneck block in encoder. Default: (5, 8).

  • enc_non_bottleneck_dilations (Tuple[int]) – Dilation rate of each stage of Non-bottleneck block of encoder. Default: (2, 4, 8, 16).

  • enc_non_bottleneck_channels (Tuple[int]) – Size of channel numbers of various Non-bottleneck block in encoder. Default: (64, 128).

  • dec_upsample_channels (Tuple[int]) – Size of channel numbers of various Deconvolution block in decoder. Default: (64, 16).

  • dec_stages_non_bottleneck (Tuple[int]) – Number of stages of Non-bottleneck block in decoder. Default: (2, 2).

  • dec_non_bottleneck_channels (Tuple[int]) – Size of channel numbers of various Non-bottleneck block in decoder. Default: (64, 16).

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.1.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.FastSCNN(in_channels=3, downsample_dw_channels=(32, 48), global_in_channels=64, global_block_channels=(64, 96, 128), global_block_strides=(2, 2, 1), global_out_channels=128, higher_in_channels=64, lower_in_channels=128, fusion_out_channels=128, out_indices=(0, 1, 2), conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, align_corners=False, dw_act_cfg=None, init_cfg=None)[source]

Fast-SCNN Backbone.

This backbone is the implementation of Fast-SCNN: Fast Semantic Segmentation Network.

Parameters
  • in_channels (int) – Number of input image channels. Default: 3.

  • downsample_dw_channels (tuple[int]) – Number of output channels after the first conv layer & the second conv layer in Learning-To-Downsample (LTD) module. Default: (32, 48).

  • global_in_channels (int) – Number of input channels of Global Feature Extractor(GFE). Equal to number of output channels of LTD. Default: 64.

  • global_block_channels (tuple[int]) – Tuple of integers that describe the output channels for each of the MobileNet-v2 bottleneck residual blocks in GFE. Default: (64, 96, 128).

  • global_block_strides (tuple[int]) – Tuple of integers that describe the strides (downsampling factors) for each of the MobileNet-v2 bottleneck residual blocks in GFE. Default: (2, 2, 1).

  • global_out_channels (int) – Number of output channels of GFE. Default: 128.

  • higher_in_channels (int) – Number of input channels of the higher resolution branch in FFM. Equal to global_in_channels. Default: 64.

  • lower_in_channels (int) – Number of input channels of the lower resolution branch in FFM. Equal to global_out_channels. Default: 128.

  • fusion_out_channels (int) – Number of output channels of FFM. Default: 128.

  • out_indices (tuple) – Tuple of indices of list [higher_res_features, lower_res_features, fusion_output]. Often set to (0,1,2) to enable aux. heads. Default: (0, 1, 2).

  • conv_cfg (dict | None) – Config of conv layers. Default: None

  • norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’BN’)

  • act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’)

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False

  • dw_act_cfg (dict) – In DepthwiseSeparableConvModule, activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, frozen_stages=- 1, zero_init_residual=False, multiscale_output=True, pretrained=None, init_cfg=None)[source]

HRNet backbone.

This backbone is the implementation of High-Resolution Representations for Labeling Pixels and Regions.

Parameters
  • extra (dict) –

    Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:

    • num_modules (int): The number of HRModule in this stage.

    • num_branches (int): The number of branches in the HRModule.

    • block (str): The type of convolution block.

    • num_blocks (tuple): The number of blocks in each branch.

      The length must be equal to num_branches.

    • num_channels (tuple): The number of channels in each branch.

      The length must be equal to num_branches.

  • in_channels (int) – Number of input image channels. Normally 3.

  • conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Use BN by default.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.

  • multiscale_output (bool) – Whether to output multi-level features produced by multiple branches. If False, only the first level feature will be output. Default: True.

  • pretrained (str, optional) – Model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

Example

>>> from mmseg.models import HRNet
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(4, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='BASIC',
>>>         num_blocks=(4, 4),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=3,
>>>         num_branches=4,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4, 4),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)
(1, 64, 4, 4)
(1, 128, 2, 2)
(1, 256, 1, 1)
forward(x)[source]

Forward function.

property norm1

the normalization layer named “norm1”

Type

nn.Module

property norm2

the normalization layer named “norm2”

Type

nn.Module

train(mode=True)[source]

Convert the model into training mode will keeping the normalization layer freezed.

class mmseg.models.backbones.ICNet(backbone_cfg, in_channels=3, layer_channels=(512, 2048), light_branch_middle_channels=32, psp_out_channels=512, out_channels=(64, 256, 256), pool_scales=(1, 2, 3, 6), conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, align_corners=False, init_cfg=None)[source]

ICNet for Real-Time Semantic Segmentation on High-Resolution Images.

This backbone is the implementation of ICNet.

Parameters
  • backbone_cfg (dict) – Config dict to build backbone. Usually it is ResNet but it can also be other backbones.

  • in_channels (int) – The number of input image channels. Default: 3.

  • layer_channels (Sequence[int]) – The numbers of feature channels at layer 2 and layer 4 in ResNet. It can also be other backbones. Default: (512, 2048).

  • light_branch_middle_channels (int) – The number of channels of the middle layer in light branch. Default: 32.

  • psp_out_channels (int) – The number of channels of the output of PSP module. Default: 512.

  • out_channels (Sequence[int]) – The numbers of output feature channels at each branches. Default: (64, 256, 256).

  • pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module. Default: (1, 2, 3, 6).

  • conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Dictionary to construct and config act layer. Default: dict(type=’ReLU’).

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.MAE(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, num_fcs=2, norm_eval=False, pretrained=None, init_values=0.1, init_cfg=None)[source]

VisionTransformer with support for patch.

Parameters
  • img_size (int | tuple) – Input image size. Default: 224.

  • patch_size (int) – The patch size. Default: 16.

  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (int) – embedding dimension. Default: 768.

  • num_layers (int) – depth of transformer. Default: 12.

  • num_heads (int) – number of attention heads. Default: 12.

  • mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.

  • out_indices (list | tuple | int) – Output from which stages. Default: -1.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – stochastic depth rate. Default 0.0.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_values (float) – Initialize the values of Attention and FFN with learnable scaling. Defaults to 0.1.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

fix_init_weight()[source]

Rescale the initialization according to layer id.

This function is copied from https://github.com/microsoft/unilm/blob/master/beit/modeling_pretrain.py. # noqa: E501 Copyright (c) Microsoft Corporation Licensed under the MIT License

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

class mmseg.models.backbones.MSCAN(in_channels=3, embed_dims=[64, 128, 256, 512], mlp_ratios=[4, 4, 4, 4], drop_rate=0.0, drop_path_rate=0.0, depths=[3, 4, 6, 3], num_stages=4, attention_kernel_sizes=[5, [1, 7], [1, 11], [1, 21]], attention_kernel_paddings=[2, [0, 3], [0, 5], [0, 10]], act_cfg={'type': 'GELU'}, norm_cfg={'requires_grad': True, 'type': 'SyncBN'}, pretrained=None, init_cfg=None)[source]

SegNeXt Multi-Scale Convolutional Attention Network (MCSAN) backbone.

This backbone is the implementation of SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. Inspiration from https://github.com/visual-attention-network/segnext.

Parameters
  • in_channels (int) – The number of input channels. Defaults: 3.

  • embed_dims (list[int]) – Embedding dimension. Defaults: [64, 128, 256, 512].

  • mlp_ratios (list[int]) – Ratio of mlp hidden dim to embedding dim. Defaults: [4, 4, 4, 4].

  • drop_rate (float) – Dropout rate. Defaults: 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults: 0.

  • depths (list[int]) – Depths of each Swin Transformer stage. Default: [3, 4, 6, 3].

  • num_stages (int) – MSCAN stages. Default: 4.

  • attention_kernel_sizes (list) – Size of attention kernel in Attention Module (Figure 2(b) of original paper). Defaults: [5, [1, 7], [1, 11], [1, 21]].

  • attention_kernel_paddings (list) – Size of attention paddings in Attention Module (Figure 2(b) of original paper). Defaults: [2, [0, 3], [0, 5], [0, 10]].

  • norm_cfg (dict) – Config of norm layers. Defaults: dict(type=’SyncBN’, requires_grad=True).

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(x)[source]

Forward function.

init_weights()[source]

Initialize modules of MSCAN.

class mmseg.models.backbones.MixVisionTransformer(in_channels=3, embed_dims=64, num_stages=4, num_layers=[3, 4, 6, 3], num_heads=[1, 2, 4, 8], patch_sizes=[7, 3, 3, 3], strides=[4, 2, 2, 2], sr_ratios=[8, 4, 2, 1], out_indices=(0, 1, 2, 3), mlp_ratio=4, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, act_cfg={'type': 'GELU'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, pretrained=None, init_cfg=None, with_cp=False)[source]

The backbone of Segformer.

This backbone is the implementation of SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. :param in_channels: Number of input channels. Default: 3. :type in_channels: int :param embed_dims: Embedding dimension. Default: 768. :type embed_dims: int :param num_stags: The num of stages. Default: 4. :type num_stags: int :param num_layers: The layer number of each transformer encode

layer. Default: [3, 4, 6, 3].

Parameters
  • num_heads (Sequence[int]) – The attention heads of each transformer encode layer. Default: [1, 2, 4, 8].

  • patch_sizes (Sequence[int]) – The patch_size of each overlapped patch embedding. Default: [7, 3, 3, 3].

  • strides (Sequence[int]) – The stride of each overlapped patch embedding. Default: [4, 2, 2, 2].

  • sr_ratios (Sequence[int]) – The spatial reduction rate of each transformer encode layer. Default: [8, 4, 2, 1].

  • out_indices (Sequence[int] | int) – Output from which stages. Default: (0, 1, 2, 3).

  • mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.

  • qkv_bias (bool) – Enable bias for qkv if True. Default: True.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.0

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – stochastic depth rate. Default 0.0

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

class mmseg.models.backbones.MobileNetV2(widen_factor=1.0, strides=(1, 2, 2, 2, 1, 2, 1), dilations=(1, 1, 1, 1, 1, 1, 1), out_indices=(1, 2, 4, 6), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]

MobileNetV2 backbone.

This backbone is the implementation of MobileNetV2: Inverted Residuals and Linear Bottlenecks.

Parameters
  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.

  • strides (Sequence[int], optional) – Strides of the first block of each layer. If not specified, default config in arch_setting will be used.

  • dilations (Sequence[int]) – Dilation of each layer.

  • out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

make_layer(out_channels, num_blocks, stride, dilation, expand_ratio)[source]

Stack InvertedResidual blocks to build a layer for MobileNetV2.

Parameters
  • out_channels (int) – out_channels of block.

  • num_blocks (int) – Number of blocks.

  • stride (int) – Stride of the first block.

  • dilation (int) – Dilation of the first block.

  • expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio.

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

class mmseg.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'type': 'BN'}, out_indices=(0, 1, 12), frozen_stages=- 1, reduction_factor=1, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]

MobileNetV3 backbone.

This backbone is the improved implementation of Searching for MobileNetV3.

Parameters
  • arch (str) – Architecture of mobilnetv3, from {‘small’, ‘large’}. Default: ‘small’.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • out_indices (tuple[int]) – Output from which layer. Default: (0, 1, 12).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

class mmseg.models.backbones.PCPVT(in_channels=3, embed_dims=[64, 128, 256, 512], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], out_indices=(0, 1, 2, 3), qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], norm_after_stage=False, pretrained=None, init_cfg=None)[source]

The backbone of Twins-PCPVT.

This backbone is the implementation of Twins: Revisiting the Design of Spatial Attention in Vision Transformers.

Parameters
  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (list) – Embedding dimension. Default: [64, 128, 256, 512].

  • patch_sizes (list) – The patch sizes. Default: [4, 2, 2, 2].

  • strides (list) – The strides. Default: [4, 2, 2, 2].

  • num_heads (int) – Number of attention heads. Default: [1, 2, 4, 8].

  • mlp_ratios (int) – Ratio of mlp hidden dim to embedding dim. Default: [4, 4, 4, 4].

  • out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • qkv_bias (bool) – Enable bias for qkv if True. Default: False.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – Stochastic depth rate. Default 0.0

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • depths (list) – Depths of each stage. Default [3, 4, 6, 3]

  • sr_ratios (list) – Kernel_size of conv in each Attn module in Transformer encoder layer. Default: [8, 4, 2, 1].

  • norm_after_stage(bool) – Add extra norm. Default False.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

class mmseg.models.backbones.PIDNet(in_channels: int = 3, channels: int = 64, ppm_channels: int = 96, num_stem_blocks: int = 2, num_branch_blocks: int = 3, align_corners: bool = False, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, **kwargs)[source]

PIDNet backbone.

This backbone is the implementation of PIDNet: A Real-time Semantic Segmentation Network Inspired from PID Controller. Modified from https://github.com/XuJiacong/PIDNet.

Licensed under the MIT License.

Parameters
  • in_channels (int) – The number of input channels. Default: 3.

  • channels (int) – The number of channels in the stem layer. Default: 64.

  • ppm_channels (int) – The number of channels in the PPM layer. Default: 96.

  • num_stem_blocks (int) – The number of blocks in the stem layer. Default: 2.

  • num_branch_blocks (int) – The number of blocks in the branch layer. Default: 3.

  • align_corners (bool) – The align_corners argument of F.interpolate. Default: False.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’, inplace=True).

  • init_cfg (dict) – Config dict for initialization. Default: None.

forward(x: torch.Tensor)Union[torch.Tensor, Tuple[torch.Tensor]][source]

Forward function.

Parameters

x (Tensor) – Input tensor with shape (B, C, H, W).

Returns

If self.training is True, return

tuple[Tensor], else return Tensor.

Return type

Tensor or tuple[Tensor]

init_weights()[source]

Initialize the weights in backbone.

Since the D branch is not initialized by the pre-trained model, we initialize it with the same method as the ResNet.

class mmseg.models.backbones.ResNeSt(groups=1, base_width=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[source]

ResNeSt backbone.

This backbone is the implementation of ResNeSt: Split-Attention Networks.

Parameters
  • groups (int) – Number of groups of Bottleneck. Default: 1

  • base_width (int) – Base width of Bottleneck. Default: 4

  • radix (int) – Radix of SpltAtConv2d. Default: 2

  • reduction_factor (int) – Reduction factor of inter_channels in SplitAttentionConv2d. Default: 4.

  • avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.

  • kwargs (dict) – Keyword arguments for ResNet.

make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer.

class mmseg.models.backbones.ResNeXt(groups=1, base_width=4, **kwargs)[source]

ResNeXt backbone.

This backbone is the implementation of Aggregated Residual Transformations for Deep Neural Networks.

Parameters
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Normally 3.

  • num_stages (int) – Resnet stages, normally 4.

  • groups (int) – Group of resnext.

  • base_width (int) – Base width of resnext.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.

Example

>>> from mmseg.models import ResNeXt
>>> import torch
>>> self = ResNeXt(depth=50)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 8, 8)
(1, 512, 4, 4)
(1, 1024, 2, 2)
(1, 2048, 1, 1)
make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer

class mmseg.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, multi_grid=None, contract_dilation=False, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[source]

ResNet backbone.

This backbone is the improved implementation of Deep Residual Learning for Image Recognition.

Parameters
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Number of stem channels. Default: 64.

  • base_channels (int) – Number of base channels of res layer. Default: 64.

  • num_stages (int) – Resnet stages, normally 4. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: ‘pytorch’.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – Dictionary to construct and config conv layer. When conv_cfg is None, cfg will be set to dict(type=’Conv2d’). Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • dcn (dict | None) – Dictionary to construct and config DCN conv layer. When dcn is not None, conv_cfg must be None. Default: None.

  • stage_with_dcn (Sequence[bool]) – Whether to set DCN conv for each stage. The length of stage_with_dcn is equal to num_stages. Default: (False, False, False, False).

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • position (str, required): Position inside block to insert plugin,

    options: ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.

    • stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’. Default: None.

  • multi_grid (Sequence[int]|None) – Multi grid dilation rates of last stage. Default: None.

  • contract_dilation (bool) – Whether contract first dilation of each layer Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

Example

>>> from mmseg.models import ResNet
>>> import torch
>>> self = ResNet(depth=18)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)
forward(x)[source]

Forward function.

make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer.

make_stage_plugins(plugins, stage_idx)[source]

make plugins for ResNet ‘stage_idx’th stage .

Currently we support to insert ‘context_block’, ‘empirical_attention_block’, ‘nonlocal_block’ into the backbone like ResNet/ResNeXt. They could be inserted after conv1/conv2/conv3 of Bottleneck.

An example of plugins format could be : >>> plugins=[ … dict(cfg=dict(type=’xxx’, arg1=’xxx’), … stages=(False, True, True, True), … position=’after_conv2’), … dict(cfg=dict(type=’yyy’), … stages=(True, True, True, True), … position=’after_conv3’), … dict(cfg=dict(type=’zzz’, postfix=’1’), … stages=(True, True, True, True), … position=’after_conv3’), … dict(cfg=dict(type=’zzz’, postfix=’2’), … stages=(True, True, True, True), … position=’after_conv3’) … ] >>> self = ResNet(depth=18) >>> stage_plugins = self.make_stage_plugins(plugins, 0) >>> assert len(stage_plugins) == 3

Suppose ‘stage_idx=0’, the structure of blocks in the stage would be:

conv1-> conv2->conv3->yyy->zzz1->zzz2

Suppose ‘stage_idx=1’, the structure of blocks in the stage would be:

conv1-> conv2->xxx->conv3->yyy->zzz1->zzz2

If stages is missing, the plugin would be applied to all stages.

Parameters
  • plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.

  • stage_idx (int) – Index of stage to build

Returns

Plugins for current stage

Return type

list[dict]

property norm1

the normalization layer named “norm1”

Type

nn.Module

train(mode=True)[source]

Convert the model into training mode while keep normalization layer freezed.

class mmseg.models.backbones.ResNetV1c(**kwargs)[source]

ResNetV1c variant described in [1]_.

Compared with default ResNet(ResNetV1b), ResNetV1c replaces the 7x7 conv in the input stem with three 3x3 convs. For more details please refer to Bag of Tricks for Image Classification with Convolutional Neural Networks.

class mmseg.models.backbones.ResNetV1d(**kwargs)[source]

ResNetV1d variant described in [1]_.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmseg.models.backbones.STDCContextPathNet(backbone_cfg, last_in_channels=(1024, 512), out_channels=128, ffm_cfg={'in_channels': 512, 'out_channels': 256, 'scale_factor': 4}, upsample_mode='nearest', align_corners=None, norm_cfg={'type': 'BN'}, init_cfg=None)[source]

STDCNet with Context Path. The outs below is a list of three feature maps from deep to shallow, whose height and width is from small to big, respectively. The biggest feature map of outs is outputted for STDCHead, where Detail Loss would be calculated by Detail Ground-truth. The other two feature maps are used for Attention Refinement Module, respectively. Besides, the biggest feature map of outs and the last output of Attention Refinement Module are concatenated for Feature Fusion Module. Then, this fusion feature map feat_fuse would be outputted for decode_head. More details please refer to Figure 4 of original paper.

Parameters
  • backbone_cfg (dict) – Config dict for stdc backbone.

  • last_in_channels (tuple(int)) – two feature maps from stdc backbone. Default: (1024, 512).

  • out_channels (int) – The channels of output feature maps. Default: 128.

  • ffm_cfg (dict) – Config dict for Feature Fusion Module. Default: dict(in_channels=512, out_channels=256, scale_factor=4).

  • upsample_mode (str) – Algorithm used for upsampling: 'nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear'. Default: 'nearest'.

  • align_corners (str) – align_corners argument of F.interpolate. It must be None if upsample_mode is 'nearest'. Default: None.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

Returns

The tuple of list of output feature map for

auxiliary heads and decoder head.

Return type

outputs (tuple)

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.STDCNet(stdc_type, in_channels, channels, bottleneck_type, norm_cfg, act_cfg, num_convs=4, with_final_conv=False, pretrained=None, init_cfg=None)[source]

This backbone is the implementation of Rethinking BiSeNet For Real-time Semantic Segmentation.

Parameters
  • stdc_type (int) – The type of backbone structure, STDCNet1 and`STDCNet2` denotes two main backbones in paper, whose FLOPs is 813M and 1446M, respectively.

  • in_channels (int) – The num of input_channels.

  • channels (tuple[int]) – The output channels for each stage.

  • bottleneck_type (str) – The type of STDC Module type, the value must be ‘add’ or ‘cat’.

  • norm_cfg (dict) – Config dict for normalization layer.

  • act_cfg (dict) – The activation config for conv layers.

  • num_convs (int) – Numbers of conv layer at each STDC Module. Default: 4.

  • with_final_conv (bool) – Whether add a conv layer at the Module output. Default: True.

  • pretrained (str, optional) – Model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

Example

>>> import torch
>>> stdc_type = 'STDCNet1'
>>> in_channels = 3
>>> channels = (32, 64, 256, 512, 1024)
>>> bottleneck_type = 'cat'
>>> inputs = torch.rand(1, 3, 1024, 2048)
>>> self = STDCNet(stdc_type, in_channels,
...                 channels, bottleneck_type).eval()
>>> outputs = self.forward(inputs)
>>> for i in range(len(outputs)):
...     print(f'outputs[{i}].shape = {outputs[i].shape}')
outputs[0].shape = torch.Size([1, 256, 128, 256])
outputs[1].shape = torch.Size([1, 512, 64, 128])
outputs[2].shape = torch.Size([1, 1024, 32, 64])
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.SVT(in_channels=3, embed_dims=[64, 128, 256], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], num_heads=[1, 2, 4], mlp_ratios=[4, 4, 4], out_indices=(0, 1, 2, 3), qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.2, norm_cfg={'type': 'LN'}, depths=[4, 4, 4], sr_ratios=[4, 2, 1], windiow_sizes=[7, 7, 7], norm_after_stage=True, pretrained=None, init_cfg=None)[source]

The backbone of Twins-SVT.

This backbone is the implementation of Twins: Revisiting the Design of Spatial Attention in Vision Transformers.

Parameters
  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (list) – Embedding dimension. Default: [64, 128, 256, 512].

  • patch_sizes (list) – The patch sizes. Default: [4, 2, 2, 2].

  • strides (list) – The strides. Default: [4, 2, 2, 2].

  • num_heads (int) – Number of attention heads. Default: [1, 2, 4].

  • mlp_ratios (int) – Ratio of mlp hidden dim to embedding dim. Default: [4, 4, 4].

  • out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • qkv_bias (bool) – Enable bias for qkv if True. Default: False.

  • drop_rate (float) – Dropout rate. Default 0.

  • attn_drop_rate (float) – Dropout ratio of attention weight. Default 0.0

  • drop_path_rate (float) – Stochastic depth rate. Default 0.2.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • depths (list) – Depths of each stage. Default [4, 4, 4].

  • sr_ratios (list) – Kernel_size of conv in each Attn module in Transformer encoder layer. Default: [4, 2, 1].

  • windiow_sizes (list) – Window size of LSA. Default: [7, 7, 7],

  • input_features_slice(bool) – Input features need slice. Default: False.

  • norm_after_stage(bool) – Add extra norm. Default False.

  • strides – Strides in patch-Embedding modules. Default: (2, 2, 2)

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

class mmseg.models.backbones.SwinTransformer(pretrain_img_size=224, in_channels=3, embed_dims=96, patch_size=4, window_size=7, mlp_ratio=4, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), strides=(4, 2, 2, 2), out_indices=(0, 1, 2, 3), qkv_bias=True, qk_scale=None, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=False, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, with_cp=False, pretrained=None, frozen_stages=- 1, init_cfg=None)[source]

Swin Transformer backbone.

This backbone is the implementation of Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Inspiration from https://github.com/microsoft/Swin-Transformer.

Parameters
  • pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.

  • in_channels (int) – The num of input channels. Defaults: 3.

  • embed_dims (int) – The feature dimension. Default: 96.

  • patch_size (int | tuple[int]) – Patch size. Default: 4.

  • window_size (int) – Window size. Default: 7.

  • mlp_ratio (int | float) – Ratio of mlp hidden dim to embedding dim. Default: 4.

  • depths (tuple[int]) – Depths of each Swin Transformer stage. Default: (2, 2, 6, 2).

  • num_heads (tuple[int]) – Parallel attention heads of each Swin Transformer stage. Default: (3, 6, 12, 24).

  • strides (tuple[int]) – The patch merging or patch embedding stride of each Swin Transformer stage. (In swin, we set kernel size equal to stride.) Default: (4, 2, 2, 2).

  • out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True

  • qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.

  • patch_norm (bool) – If add a norm layer for patch embed and patch merging. Default: True.

  • drop_rate (float) – Dropout rate. Defaults: 0.

  • attn_drop_rate (float) – Attention dropout rate. Default: 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults: 0.1.

  • use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: False.

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LN’).

  • norm_cfg (dict) – Config dict for normalization layer at output of backone. Defaults: dict(type=’LN’).

  • with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

train(mode=True)[source]

Convert the model into training mode while keep layers freezed.

class mmseg.models.backbones.TIMMBackbone(model_name, features_only=True, pretrained=True, checkpoint_path='', in_channels=3, init_cfg=None, **kwargs)[source]

Wrapper to use backbones from timm library. More details can be found in timm .

Parameters
  • model_name (str) – Name of timm model to instantiate.

  • pretrained (bool) – Load pretrained weights if True.

  • checkpoint_path (str) – Path of checkpoint to load after model is initialized.

  • in_channels (int) – Number of input image channels. Default: 3.

  • init_cfg (dict, optional) – Initialization config dict

  • **kwargs – Other timm & model specific arguments.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.UNet(in_channels=3, base_channels=64, num_stages=5, strides=(1, 1, 1, 1, 1), enc_num_convs=(2, 2, 2, 2, 2), dec_num_convs=(2, 2, 2, 2), downsamples=(True, True, True, True), enc_dilations=(1, 1, 1, 1, 1), dec_dilations=(1, 1, 1, 1), with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, upsample_cfg={'type': 'InterpConv'}, norm_eval=False, dcn=None, plugins=None, pretrained=None, init_cfg=None)[source]

UNet backbone.

This backbone is the implementation of U-Net: Convolutional Networks for Biomedical Image Segmentation.

Parameters
  • in_channels (int) – Number of input image channels. Default” 3.

  • base_channels (int) – Number of base channels of each stage. The output channels of the first stage. Default: 64.

  • num_stages (int) – Number of stages in encoder, normally 5. Default: 5.

  • strides (Sequence[int 1 | 2]) – Strides of each stage in encoder. len(strides) is equal to num_stages. Normally the stride of the first stage in encoder is 1. If strides[i]=2, it uses stride convolution to downsample in the correspondence encoder stage. Default: (1, 1, 1, 1, 1).

  • enc_num_convs (Sequence[int]) – Number of convolutional layers in the convolution block of the correspondence encoder stage. Default: (2, 2, 2, 2, 2).

  • dec_num_convs (Sequence[int]) – Number of convolutional layers in the convolution block of the correspondence decoder stage. Default: (2, 2, 2, 2).

  • downsamples (Sequence[int]) – Whether use MaxPool to downsample the feature map after the first stage of encoder (stages: [1, num_stages)). If the correspondence encoder stage use stride convolution (strides[i]=2), it will never use MaxPool to downsample, even downsamples[i-1]=True. Default: (True, True, True, True).

  • enc_dilations (Sequence[int]) – Dilation rate of each stage in encoder. Default: (1, 1, 1, 1, 1).

  • dec_dilations (Sequence[int]) – Dilation rate of each stage in decoder. Default: (1, 1, 1, 1).

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • conv_cfg (dict | None) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict | None) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict | None) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’).

  • upsample_cfg (dict) – The upsample config of the upsample module in decoder. Default: dict(type=’InterpConv’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • dcn (bool) – Use deformable convolution in convolutional layer or not. Default: None.

  • plugins (dict) – plugins for convolutional layers. Default: None.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

Notice:

The input image size should be divisible by the whole downsample rate of the encoder. More detail of the whole downsample rate can be found in UNet._check_input_divisible.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[source]

Convert the model into training mode while keep normalization layer freezed.

class mmseg.models.backbones.VisionTransformer(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, with_cls_token=True, output_cls_token=False, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, interpolate_mode='bicubic', num_fcs=2, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]

Vision Transformer.

This backbone is the implementation of An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Parameters
  • img_size (int | tuple) – Input image size. Default: 224.

  • patch_size (int) – The patch size. Default: 16.

  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (int) – embedding dimension. Default: 768.

  • num_layers (int) – depth of transformer. Default: 12.

  • num_heads (int) – number of attention heads. Default: 12.

  • mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.

  • out_indices (list | tuple | int) – Output from which stages. Default: -1.

  • qkv_bias (bool) – enable bias for qkv if True. Default: True.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.0

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – stochastic depth rate. Default 0.0

  • with_cls_token (bool) – Whether concatenating class token into image tokens as transformer input. Default: True.

  • output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Default: False.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.

  • interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Default: bicubic.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

static resize_pos_embed(pos_embed, input_shpae, pos_shape, mode)[source]

Resize pos_embed weights.

Resize pos_embed using bicubic interpolate method. :param pos_embed: Position embedding weights. :type pos_embed: torch.Tensor :param input_shpae: Tuple for (downsampled input image height,

downsampled input image width).

Parameters
  • pos_shape (tuple) – The resolution of downsampled origin training image.

  • mode (str) – Algorithm used for upsampling: 'nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear'. Default: 'nearest'

Returns

The resized pos_embed of shape [B, L_new, C]

Return type

torch.Tensor

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

decode_heads

class mmseg.models.decode_heads.ANNHead(project_channels, query_scales=(1), key_pool_scales=(1, 3, 6, 8), **kwargs)[source]

Asymmetric Non-local Neural Networks for Semantic Segmentation.

This head is the implementation of ANNNet.

Parameters
  • project_channels (int) – Projection channels for Nonlocal.

  • query_scales (tuple[int]) – The scales of query feature map. Default: (1,)

  • key_pool_scales (tuple[int]) – The pooling scales of key feature map. Default: (1, 3, 6, 8).

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.APCHead(pool_scales=(1, 2, 3, 6), fusion=True, **kwargs)[source]

Adaptive Pyramid Context Network for Semantic Segmentation.

This head is the implementation of APCNet.

Parameters
  • pool_scales (tuple[int]) – Pooling scales used in Adaptive Context Module. Default: (1, 2, 3, 6).

  • fusion (bool) – Add one conv to fuse residual feature.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.ASPPHead(dilations=(1, 6, 12, 18), **kwargs)[source]

Rethinking Atrous Convolution for Semantic Image Segmentation.

This head is the implementation of DeepLabV3.

Parameters

dilations (tuple[int]) – Dilation rates for ASPP module. Default: (1, 6, 12, 18).

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.CCHead(recurrence=2, **kwargs)[source]

CCNet: Criss-Cross Attention for Semantic Segmentation.

This head is the implementation of CCNet.

Parameters

recurrence (int) – Number of recurrence of Criss Cross Attention module. Default: 2.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.DAHead(pam_channels, **kwargs)[source]

Dual Attention Network for Scene Segmentation.

This head is the implementation of DANet.

Parameters

pam_channels (int) – The channels of Position Attention Module(PAM).

cam_cls_seg(feat)[source]

CAM feature classification.

forward(inputs)[source]

Forward function.

loss_by_feat(seg_logit: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], **kwargs)dict[source]

Compute pam_cam, pam, cam loss.

pam_cls_seg(feat)[source]

PAM feature classification.

predict(inputs, batch_img_metas: List[dict], test_cfg, **kwargs)List[torch.Tensor][source]

Forward function for testing, only pam_cam is used.

class mmseg.models.decode_heads.DMHead(filter_sizes=(1, 3, 5, 7), fusion=False, **kwargs)[source]

Dynamic Multi-scale Filters for Semantic Segmentation.

This head is the implementation of DMNet.

Parameters
  • filter_sizes (tuple[int]) – The size of generated convolutional filters used in Dynamic Convolutional Module. Default: (1, 3, 5, 7).

  • fusion (bool) – Add one conv to fuse DCM output feature.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.DNLHead(reduction=2, use_scale=True, mode='embedded_gaussian', temperature=0.05, **kwargs)[source]

Disentangled Non-Local Neural Networks.

This head is the implementation of DNLNet.

Parameters
  • reduction (int) – Reduction factor of projection transform. Default: 2.

  • use_scale (bool) – Whether to scale pairwise_weight by sqrt(1/inter_channels). Default: False.

  • mode (str) – The nonlocal mode. Options are ‘embedded_gaussian’, ‘dot_product’. Default: ‘embedded_gaussian.’.

  • temperature (float) – Temperature to adjust attention. Default: 0.05

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.DPTHead(embed_dims=768, post_process_channels=[96, 192, 384, 768], readout_type='ignore', patch_size=16, expand_channels=False, act_cfg={'type': 'ReLU'}, norm_cfg={'type': 'BN'}, **kwargs)[source]

Vision Transformers for Dense Prediction.

This head is implemented of DPT.

Parameters
  • embed_dims (int) – The embed dimension of the ViT backbone. Default: 768.

  • post_process_channels (List) – Out channels of post process conv layers. Default: [96, 192, 384, 768].

  • readout_type (str) – Type of readout operation. Default: ‘ignore’.

  • patch_size (int) – The patch size. Default: 16.

  • expand_channels (bool) – Whether expand the channels in post process block. Default: False.

  • act_cfg (dict) – The activation config for residual conv unit. Default dict(type=’ReLU’).

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

forward(inputs)[source]

Placeholder of forward function.

class mmseg.models.decode_heads.DepthwiseSeparableASPPHead(c1_in_channels, c1_channels, **kwargs)[source]

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.

This head is the implementation of DeepLabV3+.

Parameters
  • c1_in_channels (int) – The input channels of c1 decoder. If is 0, the no decoder will be used.

  • c1_channels (int) – The intermediate channels of c1 decoder.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.DepthwiseSeparableFCNHead(dw_act_cfg=None, **kwargs)[source]

Depthwise-Separable Fully Convolutional Network for Semantic Segmentation.

This head is implemented according to Fast-SCNN: Fast Semantic Segmentation Network.

Parameters
  • in_channels (int) – Number of output channels of FFM.

  • channels (int) – Number of middle-stage channels in the decode head.

  • concat_input (bool) – Whether to concatenate original decode input into the result of several consecutive convolution layers. Default: True.

  • num_classes (int) – Used to determine the dimension of final prediction tensor.

  • in_index (int) – Correspond with ‘out_indices’ in FastSCNN backbone.

  • norm_cfg (dict | None) – Config of norm layers.

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • loss_decode (dict) – Config of loss type and some relevant additional options.

  • dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: None.

class mmseg.models.decode_heads.EMAHead(ema_channels, num_bases, num_stages, concat_input=True, momentum=0.1, **kwargs)[source]

Expectation Maximization Attention Networks for Semantic Segmentation.

This head is the implementation of EMANet.

Parameters
  • ema_channels (int) – EMA module channels

  • num_bases (int) – Number of bases.

  • num_stages (int) – Number of the EM iterations.

  • concat_input (bool) – Whether concat the input and output of convs before classification layer. Default: True

  • momentum (float) – Momentum to update the base. Default: 0.1.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.EncHead(num_codes=32, use_se_loss=True, add_lateral=False, loss_se_decode={'loss_weight': 0.2, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, **kwargs)[source]

Context Encoding for Semantic Segmentation.

This head is the implementation of EncNet.

Parameters
  • num_codes (int) – Number of code words. Default: 32.

  • use_se_loss (bool) – Whether use Semantic Encoding Loss (SE-loss) to regularize the training. Default: True.

  • add_lateral (bool) – Whether use lateral connection to fuse features. Default: False.

  • loss_se_decode (dict) – Config of decode loss. Default: dict(type=’CrossEntropyLoss’, use_sigmoid=True).

forward(inputs)[source]

Forward function.

loss_by_feat(seg_logit: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], **kwargs)dict[source]

Compute segmentation and semantic encoding loss.

predict(inputs: Tuple[torch.Tensor], batch_img_metas: List[dict], test_cfg: Union[mmengine.config.config.ConfigDict, dict])[source]

Forward function for testing, ignore se_loss.

class mmseg.models.decode_heads.FCNHead(num_convs=2, kernel_size=3, concat_input=True, dilation=1, **kwargs)[source]

Fully Convolution Networks for Semantic Segmentation.

This head is implemented of FCNNet.

Parameters
  • num_convs (int) – Number of convs in the head. Default: 2.

  • kernel_size (int) – The kernel size for convs in the head. Default: 3.

  • concat_input (bool) – Whether concat the input and output of convs before classification layer.

  • dilation (int) – The dilation rate for convs in the head. Default: 1.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.FPNHead(feature_strides, **kwargs)[source]

Panoptic Feature Pyramid Networks.

This head is the implementation of Semantic FPN.

Parameters

feature_strides (tuple[int]) – The strides for input feature maps. stack_lateral. All strides suppose to be power of 2. The first one is of largest resolution.

forward(inputs)[source]

Placeholder of forward function.

class mmseg.models.decode_heads.GCHead(ratio=0.25, pooling_type='att', fusion_types=('channel_add'), **kwargs)[source]

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond.

This head is the implementation of GCNet.

Parameters
  • ratio (float) – Multiplier of channels ratio. Default: 1/4.

  • pooling_type (str) – The pooling type of context aggregation. Options are ‘att’, ‘avg’. Default: ‘avg’.

  • fusion_types (tuple[str]) – The fusion type for feature fusion. Options are ‘channel_add’, ‘channel_mul’. Default: (‘channel_add’,)

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.ISAHead(isa_channels, down_factor=(8, 8), **kwargs)[source]

Interlaced Sparse Self-Attention for Semantic Segmentation.

This head is the implementation of ISA.

Parameters
  • isa_channels (int) – The channels of ISA Module.

  • down_factor (tuple[int]) – The local group size of ISA.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.IterativeDecodeHead(num_stages, kernel_generate_head, kernel_update_head, **kwargs)[source]

K-Net: Towards Unified Image Segmentation.

This head is the implementation of `K-Net: <https://arxiv.org/abs/2106.14855>`_.

Parameters
  • num_stages (int) – The number of stages (kernel update heads) in IterativeDecodeHead. Default: 3.

  • kernel_generate_head – (dict): Config of kernel generate head which generate mask predictions, dynamic kernels and class predictions for next kernel update heads.

  • kernel_update_head (dict) – Config of kernel update head which refine dynamic kernels and class predictions iteratively.

forward(inputs)[source]

Forward function.

loss_by_feat(seg_logits: List[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], **kwargs)dict[source]

Compute segmentation loss.

Parameters
  • seg_logits (Tensor) – The output from decode head forward function.

  • batch_data_samples (List[SegDataSample]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.

Returns

a dictionary of loss components

Return type

dict[str, Tensor]

class mmseg.models.decode_heads.KernelUpdateHead(num_classes=150, num_ffn_fcs=2, num_heads=8, num_mask_fcs=3, feedforward_channels=2048, in_channels=256, out_channels=256, dropout=0.0, act_cfg={'inplace': True, 'type': 'ReLU'}, ffn_act_cfg={'inplace': True, 'type': 'ReLU'}, conv_kernel_size=1, feat_transform_cfg=None, kernel_init=False, with_ffn=True, feat_gather_stride=1, mask_transform_stride=1, kernel_updator_cfg={'act_cfg': {'inplace': True, 'type': 'ReLU'}, 'feat_channels': 64, 'in_channels': 256, 'norm_cfg': {'type': 'LN'}, 'out_channels': 256, 'type': 'DynamicConv'})[source]

Kernel Update Head in K-Net.

Parameters
  • num_classes (int) – Number of classes. Default: 150.

  • num_ffn_fcs (int) – The number of fully-connected layers in FFNs. Default: 2.

  • num_heads (int) – The number of parallel attention heads. Default: 8.

  • num_mask_fcs (int) – The number of fully connected layers for mask prediction. Default: 3.

  • feedforward_channels (int) – The hidden dimension of FFNs. Defaults: 2048.

  • in_channels (int) – The number of channels of input feature map. Default: 256.

  • out_channels (int) – The number of output channels. Default: 256.

  • dropout (float) – The Probability of an element to be zeroed in MultiheadAttention and FFN. Default 0.0.

  • act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).

  • ffn_act_cfg (dict) – Config of activation layers in FFN. Default: dict(type=’ReLU’).

  • conv_kernel_size (int) – The kernel size of convolution in Kernel Update Head for dynamic kernel updation. Default: 1.

  • feat_transform_cfg (dict | None) – Config of feature transform. Default: None.

  • kernel_init (bool) – Whether initiate mask kernel in mask head. Default: False.

  • with_ffn (bool) – Whether add FFN in kernel update head. Default: True.

  • feat_gather_stride (int) – Stride of convolution in feature transform. Default: 1.

  • mask_transform_stride (int) – Stride of mask transform. Default: 1.

  • kernel_updator_cfg (dict) –

    Config of kernel updator. Default: dict(

    type=’DynamicConv’, in_channels=256, feat_channels=64, out_channels=256, act_cfg=dict(type=’ReLU’, inplace=True), norm_cfg=dict(type=’LN’)).

forward(x, proposal_feat, mask_preds, mask_shape=None)[source]

Forward function of Dynamic Instance Interactive Head.

Parameters
  • x (Tensor) – Feature map from FPN with shape (batch_size, feature_dimensions, H , W).

  • proposal_feat (Tensor) – Intermediate feature get from diihead in last stage, has shape (batch_size, num_proposals, feature_dimensions)

  • mask_preds (Tensor) – mask prediction from the former stage in shape (batch_size, num_proposals, H, W).

Returns

The first tensor is predicted mask with shape (N, num_classes, H, W), the second tensor is dynamic kernel with shape (N, num_classes, channels, K, K).

Return type

Tuple

init_weights()[source]

Use xavier initialization for all weight parameter and set classification head bias as a specific value when use focal loss.

class mmseg.models.decode_heads.KernelUpdator(in_channels=256, feat_channels=64, out_channels=None, gate_sigmoid=True, gate_norm_act=False, activate_out=False, norm_cfg={'type': 'LN'}, act_cfg={'inplace': True, 'type': 'ReLU'})[source]

Dynamic Kernel Updator in Kernel Update Head.

Parameters
  • in_channels (int) – The number of channels of input feature map. Default: 256.

  • feat_channels (int) – The number of middle-stage channels in the kernel updator. Default: 64.

  • out_channels (int) – The number of output channels.

  • gate_sigmoid (bool) – Whether use sigmoid function in gate mechanism. Default: True.

  • gate_norm_act (bool) – Whether add normalization and activation layer in gate mechanism. Default: False.

  • activate_out – Whether add activation after gate mechanism. Default: False.

  • norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’LN’).

  • act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).

forward(update_feature, input_feature)[source]

Forward function of KernelUpdator.

Parameters
  • update_feature (torch.Tensor) – Feature map assembled from each group. It would be reshaped with last dimension shape: self.in_channels.

  • input_feature (torch.Tensor) – Intermediate feature with shape: (N, num_classes, conv_kernel_size**2, channels).

Returns

The output tensor of shape (N*C1/C2, K*K, C2), where N is the number of classes, C1 and C2 are the feature map channels of KernelUpdateHead and KernelUpdator, respectively.

Return type

Tensor

class mmseg.models.decode_heads.LRASPPHead(branch_channels=(32, 64), **kwargs)[source]

Lite R-ASPP (LRASPP) head is proposed in Searching for MobileNetV3.

This head is the improved implementation of Searching for MobileNetV3.

Parameters

branch_channels (tuple[int]) – The number of output channels in every each branch. Default: (32, 64).

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.LightHamHead(ham_channels=512, ham_kwargs={}, **kwargs)[source]

SegNeXt decode head.

This decode head is the implementation of SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. Inspiration from https://github.com/visual-attention-network/segnext.

Specifically, LightHamHead is inspired by HamNet from Is Attention Better Than Matrix Decomposition? <https://arxiv.org/abs/2109.04553>.

Parameters
  • ham_channels (int) – input channels for Hamburger. Defaults: 512.

  • ham_kwargs (int) – kwagrs for Ham. Defaults: dict().

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.Mask2FormerHead(num_classes, align_corners=False, ignore_index=255, **kwargs)[source]

Implements the Mask2Former head.

See Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation for details.

Parameters
  • num_classes (int) – Number of classes. Default: 150.

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • ignore_index (int) – The label index to be ignored. Default: 255.

loss(x: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], train_cfg: Union[mmengine.config.config.ConfigDict, dict])dict[source]

Perform forward propagation and loss calculation of the decoder head on the features of the upstream network.

Parameters
  • x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[SegDataSample]) – The Data Samples. It usually includes information such as gt_sem_seg.

  • train_cfg (ConfigType) – Training config.

Returns

a dictionary of loss components.

Return type

dict[str, Tensor]

predict(x: Tuple[torch.Tensor], batch_img_metas: List[dict], test_cfg: Union[mmengine.config.config.ConfigDict, dict])Tuple[torch.Tensor][source]

Test without augmentaton.

Parameters
  • x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.

  • batch_img_metas (List[SegDataSample]) – The Data Samples. It usually includes information such as gt_sem_seg.

  • test_cfg (ConfigType) – Test config.

Returns

A tensor of segmentation mask.

Return type

Tensor

class mmseg.models.decode_heads.MaskFormerHead(num_classes: int = 150, align_corners: bool = False, ignore_index: int = 255, **kwargs)[source]

Implements the MaskFormer head.

See Per-Pixel Classification is Not All You Need for Semantic Segmentation for details.

Parameters
  • num_classes (int) – Number of classes. Default: 150.

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • ignore_index (int) – The label index to be ignored. Default: 255.

loss(x: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], train_cfg: Union[mmengine.config.config.ConfigDict, dict])dict[source]

Perform forward propagation and loss calculation of the decoder head on the features of the upstream network.

Parameters
  • x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[SegDataSample]) – The Data Samples. It usually includes information such as gt_sem_seg.

  • train_cfg (ConfigType) – Training config.

Returns

a dictionary of loss components.

Return type

dict[str, Tensor]

predict(x: Tuple[torch.Tensor], batch_img_metas: List[dict], test_cfg: Union[mmengine.config.config.ConfigDict, dict])Tuple[torch.Tensor][source]

Test without augmentaton.

Parameters
  • x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.

  • batch_img_metas (List[SegDataSample]) – The Data Samples. It usually includes information such as gt_sem_seg.

  • test_cfg (ConfigType) – Test config.

Returns

A tensor of segmentation mask.

Return type

Tensor

class mmseg.models.decode_heads.NLHead(reduction=2, use_scale=True, mode='embedded_gaussian', **kwargs)[source]

Non-local Neural Networks.

This head is the implementation of NLNet.

Parameters
  • reduction (int) – Reduction factor of projection transform. Default: 2.

  • use_scale (bool) – Whether to scale pairwise_weight by sqrt(1/inter_channels). Default: True.

  • mode (str) – The nonlocal mode. Options are ‘embedded_gaussian’, ‘dot_product’. Default: ‘embedded_gaussian.’.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.OCRHead(ocr_channels, scale=1, **kwargs)[source]

Object-Contextual Representations for Semantic Segmentation.

This head is the implementation of OCRNet.

Parameters
  • ocr_channels (int) – The intermediate channels of OCR block.

  • scale (int) – The scale of probability map in SpatialGatherModule in Default: 1.

forward(inputs, prev_output)[source]

Forward function.

class mmseg.models.decode_heads.PIDHead(in_channels: int, channels: int, num_classes: int, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, **kwargs)[source]

Decode head for PIDNet.

Parameters
  • in_channels (int) – Number of input channels.

  • channels (int) – Number of output channels.

  • num_classes (int) – Number of classes.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’, inplace=True).

forward(inputs: Union[torch.Tensor, Tuple[torch.Tensor]])Union[torch.Tensor, Tuple[torch.Tensor]][source]

Forward function. :param inputs: Input tensor or tuple of

Tensor. When training, the input is a tuple of three tensors, (p_feat, i_feat, d_feat), and the output is a tuple of three tensors, (p_seg_logit, i_seg_logit, d_seg_logit). When inference, only the head of integral branch is used, and input is a tensor of integral feature map, and the output is the segmentation logit.

Returns

Output tensor or tuple of tensors.

Return type

Tensor | tuple[Tensor]

init_weights()[source]

Initialize the weights.

loss_by_feat(seg_logits: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])dict[source]

Compute segmentation loss.

Parameters
  • seg_logits (Tensor) – The output from decode head forward function.

  • batch_data_samples (List[SegDataSample]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.

Returns

a dictionary of loss components

Return type

dict[str, Tensor]

class mmseg.models.decode_heads.PSAHead(mask_size, psa_type='bi-direction', compact=False, shrink_factor=2, normalization_factor=1.0, psa_softmax=True, **kwargs)[source]

Point-wise Spatial Attention Network for Scene Parsing.

This head is the implementation of PSANet.

Parameters
  • mask_size (tuple[int]) – The PSA mask size. It usually equals input size.

  • psa_type (str) – The type of psa module. Options are ‘collect’, ‘distribute’, ‘bi-direction’. Default: ‘bi-direction’

  • compact (bool) – Whether use compact map for ‘collect’ mode. Default: True.

  • shrink_factor (int) – The downsample factors of psa mask. Default: 2.

  • normalization_factor (float) – The normalize factor of attention.

  • psa_softmax (bool) – Whether use softmax for attention.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.PSPHead(pool_scales=(1, 2, 3, 6), **kwargs)[source]

Pyramid Scene Parsing Network.

This head is the implementation of PSPNet.

Parameters

pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module. Default: (1, 2, 3, 6).

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.PointHead(num_fcs=3, coarse_pred_each_layer=True, conv_cfg={'type': 'Conv1d'}, norm_cfg=None, act_cfg={'inplace': False, 'type': 'ReLU'}, **kwargs)[source]

A mask point head use in PointRend.

This head is implemented of PointRend: Image Segmentation as Rendering. PointHead use shared multi-layer perceptron (equivalent to nn.Conv1d) to predict the logit of input points. The fine-grained feature and coarse feature will be concatenate together for predication.

Parameters
  • num_fcs (int) – Number of fc layers in the head. Default: 3.

  • in_channels (int) – Number of input channels. Default: 256.

  • fc_channels (int) – Number of fc channels. Default: 256.

  • num_classes (int) – Number of classes for logits. Default: 80.

  • class_agnostic (bool) – Whether use class agnostic classification. If so, the output channels of logits will be 1. Default: False.

  • coarse_pred_each_layer (bool) – Whether concatenate coarse feature with the output of each fc layer. Default: True.

  • conv_cfg (dict|None) – Dictionary to construct and config conv layer. Default: dict(type=’Conv1d’))

  • norm_cfg (dict|None) – Dictionary to construct and config norm layer. Default: None.

  • loss_point (dict) – Dictionary to construct and config loss layer of point head. Default: dict(type=’CrossEntropyLoss’, use_mask=True, loss_weight=1.0).

cls_seg(feat)[source]

Classify each pixel with fc.

forward(fine_grained_point_feats, coarse_point_feats)[source]

Placeholder of forward function.

get_points_test(seg_logits, uncertainty_func, cfg)[source]

Sample points for testing.

Find num_points most uncertain points from uncertainty_map.

Parameters
  • seg_logits (Tensor) – A tensor of shape (batch_size, num_classes, height, width) for class-specific or class-agnostic prediction.

  • uncertainty_func (func) – uncertainty calculation function.

  • cfg (dict) – Testing config of point head.

Returns

A tensor of shape (batch_size, num_points)

that contains indices from [0, height x width) of the most uncertain points.

point_coords (Tensor): A tensor of shape (batch_size, num_points,

2) that contains [0, 1] x [0, 1] normalized coordinates of the most uncertain points from the height x width grid .

Return type

point_indices (Tensor)

get_points_train(seg_logits, uncertainty_func, cfg)[source]

Sample points for training.

Sample points in [0, 1] x [0, 1] coordinate space based on their uncertainty. The uncertainties are calculated for each point using ‘uncertainty_func’ function that takes point’s logit prediction as input.

Parameters
  • seg_logits (Tensor) – Semantic segmentation logits, shape ( batch_size, num_classes, height, width).

  • uncertainty_func (func) – uncertainty calculation function.

  • cfg (dict) – Training config of point head.

Returns

A tensor of shape (batch_size, num_points,

2) that contains the coordinates of num_points sampled points.

Return type

point_coords (Tensor)

loss(inputs, prev_output, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], train_cfg, **kwargs)[source]

Forward function for training. :param inputs: List of multi-level img features. :type inputs: list[Tensor] :param prev_output: The output of previous decode head. :type prev_output: Tensor :param batch_data_samples: The seg

data samples. It usually includes information such as img_metas or gt_semantic_seg.

Parameters

train_cfg (dict) – The training config.

Returns

a dictionary of loss components

Return type

dict[str, Tensor]

loss_by_feat(point_logits, points, batch_data_samples, **kwargs)[source]

Compute segmentation loss.

predict(inputs, prev_output, batch_img_metas: List[dict], test_cfg, **kwargs)[source]

Forward function for testing.

Parameters
  • inputs (list[Tensor]) – List of multi-level img features.

  • prev_output (Tensor) – The output of previous decode head.

  • img_metas (list[dict]) – List of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:Collect.

  • test_cfg (dict) – The testing config.

Returns

Output segmentation map.

Return type

Tensor

class mmseg.models.decode_heads.SETRMLAHead(mla_channels=128, up_scale=4, **kwargs)[source]

Multi level feature aggretation head of SETR.

MLA head of SETR.

Parameters
  • mlahead_channels (int) – Channels of conv-conv-4x of multi-level feature aggregation. Default: 128.

  • up_scale (int) – The scale factor of interpolate. Default:4.

forward(inputs)[source]

Placeholder of forward function.

class mmseg.models.decode_heads.SETRUPHead(norm_layer={'eps': 1e-06, 'requires_grad': True, 'type': 'LN'}, num_convs=1, up_scale=4, kernel_size=3, init_cfg=[{'type': 'Constant', 'val': 1.0, 'bias': 0, 'layer': 'LayerNorm'}, {'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}}], **kwargs)[source]

Naive upsampling head and Progressive upsampling head of SETR.

Naive or PUP head of SETR.

Parameters
  • norm_layer (dict) – Config dict for input normalization. Default: norm_layer=dict(type=’LN’, eps=1e-6, requires_grad=True).

  • num_convs (int) – Number of decoder convolutions. Default: 1.

  • up_scale (int) – The scale factor of interpolate. Default:4.

  • kernel_size (int) – The kernel size of convolution when decoding feature information from backbone. Default: 3.

  • init_cfg (dict | list[dict] | None) –

    Initialization config dict. Default: dict(

    type=’Constant’, val=1.0, bias=0, layer=’LayerNorm’).

forward(x)[source]

Placeholder of forward function.

class mmseg.models.decode_heads.STDCHead(boundary_threshold=0.1, **kwargs)[source]

This head is the implementation of Rethinking BiSeNet For Real-time Semantic Segmentation.

Parameters

boundary_threshold (float) – The threshold of calculating boundary. Default: 0.1.

loss_by_feat(seg_logits: torch.Tensor, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])dict[source]

Compute Detail Aggregation Loss.

class mmseg.models.decode_heads.SegformerHead(interpolate_mode='bilinear', **kwargs)[source]

The all mlp Head of segformer.

This head is the implementation of Segformer <https://arxiv.org/abs/2105.15203> _.

Parameters

interpolate_mode – The interpolate mode of MLP head upsample operation. Default: ‘bilinear’.

forward(inputs)[source]

Placeholder of forward function.

class mmseg.models.decode_heads.SegmenterMaskTransformerHead(in_channels, num_layers, num_heads, embed_dims, mlp_ratio=4, drop_path_rate=0.1, drop_rate=0.0, attn_drop_rate=0.0, num_fcs=2, qkv_bias=True, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, init_std=0.02, **kwargs)[source]

Segmenter: Transformer for Semantic Segmentation.

This head is the implementation of Segmenter:.

Parameters
  • backbone_cfg – (dict): Config of backbone of Context Path.

  • in_channels (int) – The number of channels of input image.

  • num_layers (int) – The depth of transformer.

  • num_heads (int) – The number of attention heads.

  • embed_dims (int) – The number of embedding dimension.

  • mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.

  • drop_path_rate (float) – stochastic depth rate. Default 0.1.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.0

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.

  • qkv_bias (bool) – Enable bias for qkv if True. Default: True.

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • init_std (float) – The value of std in weight initialization. Default: 0.02.

forward(inputs)[source]

Placeholder of forward function.

init_weights()[source]

Initialize the weights.

class mmseg.models.decode_heads.UPerHead(pool_scales=(1, 2, 3, 6), **kwargs)[source]

Unified Perceptual Parsing for Scene Understanding.

This head is the implementation of UPerNet.

Parameters

pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module applied on the last feature. Default: (1, 2, 3, 6).

forward(inputs)[source]

Forward function.

psp_forward(inputs)[source]

Forward function of PSP module.

segmentors

class mmseg.models.segmentors.BaseSegmentor(data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]

Base class for segmentors.

Parameters

data_preprocessor – Model preprocessing config for processing the input data. it usually includes to_rgb, pad_size_divisor, pad_val, mean and std. Default to None.

abstract encode_decode(inputs: torch.Tensor, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])[source]

Placeholder for encode images with backbone and decode into a semantic segmentation map of the same size as input.

abstract extract_feat(inputs: torch.Tensor)bool[source]

Placeholder for extract features from images.

forward(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None, mode: str = 'tensor')Union[Dict[str, torch.Tensor], List[mmseg.structures.seg_data_sample.SegDataSample], Tuple[torch.Tensor], torch.Tensor][source]

The unified entry for a forward process in both training and test.

The method should accept three modes: “tensor”, “predict” and “loss”:

  • “tensor”: Forward the whole network and return tensor or tuple of

tensor without any post-processing, same as a common nn.Module. - “predict”: Forward and return the predictions, which are fully processed to a list of SegDataSample. - “loss”: Forward and return a dict of losses according to the given inputs and data samples.

Note that this method doesn’t handle neither back propagation nor optimizer updating, which are done in the train_step().

Parameters
  • inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.

  • data_samples (list[SegDataSample]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg. Default to None.

  • mode (str) – Return what kind of value. Defaults to ‘tensor’.

Returns

The return type depends on mode.

  • If mode="tensor", return a tensor or a tuple of tensor.

  • If mode="predict", return a list of DetDataSample.

  • If mode="loss", return a dict of tensor.

abstract loss(inputs: torch.Tensor, data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])dict[source]

Calculate losses from a batch of inputs and data samples.

postprocess_result(seg_logits: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None)Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]

Convert results list to SegDataSample. :param seg_logits: The segmentation results, seg_logits from

model of each input image.

Parameters

data_samples (list[SegDataSample]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg. Default to None.

Returns

Segmentation results of the input images. Each SegDataSample usually contain:

  • ``pred_sem_seg``(PixelData): Prediction of semantic segmentation.

  • ``seg_logits``(PixelData): Predicted logits of semantic

    segmentation before normalization.

Return type

list[SegDataSample]

abstract predict(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None)Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]

Predict results from a batch of inputs and data samples with post- processing.

property with_auxiliary_head: bool

whether the segmentor has auxiliary head

Type

bool

property with_decode_head: bool

whether the segmentor has decode head

Type

bool

property with_neck: bool

whether the segmentor has neck

Type

bool

class mmseg.models.segmentors.CascadeEncoderDecoder(num_stages: int, backbone: Union[mmengine.config.config.ConfigDict, dict], decode_head: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, auxiliary_head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, pretrained: Optional[str] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]

Cascade Encoder Decoder segmentors.

CascadeEncoderDecoder almost the same as EncoderDecoder, while decoders of CascadeEncoderDecoder are cascaded. The output of previous decoder_head will be the input of next decoder_head.

Parameters
  • num_stages (int) – How many stages will be cascaded.

  • backbone (ConfigType) – The config for the backnone of segmentor.

  • decode_head (ConfigType) – The config for the decode head of segmentor.

  • neck (OptConfigType) – The config for the neck of segmentor. Defaults to None.

  • auxiliary_head (OptConfigType) – The config for the auxiliary head of segmentor. Defaults to None.

  • train_cfg (OptConfigType) – The config for training. Defaults to None.

  • test_cfg (OptConfigType) – The config for testing. Defaults to None.

  • data_preprocessor (dict, optional) – The pre-process config of BaseDataPreprocessor.

  • pretrained (str, optional) – The path for pretrained model. Defaults to None.

  • init_cfg (dict, optional) – The weight initialized config for BaseModule.

encode_decode(inputs: torch.Tensor, batch_img_metas: List[dict])torch.Tensor[source]

Encode images with backbone and decode into a semantic segmentation map of the same size as input.

class mmseg.models.segmentors.EncoderDecoder(backbone: Union[mmengine.config.config.ConfigDict, dict], decode_head: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, auxiliary_head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, pretrained: Optional[str] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]

Encoder Decoder segmentors.

EncoderDecoder typically consists of backbone, decode_head, auxiliary_head. Note that auxiliary_head is only used for deep supervision during training, which could be dumped during inference.

1. The loss method is used to calculate the loss of model, which includes two steps: (1) Extracts features to obtain the feature maps (2) Call the decode head loss function to forward decode head model and calculate losses.

loss(): extract_feat() -> _decode_head_forward_train() -> _auxiliary_head_forward_train (optional)
_decode_head_forward_train(): decode_head.loss()
_auxiliary_head_forward_train(): auxiliary_head.loss (optional)

2. The predict method is used to predict segmentation results, which includes two steps: (1) Run inference function to obtain the list of seg_logits (2) Call post-processing function to obtain list of SegDataSampel including pred_sem_seg and seg_logits.

predict(): inference() -> postprocess_result()
infercen(): whole_inference()/slide_inference()
whole_inference()/slide_inference(): encoder_decoder()
encoder_decoder(): extract_feat() -> decode_head.predict()

3. The _forward method is used to output the tensor by running the model, which includes two steps: (1) Extracts features to obtain the feature maps (2)Call the decode head forward function to forward decode head model.

_forward(): extract_feat() -> _decode_head.forward()
Parameters
  • backbone (ConfigType) – The config for the backnone of segmentor.

  • decode_head (ConfigType) – The config for the decode head of segmentor.

  • neck (OptConfigType) – The config for the neck of segmentor. Defaults to None.

  • auxiliary_head (OptConfigType) – The config for the auxiliary head of segmentor. Defaults to None.

  • train_cfg (OptConfigType) – The config for training. Defaults to None.

  • test_cfg (OptConfigType) – The config for testing. Defaults to None.

  • data_preprocessor (dict, optional) – The pre-process config of BaseDataPreprocessor.

  • pretrained (str, optional) – The path for pretrained model. Defaults to None.

  • init_cfg (dict, optional) – The weight initialized config for BaseModule.

aug_test(inputs, batch_img_metas, rescale=True)[source]

Test with augmentations.

Only rescale=True is supported.

encode_decode(inputs: torch.Tensor, batch_img_metas: List[dict])torch.Tensor[source]

Encode images with backbone and decode into a semantic segmentation map of the same size as input.

extract_feat(inputs: torch.Tensor)List[torch.Tensor][source]

Extract features from images.

inference(inputs: torch.Tensor, batch_img_metas: List[dict])torch.Tensor[source]

Inference with slide/whole style.

Parameters
  • inputs (Tensor) – The input image of shape (N, 3, H, W).

  • batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, ‘pad_shape’, and ‘padding_size’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.

Returns

The segmentation results, seg_logits from model of each

input image.

Return type

Tensor

loss(inputs: torch.Tensor, data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])dict[source]

Calculate losses from a batch of inputs and data samples.

Parameters
  • inputs (Tensor) – Input images.

  • data_samples (list[SegDataSample]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.

Returns

a dictionary of loss components

Return type

dict[str, Tensor]

predict(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None)Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]

Predict results from a batch of inputs and data samples with post- processing.

Parameters
  • inputs (Tensor) – Inputs with shape (N, C, H, W).

  • data_samples (List[SegDataSample], optional) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.

Returns

Segmentation results of the input images. Each SegDataSample usually contain:

  • ``pred_sem_seg``(PixelData): Prediction of semantic segmentation.

  • ``seg_logits``(PixelData): Predicted logits of semantic

    segmentation before normalization.

Return type

list[SegDataSample]

slide_inference(inputs: torch.Tensor, batch_img_metas: List[dict])torch.Tensor[source]

Inference by sliding-window with overlap.

If h_crop > h_img or w_crop > w_img, the small patch will be used to decode without padding.

Parameters
  • inputs (tensor) – the tensor should have a shape NxCxHxW, which contains all images in the batch.

  • batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, and ‘pad_shape’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.

Returns

The segmentation results, seg_logits from model of each

input image.

Return type

Tensor

whole_inference(inputs: torch.Tensor, batch_img_metas: List[dict])torch.Tensor[source]

Inference with full image.

Parameters
  • inputs (Tensor) – The tensor should have a shape NxCxHxW, which contains all images in the batch.

  • batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, and ‘pad_shape’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.

Returns

The segmentation results, seg_logits from model of each

input image.

Return type

Tensor

class mmseg.models.segmentors.SegTTAModel(module: Union[dict, torch.nn.modules.module.Module], data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None)[source]
merge_preds(data_samples_list: List[Sequence[mmseg.structures.seg_data_sample.SegDataSample]])Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]

Merge predictions of enhanced data to one prediction.

Parameters

data_samples_list (List[SampleList]) – List of predictions of all enhanced data.

Returns

Merged prediction.

Return type

SampleList

losses

class mmseg.models.losses.Accuracy(topk=(1), thresh=None, ignore_index=None)[source]

Accuracy calculation module.

forward(pred, target)[source]

Forward function to calculate accuracy.

Parameters
  • pred (torch.Tensor) – Prediction of models.

  • target (torch.Tensor) – Target for each prediction.

Returns

The accuracies under different topk criterions.

Return type

tuple[float]

class mmseg.models.losses.BoundaryLoss(loss_weight: float = 1.0, loss_name: str = 'loss_boundary')[source]

Boundary loss.

This function is modified from PIDNet. # noqa Licensed under the MIT License.

Parameters
  • loss_weight (float) – Weight of the loss. Defaults to 1.0.

  • loss_name (str) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_boundary’.

forward(bd_pre: torch.Tensor, bd_gt: torch.Tensor)torch.Tensor[source]

Forward function. :param bd_pre: Predictions of the boundary head. :type bd_pre: Tensor :param bd_gt: Ground truth of the boundary. :type bd_gt: Tensor

Returns

Loss tensor.

Return type

Tensor

class mmseg.models.losses.CrossEntropyLoss(use_sigmoid=False, use_mask=False, reduction='mean', class_weight=None, loss_weight=1.0, loss_name='loss_ce', avg_non_ignore=False)[source]

CrossEntropyLoss.

Parameters
  • use_sigmoid (bool, optional) – Whether the prediction uses sigmoid of softmax. Defaults to False.

  • use_mask (bool, optional) – Whether to use mask cross entropy loss. Defaults to False.

  • reduction (str, optional) – . Defaults to ‘mean’. Options are “none”, “mean” and “sum”.

  • class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.

  • loss_weight (float, optional) – Weight of the loss. Defaults to 1.0.

  • loss_name (str, optional) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_ce’.

  • avg_non_ignore (bool) – The flag decides to whether the loss is only averaged over non-ignored targets. Default: False. New in version 0.23.0.

extra_repr()[source]

Extra repr.

forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, ignore_index=- 100, **kwargs)[source]

Forward function.

property loss_name

Loss Name.

This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name.

Returns

The name of this loss item.

Return type

str

class mmseg.models.losses.DiceLoss(smooth=1, exponent=2, reduction='mean', class_weight=None, loss_weight=1.0, ignore_index=255, loss_name='loss_dice', **kwards)[source]

DiceLoss.

This loss is proposed in V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation.

Parameters
  • smooth (float) – A float number to smooth loss, and avoid NaN error. Default: 1

  • exponent (float) – An float number to calculate denominator value: sum{x^exponent} + sum{y^exponent}. Default: 2.

  • reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. This parameter only works when per_image is True. Default: ‘mean’.

  • class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.

  • loss_weight (float, optional) – Weight of the loss. Default to 1.0.

  • ignore_index (int | None) – The label index to be ignored. Default: 255.

  • loss_name (str, optional) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_dice’.

forward(pred, target, avg_factor=None, reduction_override=None, **kwards)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

property loss_name

Loss Name.

This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str

class mmseg.models.losses.FocalLoss(use_sigmoid=True, gamma=2.0, alpha=0.5, reduction='mean', class_weight=None, loss_weight=1.0, loss_name='loss_focal')[source]
forward(pred, target, weight=None, avg_factor=None, reduction_override=None, ignore_index=255, **kwargs)[source]

Forward function.

Parameters
  • pred (torch.Tensor) – The prediction with shape (N, C) where C = number of classes, or (N, C, d_1, d_2, …, d_K) with K≥1 in the case of K-dimensional loss.

  • target (torch.Tensor) – The ground truth. If containing class indices, shape (N) where each value is 0≤targets[i]≤C−1, or (N, d_1, d_2, …, d_K) with K≥1 in the case of K-dimensional loss. If containing class probabilities, same shape as the input.

  • weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Options are “none”, “mean” and “sum”.

  • ignore_index (int, optional) – The label index to be ignored. Default: 255

Returns

The calculated loss

Return type

torch.Tensor

property loss_name

Loss Name.

This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str

class mmseg.models.losses.LovaszLoss(loss_type='multi_class', classes='present', per_image=False, reduction='mean', class_weight=None, loss_weight=1.0, loss_name='loss_lovasz')[source]

LovaszLoss.

This loss is proposed in The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks.

Parameters
  • loss_type (str, optional) – Binary or multi-class loss. Default: ‘multi_class’. Options are “binary” and “multi_class”.

  • classes (str | list[int], optional) – Classes chosen to calculate loss. ‘all’ for all classes, ‘present’ for classes present in labels, or a list of classes to average. Default: ‘present’.

  • per_image (bool, optional) – If per_image is True, compute the loss per image instead of per batch. Default: False.

  • reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. This parameter only works when per_image is True. Default: ‘mean’.

  • class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.

  • loss_weight (float, optional) – Weight of the loss. Defaults to 1.0.

  • loss_name (str, optional) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_lovasz’.

forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]

Forward function.

property loss_name

Loss Name.

This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str

class mmseg.models.losses.OhemCrossEntropy(ignore_label: int = 255, thres: float = 0.7, min_kept: int = 100000, loss_weight: float = 1.0, class_weight: Optional[Union[List[float], str]] = None, loss_name: str = 'loss_ohem')[source]

OhemCrossEntropy loss.

This func is modified from PIDNet. # noqa

Licensed under the MIT License.

Parameters
  • ignore_label (int) – Labels to ignore when computing the loss. Default: 255

  • thresh (float, optional) – The threshold for hard example selection. Below which, are prediction with low confidence. If not specified, the hard examples will be pixels of top min_kept loss. Default: 0.7.

  • min_kept (int, optional) – The minimum number of predictions to keep. Default: 100000.

  • loss_weight (float) – Weight of the loss. Defaults to 1.0.

  • class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.

  • loss_name (str) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_boundary’.

forward(score: torch.Tensor, target: torch.Tensor)torch.Tensor[source]

Forward function. :param score: Predictions of the segmentation head. :type score: Tensor :param target: Ground truth of the image. :type target: Tensor

Returns

Loss tensor.

Return type

Tensor

class mmseg.models.losses.TverskyLoss(smooth=1, class_weight=None, loss_weight=1.0, ignore_index=255, alpha=0.3, beta=0.7, loss_name='loss_tversky')[source]

TverskyLoss. This loss is proposed in `Tversky loss function for image segmentation using 3D fully convolutional deep networks.

<https://arxiv.org/abs/1706.05721>`_. :param smooth: A float number to smooth loss, and avoid NaN error.

Default: 1.

Parameters
  • class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.

  • loss_weight (float, optional) – Weight of the loss. Default to 1.0.

  • ignore_index (int | None) – The label index to be ignored. Default: 255.

  • alpha (float, in [0, 1]) – The coefficient of false positives. Default: 0.3.

  • beta (float, in [0, 1]) – The coefficient of false negatives. Default: 0.7. Note: alpha + beta = 1.

  • loss_name (str, optional) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_tversky’.

forward(pred, target, **kwargs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

property loss_name

Loss Name.

This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str

mmseg.models.losses.accuracy(pred, target, topk=1, thresh=None, ignore_index=None)[source]

Calculate accuracy according to the prediction and target.

Parameters
  • pred (torch.Tensor) – The model prediction, shape (N, num_class, …)

  • target (torch.Tensor) – The target of each prediction, shape (N, , …)

  • ignore_index (int | None) – The label index to be ignored. Default: None

  • topk (int | tuple[int], optional) – If the predictions in topk matches the target, the predictions will be regarded as correct ones. Defaults to 1.

  • thresh (float, optional) – If not None, predictions with scores under this threshold are considered incorrect. Default to None.

Returns

If the input topk is a single integer,

the function will return a single float as accuracy. If topk is a tuple containing multiple integers, the function will return a tuple containing accuracies of each topk number.

Return type

float | tuple[float]

mmseg.models.losses.binary_cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None, class_weight=None, ignore_index=- 100, avg_non_ignore=False, **kwargs)[source]

Calculate the binary CrossEntropy loss.

Parameters
  • pred (torch.Tensor) – The prediction with shape (N, 1).

  • label (torch.Tensor) – The learning label of the prediction. Note: In bce loss, label < 0 is invalid.

  • weight (torch.Tensor, optional) – Sample-wise loss weight.

  • reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • class_weight (list[float], optional) – The weight for each class.

  • ignore_index (int) – The label index to be ignored. Default: -100.

  • avg_non_ignore (bool) – The flag decides to whether the loss is only averaged over non-ignored targets. Default: False. New in version 0.23.0.

Returns

The calculated loss

Return type

torch.Tensor

mmseg.models.losses.cross_entropy(pred, label, weight=None, class_weight=None, reduction='mean', avg_factor=None, ignore_index=- 100, avg_non_ignore=False)[source]

cross_entropy. The wrapper function for F.cross_entropy()

Parameters
  • pred (torch.Tensor) – The prediction with shape (N, 1).

  • label (torch.Tensor) – The learning label of the prediction.

  • weight (torch.Tensor, optional) – Sample-wise loss weight. Default: None.

  • class_weight (list[float], optional) – The weight for each class. Default: None.

  • reduction (str, optional) – The method used to reduce the loss. Options are ‘none’, ‘mean’ and ‘sum’. Default: ‘mean’.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Default: None.

  • ignore_index (int) – Specifies a target value that is ignored and does not contribute to the input gradients. When avg_non_ignore `` is ``True, and the reduction is ''mean'', the loss is averaged over non-ignored targets. Defaults: -100.

  • avg_non_ignore (bool) – The flag decides to whether the loss is only averaged over non-ignored targets. Default: False. New in version 0.23.0.

mmseg.models.losses.mask_cross_entropy(pred, target, label, reduction='mean', avg_factor=None, class_weight=None, ignore_index=None, **kwargs)[source]

Calculate the CrossEntropy loss for masks.

Parameters
  • pred (torch.Tensor) – The prediction with shape (N, C), C is the number of classes.

  • target (torch.Tensor) – The learning label of the prediction.

  • label (torch.Tensor) – label indicates the class label of the mask’ corresponding object. This will be used to select the mask in the of the class which the object belongs to when the mask prediction if not class-agnostic.

  • reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • class_weight (list[float], optional) – The weight for each class.

  • ignore_index (None) – Placeholder, to be consistent with other loss. Default: None.

Returns

The calculated loss

Return type

torch.Tensor

mmseg.models.losses.reduce_loss(loss, reduction)[source]

Reduce loss as specified.

Parameters
  • loss (Tensor) – Elementwise loss tensor.

  • reduction (str) – Options are “none”, “mean” and “sum”.

Returns

Reduced loss tensor.

Return type

Tensor

mmseg.models.losses.weight_reduce_loss(loss, weight=None, reduction='mean', avg_factor=None)[source]

Apply element-wise weight and reduce loss.

Parameters
  • loss (Tensor) – Element-wise loss.

  • weight (Tensor) – Element-wise weights.

  • reduction (str) – Same as built-in losses of PyTorch.

  • avg_factor (float) – Average factor when computing the mean of losses.

Returns

Processed loss values.

Return type

Tensor

mmseg.models.losses.weighted_loss(loss_func)[source]

Create a weighted version of a given loss function.

To use this decorator, the loss function must have the signature like loss_func(pred, target, **kwargs). The function only needs to compute element-wise loss without any reduction. This decorator will add weight and reduction arguments to the function. The decorated function will have the signature like loss_func(pred, target, weight=None, reduction=’mean’, avg_factor=None, **kwargs).

Example

>>> import torch
>>> @weighted_loss
>>> def l1_loss(pred, target):
>>>     return (pred - target).abs()
>>> pred = torch.Tensor([0, 2, 3])
>>> target = torch.Tensor([1, 1, 1])
>>> weight = torch.Tensor([1, 0, 1])
>>> l1_loss(pred, target)
tensor(1.3333)
>>> l1_loss(pred, target, weight)
tensor(1.)
>>> l1_loss(pred, target, reduction='none')
tensor([1., 1., 2.])
>>> l1_loss(pred, target, weight, avg_factor=2)
tensor(1.5000)

necks

class mmseg.models.necks.FPN(in_channels, out_channels, num_outs, start_level=0, end_level=- 1, add_extra_convs=False, extra_convs_on_inputs=False, relu_before_extra_convs=False, no_norm_on_lateral=False, conv_cfg=None, norm_cfg=None, act_cfg=None, upsample_cfg={'mode': 'nearest'}, init_cfg={'distribution': 'uniform', 'layer': 'Conv2d', 'type': 'Xavier'})[source]

Feature Pyramid Network.

This neck is the implementation of Feature Pyramid Networks for Object Detection.

Parameters
  • in_channels (list[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale).

  • num_outs (int) – Number of output scales.

  • start_level (int) – Index of the start input backbone level used to build the feature pyramid. Default: 0.

  • end_level (int) – Index of the end input backbone level (exclusive) to build the feature pyramid. Default: -1, which means the last level.

  • add_extra_convs (bool | str) –

    If bool, it decides whether to add conv layers on top of the original feature maps. Default to False. If True, its actual mode is specified by extra_convs_on_inputs. If str, it specifies the source feature map of the extra convs. Only the following options are allowed

    • ’on_input’: Last feat map of neck inputs (i.e. backbone feature).

    • ’on_lateral’: Last feature map after lateral convs.

    • ’on_output’: The last output feature map after fpn convs.

  • extra_convs_on_inputs (bool, deprecated) – Whether to apply extra convs on the original feature from the backbone. If True, it is equivalent to add_extra_convs=’on_input’. If False, it is equivalent to set add_extra_convs=’on_output’. Default to True.

  • relu_before_extra_convs (bool) – Whether to apply relu before the extra conv. Default: False.

  • no_norm_on_lateral (bool) – Whether to apply norm on lateral. Default: False.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict) – Config dict for normalization layer. Default: None.

  • act_cfg (dict) – Config dict for activation layer in ConvModule. Default: None.

  • upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(mode=’nearest’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict.

Example

>>> import torch
>>> in_channels = [2, 3, 5, 7]
>>> scales = [340, 170, 84, 43]
>>> inputs = [torch.rand(1, c, s, s)
...           for c, s in zip(in_channels, scales)]
>>> self = FPN(in_channels, 11, len(in_channels)).eval()
>>> outputs = self.forward(inputs)
>>> for i in range(len(outputs)):
...     print(f'outputs[{i}].shape = {outputs[i].shape}')
outputs[0].shape = torch.Size([1, 11, 340, 340])
outputs[1].shape = torch.Size([1, 11, 170, 170])
outputs[2].shape = torch.Size([1, 11, 84, 84])
outputs[3].shape = torch.Size([1, 11, 43, 43])
forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.necks.Feature2Pyramid(embed_dim, rescales=[4, 2, 1, 0.5], norm_cfg={'requires_grad': True, 'type': 'SyncBN'})[source]

Feature2Pyramid.

A neck structure connect ViT backbone and decoder_heads.

Parameters
  • embed_dims (int) – Embedding dimension.

  • rescales (list[float]) – Different sampling multiples were used to obtain pyramid features. Default: [4, 2, 1, 0.5].

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’SyncBN’, requires_grad=True).

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.necks.ICNeck(in_channels=(64, 256, 256), out_channels=128, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, align_corners=False, init_cfg=None)[source]

ICNet for Real-Time Semantic Segmentation on High-Resolution Images.

This head is the implementation of ICHead.

Parameters
  • in_channels (int) – The number of input image channels. Default: 3.

  • out_channels (int) – The numbers of output feature channels. Default: 128.

  • conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Dictionary to construct and config act layer. Default: dict(type=’ReLU’).

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.necks.JPU(in_channels=(512, 1024, 2048), mid_channels=512, start_level=0, end_level=- 1, dilations=(1, 2, 4, 8), align_corners=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.

This Joint Pyramid Upsampling (JPU) neck is the implementation of FastFCN.

Parameters
  • in_channels (Tuple[int], optional) – The number of input channels for each convolution operations before upsampling. Default: (512, 1024, 2048).

  • mid_channels (int) – The number of output channels of JPU. Default: 512.

  • start_level (int) – Index of the start input backbone level used to build the feature pyramid. Default: 0.

  • end_level (int) – Index of the end input backbone level (exclusive) to build the feature pyramid. Default: -1, which means the last level.

  • dilations (tuple[int]) – Dilation rate of each Depthwise Separable ConvModule. Default: (1, 2, 4, 8).

  • align_corners (bool, optional) – The align_corners argument of resize operation. Default: False.

  • conv_cfg (dict | None) – Config of conv layers. Default: None.

  • norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’BN’).

  • act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(inputs)[source]

Forward function.

class mmseg.models.necks.MLANeck(in_channels, out_channels, norm_layer={'eps': 1e-06, 'requires_grad': True, 'type': 'LN'}, norm_cfg=None, act_cfg=None)[source]

Multi-level Feature Aggregation.

This neck is The Multi-level Feature Aggregation construction of SETR.

Parameters
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale).

  • norm_layer (dict) – Config dict for input normalization. Default: norm_layer=dict(type=’LN’, eps=1e-6, requires_grad=True).

  • norm_cfg (dict) – Config dict for normalization layer. Default: None.

  • act_cfg (dict) – Config dict for activation layer in ConvModule. Default: None.

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.necks.MultiLevelNeck(in_channels, out_channels, scales=[0.5, 1, 2, 4], norm_cfg=None, act_cfg=None)[source]

MultiLevelNeck.

A neck structure connect vit backbone and decoder_heads.

Parameters
  • in_channels (List[int]) – Number of input channels per scale.

  • out_channels (int) – Number of output channels (used at each scale).

  • scales (List[float]) – Scale factors for each input feature map. Default: [0.5, 1, 2, 4]

  • norm_cfg (dict) – Config dict for normalization layer. Default: None.

  • act_cfg (dict) – Config dict for activation layer in ConvModule. Default: None.

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

utils

class mmseg.models.utils.BasicBlock(in_channels: int, channels: int, stride: int = 1, downsample: Optional[torch.nn.modules.module.Module] = None, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, act_cfg_out: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]

Basic block from ResNet.

Parameters
  • in_channels (int) – Input channels.

  • channels (int) – Output channels.

  • stride (int) – Stride of the first block. Default: 1.

  • downsample (nn.Module, optional) – Downsample operation on identity. Default: None.

  • norm_cfg (dict, optional) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict, optional) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’, inplace=True).

  • act_cfg_out (dict, optional) – Config dict for activation layer at the last of the block. Default: None.

  • init_cfg (dict, optional) – Initialization config dict. Default: None.

forward(x: torch.Tensor)torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.utils.Bottleneck(in_channels: int, channels: int, stride: int = 1, downsample: Optional[torch.nn.modules.module.Module] = None, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, act_cfg_out: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]

Bottleneck block from ResNet.

Parameters
  • in_channels (int) – Input channels.

  • channels (int) – Output channels.

  • stride (int) – Stride of the first block. Default: 1.

  • downsample (nn.Module, optional) – Downsample operation on identity. Default: None.

  • norm_cfg (dict, optional) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict, optional) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’, inplace=True).

  • act_cfg_out (dict, optional) – Config dict for activation layer at the last of the block. Default: None.

  • init_cfg (dict, optional) – Initialization config dict. Default: None.

forward(x: torch.Tensor)torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.utils.DAPPM(in_channels: int, branch_channels: int, out_channels: int, num_scales: int, kernel_sizes: List[int] = [5, 9, 17], strides: List[int] = [2, 4, 8], paddings: List[int] = [2, 4, 8], norm_cfg: Dict = {'momentum': 0.1, 'type': 'BN'}, act_cfg: Dict = {'inplace': True, 'type': 'ReLU'}, conv_cfg: Dict = {'bias': False, 'order': ('norm', 'act', 'conv')}, upsample_mode: str = 'bilinear')[source]

DAPPM module in DDRNet.

Parameters
  • in_channels (int) – Input channels.

  • branch_channels (int) – Branch channels.

  • out_channels (int) – Output channels.

  • num_scales (int) – Number of scales.

  • kernel_sizes (list[int]) – Kernel sizes of each scale.

  • strides (list[int]) – Strides of each scale.

  • paddings (list[int]) – Paddings of each scale.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’, inplace=True).

  • conv_cfg (dict) – Config dict for convolution layer in ConvModule. Default: dict(order=(‘norm’, ‘act’, ‘conv’), bias=False).

  • upsample_mode (str) – Upsample mode. Default: ‘bilinear’.

forward(inputs: torch.Tensor)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.utils.Encoding(channels, num_codes)[source]

Encoding Layer: a learnable residual encoder.

Input is of shape (batch_size, channels, height, width). Output is of shape (batch_size, num_codes, channels).

Parameters
  • channels – dimension of the features or feature channels

  • num_codes – number of code words

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.utils.InvertedResidual(in_channels, out_channels, stride, expand_ratio, dilation=1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, with_cp=False, **kwargs)[source]

InvertedResidual block for MobileNetV2.

Parameters
  • in_channels (int) – The input channels of the InvertedResidual block.

  • out_channels (int) – The output channels of the InvertedResidual block.

  • stride (int) – Stride of the middle (first) 3x3 convolution.

  • expand_ratio (int) – Adjusts number of channels of the hidden layer in InvertedResidual by this amount.

  • dilation (int) – Dilation rate of depthwise conv. Default: 1

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

Returns

The output tensor.

Return type

Tensor

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.utils.InvertedResidualV3(in_channels, out_channels, mid_channels, kernel_size=3, stride=1, se_cfg=None, with_expand_conv=True, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_cp=False)[source]

Inverted Residual Block for MobileNetV3.

Parameters
  • in_channels (int) – The input channels of this Module.

  • out_channels (int) – The output channels of this Module.

  • mid_channels (int) – The input channels of the depthwise convolution.

  • kernel_size (int) – The kernel size of the depthwise convolution. Default: 3.

  • stride (int) – The stride of the depthwise convolution. Default: 1.

  • se_cfg (dict) – Config dict for se layer. Default: None, which means no se layer.

  • with_expand_conv (bool) – Use expand conv or not. If set False, mid_channels must be the same with in_channels. Default: True.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

Returns

The output tensor.

Return type

Tensor

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.utils.PAPPM(in_channels: int, branch_channels: int, out_channels: int, num_scales: int, kernel_sizes: List[int] = [5, 9, 17], strides: List[int] = [2, 4, 8], paddings: List[int] = [2, 4, 8], norm_cfg: Dict = {'momentum': 0.1, 'type': 'BN'}, act_cfg: Dict = {'inplace': True, 'type': 'ReLU'}, conv_cfg: Dict = {'bias': False, 'order': ('norm', 'act', 'conv')}, upsample_mode: str = 'bilinear')[source]

PAPPM module in PIDNet.

Parameters
  • in_channels (int) – Input channels.

  • branch_channels (int) – Branch channels.

  • out_channels (int) – Output channels.

  • num_scales (int) – Number of scales.

  • kernel_sizes (list[int]) – Kernel sizes of each scale.

  • strides (list[int]) – Strides of each scale.

  • paddings (list[int]) – Paddings of each scale.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’, momentum=0.1).

  • act_cfg (dict) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’, inplace=True).

  • conv_cfg (dict) – Config dict for convolution layer in ConvModule. Default: dict(order=(‘norm’, ‘act’, ‘conv’), bias=False).

  • upsample_mode (str) – Upsample mode. Default: ‘bilinear’.

forward(inputs: torch.Tensor)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.utils.PatchEmbed(in_channels=3, embed_dims=768, conv_type='Conv2d', kernel_size=16, stride=None, padding='corner', dilation=1, bias=True, norm_cfg=None, input_size=None, init_cfg=None)[source]

Image to Patch Embedding.

We use a conv layer to implement PatchEmbed.

Parameters
  • in_channels (int) – The num of input channels. Default: 3

  • embed_dims (int) – The dimensions of embedding. Default: 768

  • conv_type (str) – The config dict for embedding conv layer type selection. Default: “Conv2d”.

  • kernel_size (int) – The kernel_size of embedding conv. Default: 16.

  • stride (int, optional) – The slide stride of embedding conv. Default: None (Would be set as kernel_size).

  • padding (int | tuple | string) – The padding length of embedding conv. When it is a string, it means the mode of adaptive padding, support “same” and “corner” now. Default: “corner”.

  • dilation (int) – The dilation rate of embedding conv. Default: 1.

  • bias (bool) – Bias of embed conv. Default: True.

  • norm_cfg (dict, optional) – Config dict for normalization layer. Default: None.

  • input_size (int | tuple | None) – The size of input, which will be used to calculate the out size. Only work when dynamic_size is False. Default: None.

  • init_cfg (mmengine.ConfigDict, optional) – The Config for initialization. Default: None.

forward(x)[source]
Parameters

x (Tensor) – Has shape (B, C, H, W). In most case, C is 3.

Returns

Contains merged results and its spatial shape.

  • x (Tensor): Has shape (B, out_h * out_w, embed_dims)

  • out_size (tuple[int]): Spatial shape of x, arrange as

    (out_h, out_w).

Return type

tuple

class mmseg.models.utils.ResLayer(block, inplanes, planes, num_blocks, stride=1, dilation=1, avg_down=False, conv_cfg=None, norm_cfg={'type': 'BN'}, multi_grid=None, contract_dilation=False, **kwargs)[source]

ResLayer to build ResNet style backbone.

Parameters
  • block (nn.Module) – block used to build ResLayer.

  • inplanes (int) – inplanes of block.

  • planes (int) – planes of block.

  • num_blocks (int) – number of blocks.

  • stride (int) – stride of the first block. Default: 1

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False

  • conv_cfg (dict) – dictionary to construct and config conv layer. Default: None

  • norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)

  • multi_grid (int | None) – Multi grid dilation rates of last stage. Default: None

  • contract_dilation (bool) – Whether contract first dilation of each layer Default: False

class mmseg.models.utils.SELayer(channels, ratio=16, conv_cfg=None, act_cfg=({'type': 'ReLU'}, {'type': 'HSigmoid', 'bias': 3.0, 'divisor': 6.0}))[source]

Squeeze-and-Excitation Module.

Parameters
  • channels (int) – The input (and output) channels of the SE layer.

  • ratio (int) – Squeeze ratio in SELayer, the intermediate channel will be int(channels/ratio). Default: 16.

  • conv_cfg (None or dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • act_cfg (dict or Sequence[dict]) – Config dict for activation layer. If act_cfg is a dict, two activation layers will be configured by this dict. If act_cfg is a sequence of dicts, the first activation layer will be configured by the first dict and the second activation layer will be configured by the second dict. Default: (dict(type=’ReLU’), dict(type=’HSigmoid’, bias=3.0, divisor=6.0)).

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.utils.SelfAttentionBlock(key_in_channels, query_in_channels, channels, out_channels, share_key_query, query_downsample, key_downsample, key_query_num_convs, value_out_num_convs, key_query_norm, value_out_norm, matmul_norm, with_out, conv_cfg, norm_cfg, act_cfg)[source]

General self-attention block/non-local block.

Please refer to https://arxiv.org/abs/1706.03762 for details about key, query and value.

Parameters
  • key_in_channels (int) – Input channels of key feature.

  • query_in_channels (int) – Input channels of query feature.

  • channels (int) – Output channels of key/query transform.

  • out_channels (int) – Output channels.

  • share_key_query (bool) – Whether share projection weight between key and query projection.

  • query_downsample (nn.Module) – Query downsample module.

  • key_downsample (nn.Module) – Key downsample module.

  • key_query_num_convs (int) – Number of convs for key/query projection.

  • value_num_convs (int) – Number of convs for value projection.

  • matmul_norm (bool) – Whether normalize attention map with sqrt of channels

  • with_out (bool) – Whether use out projection.

  • conv_cfg (dict|None) – Config of conv layers.

  • norm_cfg (dict|None) – Config of norm layers.

  • act_cfg (dict|None) – Config of activation layers.

build_project(in_channels, channels, num_convs, use_conv_module, conv_cfg, norm_cfg, act_cfg)[source]

Build projection layer for key/query/value/out.

forward(query_feats, key_feats)[source]

Forward function.

init_weights()[source]

Initialize weight of later layer.

class mmseg.models.utils.UpConvBlock(conv_block, in_channels, skip_channels, out_channels, num_convs=2, stride=1, dilation=1, with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, upsample_cfg={'type': 'InterpConv'}, dcn=None, plugins=None)[source]

Upsample convolution block in decoder for UNet.

This upsample convolution block consists of one upsample module followed by one convolution block. The upsample module expands the high-level low-resolution feature map and the convolution block fuses the upsampled high-level low-resolution feature map and the low-level high-resolution feature map from encoder.

Parameters
  • conv_block (nn.Sequential) – Sequential of convolutional layers.

  • in_channels (int) – Number of input channels of the high-level

  • skip_channels (int) – Number of input channels of the low-level

  • feature map from encoder. (high-resolution) –

  • out_channels (int) – Number of output channels.

  • num_convs (int) – Number of convolutional layers in the conv_block. Default: 2.

  • stride (int) – Stride of convolutional layer in conv_block. Default: 1.

  • dilation (int) – Dilation rate of convolutional layer in conv_block. Default: 1.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • conv_cfg (dict | None) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict | None) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict | None) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’).

  • upsample_cfg (dict) – The upsample config of the upsample module in decoder. Default: dict(type=’InterpConv’). If the size of high-level feature map is the same as that of skip feature map (low-level feature map from encoder), it does not need upsample the high-level feature map and the upsample_cfg is None.

  • dcn (bool) – Use deformable convolution in convolutional layer or not. Default: None.

  • plugins (dict) – plugins for convolutional layers. Default: None.

forward(skip, x)[source]

Forward function.

class mmseg.models.utils.Upsample(size=None, scale_factor=None, mode='nearest', align_corners=None)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmseg.models.utils.make_divisible(value, divisor, min_value=None, min_ratio=0.9)[source]

Make divisible function.

This function rounds the channel number to the nearest value that can be divisible by the divisor. It is taken from the original tf repo. It ensures that all layers have a channel number that is divisible by divisor. It can be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py # noqa

Parameters
  • value (int) – The original channel number.

  • divisor (int) – The divisor to fully divide the channel number.

  • min_value (int) – The minimum value of the output channel. Default: None, means that the minimum value equal to the divisor.

  • min_ratio (float) – The minimum ratio of the rounded channel number to the original channel number. Default: 0.9.

Returns

The modified output channel number.

Return type

int

mmseg.models.utils.nchw2nlc2nchw(module, x, contiguous=False, **kwargs)[source]

Flatten [N, C, H, W] shape tensor x to [N, L, C] shape tensor. Use the reshaped tensor as the input of module, and the convert the output of module, whose shape is.

[N, L, C], to [N, C, H, W].

Parameters
  • module (Callable) – A callable object the takes a tensor with shape [N, L, C] as input.

  • x (Tensor) – The input tensor of shape [N, C, H, W]. contiguous:

  • contiguous (Bool) – Whether to make the tensor contiguous after each shape transform.

Returns

The output tensor of shape [N, C, H, W].

Return type

Tensor

Example

>>> import torch
>>> import torch.nn as nn
>>> norm = nn.LayerNorm(4)
>>> feature_map = torch.rand(4, 4, 5, 5)
>>> output = nchw2nlc2nchw(norm, feature_map)
mmseg.models.utils.nchw_to_nlc(x)[source]

Flatten [N, C, H, W] shape tensor to [N, L, C] shape tensor.

Parameters

x (Tensor) – The input tensor of shape [N, C, H, W] before conversion.

Returns

The output tensor of shape [N, L, C] after conversion.

Return type

Tensor

mmseg.models.utils.nlc2nchw2nlc(module, x, hw_shape, contiguous=False, **kwargs)[source]

Convert [N, L, C] shape tensor x to [N, C, H, W] shape tensor. Use the reshaped tensor as the input of module, and convert the output of module, whose shape is.

[N, C, H, W], to [N, L, C].

Parameters
  • module (Callable) – A callable object the takes a tensor with shape [N, C, H, W] as input.

  • x (Tensor) – The input tensor of shape [N, L, C].

  • hw_shape – (Sequence[int]): The height and width of the feature map with shape [N, C, H, W].

  • contiguous (Bool) – Whether to make the tensor contiguous after each shape transform.

Returns

The output tensor of shape [N, L, C].

Return type

Tensor

Example

>>> import torch
>>> import torch.nn as nn
>>> conv = nn.Conv2d(16, 16, 3, 1, 1)
>>> feature_map = torch.rand(4, 25, 16)
>>> output = nlc2nchw2nlc(conv, feature_map, (5, 5))
mmseg.models.utils.nlc_to_nchw(x, hw_shape)[source]

Convert [N, L, C] shape tensor to [N, C, H, W] shape tensor.

Parameters
  • x (Tensor) – The input tensor of shape [N, L, C] before conversion.

  • hw_shape (Sequence[int]) – The height and width of output feature map.

Returns

The output tensor of shape [N, C, H, W] after conversion.

Return type

Tensor

mmseg.structures

structures

class mmseg.structures.BasePixelSampler(**kwargs)[source]

Base class of pixel sampler.

abstract sample(seg_logit, seg_label)[source]

Placeholder for sample function.

class mmseg.structures.OHEMPixelSampler(context, thresh=None, min_kept=100000)[source]

Online Hard Example Mining Sampler for segmentation.

Parameters
  • context (nn.Module) – The context of sampler, subclass of BaseDecodeHead.

  • thresh (float, optional) – The threshold for hard example selection. Below which, are prediction with low confidence. If not specified, the hard examples will be pixels of top min_kept loss. Default: None.

  • min_kept (int, optional) – The minimum number of predictions to keep. Default: 100000.

sample(seg_logit, seg_label)[source]

Sample pixels that have high loss or with low prediction confidence.

Parameters
  • seg_logit (torch.Tensor) – segmentation logits, shape (N, C, H, W)

  • seg_label (torch.Tensor) – segmentation label, shape (N, 1, H, W)

Returns

segmentation weight, shape (N, H, W)

Return type

torch.Tensor

class mmseg.structures.SegDataSample(*, metainfo: Optional[dict] = None, **kwargs)[source]

A data structure interface of MMSegmentation. They are used as interfaces between different components.

The attributes in SegDataSample are divided into several parts:

  • ``gt_sem_seg``(PixelData): Ground truth of semantic segmentation.

  • ``pred_sem_seg``(PixelData): Prediction of semantic segmentation.

  • ``seg_logits``(PixelData): Predicted logits of semantic segmentation.

Examples

>>> import torch
>>> import numpy as np
>>> from mmengine.structures import PixelData
>>> from mmseg.structures import SegDataSample
>>> data_sample = SegDataSample()
>>> img_meta = dict(img_shape=(4, 4, 3),
...                 pad_shape=(4, 4, 3))
>>> gt_segmentations = PixelData(metainfo=img_meta)
>>> gt_segmentations.data = torch.randint(0, 2, (1, 4, 4))
>>> data_sample.gt_sem_seg = gt_segmentations
>>> assert 'img_shape' in data_sample.gt_sem_seg.metainfo_keys()
>>> data_sample.gt_sem_seg.shape
(4, 4)
>>> print(data_sample)

<SegDataSample(

META INFORMATION

DATA FIELDS gt_sem_seg: <PixelData(

META INFORMATION img_shape: (4, 4, 3) pad_shape: (4, 4, 3)

DATA FIELDS data: tensor([[[1, 1, 1, 0],

[1, 0, 1, 1], [1, 1, 1, 1], [0, 1, 0, 1]]])

) at 0x1c2b4156460>

) at 0x1c2aae44d60>

>>> data_sample = SegDataSample()
>>> gt_sem_seg_data = dict(sem_seg=torch.rand(1, 4, 4))
>>> gt_sem_seg = PixelData(**gt_sem_seg_data)
>>> data_sample.gt_sem_seg = gt_sem_seg
>>> assert 'gt_sem_seg' in data_sample
>>> assert 'sem_seg' in data_sample.gt_sem_seg
mmseg.structures.build_pixel_sampler(cfg, **default_args)[source]

Build pixel sampler for segmentation map.

sampler

class mmseg.structures.sampler.BasePixelSampler(**kwargs)[source]

Base class of pixel sampler.

abstract sample(seg_logit, seg_label)[source]

Placeholder for sample function.

class mmseg.structures.sampler.OHEMPixelSampler(context, thresh=None, min_kept=100000)[source]

Online Hard Example Mining Sampler for segmentation.

Parameters
  • context (nn.Module) – The context of sampler, subclass of BaseDecodeHead.

  • thresh (float, optional) – The threshold for hard example selection. Below which, are prediction with low confidence. If not specified, the hard examples will be pixels of top min_kept loss. Default: None.

  • min_kept (int, optional) – The minimum number of predictions to keep. Default: 100000.

sample(seg_logit, seg_label)[source]

Sample pixels that have high loss or with low prediction confidence.

Parameters
  • seg_logit (torch.Tensor) – segmentation logits, shape (N, C, H, W)

  • seg_label (torch.Tensor) – segmentation label, shape (N, 1, H, W)

Returns

segmentation weight, shape (N, H, W)

Return type

torch.Tensor

mmseg.structures.sampler.build_pixel_sampler(cfg, **default_args)[source]

Build pixel sampler for segmentation map.

mmseg.visualization

class mmseg.visualization.SegLocalVisualizer(name: str = 'visualizer', image: Optional[numpy.ndarray] = None, vis_backends: Optional[Dict] = None, save_dir: Optional[str] = None, classes: Optional[List] = None, palette: Optional[List] = None, dataset_name: Optional[str] = None, alpha: float = 0.8, **kwargs)[source]

Local Visualizer.

Parameters
  • name (str) – Name of the instance. Defaults to ‘visualizer’.

  • image (np.ndarray, optional) – the origin image to draw. The format should be RGB. Defaults to None.

  • vis_backends (list, optional) – Visual backend config list. Defaults to None.

  • save_dir (str, optional) – Save file dir for all storage backends. If it is None, the backend storage will not save any data.

  • classes (list, optional) – Input classes for result rendering, as the prediction of segmentation model is a segment map with label indices, classes is a list which includes items responding to the label indices. If classes is not defined, visualizer will take cityscapes classes by default. Defaults to None.

  • palette (list, optional) – Input palette for result rendering, which is a list of color palette responding to the classes. Defaults to None.

  • dataset_name (str, optional) –

    Dataset name or alias visulizer will use the meta information of the dataset i.e. classes and palette, but the classes and palette have higher priority. Defaults to None.

  • alpha (int, float) – The transparency of segmentation mask. Defaults to 0.8.

Examples

>>> import numpy as np
>>> import torch
>>> from mmengine.structures import PixelData
>>> from mmseg.data import SegDataSample
>>> from mmseg.engine.visualization import SegLocalVisualizer
>>> seg_local_visualizer = SegLocalVisualizer()
>>> image = np.random.randint(0, 256,
...                     size=(10, 12, 3)).astype('uint8')
>>> gt_sem_seg_data = dict(data=torch.randint(0, 2, (1, 10, 12)))
>>> gt_sem_seg = PixelData(**gt_sem_seg_data)
>>> gt_seg_data_sample = SegDataSample()
>>> gt_seg_data_sample.gt_sem_seg = gt_sem_seg
>>> seg_local_visualizer.dataset_meta = dict(
>>>     classes=('background', 'foreground'),
>>>     palette=[[120, 120, 120], [6, 230, 230]])
>>> seg_local_visualizer.add_datasample('visualizer_example',
...                         image, gt_seg_data_sample)
>>> seg_local_visualizer.add_datasample(
...                        'visualizer_example', image,
...                         gt_seg_data_sample, show=True)
add_datasample(name: str, image: numpy.ndarray, data_sample: Optional[mmseg.structures.seg_data_sample.SegDataSample] = None, draw_gt: bool = True, draw_pred: