Shortcuts

mmseg.apis

class mmseg.apis.MMSegInferencer(model: Union[dict, mmengine.config.config.Config, mmengine.config.config.ConfigDict, str], weights: Optional[str] = None, classes: Optional[Union[str, List]] = None, palette: Optional[Union[str, List]] = None, dataset_name: Optional[str] = None, device: Optional[str] = None, scope: Optional[str] = 'mmseg')[source]

Semantic segmentation inferencer, provides inference and visualization interfaces. Note: MMEngine >= 0.5.0 is required.

Parameters
  • model (str, optional) – Path to the config file or the model name defined in metafile. Take the mmseg metafile as an example the model could be “fcn_r50-d8_4xb2-40k_cityscapes-512x1024”, and the weights of model will be download automatically. If use config file, like “configs/fcn/fcn_r50-d8_4xb2-40k_cityscapes-512x1024.py”, the weights should be defined.

  • weights (str, optional) – Path to the checkpoint. If it is not specified and model is a model name of metafile, the weights will be loaded from metafile. Defaults to None.

  • classes (list, optional) – Input classes for result rendering, as the prediction of segmentation model is a segment map with label indices, classes is a list which includes items responding to the label indices. If classes is not defined, visualizer will take cityscapes classes by default. Defaults to None.

  • palette (list, optional) – Input palette for result rendering, which is a list of color palette responding to the classes. If palette is not defined, visualizer will take cityscapes palette by default. Defaults to None.

  • dataset_name (str, optional) – Dataset name or alias visulizer will use the meta information of the dataset i.e. classes and palette, but the classes and palette have higher priority. Defaults to None.

  • device (str, optional) – Device to run inference. If None, the available device will be automatically used. Defaults to None.

  • scope (str, optional) – The scope of the model. Defaults to ‘mmseg’.

postprocess(preds: Union[mmseg.structures.seg_data_sample.SegDataSample, Sequence[mmseg.structures.seg_data_sample.SegDataSample]], visualization: List[numpy.ndarray], return_datasample: bool = False, pred_out_dir: str = '')dict[source]

Process the predictions and visualization results from forward and visualize.

This method should be responsible for the following tasks:

  1. Pack the predictions and visualization results and return them.

  2. Save the predictions, if it needed.

Parameters
  • preds (List[Dict]) – Predictions of the model.

  • visualization (List[np.ndarray]) – The list of rendering color segmentation mask.

  • return_datasample (bool) – Whether to return results as datasamples. Defaults to False.

  • pred_out_dir – File to save the inference results w/o visualization. If left as empty, no file will be saved. Defaults to ‘’.

Returns

Inference and visualization results with key predictions and visualization

  • visualization (Any): Returned by visualize()

  • predictions (List[np.ndarray], np.ndarray): Returned by forward() and processed in postprocess(). If return_datasample=False, it will be the segmentation mask with label indice.

Return type

dict

visualize(inputs: list, preds: List[dict], return_vis: bool = False, show: bool = False, wait_time: int = 0, img_out_dir: str = '', opacity: float = 0.8)List[numpy.ndarray][source]

Visualize predictions.

Parameters
  • inputs (list) – Inputs preprocessed by _inputs_to_list().

  • preds (Any) – Predictions of the model.

  • show (bool) – Whether to display the image in a popup window. Defaults to False.

  • wait_time (float) – The interval of show (s). Defaults to 0.

  • img_out_dir (str) – Output directory of rendering prediction i.e. color segmentation mask. Defaults: ‘’

  • opacity (int, float) – The transparency of segmentation mask. Defaults to 0.8.

Returns

Visualization results.

Return type

List[np.ndarray]

class mmseg.apis.RSImage(image)[source]

Remote sensing image class.

Parameters

img (str or gdal.Dataset) – Image file path or gdal.Dataset.

create_grids(window_size: Tuple[int, int], stride: Tuple[int, int] = (0, 0))[source]

Create grids for image inference.

Parameters
  • window_size (Tuple[int, int]) – the size of the sliding window.

  • stride (Tuple[int, int], optional) – the stride of the sliding window. Defaults to (0, 0).

Raises
  • AssertionError – window_size must be a tuple of 2 elements.

  • AssertionError – stride must be a tuple of 2 elements.

read(grid: Optional[List] = None)numpy.ndarray[source]

Read image data. If grid is None, read the whole image.

Parameters

grid (Optional[List], optional) – Grid to read. Defaults to None.

Returns

Image data.

Return type

np.ndarray

write(data: Optional[numpy.ndarray], grid: Optional[List] = None)[source]

Write image data.

Parameters
  • grid (Optional[List], optional) – Grid to write. Defaults to None.

  • data (Optional[np.ndarray], optional) – Data to write. Defaults to None.

Raises

ValueError – Either grid or data must be provided.

class mmseg.apis.RSInferencer(model: mmengine.model.base_model.base_model.BaseModel, batch_size: int = 1, thread: int = 1)[source]

Remote sensing inference class.

Parameters
  • model (BaseModel) – The loaded model.

  • batch_size (int, optional) – Batch size. Defaults to 1.

  • thread (int, optional) – Number of threads. Defaults to 1.

classmethod from_config_path(config_path: str, checkpoint_path: str, batch_size: int = 1, thread: int = 1, device: Optional[str] = 'cpu')[source]

Initialize a segmentor from config file.

Parameters
  • config_path (str) – Config file path.

  • checkpoint_path (str) – Checkpoint path.

  • batch_size (int, optional) – Batch size. Defaults to 1.

classmethod from_model(model: mmengine.model.base_model.base_model.BaseModel, checkpoint_path: Optional[str] = None, batch_size: int = 1, thread: int = 1, device: Optional[str] = 'cpu')[source]

Initialize a segmentor from model.

Parameters
  • model (BaseModel) – The loaded model.

  • checkpoint_path (Optional[str]) – Checkpoint path.

  • batch_size (int, optional) – Batch size. Defaults to 1.

inference()[source]

Inference image data from read buffer and put the result to write buffer.

read(image: mmseg.apis.remote_sense_inferencer.RSImage, window_size: Tuple[int, int], strides: Tuple[int, int] = (0, 0))[source]

Load image data to read buffer.

Parameters
  • image (RSImage) – The image to read.

  • window_size (Tuple[int, int]) – The size of the sliding window.

  • strides (Tuple[int, int], optional) – The stride of the sliding window. Defaults to (0, 0).

run(image: mmseg.apis.remote_sense_inferencer.RSImage, window_size: Tuple[int, int], strides: Tuple[int, int] = (0, 0), output_path: Optional[str] = None)[source]

Run inference with multi-threading.

Parameters
  • image (RSImage) – The image to inference.

  • window_size (Tuple[int, int]) – The size of the sliding window.

  • strides (Tuple[int, int], optional) – The stride of the sliding window. Defaults to (0, 0).

  • output_path (Optional[str], optional) – The path to save the segmentation map. Defaults to None.

write(image: mmseg.apis.remote_sense_inferencer.RSImage, output_path: Optional[str] = None)[source]

Write image data from write buffer.

Parameters
  • image (RSImage) – The image to write.

  • output_path (Optional[str], optional) – The path to save the segmentation map. Defaults to None.

mmseg.apis.inference_model(model: mmseg.models.segmentors.base.BaseSegmentor, img: Union[str, numpy.ndarray, Sequence[str], Sequence[numpy.ndarray]])Union[mmseg.structures.seg_data_sample.SegDataSample, Sequence[mmseg.structures.seg_data_sample.SegDataSample]][source]

Inference image(s) with the segmentor.

Parameters
  • model (nn.Module) – The loaded segmentor.

  • imgs (str/ndarray or list[str/ndarray]) – Either image files or loaded images.

Returns

If imgs is a list or tuple, the same length list type results will be returned, otherwise return the segmentation results directly.

Return type

SegDataSample or list[SegDataSample]

mmseg.apis.init_model(config: Union[str, pathlib.Path, mmengine.config.config.Config], checkpoint: Optional[str] = None, device: str = 'cuda:0', cfg_options: Optional[dict] = None)[source]

Initialize a segmentor from config file.

Parameters
  • config (str, Path, or mmengine.Config) – Config file path, Path, or the config object.

  • checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.

  • device (str, optional) – 0’. Use ‘cpu’ for loading model on CPU.

  • cfg_options (dict, optional) – Options to override some settings in the used config.

Returns

The constructed segmentor.

Return type

nn.Module

mmseg.apis.show_result_pyplot(model: mmseg.models.segmentors.base.BaseSegmentor, img: Union[str, numpy.ndarray], result: mmseg.structures.seg_data_sample.SegDataSample, opacity: float = 0.5, title: str = '', draw_gt: bool = True, draw_pred: bool = True, wait_time: float = 0, show: bool = True, withLabels: Optional[bool] = True, save_dir=None, out_file=None)[source]

Visualize the segmentation results on the image.

Parameters
  • model (nn.Module) – The loaded segmentor.

  • img (str or np.ndarray) – Image filename or loaded image.

  • result (SegDataSample) – The prediction SegDataSample result.

  • opacity (float) – Opacity of painted segmentation map. Default 0.5. Must be in (0, 1] range.

  • title (str) – The title of pyplot figure. Default is ‘’.

  • draw_gt (bool) – Whether to draw GT SegDataSample. Default to True.

  • draw_pred (bool) – Whether to draw Prediction SegDataSample. Defaults to True.

  • wait_time (float) – The interval of show (s). 0 is the special value that means “forever”. Defaults to 0.

  • show (bool) – Whether to display the drawn image. Default to True.

  • withLabels (bool, optional) – Add semantic labels in visualization result, Default to True.

  • save_dir (str, optional) – Save file dir for all storage backends. If it is None, the backend storage will not save any data.

  • out_file (str, optional) – Path to output file. Default to None.

Returns

the drawn image which channel is RGB.

Return type

np.ndarray

mmseg.datasets

datasets

class mmseg.datasets.ADE20KDataset(img_suffix='.jpg', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]

ADE20K dataset.

In segmentation map annotation for ADE20K, 0 stands for background, which is not included in 150 categories. reduce_zero_label is fixed to True. The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.AdjustGamma(gamma=1.0)[source]

Using gamma correction to process the image.

Required Keys:

  • img

Modified Keys:

  • img

Parameters

gamma (float or int) – Gamma value used in gamma correction. Default: 1.0.

transform(results: dict)dict[source]

Call function to process the image with gamma correction.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Processed results.

Return type

dict

class mmseg.datasets.Albu(transforms: List[dict], keymap: Optional[dict] = None, update_pad_shape: bool = False)[source]

Albumentation augmentation. Adds custom transformations from Albumentations library. Please, visit https://albumentations.readthedocs.io to get more information. An example of transforms is as followed:

Parameters
  • transforms (list[dict]) – A list of albu transformations

  • keymap (dict) – Contains {‘input key’:’albumentation-style key’}

  • update_pad_shape (bool) – Whether to update padding shape according to the output shape of the last transform

albu_builder(cfg: dict)object[source]

Build a callable object from a dict containing albu arguments.

Parameters

cfg (dict) – Config dict. It should at least contain the key “type”.

Returns

A callable object.

Return type

Callable

static mapper(d: dict, keymap: dict)[source]

Dictionary mapper.

Renames keys according to keymap provided. :param d: old dict :type d: dict :param keymap: {‘old_key’:’new_key’} :type keymap: dict

Returns

new dict.

Return type

dict

transform(results)[source]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

class mmseg.datasets.BDD100KDataset(img_suffix='.jpg', seg_map_suffix='.png', reduce_zero_label=False, **kwargs)[source]
class mmseg.datasets.BaseCDDataset(ann_file: str = '', img_suffix='.jpg', img_suffix2='.jpg', seg_map_suffix='.png', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img_path': '', 'img_path2': '', 'seg_map_path': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, ignore_index: int = 255, reduce_zero_label: bool = False, backend_args: Optional[dict] = None)[source]

Custom dataset for change detection. An example of file structure is as followed.

├── data
│   ├── my_dataset
│   │   ├── img_dir
│   │   │   ├── train
│   │   │   │   ├── xxx{img_suffix}
│   │   │   │   ├── yyy{img_suffix}
│   │   │   │   ├── zzz{img_suffix}
│   │   │   ├── val
│   │   ├── img_dir2
│   │   │   ├── train
│   │   │   │   ├── xxx{img_suffix}
│   │   │   │   ├── yyy{img_suffix}
│   │   │   │   ├── zzz{img_suffix}
│   │   │   ├── val
│   │   ├── ann_dir
│   │   │   ├── train
│   │   │   │   ├── xxx{seg_map_suffix}
│   │   │   │   ├── yyy{seg_map_suffix}
│   │   │   │   ├── zzz{seg_map_suffix}
│   │   │   ├── val

The image names in img_dir and img_dir2 should be consistent. The img/gt_semantic_seg pair of BaseSegDataset should be of the same except suffix. A valid img/gt_semantic_seg filename pair should be like xxx{img_suffix} and xxx{seg_map_suffix} (extension is also included in the suffix). If split is given, then xxx is specified in txt file. Otherwise, all files in img_dir/``and ``ann_dir will be loaded. Please refer to docs/en/tutorials/new_dataset.md for more details.

Parameters
  • ann_file (str) – Annotation file path. Defaults to ‘’.

  • metainfo (dict, optional) – Meta information for dataset, such as specify classes to load. Defaults to None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Defaults to None.

  • data_prefix (dict, optional) – Prefix for training data. Defaults to dict(img_path=None, img_path2=None, seg_map_path=None).

  • img_suffix (str) – Suffix of images. Default: ‘.jpg’

  • img_suffix2 (str) – Suffix of images. Default: ‘.jpg’

  • seg_map_suffix (str) – Suffix of segmentation maps. Default: ‘.png’

  • filter_cfg (dict, optional) – Config for filter data. Defaults to None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.

  • pipeline (list, optional) – Processing pipeline. Defaults to [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Defaults to False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=True. Defaults to False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.

  • ignore_index (int) – The label index to be ignored. Default: 255

  • reduce_zero_label (bool) – Whether to mark label zero as ignored. Default to False.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

classmethod get_label_map(new_classes: Optional[Sequence] = None)Optional[Dict][source]

Require label mapping.

The label_map is a dictionary, its keys are the old label ids and its values are the new label ids, and is used for changing pixel labels in load_annotations. If and only if old classes in cls.METAINFO is not equal to new classes in self._metainfo and nether of them is not None, label_map is not None.

Parameters

new_classes (list, tuple, optional) – The new classes name from metainfo. Default to None.

Returns

The mapping from old classes in cls.METAINFO to

new classes in self._metainfo

Return type

dict, optional

load_data_list()List[dict][source]

Load annotation from directory or annotation file.

Returns

All data info of dataset.

Return type

list[dict]

class mmseg.datasets.BaseSegDataset(ann_file: str = '', img_suffix='.jpg', seg_map_suffix='.png', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img_path': '', 'seg_map_path': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, ignore_index: int = 255, reduce_zero_label: bool = False, backend_args: Optional[dict] = None)[source]

Custom dataset for semantic segmentation. An example of file structure is as followed.

├── data
│   ├── my_dataset
│   │   ├── img_dir
│   │   │   ├── train
│   │   │   │   ├── xxx{img_suffix}
│   │   │   │   ├── yyy{img_suffix}
│   │   │   │   ├── zzz{img_suffix}
│   │   │   ├── val
│   │   ├── ann_dir
│   │   │   ├── train
│   │   │   │   ├── xxx{seg_map_suffix}
│   │   │   │   ├── yyy{seg_map_suffix}
│   │   │   │   ├── zzz{seg_map_suffix}
│   │   │   ├── val

The img/gt_semantic_seg pair of BaseSegDataset should be of the same except suffix. A valid img/gt_semantic_seg filename pair should be like xxx{img_suffix} and xxx{seg_map_suffix} (extension is also included in the suffix). If split is given, then xxx is specified in txt file. Otherwise, all files in img_dir/``and ``ann_dir will be loaded. Please refer to docs/en/tutorials/new_dataset.md for more details.

Parameters
  • ann_file (str) – Annotation file path. Defaults to ‘’.

  • metainfo (dict, optional) – Meta information for dataset, such as specify classes to load. Defaults to None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Defaults to None.

  • data_prefix (dict, optional) – Prefix for training data. Defaults to dict(img_path=None, seg_map_path=None).

  • img_suffix (str) – Suffix of images. Default: ‘.jpg’

  • seg_map_suffix (str) – Suffix of segmentation maps. Default: ‘.png’

  • filter_cfg (dict, optional) – Config for filter data. Defaults to None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.

  • pipeline (list, optional) – Processing pipeline. Defaults to [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Defaults to False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=True. Defaults to False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.

  • ignore_index (int) – The label index to be ignored. Default: 255

  • reduce_zero_label (bool) – Whether to mark label zero as ignored. Default to False.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

classmethod get_label_map(new_classes: Optional[Sequence] = None)Optional[Dict][source]

Require label mapping.

The label_map is a dictionary, its keys are the old label ids and its values are the new label ids, and is used for changing pixel labels in load_annotations. If and only if old classes in cls.METAINFO is not equal to new classes in self._metainfo and nether of them is not None, label_map is not None.

Parameters

new_classes (list, tuple, optional) – The new classes name from metainfo. Default to None.

Returns

The mapping from old classes in cls.METAINFO to

new classes in self._metainfo

Return type

dict, optional

load_data_list()List[dict][source]

Load annotation from directory or annotation file.

Returns

All data info of dataset.

Return type

list[dict]

class mmseg.datasets.BioMedical3DPad(pad_shape: Tuple[int, int, int], pad_val: float = 0.0, seg_pad_val: int = 0)[source]

Pad the biomedical 3d image & biomedical 3d semantic segmentation maps.

Required Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Modified Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Added Keys:

  • pad_shape (Tuple[int, int, int]): The padded shape.

Parameters
  • pad_shape (Tuple[int, int, int]) – Fixed padding size. Expected padding shape (Z, Y, X).

  • pad_val (float) – Padding value for biomedical image. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.

  • seg_pad_val (int) – Padding value for biomedical 3d semantic segmentation maps. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.

transform(results: dict)dict[source]

Call function to pad images, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Updated result dict.

Return type

dict

class mmseg.datasets.BioMedical3DRandomCrop(crop_shape: Union[int, Tuple[int, int, int]], keep_foreground: bool = True)[source]

Crop the input patch for medical image & segmentation mask.

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

  • gt_seg_map (np.ndarray, optional): Biomedical semantic segmentation mask

    with shape (Z, Y, X).

Modified Keys:

  • img

  • img_shape

  • gt_seg_map (optional)

Parameters
  • crop_shape (Union[int, Tuple[int, int, int]]) – Expected size after cropping with the format of (z, y, x). If set to an integer, then cropping width and height are equal to this integer.

  • keep_foreground (bool) – If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the crop bounding-box. Default to True.

crop(img: numpy.ndarray, crop_bbox: tuple)numpy.ndarray[source]

Crop from img

Parameters
  • img (np.ndarray) – Original input image.

  • crop_bbox (tuple) – Coordinates of the cropped image.

Returns

The cropped image.

Return type

np.ndarray

generate_margin(results: dict)tuple[source]

Generate margin of crop bounding-box.

If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the bounding-box, and return the margin between of the bounding-box and image. If keep_foreground is False, it will return the difference from crop shape and image shape.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

The margin for 3 dimensions of crop bounding-box and image.

Return type

tuple

random_generate_crop_bbox(margin_z: int, margin_y: int, margin_x: int)tuple[source]

Randomly get a crop bounding box.

Parameters

seg_map (np.ndarray) – Ground truth segmentation map.

Returns

Coordinates of the cropped image.

Return type

tuple

random_sample_location(seg_map: numpy.ndarray)dict[source]

sample foreground voxel when keep_foreground is True.

Parameters

seg_map (np.ndarray) – gt seg map.

Returns

Coordinates of selected foreground voxel.

Return type

dict

transform(results: dict)dict[source]

Transform function to randomly crop images, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Randomly cropped results, ‘img_shape’ key in result dict is

updated according to crop size.

Return type

dict

class mmseg.datasets.BioMedical3DRandomFlip(prob: float, axes: Tuple[int, ...], swap_label_pairs: Optional[List[Tuple[int, int]]] = None)[source]

Flip biomedical 3D images and segmentations.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/spatial_transforms.py # noqa:E501

Copyright 2021 Division of Medical Image Computing, German Cancer Research Center (DKFZ) and Applied Computer Vision Lab, Helmholtz Imaging Platform. Licensed under the Apache-2.0 License.

Required Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Modified Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Added Keys:

  • do_flip

  • flip_axes

Parameters
  • prob (float) – Flipping probability.

  • axes (Tuple[int, ...]) – Flipping axes with order ‘ZXY’.

  • swap_label_pairs (Optional[List[Tuple[int, int]]]) –

  • segmentation label pairs that are swapped when flipping. (The) –

transform(results: Dict)Dict[source]

Call function to flip and swap pair labels.

Parameters

results (dict) – Result dict.

Returns

Flipped results, ‘do_flip’, ‘flip_axes’ keys are added into

result dict.

Return type

dict

class mmseg.datasets.BioMedicalGaussianBlur(sigma_range: Tuple[float, float] = (0.5, 1.0), prob: float = 0.2, prob_per_channel: float = 0.5, different_sigma_per_channel: bool = True, different_sigma_per_axis: bool = True)[source]

Add Gaussian blur with random sigma to image.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L81 # noqa:E501

Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

Modified Keys:

  • img

Parameters
  • sigma_range (Tuple[float, float]|float) – range to randomly select sigma value. Default to (0.5, 1.0).

  • prob (float) – Probability to apply Gaussian blur for each sample. Default to 0.2.

  • prob_per_channel (float) – Probability to apply Gaussian blur for each channel (axis N of the image). Default to 0.5.

  • different_sigma_per_channel (bool) – whether to use different sigma for each channel (axis N of the image). Default to True.

  • different_sigma_per_axis (bool) – whether to use different sigma for axis Z, X and Y of the image. Default to True.

transform(results: Dict)Dict[source]

Call function to add random Gaussian blur to image.

Parameters

results (dict) – Result dict.

Returns

Result dict with random Gaussian noise.

Return type

dict

class mmseg.datasets.BioMedicalGaussianNoise(prob: float = 0.1, mean: float = 0.0, std: float = 0.1)[source]

Add random Gaussian noise to image.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L53 # noqa:E501

Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

Modified Keys:

  • img

Parameters
  • prob (float) – Probability to add Gaussian noise for each sample. Default to 0.1.

  • mean (float) – Mean or “centre” of the distribution. Default to 0.0.

  • std (float) – Standard deviation of distribution. Default to 0.1.

transform(results: Dict)Dict[source]

Call function to add random Gaussian noise to image.

Parameters

results (dict) – Result dict.

Returns

Result dict with random Gaussian noise.

Return type

dict

class mmseg.datasets.BioMedicalRandomGamma(prob: float = 0.5, gamma_range: Tuple[float] = (0.5, 2), invert_image: bool = False, per_channel: bool = False, retain_stats: bool = False)[source]

Using random gamma correction to process the biomedical image.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/color_transforms.py#L132 # noqa:E501 With licence: Apache 2.0

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

Modified Keys: - img

Parameters
  • prob (float) – The probability to perform this transform. Default: 0.5.

  • gamma_range (Tuple[float]) – Range of gamma values. Default: (0.5, 2).

  • invert_image (bool) – Whether invert the image before applying gamma augmentation. Default: False.

  • per_channel (bool) – Whether perform the transform each channel individually. Default: False

  • retain_stats (bool) – Gamma transformation will alter the mean and std of the data in the patch. If retain_stats=True, the data will be transformed to match the mean and standard deviation before gamma augmentation. Default: False.

transform(results: dict)dict[source]

Call function to perform random gamma correction :param results: Result dict from loading pipeline. :type results: dict

Returns

Result dict with random gamma correction performed.

Return type

dict

class mmseg.datasets.CLAHE(clip_limit=40.0, tile_grid_size=(8, 8))[source]

Use CLAHE method to process the image.

See ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994:474-485. for more information.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • clip_limit (float) – Threshold for contrast limiting. Default: 40.0.

  • tile_grid_size (tuple[int]) – Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. It defines the number of tiles in row and column. Default: (8, 8).

transform(results: dict)dict[source]

Call function to Use CLAHE method process images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Processed results.

Return type

dict

class mmseg.datasets.COCOStuffDataset(img_suffix='.jpg', seg_map_suffix='_labelTrainIds.png', **kwargs)[source]

COCO-Stuff dataset.

In segmentation map annotation for COCO-Stuff, Train-IDs of the 10k version are from 1 to 171, where 0 is the ignore index, and Train-ID of COCO Stuff 164k is from 0 to 170, where 255 is the ignore index. So, they are all 171 semantic categories. reduce_zero_label is set to True and False for the 10k and 164k versions, respectively. The img_suffix is fixed to ‘.jpg’, and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.ChaseDB1Dataset(img_suffix='.png', seg_map_suffix='_1stHO.png', reduce_zero_label=False, **kwargs)[source]

Chase_db1 dataset.

In segmentation map annotation for Chase_db1, 0 stands for background, which is included in 2 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘_1stHO.png’.

class mmseg.datasets.CityscapesDataset(img_suffix='_leftImg8bit.png', seg_map_suffix='_gtFine_labelTrainIds.png', **kwargs)[source]

Cityscapes dataset.

The img_suffix is fixed to ‘_leftImg8bit.png’ and seg_map_suffix is fixed to ‘_gtFine_labelTrainIds.png’ for Cityscapes dataset.

class mmseg.datasets.ConcatCDInput(input_keys=('img', 'img2'))[source]

Concat images for change detection.

Required Keys:

  • img

  • img2

Parameters

input_keys (tuple) – Input image keys for change detection. Default: (‘img’, ‘img2’).

transform(results: dict)dict[source]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

class mmseg.datasets.DRIVEDataset(img_suffix='.png', seg_map_suffix='_manual1.png', reduce_zero_label=False, **kwargs)[source]

DRIVE dataset.

In segmentation map annotation for DRIVE, 0 stands for background, which is included in 2 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘_manual1.png’.

class mmseg.datasets.DSDLSegDataset(specific_key_path: Dict = {}, pre_transform: Dict = {}, used_labels: Optional[Sequence] = None, **kwargs)[source]

Dataset for dsdl segmentation.

Parameters
  • specific_key_path (dict) – Path of specific key which can not be loaded by it’s field name.

  • pre_transform (dict) – pre-transform functions before loading.

  • used_labels (sequence) – list of actual used classes in train steps, this must be subset of class domain.

get_label_map(new_classes: Optional[Sequence] = None)Optional[Dict][source]

Require label mapping.

The label_map is a dictionary, its keys are the old label ids and its values are the new label ids, and is used for changing pixel labels in load_annotations. If and only if old classes in class_dom is not equal to new classes in args and nether of them is not None, label_map is not None. :param new_classes: The new classes name from

metainfo. Default to None.

Returns

The mapping from old classes to new classes.

Return type

dict, optional

load_data_list()List[Dict][source]

Load data info from a dsdl yaml file named as self.ann_file

Returns

A list of data list.

Return type

List[dict]

class mmseg.datasets.DarkZurichDataset(img_suffix='_rgb_anon.png', seg_map_suffix='_gt_labelTrainIds.png', **kwargs)[source]

DarkZurichDataset dataset.

class mmseg.datasets.DecathlonDataset(ann_file: str = '', img_suffix='.jpg', seg_map_suffix='.png', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img_path': '', 'seg_map_path': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, ignore_index: int = 255, reduce_zero_label: bool = False, backend_args: Optional[dict] = None)[source]

Dataset for Dacathlon dataset.

The dataset.json format is shown as follows

{
    "name": "BRATS",
    "tensorImageSize": "4D",
    "modality":
    {
        "0": "FLAIR",
        "1": "T1w",
        "2": "t1gd",
        "3": "T2w"
    },
    "labels": {
        "0": "background",
        "1": "edema",
        "2": "non-enhancing tumor",
        "3": "enhancing tumour"
    },
    "numTraining": 484,
    "numTest": 266,
    "training":
    [
        {
            "image": "./imagesTr/BRATS_306.nii.gz"
            "label": "./labelsTr/BRATS_306.nii.gz"
            ...
        }
    ]
    "test":
    [
        "./imagesTs/BRATS_557.nii.gz"
        ...
    ]
}
load_data_list()List[dict][source]

Load annotation from directory or annotation file.

Returns

All data info of dataset.

Return type

list[dict]

class mmseg.datasets.GenerateEdge(edge_width: int = 3, ignore_index: int = 255)[source]

Generate Edge for CE2P approach.

Edge will be used to calculate loss of CE2P.

Modified from https://github.com/liutinglt/CE2P/blob/master/dataset/target_generation.py # noqa:E501

Required Keys:

  • img_shape

  • gt_seg_map

Added Keys:
  • gt_edge_map (np.ndarray, uint8): The edge annotation generated from the

    seg map by extracting border between different semantics.

Parameters
  • edge_width (int) – The width of edge. Default to 3.

  • ignore_index (int) – Index that will be ignored. Default to 255.

transform(results: Dict)Dict[source]

Call function to generate edge from segmentation map.

Parameters

results (dict) – Result dict.

Returns

Result dict with edge mask.

Return type

dict

class mmseg.datasets.HRFDataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=False, **kwargs)[source]

HRF dataset.

In segmentation map annotation for HRF, 0 stands for background, which is included in 2 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.ISPRSDataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]

ISPRS dataset.

In segmentation map annotation for ISPRS, 0 is the ignore index. reduce_zero_label should be set to True. The img_suffix and seg_map_suffix are both fixed to ‘.png’.

class mmseg.datasets.LEVIRCDDataset(img_suffix='.png', img_suffix2='.png', seg_map_suffix='.png', reduce_zero_label=False, **kwargs)[source]

ISPRS dataset.

In segmentation map annotation for ISPRS, 0 is to ignore index. reduce_zero_label should be set to True. The img_suffix and seg_map_suffix are both fixed to ‘.png’.

class mmseg.datasets.LIPDataset(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]

LIP dataset.

The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.LoadAnnotations(reduce_zero_label=None, backend_args=None, imdecode_backend='pillow')[source]

Load annotations for semantic segmentation provided by dataset.

The annotation format is as the following:

{
    # Filename of semantic segmentation ground truth file.
    'seg_map_path': 'a/b/c'
}

After this module, the annotation has been changed to the format below:

{
    # in str
    'seg_fields': List
     # In uint8 type.
    'gt_seg_map': np.ndarray (H, W)
}

Required Keys:

  • seg_map_path (str): Path of semantic segmentation ground truth file.

Added Keys:

  • seg_fields (List)

  • gt_seg_map (np.uint8)

Parameters
  • reduce_zero_label (bool, optional) – Whether reduce all label value by 1. Usually used for datasets where 0 is background label. Defaults to None.

  • imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :fun:mmcv.imfrombytes for details. Defaults to ‘pillow’.

  • backend_args (dict) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

class mmseg.datasets.LoadBiomedicalAnnotation(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]

Load seg_map annotation provided by biomedical dataset.

The annotation format is as the following:

{
    'gt_seg_map': np.ndarray (X, Y, Z) or (Z, Y, X)
}

Required Keys:

  • seg_map_path

Added Keys:

  • gt_seg_map (np.ndarray): Biomedical seg map with shape (Z, Y, X) by

    default, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.

Parameters
  • decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.

  • to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.

  • to_float32 (bool) – Whether to convert the loaded seg map to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See mmengine.fileio for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.LoadBiomedicalData(with_seg=False, decode_backend: str = 'numpy', to_xyz: bool = False, backend_args: Optional[dict] = None)[source]

Load an biomedical image and annotation from file.

The loading data format is as the following:

{
    'img': np.ndarray data[:-1, X, Y, Z]
    'seg_map': np.ndarray data[-1, X, Y, Z]
}

Required Keys:

  • img_path

Added Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

  • img_shape

  • ori_shape

Parameters
  • with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.

  • decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.

  • to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.LoadBiomedicalImageFromFile(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]

Load an biomedical mage from file.

Required Keys:

  • img_path

Added Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.

  • img_shape

  • ori_shape

Parameters
  • decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.

  • to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.

  • to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.LoadImageFromNDArray(to_float32: bool = False, color_type: str = 'color', imdecode_backend: str = 'cv2', file_client_args: Optional[dict] = None, ignore_empty: bool = False, *, backend_args: Optional[dict] = None)[source]

Load an image from results['img'].

Similar with LoadImageFromFile, but the image has been loaded as np.ndarray in results['img']. Can be used when loading image from webcam.

Required Keys:

  • img

Modified Keys:

  • img

  • img_path

  • img_shape

  • ori_shape

Parameters

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

transform(results: dict)dict[source]

Transform function to add image meta information.

Parameters

results (dict) – Result dict with Webcam read image in results['img'].

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.LoadMultipleRSImageFromFile(to_float32: bool = True)[source]

Load two Remote Sensing mage from file.

Required Keys:

  • img_path

  • img_path2

Modified Keys:

  • img

  • img2

  • img_shape

  • ori_shape

Parameters

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is a float64 array. Defaults to True.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.LoadSingleRSImageFromFile(to_float32: bool = True)[source]

Load a Remote Sensing mage from file.

Required Keys:

  • img_path

Modified Keys:

  • img

  • img_shape

  • ori_shape

Parameters

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is a float64 array. Defaults to True.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.LoveDADataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]

LoveDA dataset.

In segmentation map annotation for LoveDA, 0 is the ignore index. reduce_zero_label should be set to True. The img_suffix and seg_map_suffix are both fixed to ‘.png’.

class mmseg.datasets.MapillaryDataset_v1(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]

Mapillary Vistas Dataset.

Dataset paper link: http://ieeexplore.ieee.org/document/8237796/

v1.2 contain 66 object classes. (37 instance-specific)

v2.0 contain 124 object classes. (70 instance-specific, 46 stuff, 8 void or crowd).

The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’ for Mapillary Vistas Dataset.

class mmseg.datasets.MapillaryDataset_v2(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]

Mapillary Vistas Dataset.

Dataset paper link: http://ieeexplore.ieee.org/document/8237796/

v1.2 contain 66 object classes. (37 instance-specific)

v2.0 contain 124 object classes. (70 instance-specific, 46 stuff, 8 void or crowd).

The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’ for Mapillary Vistas Dataset.

class mmseg.datasets.MultiImageMixDataset(dataset: Union[mmengine.dataset.dataset_wrapper.ConcatDataset, dict], pipeline: Sequence[dict], skip_type_keys: Optional[List[str]] = None, lazy_init: bool = False)[source]

A wrapper of multiple images mixed dataset.

Suitable for training on multiple images mixed data augmentation like mosaic and mixup.

Parameters
  • dataset (ConcatDataset or dict) – The dataset to be mixed.

  • pipeline (Sequence[dict]) – Sequence of transform object or config dict to be composed.

  • skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline. Default to None.

full_init()[source]

Loop to full_init each dataset.

get_data_info(idx: int)dict[source]

Get annotation by index.

Parameters

idx (int) – Global index of ConcatDataset.

Returns

The idx-th annotation of the datasets.

Return type

dict

property metainfo: dict

Get the meta information of the multi-image-mixed dataset.

Returns

The meta information of multi-image-mixed dataset.

Return type

dict

update_skip_type_keys(skip_type_keys)[source]

Update skip_type_keys.

It is called by an external hook.

Parameters

skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline.

class mmseg.datasets.NYUDataset(data_prefix={'depth_map_path': 'annotations', 'img_path': 'images'}, img_suffix='.jpg', depth_map_suffix='.png', **kwargs)[source]

NYU depth estimation dataset. The file structure should be.

├── data
│   ├── nyu
│   │   ├── images
│   │   │   ├── train
│   │   │   │   ├── scene_xxx.jpg
│   │   │   │   ├── ...
│   │   │   ├── test
│   │   ├── annotations
│   │   │   ├── train
│   │   │   │   ├── scene_xxx.png
│   │   │   │   ├── ...
│   │   │   ├── test
Parameters
  • ann_file (str) – Annotation file path. Defaults to ‘’.

  • metainfo (dict, optional) – Meta information for dataset, such as specify classes to load. Defaults to None.

  • data_root (str, optional) – The root directory for data_prefix and ann_file. Defaults to None.

  • data_prefix (dict, optional) – Prefix for training data. Defaults to dict(img_path=’images’, depth_map_path=’annotations’).

  • img_suffix (str) – Suffix of images. Default: ‘.jpg’

  • seg_map_suffix (str) – Suffix of segmentation maps. Default: ‘.png’

  • filter_cfg (dict, optional) – Config for filter data. Defaults to None.

  • indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None which means using all data_infos.

  • serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.

  • pipeline (list, optional) – Processing pipeline. Defaults to [].

  • test_mode (bool, optional) – test_mode=True means in test phase. Defaults to False.

  • lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file. Basedataset can skip load annotations to save time by set lazy_init=True. Defaults to False.

  • max_refetch (int, optional) – If Basedataset.prepare_data get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.

  • ignore_index (int) – The label index to be ignored. Default: 255

  • reduce_zero_label (bool) – Whether to mark label zero as ignored. Default to False.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

load_data_list()List[dict][source]

Load annotation from directory or annotation file.

Returns

All data info of dataset.

Return type

list[dict]

class mmseg.datasets.NightDrivingDataset(img_suffix='_leftImg8bit.png', seg_map_suffix='_gtCoarse_labelTrainIds.png', **kwargs)[source]

NightDrivingDataset dataset.

class mmseg.datasets.PackSegInputs(meta_keys=('img_path', 'seg_map_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'reduce_zero_label'))[source]

Pack the inputs data for the semantic segmentation.

The img_meta item is always populated. The contents of the img_meta dictionary depends on meta_keys. By default this includes:

  • img_path: filename of the image

  • ori_shape: original shape of the image as a tuple (h, w, c)

  • img_shape: shape of the image input to the network as a tuple (h, w, c). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.

  • pad_shape: shape of padded images

  • scale_factor: a float indicating the preprocessing scale

  • flip: a boolean indicating if image flip transform was used

  • flip_direction: the flipping direction

Parameters

meta_keys (Sequence[str], optional) – Meta keys to be packed from SegDataSample and collected in data[img_metas]. Default: ('img_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction')

transform(results: dict)dict[source]

Method to pack the input data.

Parameters

results (dict) – Result dict from the data pipeline.

Returns

  • ‘inputs’ (obj:torch.Tensor): The forward data of models.

  • ’data_sample’ (obj:SegDataSample): The annotation info of the

    sample.

Return type

dict

class mmseg.datasets.PascalContextDataset(ann_file='', img_suffix='.jpg', seg_map_suffix='.png', reduce_zero_label=False, **kwargs)[source]

PascalContext dataset.

In segmentation map annotation for PascalContext, 0 stands for background, which is included in 60 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’.

Parameters

ann_file (str) – Annotation file path.

class mmseg.datasets.PascalContextDataset59(ann_file='', img_suffix='.jpg', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]

PascalContext dataset.

In segmentation map annotation for PascalContext, 0 stands for background, which is included in 60 categories. reduce_zero_label is fixed to True. The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’. Noted: If the background is 255 and the ids of categories are from 0 to 58, reduce_zero_label needs to be set to False.

Parameters

ann_file (str) – Annotation file path.

class mmseg.datasets.PascalVOCDataset(ann_file, img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]

Pascal VOC dataset.

Parameters

split (str) – Split txt file for Pascal VOC.

class mmseg.datasets.PhotoMetricDistortion(brightness_delta: int = 32, contrast_range: Sequence[float] = (0.5, 1.5), saturation_range: Sequence[float] = (0.5, 1.5), hue_delta: int = 18)[source]

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

  1. random brightness

  2. random contrast (mode 0)

  3. convert color from BGR to HSV

  4. random saturation

  5. random hue

  6. convert color from HSV to BGR

  7. random contrast (mode 1)

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • brightness_delta (int) – delta of brightness.

  • contrast_range (tuple) – range of contrast.

  • saturation_range (tuple) – range of saturation.

  • hue_delta (int) – delta of hue.

brightness(img: numpy.ndarray)numpy.ndarray[source]

Brightness distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after brightness change.

Return type

np.ndarray

contrast(img: numpy.ndarray)numpy.ndarray[source]

Contrast distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after contrast change.

Return type

np.ndarray

convert(img: numpy.ndarray, alpha: int = 1, beta: int = 0)numpy.ndarray[source]

Multiple with alpha and add beat with clip.

Parameters
  • img (np.ndarray) – The input image.

  • alpha (int) – Image weights, change the contrast/saturation of the image. Default: 1

  • beta (int) – Image bias, change the brightness of the image. Default: 0

Returns

The transformed image.

Return type

np.ndarray

hue(img: numpy.ndarray)numpy.ndarray[source]

Hue distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after hue change.

Return type

np.ndarray

saturation(img: numpy.ndarray)numpy.ndarray[source]

Saturation distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after saturation change.

Return type

np.ndarray

transform(results: dict)dict[source]

Transform function to perform photometric distortion on images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with images distorted.

Return type

dict

class mmseg.datasets.PotsdamDataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]

ISPRS Potsdam dataset.

In segmentation map annotation for Potsdam dataset, 0 is the ignore index. reduce_zero_label should be set to True. The img_suffix and seg_map_suffix are both fixed to ‘.png’.

class mmseg.datasets.REFUGEDataset(**kwargs)[source]

REFUGE dataset.

In segmentation map annotation for REFUGE, 0 stands for background, which is not included in 2 categories. reduce_zero_label is fixed to True. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.RGB2Gray(out_channels=None, weights=(0.299, 0.587, 0.114))[source]

Convert RGB image to grayscale image.

Required Keys:

  • img

Modified Keys:

  • img

  • img_shape

This transform calculate the weighted mean of input image channels with weights and then expand the channels to out_channels. When out_channels is None, the number of output channels is the same as input channels.

Parameters
  • out_channels (int) – Expected number of output channels after transforming. Default: None.

  • weights (tuple[float]) – The weights to calculate the weighted mean. Default: (0.299, 0.587, 0.114).

transform(results: dict)dict[source]

Call function to convert RGB image to grayscale image.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with grayscale image.

Return type

dict

class mmseg.datasets.RandomCrop(crop_size: Union[int, Tuple[int, int]], cat_max_ratio: float = 1.0, ignore_index: int = 255)[source]

Random crop the image & seg.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • img_shape

  • gt_seg_map

Parameters
  • crop_size (Union[int, Tuple[int, int]]) – Expected size after cropping with the format of (h, w). If set to an integer, then cropping width and height are equal to this integer.

  • cat_max_ratio (float) – The maximum ratio that single category could occupy.

  • ignore_index (int) – The label index to be ignored. Default: 255

crop(img: numpy.ndarray, crop_bbox: tuple)numpy.ndarray[source]

Crop from img

Parameters
  • img (np.ndarray) – Original input image.

  • crop_bbox (tuple) – Coordinates of the cropped image.

Returns

The cropped image.

Return type

np.ndarray

transform(results: dict)dict[source]

Transform function to randomly crop images, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Randomly cropped results, ‘img_shape’ key in result dict is

updated according to crop size.

Return type

dict

class mmseg.datasets.RandomCutOut(prob, n_holes, cutout_shape=None, cutout_ratio=None, fill_in=(0, 0, 0), seg_fill_in=None)[source]

CutOut operation.

Randomly drop some regions of image used in Cutout.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • gt_seg_map

Parameters
  • prob (float) – cutout probability.

  • n_holes (int | tuple[int, int]) – Number of regions to be dropped. If it is given as a list, number of holes will be randomly selected from the closed interval [n_holes[0], n_holes[1]].

  • cutout_shape (tuple[int, int] | list[tuple[int, int]]) – The candidate shape of dropped regions. It can be tuple[int, int] to use a fixed cutout shape, or list[tuple[int, int]] to randomly choose shape from the list.

  • cutout_ratio (tuple[float, float] | list[tuple[float, float]]) – The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time.

  • fill_in (tuple[float, float, float] | tuple[int, int, int]) – The value of pixel to fill in the dropped regions. Default: (0, 0, 0).

  • seg_fill_in (int) – The labels of pixel to fill in the dropped regions. If seg_fill_in is None, skip. Default: None.

transform(results: dict)dict[source]

Call function to drop some regions of image.

class mmseg.datasets.RandomMosaic(prob, img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=0, seg_pad_val=255)[source]

Mosaic augmentation. Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

                   mosaic transform
                      center_x
           +------------------------------+
           |       pad        |  pad      |
           |      +-----------+           |
           |      |           |           |
           |      |  image1   |--------+  |
           |      |           |        |  |
           |      |           | image2 |  |
center_y   |----+-------------+-----------|
           |    |   cropped   |           |
           |pad |   image3    |  image4   |
           |    |             |           |
           +----|-------------+-----------+
                |             |
                +-------------+

The mosaic transform steps are as follows:
    1. Choose the mosaic center as the intersections of 4 images
    2. Get the left top image according to the index, and randomly
       sample another 3 images from the custom dataset.
    3. Sub image will be cropped if image is larger than mosaic patch

Required Keys:

  • img

  • gt_seg_map

  • mix_results

Modified Keys:

  • img

  • img_shape

  • ori_shape

  • gt_seg_map

Parameters
  • prob (float) – mosaic probability.

  • img_scale (Sequence[int]) – Image size after mosaic pipeline of a single image. The size of the output image is four times that of a single image. The output image comprises 4 single images. Default: (640, 640).

  • center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default: (0.5, 1.5).

  • pad_val (int) – Pad value. Default: 0.

  • seg_pad_val (int) – Pad value of segmentation map. Default: 255.

get_indices(dataset: mmseg.datasets.dataset_wrappers.MultiImageMixDataset)list[source]

Call function to collect indices.

Parameters

dataset (MultiImageMixDataset) – The dataset.

Returns

indices.

Return type

list

transform(results: dict)dict[source]

Call function to make a mosaic of image.

Parameters

results (dict) – Result dict.

Returns

Result dict with mosaic transformed.

Return type

dict

class mmseg.datasets.RandomRotFlip(rotate_prob=0.5, flip_prob=0.5, degree=(- 20, 20))[source]

Rotate and flip the image & seg or just rotate the image & seg.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • gt_seg_map

Parameters
  • rotate_prob (float) – The probability of rotate image.

  • flip_prob (float) – The probability of rotate&flip image.

  • degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (-degree, +degree)

transform(results: dict)dict[source]

Call function to rotate or rotate & flip image, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Rotated or rotated & flipped results.

Return type

dict

class mmseg.datasets.RandomRotate(prob, degree, pad_val=0, seg_pad_val=255, center=None, auto_bound=False)[source]

Rotate the image & seg.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • gt_seg_map

Parameters
  • prob (float) – The rotation probability.

  • degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (-degree, +degree)

  • pad_val (float, optional) – Padding value of image. Default: 0.

  • seg_pad_val (float, optional) – Padding value of segmentation map. Default: 255.

  • center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used. Default: None.

  • auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image. Default: False

transform(results: dict)dict[source]

Call function to rotate image, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Rotated results.

Return type

dict

class mmseg.datasets.Rerange(min_value=0, max_value=255)[source]

Rerange the image pixel value.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • min_value (float or int) – Minimum value of the reranged image. Default: 0.

  • max_value (float or int) – Maximum value of the reranged image. Default: 255.

transform(results: dict)dict[source]

Call function to rerange images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Reranged results.

Return type

dict

class mmseg.datasets.ResizeShortestEdge(scale: Union[int, Tuple[int, int]], max_size: int)[source]

Resize the image and mask while keeping the aspect ratio unchanged.

Modified from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/transforms/augmentation_impl.py#L130 # noqa:E501 Copyright (c) Facebook, Inc. and its affiliates. Licensed under the Apache-2.0 License

This transform attempts to scale the shorter edge to the given scale, as long as the longer edge does not exceed max_size. If max_size is reached, then downscale so that the longer edge does not exceed max_size.

Required Keys:

  • img

  • gt_seg_map (optional)

Modified Keys:

  • img

  • img_shape

  • gt_seg_map (optional))

Added Keys:

  • scale

  • scale_factor

  • keep_ratio

Parameters
  • scale (Union[int, Tuple[int, int]]) – The target short edge length. If it’s tuple, will select the min value as the short edge length.

  • max_size (int) – The maximum allowed longest edge length.

transform(results: Dict)Dict[source]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

class mmseg.datasets.ResizeToMultiple(size_divisor=32, interpolation=None)[source]

Resize images & seg to multiple of divisor.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • img_shape

  • pad_shape

Parameters
  • size_divisor (int) – images and gt seg maps need to resize to multiple of size_divisor. Default: 32.

  • interpolation (str, optional) – The interpolation mode of image resize. Default: None

transform(results: dict)dict[source]

Call function to resize images, semantic segmentation map to multiple of size divisor.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Resized results, ‘img_shape’, ‘pad_shape’ keys are updated.

Return type

dict

class mmseg.datasets.STAREDataset(img_suffix='.png', seg_map_suffix='.ah.png', reduce_zero_label=False, **kwargs)[source]

STARE dataset.

In segmentation map annotation for STARE, 0 stands for background, which is included in 2 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘.ah.png’.

class mmseg.datasets.SegRescale(scale_factor=1)[source]

Rescale semantic segmentation maps.

Required Keys:

  • gt_seg_map

Modified Keys:

  • gt_seg_map

Parameters

scale_factor (float) – The scale factor of the final output.

transform(results: dict)dict[source]

Call function to scale the semantic segmentation map.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with semantic segmentation map scaled.

Return type

dict

class mmseg.datasets.SynapseDataset(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]

Synapse dataset.

Before dataset preprocess of Synapse, there are total 13 categories of foreground which does not include background. After preprocessing, 8 foreground categories are kept while the other 5 foreground categories are handled as background. The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.iSAIDDataset(img_suffix='.png', seg_map_suffix='_instance_color_RGB.png', ignore_index=255, **kwargs)[source]

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images In segmentation map annotation for iSAID dataset, which is included in 16 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘_manual1.png’.

transforms

class mmseg.datasets.transforms.AdjustGamma(gamma=1.0)[source]

Using gamma correction to process the image.

Required Keys:

  • img

Modified Keys:

  • img

Parameters

gamma (float or int) – Gamma value used in gamma correction. Default: 1.0.

transform(results: dict)dict[source]

Call function to process the image with gamma correction.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Processed results.

Return type

dict

class mmseg.datasets.transforms.Albu(transforms: List[dict], keymap: Optional[dict] = None, update_pad_shape: bool = False)[source]

Albumentation augmentation. Adds custom transformations from Albumentations library. Please, visit https://albumentations.readthedocs.io to get more information. An example of transforms is as followed:

Parameters
  • transforms (list[dict]) – A list of albu transformations

  • keymap (dict) – Contains {‘input key’:’albumentation-style key’}

  • update_pad_shape (bool) – Whether to update padding shape according to the output shape of the last transform

albu_builder(cfg: dict)object[source]

Build a callable object from a dict containing albu arguments.

Parameters

cfg (dict) – Config dict. It should at least contain the key “type”.

Returns

A callable object.

Return type

Callable

static mapper(d: dict, keymap: dict)[source]

Dictionary mapper.

Renames keys according to keymap provided. :param d: old dict :type d: dict :param keymap: {‘old_key’:’new_key’} :type keymap: dict

Returns

new dict.

Return type

dict

transform(results)[source]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

class mmseg.datasets.transforms.BioMedical3DPad(pad_shape: Tuple[int, int, int], pad_val: float = 0.0, seg_pad_val: int = 0)[source]

Pad the biomedical 3d image & biomedical 3d semantic segmentation maps.

Required Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Modified Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Added Keys:

  • pad_shape (Tuple[int, int, int]): The padded shape.

Parameters
  • pad_shape (Tuple[int, int, int]) – Fixed padding size. Expected padding shape (Z, Y, X).

  • pad_val (float) – Padding value for biomedical image. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.

  • seg_pad_val (int) – Padding value for biomedical 3d semantic segmentation maps. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.

transform(results: dict)dict[source]

Call function to pad images, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Updated result dict.

Return type

dict

class mmseg.datasets.transforms.BioMedical3DRandomCrop(crop_shape: Union[int, Tuple[int, int, int]], keep_foreground: bool = True)[source]

Crop the input patch for medical image & segmentation mask.

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

  • gt_seg_map (np.ndarray, optional): Biomedical semantic segmentation mask

    with shape (Z, Y, X).

Modified Keys:

  • img

  • img_shape

  • gt_seg_map (optional)

Parameters
  • crop_shape (Union[int, Tuple[int, int, int]]) – Expected size after cropping with the format of (z, y, x). If set to an integer, then cropping width and height are equal to this integer.

  • keep_foreground (bool) – If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the crop bounding-box. Default to True.

crop(img: numpy.ndarray, crop_bbox: tuple)numpy.ndarray[source]

Crop from img

Parameters
  • img (np.ndarray) – Original input image.

  • crop_bbox (tuple) – Coordinates of the cropped image.

Returns

The cropped image.

Return type

np.ndarray

generate_margin(results: dict)tuple[source]

Generate margin of crop bounding-box.

If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the bounding-box, and return the margin between of the bounding-box and image. If keep_foreground is False, it will return the difference from crop shape and image shape.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

The margin for 3 dimensions of crop bounding-box and image.

Return type

tuple

random_generate_crop_bbox(margin_z: int, margin_y: int, margin_x: int)tuple[source]

Randomly get a crop bounding box.

Parameters

seg_map (np.ndarray) – Ground truth segmentation map.

Returns

Coordinates of the cropped image.

Return type

tuple

random_sample_location(seg_map: numpy.ndarray)dict[source]

sample foreground voxel when keep_foreground is True.

Parameters

seg_map (np.ndarray) – gt seg map.

Returns

Coordinates of selected foreground voxel.

Return type

dict

transform(results: dict)dict[source]

Transform function to randomly crop images, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Randomly cropped results, ‘img_shape’ key in result dict is

updated according to crop size.

Return type

dict

class mmseg.datasets.transforms.BioMedical3DRandomFlip(prob: float, axes: Tuple[int, ...], swap_label_pairs: Optional[List[Tuple[int, int]]] = None)[source]

Flip biomedical 3D images and segmentations.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/spatial_transforms.py # noqa:E501

Copyright 2021 Division of Medical Image Computing, German Cancer Research Center (DKFZ) and Applied Computer Vision Lab, Helmholtz Imaging Platform. Licensed under the Apache-2.0 License.

Required Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Modified Keys:

  • img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

Added Keys:

  • do_flip

  • flip_axes

Parameters
  • prob (float) – Flipping probability.

  • axes (Tuple[int, ...]) – Flipping axes with order ‘ZXY’.

  • swap_label_pairs (Optional[List[Tuple[int, int]]]) –

  • segmentation label pairs that are swapped when flipping. (The) –

transform(results: Dict)Dict[source]

Call function to flip and swap pair labels.

Parameters

results (dict) – Result dict.

Returns

Flipped results, ‘do_flip’, ‘flip_axes’ keys are added into

result dict.

Return type

dict

class mmseg.datasets.transforms.BioMedicalGaussianBlur(sigma_range: Tuple[float, float] = (0.5, 1.0), prob: float = 0.2, prob_per_channel: float = 0.5, different_sigma_per_channel: bool = True, different_sigma_per_axis: bool = True)[source]

Add Gaussian blur with random sigma to image.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L81 # noqa:E501

Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

Modified Keys:

  • img

Parameters
  • sigma_range (Tuple[float, float]|float) – range to randomly select sigma value. Default to (0.5, 1.0).

  • prob (float) – Probability to apply Gaussian blur for each sample. Default to 0.2.

  • prob_per_channel (float) – Probability to apply Gaussian blur for each channel (axis N of the image). Default to 0.5.

  • different_sigma_per_channel (bool) – whether to use different sigma for each channel (axis N of the image). Default to True.

  • different_sigma_per_axis (bool) – whether to use different sigma for axis Z, X and Y of the image. Default to True.

transform(results: Dict)Dict[source]

Call function to add random Gaussian blur to image.

Parameters

results (dict) – Result dict.

Returns

Result dict with random Gaussian noise.

Return type

dict

class mmseg.datasets.transforms.BioMedicalGaussianNoise(prob: float = 0.1, mean: float = 0.0, std: float = 0.1)[source]

Add random Gaussian noise to image.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L53 # noqa:E501

Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

Modified Keys:

  • img

Parameters
  • prob (float) – Probability to add Gaussian noise for each sample. Default to 0.1.

  • mean (float) – Mean or “centre” of the distribution. Default to 0.0.

  • std (float) – Standard deviation of distribution. Default to 0.1.

transform(results: Dict)Dict[source]

Call function to add random Gaussian noise to image.

Parameters

results (dict) – Result dict.

Returns

Result dict with random Gaussian noise.

Return type

dict

class mmseg.datasets.transforms.BioMedicalRandomGamma(prob: float = 0.5, gamma_range: Tuple[float] = (0.5, 2), invert_image: bool = False, per_channel: bool = False, retain_stats: bool = False)[source]

Using random gamma correction to process the biomedical image.

Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/color_transforms.py#L132 # noqa:E501 With licence: Apache 2.0

Required Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X),

    N is the number of modalities, and data type is float32.

Modified Keys: - img

Parameters
  • prob (float) – The probability to perform this transform. Default: 0.5.

  • gamma_range (Tuple[float]) – Range of gamma values. Default: (0.5, 2).

  • invert_image (bool) – Whether invert the image before applying gamma augmentation. Default: False.

  • per_channel (bool) – Whether perform the transform each channel individually. Default: False

  • retain_stats (bool) – Gamma transformation will alter the mean and std of the data in the patch. If retain_stats=True, the data will be transformed to match the mean and standard deviation before gamma augmentation. Default: False.

transform(results: dict)dict[source]

Call function to perform random gamma correction :param results: Result dict from loading pipeline. :type results: dict

Returns

Result dict with random gamma correction performed.

Return type

dict

class mmseg.datasets.transforms.CLAHE(clip_limit=40.0, tile_grid_size=(8, 8))[source]

Use CLAHE method to process the image.

See ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994:474-485. for more information.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • clip_limit (float) – Threshold for contrast limiting. Default: 40.0.

  • tile_grid_size (tuple[int]) – Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. It defines the number of tiles in row and column. Default: (8, 8).

transform(results: dict)dict[source]

Call function to Use CLAHE method process images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Processed results.

Return type

dict

class mmseg.datasets.transforms.ConcatCDInput(input_keys=('img', 'img2'))[source]

Concat images for change detection.

Required Keys:

  • img

  • img2

Parameters

input_keys (tuple) – Input image keys for change detection. Default: (‘img’, ‘img2’).

transform(results: dict)dict[source]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

class mmseg.datasets.transforms.GenerateEdge(edge_width: int = 3, ignore_index: int = 255)[source]

Generate Edge for CE2P approach.

Edge will be used to calculate loss of CE2P.

Modified from https://github.com/liutinglt/CE2P/blob/master/dataset/target_generation.py # noqa:E501

Required Keys:

  • img_shape

  • gt_seg_map

Added Keys:
  • gt_edge_map (np.ndarray, uint8): The edge annotation generated from the

    seg map by extracting border between different semantics.

Parameters
  • edge_width (int) – The width of edge. Default to 3.

  • ignore_index (int) – Index that will be ignored. Default to 255.

transform(results: Dict)Dict[source]

Call function to generate edge from segmentation map.

Parameters

results (dict) – Result dict.

Returns

Result dict with edge mask.

Return type

dict

class mmseg.datasets.transforms.LoadAnnotations(reduce_zero_label=None, backend_args=None, imdecode_backend='pillow')[source]

Load annotations for semantic segmentation provided by dataset.

The annotation format is as the following:

{
    # Filename of semantic segmentation ground truth file.
    'seg_map_path': 'a/b/c'
}

After this module, the annotation has been changed to the format below:

{
    # in str
    'seg_fields': List
     # In uint8 type.
    'gt_seg_map': np.ndarray (H, W)
}

Required Keys:

  • seg_map_path (str): Path of semantic segmentation ground truth file.

Added Keys:

  • seg_fields (List)

  • gt_seg_map (np.uint8)

Parameters
  • reduce_zero_label (bool, optional) – Whether reduce all label value by 1. Usually used for datasets where 0 is background label. Defaults to None.

  • imdecode_backend (str) – The image decoding backend type. The backend argument for :func:mmcv.imfrombytes. See :fun:mmcv.imfrombytes for details. Defaults to ‘pillow’.

  • backend_args (dict) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

class mmseg.datasets.transforms.LoadBiomedicalAnnotation(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]

Load seg_map annotation provided by biomedical dataset.

The annotation format is as the following:

{
    'gt_seg_map': np.ndarray (X, Y, Z) or (Z, Y, X)
}

Required Keys:

  • seg_map_path

Added Keys:

  • gt_seg_map (np.ndarray): Biomedical seg map with shape (Z, Y, X) by

    default, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.

Parameters
  • decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.

  • to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.

  • to_float32 (bool) – Whether to convert the loaded seg map to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See mmengine.fileio for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.transforms.LoadBiomedicalData(with_seg=False, decode_backend: str = 'numpy', to_xyz: bool = False, backend_args: Optional[dict] = None)[source]

Load an biomedical image and annotation from file.

The loading data format is as the following:

{
    'img': np.ndarray data[:-1, X, Y, Z]
    'seg_map': np.ndarray data[-1, X, Y, Z]
}

Required Keys:

  • img_path

Added Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities.

  • gt_seg_map (np.ndarray, optional): Biomedical seg map with shape

    (Z, Y, X) by default.

  • img_shape

  • ori_shape

Parameters
  • with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.

  • decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.

  • to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.transforms.LoadBiomedicalImageFromFile(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]

Load an biomedical mage from file.

Required Keys:

  • img_path

Added Keys:

  • img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,

    N is the number of modalities, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.

  • img_shape

  • ori_shape

Parameters
  • decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.

  • to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.

  • to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.transforms.LoadDepthAnnotation(decode_backend: str = 'cv2', to_float32: bool = True, depth_rescale_factor: float = 1.0, backend_args: Optional[dict] = None)[source]

Load depth_map annotation provided by depth estimation dataset.

The annotation format is as the following:

{
    'gt_depth_map': np.ndarray [Y, X]
}

Required Keys:

  • seg_depth_path

Added Keys:

  • gt_depth_map (np.ndarray): Depth map with shape (Y, X) by

    default, and data type is float32 if set to_float32 = True.

  • depth_rescale_factor (float): The rescale factor of depth map, which

    can be used to recover the original value of depth map.

Parameters
  • decode_backend (str) – The data decoding backend type. Options are ‘numpy’, ‘nifti’, and ‘cv2’. Defaults to ‘cv2’.

  • to_float32 (bool) – Whether to convert the loaded depth map to a float32 numpy array. If set to False, the loaded image is an uint16 array. Defaults to True.

  • depth_rescale_factor (float) – Factor to rescale the depth value to limit the range. Defaults to 1.0.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See mmengine.fileio for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

transform(results: Dict)Dict[source]

Functions to load depth map.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded depth map.

Return type

dict

class mmseg.datasets.transforms.LoadImageFromNDArray(to_float32: bool = False, color_type: str = 'color', imdecode_backend: str = 'cv2', file_client_args: Optional[dict] = None, ignore_empty: bool = False, *, backend_args: Optional[dict] = None)[source]

Load an image from results['img'].

Similar with LoadImageFromFile, but the image has been loaded as np.ndarray in results['img']. Can be used when loading image from webcam.

Required Keys:

  • img

Modified Keys:

  • img

  • img_path

  • img_shape

  • ori_shape

Parameters

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

transform(results: dict)dict[source]

Transform function to add image meta information.

Parameters

results (dict) – Result dict with Webcam read image in results['img'].

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.transforms.LoadMultipleRSImageFromFile(to_float32: bool = True)[source]

Load two Remote Sensing mage from file.

Required Keys:

  • img_path

  • img_path2

Modified Keys:

  • img

  • img2

  • img_shape

  • ori_shape

Parameters

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is a float64 array. Defaults to True.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.transforms.LoadSingleRSImageFromFile(to_float32: bool = True)[source]

Load a Remote Sensing mage from file.

Required Keys:

  • img_path

Modified Keys:

  • img

  • img_shape

  • ori_shape

Parameters

to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is a float64 array. Defaults to True.

transform(results: Dict)Dict[source]

Functions to load image.

Parameters

results (dict) – Result dict from :obj:mmcv.BaseDataset.

Returns

The dict contains loaded image and meta information.

Return type

dict

class mmseg.datasets.transforms.PackSegInputs(meta_keys=('img_path', 'seg_map_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'reduce_zero_label'))[source]

Pack the inputs data for the semantic segmentation.

The img_meta item is always populated. The contents of the img_meta dictionary depends on meta_keys. By default this includes:

  • img_path: filename of the image

  • ori_shape: original shape of the image as a tuple (h, w, c)

  • img_shape: shape of the image input to the network as a tuple (h, w, c). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.

  • pad_shape: shape of padded images

  • scale_factor: a float indicating the preprocessing scale

  • flip: a boolean indicating if image flip transform was used

  • flip_direction: the flipping direction

Parameters

meta_keys (Sequence[str], optional) – Meta keys to be packed from SegDataSample and collected in data[img_metas]. Default: ('img_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction')

transform(results: dict)dict[source]

Method to pack the input data.

Parameters

results (dict) – Result dict from the data pipeline.

Returns

  • ‘inputs’ (obj:torch.Tensor): The forward data of models.

  • ’data_sample’ (obj:SegDataSample): The annotation info of the

    sample.

Return type

dict

class mmseg.datasets.transforms.PhotoMetricDistortion(brightness_delta: int = 32, contrast_range: Sequence[float] = (0.5, 1.5), saturation_range: Sequence[float] = (0.5, 1.5), hue_delta: int = 18)[source]

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

  1. random brightness

  2. random contrast (mode 0)

  3. convert color from BGR to HSV

  4. random saturation

  5. random hue

  6. convert color from HSV to BGR

  7. random contrast (mode 1)

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • brightness_delta (int) – delta of brightness.

  • contrast_range (tuple) – range of contrast.

  • saturation_range (tuple) – range of saturation.

  • hue_delta (int) – delta of hue.

brightness(img: numpy.ndarray)numpy.ndarray[source]

Brightness distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after brightness change.

Return type

np.ndarray

contrast(img: numpy.ndarray)numpy.ndarray[source]

Contrast distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after contrast change.

Return type

np.ndarray

convert(img: numpy.ndarray, alpha: int = 1, beta: int = 0)numpy.ndarray[source]

Multiple with alpha and add beat with clip.

Parameters
  • img (np.ndarray) – The input image.

  • alpha (int) – Image weights, change the contrast/saturation of the image. Default: 1

  • beta (int) – Image bias, change the brightness of the image. Default: 0

Returns

The transformed image.

Return type

np.ndarray

hue(img: numpy.ndarray)numpy.ndarray[source]

Hue distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after hue change.

Return type

np.ndarray

saturation(img: numpy.ndarray)numpy.ndarray[source]

Saturation distortion.

Parameters

img (np.ndarray) – The input image.

Returns

Image after saturation change.

Return type

np.ndarray

transform(results: dict)dict[source]

Transform function to perform photometric distortion on images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with images distorted.

Return type

dict

class mmseg.datasets.transforms.RGB2Gray(out_channels=None, weights=(0.299, 0.587, 0.114))[source]

Convert RGB image to grayscale image.

Required Keys:

  • img

Modified Keys:

  • img

  • img_shape

This transform calculate the weighted mean of input image channels with weights and then expand the channels to out_channels. When out_channels is None, the number of output channels is the same as input channels.

Parameters
  • out_channels (int) – Expected number of output channels after transforming. Default: None.

  • weights (tuple[float]) – The weights to calculate the weighted mean. Default: (0.299, 0.587, 0.114).

transform(results: dict)dict[source]

Call function to convert RGB image to grayscale image.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with grayscale image.

Return type

dict

class mmseg.datasets.transforms.RandomCrop(crop_size: Union[int, Tuple[int, int]], cat_max_ratio: float = 1.0, ignore_index: int = 255)[source]

Random crop the image & seg.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • img_shape

  • gt_seg_map

Parameters
  • crop_size (Union[int, Tuple[int, int]]) – Expected size after cropping with the format of (h, w). If set to an integer, then cropping width and height are equal to this integer.

  • cat_max_ratio (float) – The maximum ratio that single category could occupy.

  • ignore_index (int) – The label index to be ignored. Default: 255

crop(img: numpy.ndarray, crop_bbox: tuple)numpy.ndarray[source]

Crop from img

Parameters
  • img (np.ndarray) – Original input image.

  • crop_bbox (tuple) – Coordinates of the cropped image.

Returns

The cropped image.

Return type

np.ndarray

transform(results: dict)dict[source]

Transform function to randomly crop images, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Randomly cropped results, ‘img_shape’ key in result dict is

updated according to crop size.

Return type

dict

class mmseg.datasets.transforms.RandomCutOut(prob, n_holes, cutout_shape=None, cutout_ratio=None, fill_in=(0, 0, 0), seg_fill_in=None)[source]

CutOut operation.

Randomly drop some regions of image used in Cutout.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • gt_seg_map

Parameters
  • prob (float) – cutout probability.

  • n_holes (int | tuple[int, int]) – Number of regions to be dropped. If it is given as a list, number of holes will be randomly selected from the closed interval [n_holes[0], n_holes[1]].

  • cutout_shape (tuple[int, int] | list[tuple[int, int]]) – The candidate shape of dropped regions. It can be tuple[int, int] to use a fixed cutout shape, or list[tuple[int, int]] to randomly choose shape from the list.

  • cutout_ratio (tuple[float, float] | list[tuple[float, float]]) – The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time.

  • fill_in (tuple[float, float, float] | tuple[int, int, int]) – The value of pixel to fill in the dropped regions. Default: (0, 0, 0).

  • seg_fill_in (int) – The labels of pixel to fill in the dropped regions. If seg_fill_in is None, skip. Default: None.

transform(results: dict)dict[source]

Call function to drop some regions of image.

class mmseg.datasets.transforms.RandomDepthMix(prob: float = 0.25, mix_scale_ratio: float = 0.75)[source]

This class implements the RandomDepthMix transform.

Parameters
  • prob (float) – Probability of applying the transformation. Defaults to 0.25.

  • mix_scale_ratio (float) – Ratio to scale the mix width. Defaults to 0.75.

transform(results: dict)dict[source]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

class mmseg.datasets.transforms.RandomFlip(prob: Optional[Union[float, Iterable[float]]] = None, direction: Union[str, Sequence[Optional[str]]] = 'horizontal', swap_seg_labels: Optional[Sequence] = None)[source]

Flip the image & bbox & segmentation map. Added or Updated keys: flip, flip_direction, img, gt_bboxes, gt_seg_map, and gt_depth_map. There are 3 flip modes:

  • prob is float, direction is string: the image will be direction``ly flipped with probability of ``prob . E.g., prob=0.5, direction='horizontal', then image will be horizontally flipped with probability of 0.5.

  • prob is float, direction is list of string: the image will be direction[i]``ly flipped with probability of ``prob/len(direction). E.g., prob=0.5, direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.

  • prob is list of float, direction is list of string: given len(prob) == len(direction), the image will be direction[i]``ly flipped with probability of ``prob[i]. E.g., prob=[0.3, 0.5], direction=['horizontal', 'vertical'], then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.

Required Keys:

  • img

  • gt_bboxes (optional)

  • gt_seg_map (optional)

  • gt_depth_map (optional)

Modified Keys:

  • img

  • gt_bboxes (optional)

  • gt_seg_map (optional)

  • gt_depth_map (optional)

Added Keys:

  • flip

  • flip_direction

  • swap_seg_labels (optional)

Parameters
  • prob (float | list[float], optional) – The flipping probability. Defaults to None.

  • direction (str | list[str]) – The flipping direction. Options If input is a list, the length must equal prob. Each element in prob indicates the flip probability of corresponding direction. Defaults to ‘horizontal’.

  • swap_seg_labels (list, optional) – The label pair need to be swapped for ground truth, like ‘left arm’ and ‘right arm’ need to be swapped after horizontal flipping. For example, [(1, 5)], where 1/5 is the label of the left/right arm. Defaults to None.

class mmseg.datasets.transforms.RandomMosaic(prob, img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=0, seg_pad_val=255)[source]

Mosaic augmentation. Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

                   mosaic transform
                      center_x
           +------------------------------+
           |       pad        |  pad      |
           |      +-----------+           |
           |      |           |           |
           |      |  image1   |--------+  |
           |      |           |        |  |
           |      |           | image2 |  |
center_y   |----+-------------+-----------|
           |    |   cropped   |           |
           |pad |   image3    |  image4   |
           |    |             |           |
           +----|-------------+-----------+
                |             |
                +-------------+

The mosaic transform steps are as follows:
    1. Choose the mosaic center as the intersections of 4 images
    2. Get the left top image according to the index, and randomly
       sample another 3 images from the custom dataset.
    3. Sub image will be cropped if image is larger than mosaic patch

Required Keys:

  • img

  • gt_seg_map

  • mix_results

Modified Keys:

  • img

  • img_shape

  • ori_shape

  • gt_seg_map

Parameters
  • prob (float) – mosaic probability.

  • img_scale (Sequence[int]) – Image size after mosaic pipeline of a single image. The size of the output image is four times that of a single image. The output image comprises 4 single images. Default: (640, 640).

  • center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default: (0.5, 1.5).

  • pad_val (int) – Pad value. Default: 0.

  • seg_pad_val (int) – Pad value of segmentation map. Default: 255.

get_indices(dataset: mmseg.datasets.dataset_wrappers.MultiImageMixDataset)list[source]

Call function to collect indices.

Parameters

dataset (MultiImageMixDataset) – The dataset.

Returns

indices.

Return type

list

transform(results: dict)dict[source]

Call function to make a mosaic of image.

Parameters

results (dict) – Result dict.

Returns

Result dict with mosaic transformed.

Return type

dict

class mmseg.datasets.transforms.RandomRotFlip(rotate_prob=0.5, flip_prob=0.5, degree=(- 20, 20))[source]

Rotate and flip the image & seg or just rotate the image & seg.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • gt_seg_map

Parameters
  • rotate_prob (float) – The probability of rotate image.

  • flip_prob (float) – The probability of rotate&flip image.

  • degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (-degree, +degree)

transform(results: dict)dict[source]

Call function to rotate or rotate & flip image, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Rotated or rotated & flipped results.

Return type

dict

class mmseg.datasets.transforms.RandomRotate(prob, degree, pad_val=0, seg_pad_val=255, center=None, auto_bound=False)[source]

Rotate the image & seg.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • gt_seg_map

Parameters
  • prob (float) – The rotation probability.

  • degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (-degree, +degree)

  • pad_val (float, optional) – Padding value of image. Default: 0.

  • seg_pad_val (float, optional) – Padding value of segmentation map. Default: 255.

  • center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used. Default: None.

  • auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image. Default: False

transform(results: dict)dict[source]

Call function to rotate image, semantic segmentation maps.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Rotated results.

Return type

dict

class mmseg.datasets.transforms.Rerange(min_value=0, max_value=255)[source]

Rerange the image pixel value.

Required Keys:

  • img

Modified Keys:

  • img

Parameters
  • min_value (float or int) – Minimum value of the reranged image. Default: 0.

  • max_value (float or int) – Maximum value of the reranged image. Default: 255.

transform(results: dict)dict[source]

Call function to rerange images.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Reranged results.

Return type

dict

class mmseg.datasets.transforms.Resize(scale: Optional[Union[int, Tuple[int, int]]] = None, scale_factor: Optional[Union[float, Tuple[float, float]]] = None, keep_ratio: bool = False, clip_object_border: bool = True, backend: str = 'cv2', interpolation='bilinear')[source]

Resize images & seg & depth map.

This transform resizes the input image according to scale or scale_factor. Seg map, depth map and other relative annotations are then resized with the same scale factor. if scale and scale_factor are both set, it will use scale to resize.

Required Keys:

  • img

  • gt_seg_map (optional)

  • gt_depth_map (optional)

Modified Keys:

  • img

  • gt_seg_map

  • gt_depth_map

Added Keys:

  • scale

  • scale_factor

  • keep_ratio

Parameters
  • scale (int or tuple) – Images scales for resizing. Defaults to None

  • scale_factor (float or tuple[float]) – Scale factors for resizing. Defaults to None.

  • keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Defaults to False.

  • clip_object_border (bool) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.

  • backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.

class mmseg.datasets.transforms.ResizeShortestEdge(scale: Union[int, Tuple[int, int]], max_size: int)[source]

Resize the image and mask while keeping the aspect ratio unchanged.

Modified from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/transforms/augmentation_impl.py#L130 # noqa:E501 Copyright (c) Facebook, Inc. and its affiliates. Licensed under the Apache-2.0 License

This transform attempts to scale the shorter edge to the given scale, as long as the longer edge does not exceed max_size. If max_size is reached, then downscale so that the longer edge does not exceed max_size.

Required Keys:

  • img

  • gt_seg_map (optional)

Modified Keys:

  • img

  • img_shape

  • gt_seg_map (optional))

Added Keys:

  • scale

  • scale_factor

  • keep_ratio

Parameters
  • scale (Union[int, Tuple[int, int]]) – The target short edge length. If it’s tuple, will select the min value as the short edge length.

  • max_size (int) – The maximum allowed longest edge length.

transform(results: Dict)Dict[source]

The transform function. All subclass of BaseTransform should override this method.

This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.

Parameters

results (dict) – The result dict.

Returns

The result dict.

Return type

dict

class mmseg.datasets.transforms.ResizeToMultiple(size_divisor=32, interpolation=None)[source]

Resize images & seg to multiple of divisor.

Required Keys:

  • img

  • gt_seg_map

Modified Keys:

  • img

  • img_shape

  • pad_shape

Parameters
  • size_divisor (int) – images and gt seg maps need to resize to multiple of size_divisor. Default: 32.

  • interpolation (str, optional) – The interpolation mode of image resize. Default: None

transform(results: dict)dict[source]

Call function to resize images, semantic segmentation map to multiple of size divisor.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Resized results, ‘img_shape’, ‘pad_shape’ keys are updated.

Return type

dict

class mmseg.datasets.transforms.SegRescale(scale_factor=1)[source]

Rescale semantic segmentation maps.

Required Keys:

  • gt_seg_map

Modified Keys:

  • gt_seg_map

Parameters

scale_factor (float) – The scale factor of the final output.

transform(results: dict)dict[source]

Call function to scale the semantic segmentation map.

Parameters

results (dict) – Result dict from loading pipeline.

Returns

Result dict with semantic segmentation map scaled.

Return type

dict

mmseg.engine

hooks

class mmseg.engine.hooks.SegVisualizationHook(draw: bool = False, interval: int = 50, show: bool = False, wait_time: float = 0.0, backend_args: Optional[dict] = None)[source]

Segmentation Visualization Hook. Used to visualize validation and testing process prediction results.

In the testing phase:

  1. If show is True, it means that only the prediction results are

    visualized without storing data, so vis_backends needs to be excluded.

Parameters
  • draw (bool) – whether to draw prediction results. If it is False, it means that no drawing will be done. Defaults to False.

  • interval (int) – The interval of visualization. Defaults to 50.

  • show (bool) – Whether to display the drawn image. Default to False.

  • wait_time (float) – The interval of show (s). Defaults to 0.

  • backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.

optimizers

class mmseg.engine.optimizers.ForceDefaultOptimWrapperConstructor(optim_wrapper_cfg: dict, paramwise_cfg: Optional[dict] = None)[source]

Default constructor with forced optimizer settings.

This constructor extends the default constructor to add an option for forcing default optimizer settings. This is useful for ensuring that certain parameters or layers strictly adhere to pre-defined default settings, regardless of any custom settings specified.

By default, each parameter share the same optimizer settings, and we provide an argument paramwise_cfg to specify parameter-wise settings. It is a dict and may contain various fields like ‘custom_keys’, ‘bias_lr_mult’, etc., as well as the additional field force_default_settings which allows for enforcing default settings on optimizer parameters.

  • custom_keys (dict): Specified parameters-wise settings by keys. If one of the keys in custom_keys is a substring of the name of one parameter, then the setting of the parameter will be specified by custom_keys[key] and other setting like bias_lr_mult etc. will be ignored. It should be noted that the aforementioned key is the longest key that is a substring of the name of the parameter. If there are multiple matched keys with the same length, then the key with lower alphabet order will be chosen. custom_keys[key] should be a dict and may contain fields lr_mult and decay_mult. See Example 2 below.

  • bias_lr_mult (float): It will be multiplied to the learning rate for all bias parameters (except for those in normalization layers and offset layers of DCN).

  • bias_decay_mult (float): It will be multiplied to the weight decay for all bias parameters (except for those in normalization layers, depthwise conv layers, offset layers of DCN).

  • norm_decay_mult (float): It will be multiplied to the weight decay for all weight and bias parameters of normalization layers.

  • flat_decay_mult (float): It will be multiplied to the weight decay for all one-dimensional parameters

  • dwconv_decay_mult (float): It will be multiplied to the weight decay for all weight and bias parameters of depthwise conv layers.

  • dcn_offset_lr_mult (float): It will be multiplied to the learning rate for parameters of offset layer in the deformable convs of a model.

  • bypass_duplicate (bool): If true, the duplicate parameters would not be added into optimizer. Defaults to False.

  • force_default_settings (bool): If true, this will override any custom settings defined by custom_keys and enforce the use of default settings for optimizer parameters like bias_lr_mult. This is particularly useful when you want to ensure that certain layers or parameters adhere strictly to the pre-defined default settings.

Note

1. If the option dcn_offset_lr_mult is used, the constructor will override the effect of bias_lr_mult in the bias of offset layer. So be careful when using both bias_lr_mult and dcn_offset_lr_mult. If you wish to apply both of them to the offset layer in deformable convs, set dcn_offset_lr_mult to the original dcn_offset_lr_mult * bias_lr_mult.

2. If the option dcn_offset_lr_mult is used, the constructor will apply it to all the DCN layers in the model. So be careful when the model contains multiple DCN layers in places other than backbone.

3. When the option force_default_settings is true, it will override any custom settings provided in custom_keys. This ensures that the default settings for the optimizer parameters are used.

Parameters
  • optim_wrapper_cfg (dict) –

    The config dict of the optimizer wrapper.

    Required fields of optim_wrapper_cfg are

    • type: class name of the OptimizerWrapper

    • optimizer: The configuration of optimizer.

    Optional fields of optim_wrapper_cfg are

    • any arguments of the corresponding optimizer wrapper type, e.g., accumulative_counts, clip_grad, etc.

    Required fields of optimizer are

    • type: class name of the optimizer.

    Optional fields of optimizer are

    • any arguments of the corresponding optimizer type, e.g., lr, weight_decay, momentum, etc.

  • paramwise_cfg (dict, optional) – Parameter-wise options.

Example 1:
>>> model = torch.nn.modules.Conv1d(1, 1, 1)
>>> optim_wrapper_cfg = dict(
>>>     dict(type='OptimWrapper', optimizer=dict(type='SGD', lr=0.01,
>>>         momentum=0.9, weight_decay=0.0001))
>>> paramwise_cfg = dict(norm_decay_mult=0.)
>>> optim_wrapper_builder = DefaultOptimWrapperConstructor(
>>>     optim_wrapper_cfg, paramwise_cfg)
>>> optim_wrapper = optim_wrapper_builder(model)
Example 2:
>>> # assume model have attribute model.backbone and model.cls_head
>>> optim_wrapper_cfg = dict(type='OptimWrapper', optimizer=dict(
>>>     type='SGD', lr=0.01, weight_decay=0.95))
>>> paramwise_cfg = dict(custom_keys={
>>>     'backbone': dict(lr_mult=0.1, decay_mult=0.9)})
>>> optim_wrapper_builder = DefaultOptimWrapperConstructor(
>>>     optim_wrapper_cfg, paramwise_cfg)
>>> optim_wrapper = optim_wrapper_builder(model)
>>> # Then the `lr` and `weight_decay` for model.backbone is
>>> # (0.01 * 0.1, 0.95 * 0.9). `lr` and `weight_decay` for
>>> # model.cls_head is (0.01, 0.95).
add_params(params: List[dict], module: torch.nn.modules.module.Module, prefix: str = '', is_dcn_module: Optional[Union[int, float]] = None)None[source]

Add all parameters of module to the params list.

The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.

Parameters
  • params (list[dict]) – A list of param groups, it will be modified in place.

  • module (nn.Module) – The module to be added.

  • prefix (str) – The prefix of the module

  • is_dcn_module (int|float|None) – If the current module is a submodule of DCN, is_dcn_module will be passed to control conv_offset layer’s learning rate. Defaults to None.

class mmseg.engine.optimizers.LayerDecayOptimizerConstructor(optim_wrapper_cfg, paramwise_cfg)[source]

Different learning rates are set for different layers of backbone.

Note: Currently, this optimizer constructor is built for BEiT, and it will be deprecated. Please use LearningRateDecayOptimizerConstructor instead.

class mmseg.engine.optimizers.LearningRateDecayOptimizerConstructor(optim_wrapper_cfg: dict, paramwise_cfg: Optional[dict] = None)[source]

Different learning rates are set for different layers of backbone.

Note: Currently, this optimizer constructor is built for ConvNeXt, BEiT and MAE.

add_params(params, module, **kwargs)[source]

Add all parameters of module to the params list.

The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.

Parameters
  • params (list[dict]) – A list of param groups, it will be modified in place.

  • module (nn.Module) – The module to be added.

mmseg.evaluation

metrics

class mmseg.evaluation.metrics.CityscapesMetric(output_dir: str, ignore_index: int = 255, format_only: bool = False, keep_results: bool = False, collect_device: str = 'cpu', prefix: Optional[str] = None, **kwargs)[source]

Cityscapes evaluation metric.

Parameters
  • output_dir (str) – The directory for output prediction

  • ignore_index (int) – Index that will be ignored in evaluation. Default: 255.

  • format_only (bool) – Only format result for results commit without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.

  • keep_results (bool) – Whether to keep the results. When format_only is True, keep_results must be True. Defaults to False.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)Dict[str, float][source]

Compute the metrics from processed results.

Parameters

results (list) – Testing results of the dataset.

Returns

float]: Cityscapes evaluation results.

Return type

dict[str

process(data_batch: dict, data_samples: Sequence[dict])None[source]

Process one batch of data and data_samples.

The processed results should be stored in self.results, which will be used to computed the metrics when all batches have been processed.

Parameters
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of outputs from the model.

class mmseg.evaluation.metrics.DepthMetric(depth_metrics: Optional[List[str]] = None, min_depth_eval: float = 0.0, max_depth_eval: float = inf, crop_type: Optional[str] = None, depth_scale_factor: float = 1.0, collect_device: str = 'cpu', output_dir: Optional[str] = None, format_only: bool = False, prefix: Optional[str] = None, **kwargs)[source]

Depth estimation evaluation metric.

Parameters
  • depth_metrics (List[str], optional) – List of metrics to compute. If not specified, defaults to all metrics in self.METRICS.

  • min_depth_eval (float) – Minimum depth value for evaluation. Defaults to 0.0.

  • max_depth_eval (float) – Maximum depth value for evaluation. Defaults to infinity.

  • crop_type (str, optional) – Specifies the type of cropping to be used during evaluation. This option can affect how the evaluation mask is generated. Currently, ‘nyu_crop’ is supported, but other types can be added in future. Defaults to None if no cropping should be applied.

  • depth_scale_factor (float) – Factor to scale the depth values. Defaults to 1.0.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • output_dir (str) – The directory for output prediction. Defaults to None.

  • format_only (bool) – Only format result for results commit without perform evaluation. It is useful when you want to save the result to a specific format and submit it to the test server. Defaults to False.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)Dict[str, float][source]

Compute the metrics from processed results.

Parameters

results (list) – The processed results of each batch.

Returns

The computed metrics. The keys are the names of

the metrics, and the values are corresponding results. The keys are identical with self.metrics.

Return type

Dict[str, float]

process(data_batch: dict, data_samples: Sequence[dict])None[source]

Process one batch of data and data_samples.

The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

Parameters
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of outputs from the model.

class mmseg.evaluation.metrics.IoUMetric(ignore_index: int = 255, iou_metrics: List[str] = ['mIoU'], nan_to_num: Optional[int] = None, beta: int = 1, collect_device: str = 'cpu', output_dir: Optional[str] = None, format_only: bool = False, prefix: Optional[str] = None, **kwargs)[source]

IoU evaluation metric.

Parameters
  • ignore_index (int) – Index that will be ignored in evaluation. Default: 255.

  • iou_metrics (list[str] | str) – Metrics to be calculated, the options includes ‘mIoU’, ‘mDice’ and ‘mFscore’.

  • nan_to_num (int, optional) – If specified, NaN values will be replaced by the numbers defined by the user. Default: None.

  • beta (int) – Determines the weight of recall in the combined score. Default: 1.

  • collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.

  • output_dir (str) – The directory for output prediction. Defaults to None.

  • format_only (bool) – Only format result for results commit without perform evaluation. It is useful when you want to save the result to a specific format and submit it to the test server. Defaults to False.

  • prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.

compute_metrics(results: list)Dict[str, float][source]

Compute the metrics from processed results.

Parameters

results (list) – The processed results of each batch.

Returns

The computed metrics. The keys are the names of

the metrics, and the values are corresponding results. The key mainly includes aAcc, mIoU, mAcc, mDice, mFscore, mPrecision, mRecall.

Return type

Dict[str, float]

static intersect_and_union(pred_label: torch._VariableFunctionsClass.tensor, label: torch._VariableFunctionsClass.tensor, num_classes: int, ignore_index: int)[source]

Calculate Intersection and Union.

Parameters
  • pred_label (torch.tensor) – Prediction segmentation map or predict result filename. The shape is (H, W).

  • label (torch.tensor) – Ground truth segmentation map or label filename. The shape is (H, W).

  • num_classes (int) – Number of categories.

  • ignore_index (int) – Index that will be ignored in evaluation.

Returns

The intersection of prediction and ground truth

histogram on all classes.

torch.Tensor: The union of prediction and ground truth histogram on

all classes.

torch.Tensor: The prediction histogram on all classes. torch.Tensor: The ground truth histogram on all classes.

Return type

torch.Tensor

process(data_batch: dict, data_samples: Sequence[dict])None[source]

Process one batch of data and data_samples.

The processed results should be stored in self.results, which will be used to compute the metrics when all batches have been processed.

Parameters
  • data_batch (dict) – A batch of data from the dataloader.

  • data_samples (Sequence[dict]) – A batch of outputs from the model.

static total_area_to_metrics(total_area_intersect: numpy.ndarray, total_area_union: numpy.ndarray, total_area_pred_label: numpy.ndarray, total_area_label: numpy.ndarray, metrics: List[str] = ['mIoU'], nan_to_num: Optional[int] = None, beta: int = 1)[source]

Calculate evaluation metrics :param total_area_intersect: The intersection of prediction

and ground truth histogram on all classes.

Parameters
  • total_area_union (np.ndarray) – The union of prediction and ground truth histogram on all classes.

  • total_area_pred_label (np.ndarray) – The prediction histogram on all classes.

  • total_area_label (np.ndarray) – The ground truth histogram on all classes.

  • metrics (List[str] | str) – Metrics to be evaluated, ‘mIoU’ and ‘mDice’.

  • nan_to_num (int, optional) – If specified, NaN values will be replaced by the numbers defined by the user. Default: None.

  • beta (int) – Determines the weight of recall in the combined score. Default: 1.

Returns

per category evaluation metrics,

shape (num_classes, ).

Return type

Dict[str, np.ndarray]

mmseg.models

backbones

class mmseg.models.backbones.BEiT(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, qv_bias=True, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, num_fcs=2, norm_eval=False, pretrained=None, init_values=0.1, init_cfg=None)[source]

BERT Pre-Training of Image Transformers.

Parameters
  • img_size (int | tuple) – Input image size. Default: 224.

  • patch_size (int) – The patch size. Default: 16.

  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (int) – Embedding dimension. Default: 768.

  • num_layers (int) – Depth of transformer. Default: 12.

  • num_heads (int) – Number of attention heads. Default: 12.

  • mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Default: 4.

  • out_indices (list | tuple | int) – Output from which stages. Default: -1.

  • qv_bias (bool) – Enable bias for qv if True. Default: True.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – Stochastic depth rate. Default 0.0.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • pretrained (str, optional) – Model pretrained path. Default: None.

  • init_values (float) – Initialize the values of BEiTAttention and FFN with learnable scaling.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

resize_rel_pos_embed(checkpoint)[source]

Resize relative pos_embed weights.

This function is modified from https://github.com/microsoft/unilm/blob/master/beit/semantic_segmentation/mmcv_custom/checkpoint.py. # noqa: E501 Copyright (c) Microsoft Corporation Licensed under the MIT License :param checkpoint: Key and value of the pretrain model. :type checkpoint: dict

Returns

Interpolate the relative pos_embed weights

in the pre-train model to the current model size.

Return type

state_dict (dict)

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

class mmseg.models.backbones.BiSeNetV1(backbone_cfg, in_channels=3, spatial_channels=(64, 64, 64, 128), context_channels=(128, 256, 512), out_indices=(0, 1, 2), align_corners=False, out_channels=256, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]

BiSeNetV1 backbone.

This backbone is the implementation of BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation.

Parameters
  • backbone_cfg – (dict): Config of backbone of Context Path.

  • in_channels (int) – The number of channels of input image. Default: 3.

  • spatial_channels (Tuple[int]) – Size of channel numbers of various layers in Spatial Path. Default: (64, 64, 64, 128).

  • context_channels (Tuple[int]) – Size of channel numbers of various modules in Context Path. Default: (128, 256, 512).

  • out_indices (Tuple[int] | int, optional) – Output from which stages. Default: (0, 1, 2).

  • align_corners (bool, optional) – The align_corners argument of resize operation in Bilateral Guided Aggregation Layer. Default: False.

  • out_channels (int) – The number of channels of output. It must be the same with in_channels of decode_head. Default: 256.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.BiSeNetV2(in_channels=3, detail_channels=(64, 64, 128), semantic_channels=(16, 32, 64, 128), semantic_expansion_ratio=6, bga_channels=128, out_indices=(0, 1, 2, 3, 4), align_corners=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]

BiSeNetV2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation.

This backbone is the implementation of BiSeNetV2.

Parameters
  • in_channels (int) – Number of channel of input image. Default: 3.

  • detail_channels (Tuple[int], optional) – Channels of each stage in Detail Branch. Default: (64, 64, 128).

  • semantic_channels (Tuple[int], optional) – Channels of each stage in Semantic Branch. Default: (16, 32, 64, 128). See Table 1 and Figure 3 of paper for more details.

  • semantic_expansion_ratio (int, optional) – The expansion factor expanding channel number of middle channels in Semantic Branch. Default: 6.

  • bga_channels (int, optional) – Number of middle channels in Bilateral Guided Aggregation Layer. Default: 128.

  • out_indices (Tuple[int] | int, optional) – Output from which stages. Default: (0, 1, 2, 3, 4).

  • align_corners (bool, optional) – The align_corners argument of resize operation in Bilateral Guided Aggregation Layer. Default: False.

  • conv_cfg (dict | None) – Config of conv layers. Default: None.

  • norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’BN’).

  • act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.CGNet(in_channels=3, num_channels=(32, 64, 128), num_blocks=(3, 21), dilations=(2, 4), reductions=(8, 16), conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'PReLU'}, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]

CGNet backbone.

This backbone is the implementation of A Light-weight Context Guided Network for Semantic Segmentation.

Parameters
  • in_channels (int) – Number of input image channels. Normally 3.

  • num_channels (tuple[int]) – Numbers of feature channels at each stages. Default: (32, 64, 128).

  • num_blocks (tuple[int]) – Numbers of CG blocks at stage 1 and stage 2. Default: (3, 21).

  • dilations (tuple[int]) – Dilation rate for surrounding context extractors at stage 1 and stage 2. Default: (2, 4).

  • reductions (tuple[int]) – Reductions for global context extractors at stage 1 and stage 2. Default: (8, 16).

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’PReLU’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[source]

Convert the model into training mode will keeping the normalization layer freezed.

class mmseg.models.backbones.DDRNet(in_channels: int = 3, channels: int = 32, ppm_channels: int = 128, align_corners: bool = False, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'requires_grad': True, 'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]

DDRNet backbone.

This backbone is the implementation of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. Modified from https://github.com/ydhongHIT/DDRNet.

Parameters
  • in_channels (int) – Number of input image channels. Default: 3.

  • channels – (int): The base channels of DDRNet. Default: 32.

  • ppm_channels (int) – The channels of PPM module. Default: 128.

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’, inplace=True).

  • init_cfg (dict, optional) – Initialization config dict. Default: None.

forward(x)[source]

Forward function.

class mmseg.models.backbones.ERFNet(in_channels=3, enc_downsample_channels=(16, 64, 128), enc_stage_non_bottlenecks=(5, 8), enc_non_bottleneck_dilations=(2, 4, 8, 16), enc_non_bottleneck_channels=(64, 128), dec_upsample_channels=(64, 16), dec_stages_non_bottleneck=(2, 2), dec_non_bottleneck_channels=(64, 16), dropout_ratio=0.1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]

ERFNet backbone.

This backbone is the implementation of ERFNet: Efficient Residual Factorized ConvNet for Real-time SemanticSegmentation.

Parameters
  • in_channels (int) – The number of channels of input image. Default: 3.

  • enc_downsample_channels (Tuple[int]) – Size of channel numbers of various Downsampler block in encoder. Default: (16, 64, 128).

  • enc_stage_non_bottlenecks (Tuple[int]) – Number of stages of Non-bottleneck block in encoder. Default: (5, 8).

  • enc_non_bottleneck_dilations (Tuple[int]) – Dilation rate of each stage of Non-bottleneck block of encoder. Default: (2, 4, 8, 16).

  • enc_non_bottleneck_channels (Tuple[int]) – Size of channel numbers of various Non-bottleneck block in encoder. Default: (64, 128).

  • dec_upsample_channels (Tuple[int]) – Size of channel numbers of various Deconvolution block in decoder. Default: (64, 16).

  • dec_stages_non_bottleneck (Tuple[int]) – Number of stages of Non-bottleneck block in decoder. Default: (2, 2).

  • dec_non_bottleneck_channels (Tuple[int]) – Size of channel numbers of various Non-bottleneck block in decoder. Default: (64, 16).

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.1.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.FastSCNN(in_channels=3, downsample_dw_channels=(32, 48), global_in_channels=64, global_block_channels=(64, 96, 128), global_block_strides=(2, 2, 1), global_out_channels=128, higher_in_channels=64, lower_in_channels=128, fusion_out_channels=128, out_indices=(0, 1, 2), conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, align_corners=False, dw_act_cfg=None, init_cfg=None)[source]

Fast-SCNN Backbone.

This backbone is the implementation of Fast-SCNN: Fast Semantic Segmentation Network.

Parameters
  • in_channels (int) – Number of input image channels. Default: 3.

  • downsample_dw_channels (tuple[int]) – Number of output channels after the first conv layer & the second conv layer in Learning-To-Downsample (LTD) module. Default: (32, 48).

  • global_in_channels (int) – Number of input channels of Global Feature Extractor(GFE). Equal to number of output channels of LTD. Default: 64.

  • global_block_channels (tuple[int]) – Tuple of integers that describe the output channels for each of the MobileNet-v2 bottleneck residual blocks in GFE. Default: (64, 96, 128).

  • global_block_strides (tuple[int]) – Tuple of integers that describe the strides (downsampling factors) for each of the MobileNet-v2 bottleneck residual blocks in GFE. Default: (2, 2, 1).

  • global_out_channels (int) – Number of output channels of GFE. Default: 128.

  • higher_in_channels (int) – Number of input channels of the higher resolution branch in FFM. Equal to global_in_channels. Default: 64.

  • lower_in_channels (int) – Number of input channels of the lower resolution branch in FFM. Equal to global_out_channels. Default: 128.

  • fusion_out_channels (int) – Number of output channels of FFM. Default: 128.

  • out_indices (tuple) – Tuple of indices of list [higher_res_features, lower_res_features, fusion_output]. Often set to (0,1,2) to enable aux. heads. Default: (0, 1, 2).

  • conv_cfg (dict | None) – Config of conv layers. Default: None

  • norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’BN’)

  • act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’)

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False

  • dw_act_cfg (dict) – In DepthwiseSeparableConvModule, activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, frozen_stages=- 1, zero_init_residual=False, multiscale_output=True, pretrained=None, init_cfg=None)[source]

HRNet backbone.

This backbone is the implementation of High-Resolution Representations for Labeling Pixels and Regions.

Parameters
  • extra (dict) –

    Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:

    • num_modules (int): The number of HRModule in this stage.

    • num_branches (int): The number of branches in the HRModule.

    • block (str): The type of convolution block.

    • num_blocks (tuple): The number of blocks in each branch.

      The length must be equal to num_branches.

    • num_channels (tuple): The number of channels in each branch.

      The length must be equal to num_branches.

  • in_channels (int) – Number of input image channels. Normally 3.

  • conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Use BN by default.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.

  • multiscale_output (bool) – Whether to output multi-level features produced by multiple branches. If False, only the first level feature will be output. Default: True.

  • pretrained (str, optional) – Model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

Example

>>> from mmseg.models import HRNet
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(4, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='BASIC',
>>>         num_blocks=(4, 4),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=3,
>>>         num_branches=4,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4, 4),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)
(1, 64, 4, 4)
(1, 128, 2, 2)
(1, 256, 1, 1)
forward(x)[source]

Forward function.

property norm1

the normalization layer named “norm1”

Type

nn.Module

property norm2

the normalization layer named “norm2”

Type

nn.Module

train(mode=True)[source]

Convert the model into training mode will keeping the normalization layer freezed.

class mmseg.models.backbones.ICNet(backbone_cfg, in_channels=3, layer_channels=(512, 2048), light_branch_middle_channels=32, psp_out_channels=512, out_channels=(64, 256, 256), pool_scales=(1, 2, 3, 6), conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, align_corners=False, init_cfg=None)[source]

ICNet for Real-Time Semantic Segmentation on High-Resolution Images.

This backbone is the implementation of ICNet.

Parameters
  • backbone_cfg (dict) – Config dict to build backbone. Usually it is ResNet but it can also be other backbones.

  • in_channels (int) – The number of input image channels. Default: 3.

  • layer_channels (Sequence[int]) – The numbers of feature channels at layer 2 and layer 4 in ResNet. It can also be other backbones. Default: (512, 2048).

  • light_branch_middle_channels (int) – The number of channels of the middle layer in light branch. Default: 32.

  • psp_out_channels (int) – The number of channels of the output of PSP module. Default: 512.

  • out_channels (Sequence[int]) – The numbers of output feature channels at each branches. Default: (64, 256, 256).

  • pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module. Default: (1, 2, 3, 6).

  • conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Dictionary to construct and config act layer. Default: dict(type=’ReLU’).

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.MAE(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, num_fcs=2, norm_eval=False, pretrained=None, init_values=0.1, init_cfg=None)[source]

VisionTransformer with support for patch.

Parameters
  • img_size (int | tuple) – Input image size. Default: 224.

  • patch_size (int) – The patch size. Default: 16.

  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (int) – embedding dimension. Default: 768.

  • num_layers (int) – depth of transformer. Default: 12.

  • num_heads (int) – number of attention heads. Default: 12.

  • mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.

  • out_indices (list | tuple | int) – Output from which stages. Default: -1.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – stochastic depth rate. Default 0.0.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_values (float) – Initialize the values of Attention and FFN with learnable scaling. Defaults to 0.1.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

fix_init_weight()[source]

Rescale the initialization according to layer id.

This function is copied from https://github.com/microsoft/unilm/blob/master/beit/modeling_pretrain.py. # noqa: E501 Copyright (c) Microsoft Corporation Licensed under the MIT License

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

class mmseg.models.backbones.MSCAN(in_channels=3, embed_dims=[64, 128, 256, 512], mlp_ratios=[4, 4, 4, 4], drop_rate=0.0, drop_path_rate=0.0, depths=[3, 4, 6, 3], num_stages=4, attention_kernel_sizes=[5, [1, 7], [1, 11], [1, 21]], attention_kernel_paddings=[2, [0, 3], [0, 5], [0, 10]], act_cfg={'type': 'GELU'}, norm_cfg={'requires_grad': True, 'type': 'SyncBN'}, pretrained=None, init_cfg=None)[source]

SegNeXt Multi-Scale Convolutional Attention Network (MCSAN) backbone.

This backbone is the implementation of SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. Inspiration from https://github.com/visual-attention-network/segnext.

Parameters
  • in_channels (int) – The number of input channels. Defaults: 3.

  • embed_dims (list[int]) – Embedding dimension. Defaults: [64, 128, 256, 512].

  • mlp_ratios (list[int]) – Ratio of mlp hidden dim to embedding dim. Defaults: [4, 4, 4, 4].

  • drop_rate (float) – Dropout rate. Defaults: 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults: 0.

  • depths (list[int]) – Depths of each Swin Transformer stage. Default: [3, 4, 6, 3].

  • num_stages (int) – MSCAN stages. Default: 4.

  • attention_kernel_sizes (list) – Size of attention kernel in Attention Module (Figure 2(b) of original paper). Defaults: [5, [1, 7], [1, 11], [1, 21]].

  • attention_kernel_paddings (list) – Size of attention paddings in Attention Module (Figure 2(b) of original paper). Defaults: [2, [0, 3], [0, 5], [0, 10]].

  • norm_cfg (dict) – Config of norm layers. Defaults: dict(type=’SyncBN’, requires_grad=True).

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(x)[source]

Forward function.

init_weights()[source]

Initialize modules of MSCAN.

class mmseg.models.backbones.MixVisionTransformer(in_channels=3, embed_dims=64, num_stages=4, num_layers=[3, 4, 6, 3], num_heads=[1, 2, 4, 8], patch_sizes=[7, 3, 3, 3], strides=[4, 2, 2, 2], sr_ratios=[8, 4, 2, 1], out_indices=(0, 1, 2, 3), mlp_ratio=4, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, act_cfg={'type': 'GELU'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, pretrained=None, init_cfg=None, with_cp=False)[source]

The backbone of Segformer.

This backbone is the implementation of SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. :param in_channels: Number of input channels. Default: 3. :type in_channels: int :param embed_dims: Embedding dimension. Default: 768. :type embed_dims: int :param num_stags: The num of stages. Default: 4. :type num_stags: int :param num_layers: The layer number of each transformer encode

layer. Default: [3, 4, 6, 3].

Parameters
  • num_heads (Sequence[int]) – The attention heads of each transformer encode layer. Default: [1, 2, 4, 8].

  • patch_sizes (Sequence[int]) – The patch_size of each overlapped patch embedding. Default: [7, 3, 3, 3].

  • strides (Sequence[int]) – The stride of each overlapped patch embedding. Default: [4, 2, 2, 2].

  • sr_ratios (Sequence[int]) – The spatial reduction rate of each transformer encode layer. Default: [8, 4, 2, 1].

  • out_indices (Sequence[int] | int) – Output from which stages. Default: (0, 1, 2, 3).

  • mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.

  • qkv_bias (bool) – Enable bias for qkv if True. Default: True.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.0

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – stochastic depth rate. Default 0.0

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

class mmseg.models.backbones.MobileNetV2(widen_factor=1.0, strides=(1, 2, 2, 2, 1, 2, 1), dilations=(1, 1, 1, 1, 1, 1, 1), out_indices=(1, 2, 4, 6), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]

MobileNetV2 backbone.

This backbone is the implementation of MobileNetV2: Inverted Residuals and Linear Bottlenecks.

Parameters
  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.

  • strides (Sequence[int], optional) – Strides of the first block of each layer. If not specified, default config in arch_setting will be used.

  • dilations (Sequence[int]) – Dilation of each layer.

  • out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

make_layer(out_channels, num_blocks, stride, dilation, expand_ratio)[source]

Stack InvertedResidual blocks to build a layer for MobileNetV2.

Parameters
  • out_channels (int) – out_channels of block.

  • num_blocks (int) – Number of blocks.

  • stride (int) – Stride of the first block.

  • dilation (int) – Dilation of the first block.

  • expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio.

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

class mmseg.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'type': 'BN'}, out_indices=(0, 1, 12), frozen_stages=- 1, reduction_factor=1, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]

MobileNetV3 backbone.

This backbone is the improved implementation of Searching for MobileNetV3.

Parameters
  • arch (str) – Architecture of mobilnetv3, from {‘small’, ‘large’}. Default: ‘small’.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • out_indices (tuple[int]) – Output from which layer. Default: (0, 1, 12).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

class mmseg.models.backbones.PCPVT(in_channels=3, embed_dims=[64, 128, 256, 512], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], out_indices=(0, 1, 2, 3), qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], norm_after_stage=False, pretrained=None, init_cfg=None)[source]

The backbone of Twins-PCPVT.

This backbone is the implementation of Twins: Revisiting the Design of Spatial Attention in Vision Transformers.

Parameters
  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (list) – Embedding dimension. Default: [64, 128, 256, 512].

  • patch_sizes (list) – The patch sizes. Default: [4, 2, 2, 2].

  • strides (list) – The strides. Default: [4, 2, 2, 2].

  • num_heads (int) – Number of attention heads. Default: [1, 2, 4, 8].

  • mlp_ratios (int) – Ratio of mlp hidden dim to embedding dim. Default: [4, 4, 4, 4].

  • out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • qkv_bias (bool) – Enable bias for qkv if True. Default: False.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – Stochastic depth rate. Default 0.0

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • depths (list) – Depths of each stage. Default [3, 4, 6, 3]

  • sr_ratios (list) – Kernel_size of conv in each Attn module in Transformer encoder layer. Default: [8, 4, 2, 1].

  • norm_after_stage(bool) – Add extra norm. Default False.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

class mmseg.models.backbones.PIDNet(in_channels: int = 3, channels: int = 64, ppm_channels: int = 96, num_stem_blocks: int = 2, num_branch_blocks: int = 3, align_corners: bool = False, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, **kwargs)[source]

PIDNet backbone.

This backbone is the implementation of PIDNet: A Real-time Semantic Segmentation Network Inspired from PID Controller. Modified from https://github.com/XuJiacong/PIDNet.

Licensed under the MIT License.

Parameters
  • in_channels (int) – The number of input channels. Default: 3.

  • channels (int) – The number of channels in the stem layer. Default: 64.

  • ppm_channels (int) – The number of channels in the PPM layer. Default: 96.

  • num_stem_blocks (int) – The number of blocks in the stem layer. Default: 2.

  • num_branch_blocks (int) – The number of blocks in the branch layer. Default: 3.

  • align_corners (bool) – The align_corners argument of F.interpolate. Default: False.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’, inplace=True).

  • init_cfg (dict) – Config dict for initialization. Default: None.

forward(x: torch.Tensor)Union[torch.Tensor, Tuple[torch.Tensor]][source]

Forward function.

Parameters

x (Tensor) – Input tensor with shape (B, C, H, W).

Returns

If self.training is True, return

tuple[Tensor], else return Tensor.

Return type

Tensor or tuple[Tensor]

init_weights()[source]

Initialize the weights in backbone.

Since the D branch is not initialized by the pre-trained model, we initialize it with the same method as the ResNet.

class mmseg.models.backbones.ResNeSt(groups=1, base_width=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[source]

ResNeSt backbone.

This backbone is the implementation of ResNeSt: Split-Attention Networks.

Parameters
  • groups (int) – Number of groups of Bottleneck. Default: 1

  • base_width (int) – Base width of Bottleneck. Default: 4

  • radix (int) – Radix of SpltAtConv2d. Default: 2

  • reduction_factor (int) – Reduction factor of inter_channels in SplitAttentionConv2d. Default: 4.

  • avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.

  • kwargs (dict) – Keyword arguments for ResNet.

make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer.

class mmseg.models.backbones.ResNeXt(groups=1, base_width=4, **kwargs)[source]

ResNeXt backbone.

This backbone is the implementation of Aggregated Residual Transformations for Deep Neural Networks.

Parameters
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Normally 3.

  • num_stages (int) – Resnet stages, normally 4.

  • groups (int) – Group of resnext.

  • base_width (int) – Base width of resnext.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.

Example

>>> from mmseg.models import ResNeXt
>>> import torch
>>> self = ResNeXt(depth=50)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 8, 8)
(1, 512, 4, 4)
(1, 1024, 2, 2)
(1, 2048, 1, 1)
make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer

class mmseg.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, multi_grid=None, contract_dilation=False, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[source]

ResNet backbone.

This backbone is the improved implementation of Deep Residual Learning for Image Recognition.

Parameters
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Number of stem channels. Default: 64.

  • base_channels (int) – Number of base channels of res layer. Default: 64.

  • num_stages (int) – Resnet stages, normally 4. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: ‘pytorch’.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – Dictionary to construct and config conv layer. When conv_cfg is None, cfg will be set to dict(type=’Conv2d’). Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • dcn (dict | None) – Dictionary to construct and config DCN conv layer. When dcn is not None, conv_cfg must be None. Default: None.

  • stage_with_dcn (Sequence[bool]) – Whether to set DCN conv for each stage. The length of stage_with_dcn is equal to num_stages. Default: (False, False, False, False).

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • position (str, required): Position inside block to insert plugin,

    options: ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.

    • stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’. Default: None.

  • multi_grid (Sequence[int]|None) – Multi grid dilation rates of last stage. Default: None.

  • contract_dilation (bool) – Whether contract first dilation of each layer Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

Example

>>> from mmseg.models import ResNet
>>> import torch
>>> self = ResNet(depth=18)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)
forward(x)[source]

Forward function.

make_res_layer(**kwargs)[source]

Pack all blocks in a stage into a ResLayer.

make_stage_plugins(plugins, stage_idx)[source]

make plugins for ResNet ‘stage_idx’th stage .

Currently we support to insert ‘context_block’, ‘empirical_attention_block’, ‘nonlocal_block’ into the backbone like ResNet/ResNeXt. They could be inserted after conv1/conv2/conv3 of Bottleneck.

An example of plugins format could be : >>> plugins=[ … dict(cfg=dict(type=’xxx’, arg1=’xxx’), … stages=(False, True, True, True), … position=’after_conv2’), … dict(cfg=dict(type=’yyy’), … stages=(True, True, True, True), … position=’after_conv3’), … dict(cfg=dict(type=’zzz’, postfix=’1’), … stages=(True, True, True, True), … position=’after_conv3’), … dict(cfg=dict(type=’zzz’, postfix=’2’), … stages=(True, True, True, True), … position=’after_conv3’) … ] >>> self = ResNet(depth=18) >>> stage_plugins = self.make_stage_plugins(plugins, 0) >>> assert len(stage_plugins) == 3

Suppose ‘stage_idx=0’, the structure of blocks in the stage would be:

conv1-> conv2->conv3->yyy->zzz1->zzz2

Suppose ‘stage_idx=1’, the structure of blocks in the stage would be:

conv1-> conv2->xxx->conv3->yyy->zzz1->zzz2

If stages is missing, the plugin would be applied to all stages.

Parameters
  • plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.

  • stage_idx (int) – Index of stage to build

Returns

Plugins for current stage

Return type

list[dict]

property norm1

the normalization layer named “norm1”

Type

nn.Module

train(mode=True)[source]

Convert the model into training mode while keep normalization layer freezed.

class mmseg.models.backbones.ResNetV1c(**kwargs)[source]

ResNetV1c variant described in [1]_.

Compared with default ResNet(ResNetV1b), ResNetV1c replaces the 7x7 conv in the input stem with three 3x3 convs. For more details please refer to Bag of Tricks for Image Classification with Convolutional Neural Networks.

class mmseg.models.backbones.ResNetV1d(**kwargs)[source]

ResNetV1d variant described in [1]_.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmseg.models.backbones.STDCContextPathNet(backbone_cfg, last_in_channels=(1024, 512), out_channels=128, ffm_cfg={'in_channels': 512, 'out_channels': 256, 'scale_factor': 4}, upsample_mode='nearest', align_corners=None, norm_cfg={'type': 'BN'}, init_cfg=None)[source]

STDCNet with Context Path. The outs below is a list of three feature maps from deep to shallow, whose height and width is from small to big, respectively. The biggest feature map of outs is outputted for STDCHead, where Detail Loss would be calculated by Detail Ground-truth. The other two feature maps are used for Attention Refinement Module, respectively. Besides, the biggest feature map of outs and the last output of Attention Refinement Module are concatenated for Feature Fusion Module. Then, this fusion feature map feat_fuse would be outputted for decode_head. More details please refer to Figure 4 of original paper.

Parameters
  • backbone_cfg (dict) – Config dict for stdc backbone.

  • last_in_channels (tuple(int)) – two feature maps from stdc backbone. Default: (1024, 512).

  • out_channels (int) – The channels of output feature maps. Default: 128.

  • ffm_cfg (dict) – Config dict for Feature Fusion Module. Default: dict(in_channels=512, out_channels=256, scale_factor=4).

  • upsample_mode (str) – Algorithm used for upsampling: 'nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear'. Default: 'nearest'.

  • align_corners (str) – align_corners argument of F.interpolate. It must be None if upsample_mode is 'nearest'. Default: None.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

Returns

The tuple of list of output feature map for

auxiliary heads and decoder head.

Return type

outputs (tuple)

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.STDCNet(stdc_type, in_channels, channels, bottleneck_type, norm_cfg, act_cfg, num_convs=4, with_final_conv=False, pretrained=None, init_cfg=None)[source]

This backbone is the implementation of Rethinking BiSeNet For Real-time Semantic Segmentation.

Parameters
  • stdc_type (int) – The type of backbone structure, STDCNet1 and`STDCNet2` denotes two main backbones in paper, whose FLOPs is 813M and 1446M, respectively.

  • in_channels (int) – The num of input_channels.

  • channels (tuple[int]) – The output channels for each stage.

  • bottleneck_type (str) – The type of STDC Module type, the value must be ‘add’ or ‘cat’.

  • norm_cfg (dict) – Config dict for normalization layer.

  • act_cfg (dict) – The activation config for conv layers.

  • num_convs (int) – Numbers of conv layer at each STDC Module. Default: 4.

  • with_final_conv (bool) – Whether add a conv layer at the Module output. Default: True.

  • pretrained (str, optional) – Model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

Example

>>> import torch
>>> stdc_type = 'STDCNet1'
>>> in_channels = 3
>>> channels = (32, 64, 256, 512, 1024)
>>> bottleneck_type = 'cat'
>>> inputs = torch.rand(1, 3, 1024, 2048)
>>> self = STDCNet(stdc_type, in_channels,
...                 channels, bottleneck_type).eval()
>>> outputs = self.forward(inputs)
>>> for i in range(len(outputs)):
...     print(f'outputs[{i}].shape = {outputs[i].shape}')
outputs[0].shape = torch.Size([1, 256, 128, 256])
outputs[1].shape = torch.Size([1, 512, 64, 128])
outputs[2].shape = torch.Size([1, 1024, 32, 64])
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.SVT(in_channels=3, embed_dims=[64, 128, 256], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], num_heads=[1, 2, 4], mlp_ratios=[4, 4, 4], out_indices=(0, 1, 2, 3), qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.2, norm_cfg={'type': 'LN'}, depths=[4, 4, 4], sr_ratios=[4, 2, 1], windiow_sizes=[7, 7, 7], norm_after_stage=True, pretrained=None, init_cfg=None)[source]

The backbone of Twins-SVT.

This backbone is the implementation of Twins: Revisiting the Design of Spatial Attention in Vision Transformers.

Parameters
  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (list) – Embedding dimension. Default: [64, 128, 256, 512].

  • patch_sizes (list) – The patch sizes. Default: [4, 2, 2, 2].

  • strides (list) – The strides. Default: [4, 2, 2, 2].

  • num_heads (int) – Number of attention heads. Default: [1, 2, 4].

  • mlp_ratios (int) – Ratio of mlp hidden dim to embedding dim. Default: [4, 4, 4].

  • out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • qkv_bias (bool) – Enable bias for qkv if True. Default: False.

  • drop_rate (float) – Dropout rate. Default 0.

  • attn_drop_rate (float) – Dropout ratio of attention weight. Default 0.0

  • drop_path_rate (float) – Stochastic depth rate. Default 0.2.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • depths (list) – Depths of each stage. Default [4, 4, 4].

  • sr_ratios (list) – Kernel_size of conv in each Attn module in Transformer encoder layer. Default: [4, 2, 1].

  • windiow_sizes (list) – Window size of LSA. Default: [7, 7, 7],

  • input_features_slice(bool) – Input features need slice. Default: False.

  • norm_after_stage(bool) – Add extra norm. Default False.

  • strides – Strides in patch-Embedding modules. Default: (2, 2, 2)

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

class mmseg.models.backbones.SwinTransformer(pretrain_img_size=224, in_channels=3, embed_dims=96, patch_size=4, window_size=7, mlp_ratio=4, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), strides=(4, 2, 2, 2), out_indices=(0, 1, 2, 3), qkv_bias=True, qk_scale=None, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=False, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, with_cp=False, pretrained=None, frozen_stages=- 1, init_cfg=None)[source]

Swin Transformer backbone.

This backbone is the implementation of Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Inspiration from https://github.com/microsoft/Swin-Transformer.

Parameters
  • pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.

  • in_channels (int) – The num of input channels. Defaults: 3.

  • embed_dims (int) – The feature dimension. Default: 96.

  • patch_size (int | tuple[int]) – Patch size. Default: 4.

  • window_size (int) – Window size. Default: 7.

  • mlp_ratio (int | float) – Ratio of mlp hidden dim to embedding dim. Default: 4.

  • depths (tuple[int]) – Depths of each Swin Transformer stage. Default: (2, 2, 6, 2).

  • num_heads (tuple[int]) – Parallel attention heads of each Swin Transformer stage. Default: (3, 6, 12, 24).

  • strides (tuple[int]) – The patch merging or patch embedding stride of each Swin Transformer stage. (In swin, we set kernel size equal to stride.) Default: (4, 2, 2, 2).

  • out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True

  • qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.

  • patch_norm (bool) – If add a norm layer for patch embed and patch merging. Default: True.

  • drop_rate (float) – Dropout rate. Defaults: 0.

  • attn_drop_rate (float) – Attention dropout rate. Default: 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults: 0.1.

  • use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: False.

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LN’).

  • norm_cfg (dict) – Config dict for normalization layer at output of backone. Defaults: dict(type=’LN’).

  • with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

train(mode=True)[source]

Convert the model into training mode while keep layers freezed.

class mmseg.models.backbones.TIMMBackbone(model_name, features_only=True, pretrained=True, checkpoint_path='', in_channels=3, init_cfg=None, **kwargs)[source]

Wrapper to use backbones from timm library. More details can be found in timm .

Parameters
  • model_name (str) – Name of timm model to instantiate.

  • pretrained (bool) – Load pretrained weights if True.

  • checkpoint_path (str) – Path of checkpoint to load after model is initialized.

  • in_channels (int) – Number of input image channels. Default: 3.

  • init_cfg (dict, optional) – Initialization config dict

  • **kwargs – Other timm & model specific arguments.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.UNet(in_channels=3, base_channels=64, num_stages=5, strides=(1, 1, 1, 1, 1), enc_num_convs=(2, 2, 2, 2, 2), dec_num_convs=(2, 2, 2, 2), downsamples=(True, True, True, True), enc_dilations=(1, 1, 1, 1, 1), dec_dilations=(1, 1, 1, 1), with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, upsample_cfg={'type': 'InterpConv'}, norm_eval=False, dcn=None, plugins=None, pretrained=None, init_cfg=None)[source]

UNet backbone.

This backbone is the implementation of U-Net: Convolutional Networks for Biomedical Image Segmentation.

Parameters
  • in_channels (int) – Number of input image channels. Default” 3.

  • base_channels (int) – Number of base channels of each stage. The output channels of the first stage. Default: 64.

  • num_stages (int) – Number of stages in encoder, normally 5. Default: 5.

  • strides (Sequence[int 1 | 2]) – Strides of each stage in encoder. len(strides) is equal to num_stages. Normally the stride of the first stage in encoder is 1. If strides[i]=2, it uses stride convolution to downsample in the correspondence encoder stage. Default: (1, 1, 1, 1, 1).

  • enc_num_convs (Sequence[int]) – Number of convolutional layers in the convolution block of the correspondence encoder stage. Default: (2, 2, 2, 2, 2).

  • dec_num_convs (Sequence[int]) – Number of convolutional layers in the convolution block of the correspondence decoder stage. Default: (2, 2, 2, 2).

  • downsamples (Sequence[int]) – Whether use MaxPool to downsample the feature map after the first stage of encoder (stages: [1, num_stages)). If the correspondence encoder stage use stride convolution (strides[i]=2), it will never use MaxPool to downsample, even downsamples[i-1]=True. Default: (True, True, True, True).

  • enc_dilations (Sequence[int]) – Dilation rate of each stage in encoder. Default: (1, 1, 1, 1, 1).

  • dec_dilations (Sequence[int]) – Dilation rate of each stage in decoder. Default: (1, 1, 1, 1).

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • conv_cfg (dict | None) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict | None) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict | None) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’).

  • upsample_cfg (dict) – The upsample config of the upsample module in decoder. Default: dict(type=’InterpConv’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • dcn (bool) – Use deformable convolution in convolutional layer or not. Default: None.

  • plugins (dict) – plugins for convolutional layers. Default: None.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

Notice:

The input image size should be divisible by the whole downsample rate of the encoder. More detail of the whole downsample rate can be found in UNet._check_input_divisible.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[source]

Convert the model into training mode while keep normalization layer freezed.

class mmseg.models.backbones.VPD(diffusion_cfg: Union[mmengine.config.config.ConfigDict, dict], class_embed_path: str, unet_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}, gamma: float = 0.0001, class_embed_select=False, pad_shape: Optional[Union[int, List[int]]] = None, pad_val: Union[int, List[int]] = 0, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]

VPD (Visual Perception Diffusion) model.

Parameters
  • diffusion_cfg (dict) – Configuration for diffusion model.

  • class_embed_path (str) – Path for class embeddings.

  • unet_cfg (dict, optional) – Configuration for U-Net.

  • gamma (float, optional) – Gamma for text adaptation. Defaults to 1e-4.

  • class_embed_select (bool, optional) – If True, enables class embedding selection. Defaults to False.

  • pad_shape (Optional[Union[int, List[int]]], optional) – Padding shape. Defaults to None.

  • pad_val (Union[int, List[int]], optional) – Padding value. Defaults to 0.

  • init_cfg (dict, optional) – Configuration for network initialization.

forward(x)[source]

Extract features from images.

class mmseg.models.backbones.VisionTransformer(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, with_cls_token=True, output_cls_token=False, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, interpolate_mode='bicubic', num_fcs=2, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]

Vision Transformer.

This backbone is the implementation of An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

Parameters
  • img_size (int | tuple) – Input image size. Default: 224.

  • patch_size (int) – The patch size. Default: 16.

  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (int) – embedding dimension. Default: 768.

  • num_layers (int) – depth of transformer. Default: 12.

  • num_heads (int) – number of attention heads. Default: 12.

  • mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.

  • out_indices (list | tuple | int) – Output from which stages. Default: -1.

  • qkv_bias (bool) – enable bias for qkv if True. Default: True.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.0

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – stochastic depth rate. Default 0.0

  • with_cls_token (bool) – Whether concatenating class token into image tokens as transformer input. Default: True.

  • output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Default: False.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.

  • interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Default: bicubic.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(inputs)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[source]

Initialize the weights.

static resize_pos_embed(pos_embed, input_shpae, pos_shape, mode)[source]

Resize pos_embed weights.

Resize pos_embed using bicubic interpolate method. :param pos_embed: Position embedding weights. :type pos_embed: torch.Tensor :param input_shpae: Tuple for (downsampled input image height,

downsampled input image width).

Parameters
  • pos_shape (tuple) – The resolution of downsampled origin training image.

  • mode (str) – Algorithm used for upsampling: 'nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear'. Default: 'nearest'

Returns

The resized pos_embed of shape [B, L_new, C]

Return type

torch.Tensor

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

decode_heads

class mmseg.models.decode_heads.ANNHead(project_channels, query_scales=(1), key_pool_scales=(1, 3, 6, 8), **kwargs)[source]

Asymmetric Non-local Neural Networks for Semantic Segmentation.

This head is the implementation of ANNNet.

Parameters
  • project_channels (int) – Projection channels for Nonlocal.

  • query_scales (tuple[int]) – The scales of query feature map. Default: (1,)

  • key_pool_scales (tuple[int]) – The pooling scales of key feature map. Default: (1, 3, 6, 8).

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.APCHead(pool_scales=(1, 2, 3, 6), fusion=True, **kwargs)[source]

Adaptive Pyramid Context Network for Semantic Segmentation.

This head is the implementation of APCNet.

Parameters
  • pool_scales (tuple[int]) – Pooling scales used in Adaptive Context Module. Default: (1, 2, 3, 6).

  • fusion (bool) – Add one conv to fuse residual feature.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.ASPPHead(dilations=(1, 6, 12, 18), **kwargs)[source]

Rethinking Atrous Convolution for Semantic Image Segmentation.

This head is the implementation of DeepLabV3.

Parameters

dilations (tuple[int]) – Dilation rates for ASPP module. Default: (1, 6, 12, 18).

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.CCHead(recurrence=2, **kwargs)[source]

CCNet: Criss-Cross Attention for Semantic Segmentation.

This head is the implementation of CCNet.

Parameters

recurrence (int) – Number of recurrence of Criss Cross Attention module. Default: 2.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.DAHead(pam_channels, **kwargs)[source]

Dual Attention Network for Scene Segmentation.

This head is the implementation of DANet.

Parameters

pam_channels (int) – The channels of Position Attention Module(PAM).

cam_cls_seg(feat)[source]

CAM feature classification.

forward(inputs)[source]

Forward function.

loss_by_feat(seg_logit: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], **kwargs)dict[source]

Compute pam_cam, pam, cam loss.

pam_cls_seg(feat)[source]

PAM feature classification.

predict(inputs, batch_img_metas: List[dict], test_cfg, **kwargs)List[torch.Tensor][source]

Forward function for testing, only pam_cam is used.

class mmseg.models.decode_heads.DDRHead(in_channels: int, channels: int, num_classes: int, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, **kwargs)[source]

Decode head for DDRNet.

Parameters
  • in_channels (int) – Number of input channels.

  • channels (int) – Number of output channels.

  • num_classes (int) – Number of classes.

  • norm_cfg (dict, optional) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict, optional) – Config dict for activation layer. Default: dict(type=’ReLU’, inplace=True).

forward(inputs: Union[torch.Tensor, Tuple[torch.Tensor]])Union[torch.Tensor, Tuple[torch.Tensor]][source]

Placeholder of forward function.

init_weights()[source]

Initialize the weights.

loss_by_feat(seg_logits: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])dict[source]

Compute segmentation loss.

Parameters
  • seg_logits (Tensor) – The output from decode head forward function.

  • batch_data_samples (List[SegDataSample]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.

Returns

a dictionary of loss components

Return type

dict[str, Tensor]

class mmseg.models.decode_heads.DMHead(filter_sizes=(1, 3, 5, 7), fusion=False, **kwargs)[source]

Dynamic Multi-scale Filters for Semantic Segmentation.

This head is the implementation of DMNet.

Parameters
  • filter_sizes (tuple[int]) – The size of generated convolutional filters used in Dynamic Convolutional Module. Default: (1, 3, 5, 7).

  • fusion (bool) – Add one conv to fuse DCM output feature.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.DNLHead(reduction=2, use_scale=True, mode='embedded_gaussian', temperature=0.05, **kwargs)[source]

Disentangled Non-Local Neural Networks.

This head is the implementation of DNLNet.

Parameters
  • reduction (int) – Reduction factor of projection transform. Default: 2.

  • use_scale (bool) – Whether to scale pairwise_weight by sqrt(1/inter_channels). Default: False.

  • mode (str) – The nonlocal mode. Options are ‘embedded_gaussian’, ‘dot_product’. Default: ‘embedded_gaussian.’.

  • temperature (float) – Temperature to adjust attention. Default: 0.05

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.DPTHead(embed_dims=768, post_process_channels=[96, 192, 384, 768], readout_type='ignore', patch_size=16, expand_channels=False, act_cfg={'type': 'ReLU'}, norm_cfg={'type': 'BN'}, **kwargs)[source]

Vision Transformers for Dense Prediction.

This head is implemented of DPT.

Parameters
  • embed_dims (int) – The embed dimension of the ViT backbone. Default: 768.

  • post_process_channels (List) – Out channels of post process conv layers. Default: [96, 192, 384, 768].

  • readout_type (str) – Type of readout operation. Default: ‘ignore’.

  • patch_size (int) – The patch size. Default: 16.

  • expand_channels (bool) – Whether expand the channels in post process block. Default: False.

  • act_cfg (dict) – The activation config for residual conv unit. Default dict(type=’ReLU’).

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

forward(inputs)[source]

Placeholder of forward function.

class mmseg.models.decode_heads.DepthwiseSeparableASPPHead(c1_in_channels, c1_channels, **kwargs)[source]

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.

This head is the implementation of DeepLabV3+.

Parameters
  • c1_in_channels (int) – The input channels of c1 decoder. If is 0, the no decoder will be used.

  • c1_channels (int) – The intermediate channels of c1 decoder.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.DepthwiseSeparableFCNHead(dw_act_cfg=None, **kwargs)[source]

Depthwise-Separable Fully Convolutional Network for Semantic Segmentation.

This head is implemented according to Fast-SCNN: Fast Semantic Segmentation Network.

Parameters
  • in_channels (int) – Number of output channels of FFM.

  • channels (int) – Number of middle-stage channels in the decode head.

  • concat_input (bool) – Whether to concatenate original decode input into the result of several consecutive convolution layers. Default: True.

  • num_classes (int) – Used to determine the dimension of final prediction tensor.

  • in_index (int) – Correspond with ‘out_indices’ in FastSCNN backbone.

  • norm_cfg (dict | None) – Config of norm layers.

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • loss_decode (dict) – Config of loss type and some relevant additional options.

  • dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: None.

class mmseg.models.decode_heads.EMAHead(ema_channels, num_bases, num_stages, concat_input=True, momentum=0.1, **kwargs)[source]

Expectation Maximization Attention Networks for Semantic Segmentation.

This head is the implementation of EMANet.

Parameters
  • ema_channels (int) – EMA module channels

  • num_bases (int) – Number of bases.

  • num_stages (int) – Number of the EM iterations.

  • concat_input (bool) – Whether concat the input and output of convs before classification layer. Default: True

  • momentum (float) – Momentum to update the base. Default: 0.1.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.EncHead(num_codes=32, use_se_loss=True, add_lateral=False, loss_se_decode={'loss_weight': 0.2, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, **kwargs)[source]

Context Encoding for Semantic Segmentation.

This head is the implementation of EncNet.

Parameters
  • num_codes (int) – Number of code words. Default: 32.

  • use_se_loss (bool) – Whether use Semantic Encoding Loss (SE-loss) to regularize the training. Default: True.

  • add_lateral (bool) – Whether use lateral connection to fuse features. Default: False.

  • loss_se_decode (dict) – Config of decode loss. Default: dict(type=’CrossEntropyLoss’, use_sigmoid=True).

forward(inputs)[source]

Forward function.

loss_by_feat(seg_logit: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], **kwargs)dict[source]

Compute segmentation and semantic encoding loss.

predict(inputs: Tuple[torch.Tensor], batch_img_metas: List[dict], test_cfg: Union[mmengine.config.config.ConfigDict, dict])[source]

Forward function for testing, ignore se_loss.

class mmseg.models.decode_heads.FCNHead(num_convs=2, kernel_size=3, concat_input=True, dilation=1, **kwargs)[source]

Fully Convolution Networks for Semantic Segmentation.

This head is implemented of FCNNet.

Parameters
  • num_convs (int) – Number of convs in the head. Default: 2.

  • kernel_size (int) – The kernel size for convs in the head. Default: 3.

  • concat_input (bool) – Whether concat the input and output of convs before classification layer.

  • dilation (int) – The dilation rate for convs in the head. Default: 1.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.FPNHead(feature_strides, **kwargs)[source]

Panoptic Feature Pyramid Networks.

This head is the implementation of Semantic FPN.

Parameters

feature_strides (tuple[int]) – The strides for input feature maps. stack_lateral. All strides suppose to be power of 2. The first one is of largest resolution.

forward(inputs)[source]

Placeholder of forward function.

class mmseg.models.decode_heads.GCHead(ratio=0.25, pooling_type='att', fusion_types=('channel_add'), **kwargs)[source]

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond.

This head is the implementation of GCNet.

Parameters
  • ratio (float) – Multiplier of channels ratio. Default: 1/4.

  • pooling_type (str) – The pooling type of context aggregation. Options are ‘att’, ‘avg’. Default: ‘avg’.

  • fusion_types (tuple[str]) – The fusion type for feature fusion. Options are ‘channel_add’, ‘channel_mul’. Default: (‘channel_add’,)

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.ISAHead(isa_channels, down_factor=(8, 8), **kwargs)[source]

Interlaced Sparse Self-Attention for Semantic Segmentation.

This head is the implementation of ISA.

Parameters
  • isa_channels (int) – The channels of ISA Module.

  • down_factor (tuple[int]) – The local group size of ISA.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.IterativeDecodeHead(num_stages, kernel_generate_head, kernel_update_head, **kwargs)[source]

K-Net: Towards Unified Image Segmentation.

This head is the implementation of `K-Net: <https://arxiv.org/abs/2106.14855>`_.

Parameters
  • num_stages (int) – The number of stages (kernel update heads) in IterativeDecodeHead. Default: 3.

  • kernel_generate_head – (dict): Config of kernel generate head which generate mask predictions, dynamic kernels and class predictions for next kernel update heads.

  • kernel_update_head (dict) – Config of kernel update head which refine dynamic kernels and class predictions iteratively.

forward(inputs)[source]

Forward function.

loss_by_feat(seg_logits: List[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], **kwargs)dict[source]

Compute segmentation loss.

Parameters
  • seg_logits (Tensor) – The output from decode head forward function.

  • batch_data_samples (List[SegDataSample]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.

Returns

a dictionary of loss components

Return type

dict[str, Tensor]

class mmseg.models.decode_heads.KernelUpdateHead(num_classes=150, num_ffn_fcs=2, num_heads=8, num_mask_fcs=3, feedforward_channels=2048, in_channels=256, out_channels=256, dropout=0.0, act_cfg={'inplace': True, 'type': 'ReLU'}, ffn_act_cfg={'inplace': True, 'type': 'ReLU'}, conv_kernel_size=1, feat_transform_cfg=None, kernel_init=False, with_ffn=True, feat_gather_stride=1, mask_transform_stride=1, kernel_updator_cfg={'act_cfg': {'inplace': True, 'type': 'ReLU'}, 'feat_channels': 64, 'in_channels': 256, 'norm_cfg': {'type': 'LN'}, 'out_channels': 256, 'type': 'DynamicConv'})[source]

Kernel Update Head in K-Net.

Parameters
  • num_classes (int) – Number of classes. Default: 150.

  • num_ffn_fcs (int) – The number of fully-connected layers in FFNs. Default: 2.

  • num_heads (int) – The number of parallel attention heads. Default: 8.

  • num_mask_fcs (int) – The number of fully connected layers for mask prediction. Default: 3.

  • feedforward_channels (int) – The hidden dimension of FFNs. Defaults: 2048.

  • in_channels (int) – The number of channels of input feature map. Default: 256.

  • out_channels (int) – The number of output channels. Default: 256.

  • dropout (float) – The Probability of an element to be zeroed in MultiheadAttention and FFN. Default 0.0.

  • act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).

  • ffn_act_cfg (dict) – Config of activation layers in FFN. Default: dict(type=’ReLU’).

  • conv_kernel_size (int) – The kernel size of convolution in Kernel Update Head for dynamic kernel updation. Default: 1.

  • feat_transform_cfg (dict | None) – Config of feature transform. Default: None.

  • kernel_init (bool) – Whether initiate mask kernel in mask head. Default: False.

  • with_ffn (bool) – Whether add FFN in kernel update head. Default: True.

  • feat_gather_stride (int) – Stride of convolution in feature transform. Default: 1.

  • mask_transform_stride (int) – Stride of mask transform. Default: 1.

  • kernel_updator_cfg (dict) –

    Config of kernel updator. Default: dict(

    type=’DynamicConv’, in_channels=256, feat_channels=64, out_channels=256, act_cfg=dict(type=’ReLU’, inplace=True), norm_cfg=dict(type=’LN’)).

forward(x, proposal_feat, mask_preds, mask_shape=None)[source]

Forward function of Dynamic Instance Interactive Head.

Parameters
  • x (Tensor) – Feature map from FPN with shape (batch_size, feature_dimensions, H , W).

  • proposal_feat (Tensor) – Intermediate feature get from diihead in last stage, has shape (batch_size, num_proposals, feature_dimensions)

  • mask_preds (Tensor) – mask prediction from the former stage in shape (batch_size, num_proposals, H, W).

Returns

The first tensor is predicted mask with shape (N, num_classes, H, W), the second tensor is dynamic kernel with shape (N, num_classes, channels, K, K).

Return type

Tuple

init_weights()[source]

Use xavier initialization for all weight parameter and set classification head bias as a specific value when use focal loss.

class mmseg.models.decode_heads.KernelUpdator(in_channels=256, feat_channels=64, out_channels=None, gate_sigmoid=True, gate_norm_act=False, activate_out=False, norm_cfg={'type': 'LN'}, act_cfg={'inplace': True, 'type': 'ReLU'})[source]

Dynamic Kernel Updator in Kernel Update Head.

Parameters
  • in_channels (int) – The number of channels of input feature map. Default: 256.

  • feat_channels (int) – The number of middle-stage channels in the kernel updator. Default: 64.

  • out_channels (int) – The number of output channels.

  • gate_sigmoid (bool) – Whether use sigmoid function in gate mechanism. Default: True.

  • gate_norm_act (bool) – Whether add normalization and activation layer in gate mechanism. Default: False.

  • activate_out – Whether add activation after gate mechanism. Default: False.

  • norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’LN’).

  • act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).

forward(update_feature, input_feature)[source]

Forward function of KernelUpdator.

Parameters
  • update_feature (torch.Tensor) – Feature map assembled from each group. It would be reshaped with last dimension shape: self.in_channels.

  • input_feature (torch.Tensor) – Intermediate feature with shape: (N, num_classes, conv_kernel_size**2, channels).

Returns

The output tensor of shape (N*C1/C2, K*K, C2), where N is the number of classes, C1 and C2 are the feature map channels of KernelUpdateHead and KernelUpdator, respectively.

Return type

Tensor

class mmseg.models.decode_heads.LRASPPHead(branch_channels=(32, 64), **kwargs)[source]

Lite R-ASPP (LRASPP) head is proposed in Searching for MobileNetV3.

This head is the improved implementation of Searching for MobileNetV3.

Parameters

branch_channels (tuple[int]) – The number of output channels in every each branch. Default: (32, 64).

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.LightHamHead(ham_channels=512, ham_kwargs={}, **kwargs)[source]

SegNeXt decode head.

This decode head is the implementation of SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. Inspiration from https://github.com/visual-attention-network/segnext.

Specifically, LightHamHead is inspired by HamNet from Is Attention Better Than Matrix Decomposition? <https://arxiv.org/abs/2109.04553>.

Parameters
  • ham_channels (int) – input channels for Hamburger. Defaults: 512.

  • ham_kwargs (int) – kwagrs for Ham. Defaults: dict().

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.Mask2FormerHead(num_classes, align_corners=False, ignore_index=255, **kwargs)[source]

Implements the Mask2Former head.

See Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation for details.

Parameters
  • num_classes (int) – Number of classes. Default: 150.

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • ignore_index (int) – The label index to be ignored. Default: 255.

loss(x: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], train_cfg: Union[mmengine.config.config.ConfigDict, dict])dict[source]

Perform forward propagation and loss calculation of the decoder head on the features of the upstream network.

Parameters
  • x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[SegDataSample]) – The Data Samples. It usually includes information such as gt_sem_seg.

  • train_cfg (ConfigType) – Training config.

Returns

a dictionary of loss components.

Return type

dict[str, Tensor]

predict(x: Tuple[torch.Tensor], batch_img_metas: List[dict], test_cfg: Union[mmengine.config.config.ConfigDict, dict])Tuple[torch.Tensor][source]

Test without augmentaton.

Parameters
  • x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.

  • batch_img_metas (List[SegDataSample]) – The Data Samples. It usually includes information such as gt_sem_seg.

  • test_cfg (ConfigType) – Test config.

Returns

A tensor of segmentation mask.

Return type

Tensor

class mmseg.models.decode_heads.MaskFormerHead(num_classes: int = 150, align_corners: bool = False, ignore_index: int = 255, **kwargs)[source]

Implements the MaskFormer head.

See Per-Pixel Classification is Not All You Need for Semantic Segmentation for details.

Parameters
  • num_classes (int) – Number of classes. Default: 150.

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • ignore_index (int) – The label index to be ignored. Default: 255.

loss(x: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], train_cfg: Union[mmengine.config.config.ConfigDict, dict])dict[source]

Perform forward propagation and loss calculation of the decoder head on the features of the upstream network.

Parameters
  • x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.

  • batch_data_samples (List[SegDataSample]) – The Data Samples. It usually includes information such as gt_sem_seg.

  • train_cfg (ConfigType) – Training config.

Returns

a dictionary of loss components.

Return type

dict[str, Tensor]

predict(x: Tuple[torch.Tensor], batch_img_metas: List[dict], test_cfg: Union[mmengine.config.config.ConfigDict, dict])Tuple[torch.Tensor][source]

Test without augmentaton.

Parameters
  • x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.

  • batch_img_metas (List[SegDataSample]) – The Data Samples. It usually includes information such as gt_sem_seg.

  • test_cfg (ConfigType) – Test config.

Returns

A tensor of segmentation mask.

Return type

Tensor

class mmseg.models.decode_heads.NLHead(reduction=2, use_scale=True, mode='embedded_gaussian', **kwargs)[source]

Non-local Neural Networks.

This head is the implementation of NLNet.

Parameters
  • reduction (int) – Reduction factor of projection transform. Default: 2.

  • use_scale (bool) – Whether to scale pairwise_weight by sqrt(1/inter_channels). Default: True.

  • mode (str) – The nonlocal mode. Options are ‘embedded_gaussian’, ‘dot_product’. Default: ‘embedded_gaussian.’.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.OCRHead(ocr_channels, scale=1, **kwargs)[source]

Object-Contextual Representations for Semantic Segmentation.

This head is the implementation of OCRNet.

Parameters
  • ocr_channels (int) – The intermediate channels of OCR block.

  • scale (int) – The scale of probability map in SpatialGatherModule in Default: 1.

forward(inputs, prev_output)[source]

Forward function.

class mmseg.models.decode_heads.PIDHead(in_channels: int, channels: int, num_classes: int, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, **kwargs)[source]

Decode head for PIDNet.

Parameters
  • in_channels (int) – Number of input channels.

  • channels (int) – Number of output channels.

  • num_classes (int) – Number of classes.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’, inplace=True).

forward(inputs: Union[torch.Tensor, Tuple[torch.Tensor]])Union[torch.Tensor, Tuple[torch.Tensor]][source]

Forward function. :param inputs: Input tensor or tuple of

Tensor. When training, the input is a tuple of three tensors, (p_feat, i_feat, d_feat), and the output is a tuple of three tensors, (p_seg_logit, i_seg_logit, d_seg_logit). When inference, only the head of integral branch is used, and input is a tensor of integral feature map, and the output is the segmentation logit.

Returns

Output tensor or tuple of tensors.

Return type

Tensor | tuple[Tensor]

init_weights()[source]

Initialize the weights.

loss_by_feat(seg_logits: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])dict[source]

Compute segmentation loss.

Parameters
  • seg_logits (Tensor) – The output from decode head forward function.

  • batch_data_samples (List[SegDataSample]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.

Returns

a dictionary of loss components

Return type

dict[str, Tensor]

class mmseg.models.decode_heads.PSAHead(mask_size, psa_type='bi-direction', compact=False, shrink_factor=2, normalization_factor=1.0, psa_softmax=True, **kwargs)[source]

Point-wise Spatial Attention Network for Scene Parsing.

This head is the implementation of PSANet.

Parameters
  • mask_size (tuple[int]) – The PSA mask size. It usually equals input size.

  • psa_type (str) – The type of psa module. Options are ‘collect’, ‘distribute’, ‘bi-direction’. Default: ‘bi-direction’

  • compact (bool) – Whether use compact map for ‘collect’ mode. Default: True.

  • shrink_factor (int) – The downsample factors of psa mask. Default: 2.

  • normalization_factor (float) – The normalize factor of attention.

  • psa_softmax (bool) – Whether use softmax for attention.

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.PSPHead(pool_scales=(1, 2, 3, 6), **kwargs)[source]

Pyramid Scene Parsing Network.

This head is the implementation of PSPNet.

Parameters

pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module. Default: (1, 2, 3, 6).

forward(inputs)[source]

Forward function.

class mmseg.models.decode_heads.PointHead(num_fcs=3, coarse_pred_each_layer=True, conv_cfg={'type': 'Conv1d'}, norm_cfg=None, act_cfg={'inplace': False, 'type': 'ReLU'}, **kwargs)[source]

A mask point head use in PointRend.

This head is implemented of PointRend: Image Segmentation as Rendering. PointHead use shared multi-layer perceptron (equivalent to nn.Conv1d) to predict the logit of input points. The fine-grained feature and coarse feature will be concatenate together for predication.

Parameters
  • num_fcs (int) – Number of fc layers in the head. Default: 3.

  • in_channels (int) – Number of input channels. Default: 256.

  • fc_channels (int) – Number of fc channels. Default: 256.

  • num_classes (int) – Number of classes for logits. Default: 80.

  • class_agnostic (bool) – Whether use class agnostic classification. If so, the output channels of logits will be 1. Default: False.

  • coarse_pred_each_layer (bool) – Whether concatenate coarse feature with the output of each fc layer. Default: True.

  • conv_cfg (dict|None) – Dictionary to construct and config conv layer. Default: dict(type=’Conv1d’))

  • norm_cfg (dict|None) – Dictionary to construct and config norm layer. Default: None.

  • loss_point (dict) – Dictionary to construct and config loss layer of point head. Default: dict(type=’CrossEntropyLoss’, use_mask=True, loss_weight=1.0).

cls_seg(feat)[source]

Classify each pixel with fc.

forward(fine_grained_point_feats, coarse_point_feats)[source]

Placeholder of forward function.

get_points_test(seg_logits, uncertainty_func, cfg)[source]

Sample points for testing.

Find num_points most uncertain points from uncertainty_map.

Parameters
  • seg_logits (Tensor) – A tensor of shape (batch_size, num_classes, height, width) for class-specific or class-agnostic prediction.

  • uncertainty_func (func) – uncertainty calculation function.

  • cfg (dict) – Testing config of point head.

Returns

A tensor of shape (batch_size, num_points)

that contains indices from [0, height x width) of the most uncertain points.

point_coords (Tensor): A tensor of shape (batch_size, num_points,

2) that contains [0, 1] x [0, 1] normalized coordinates of the most uncertain points from the height x width grid .

Return type

point_indices (Tensor)

get_points_train(seg_logits, uncertainty_func, cfg)[source]

Sample points for training.

Sample points in [0, 1] x [0, 1] coordinate space based on their uncertainty. The uncertainties are calculated for each point using ‘uncertainty_func’ function that takes point’s logit prediction as input.

Parameters
  • seg_logits (Tensor) – Semantic segmentation logits, shape ( batch_size, num_classes, height, width).

  • uncertainty_func (func) – uncertainty calculation function.

  • cfg (dict) – Training config of point head.

Returns

A tensor of shape (batch_size, num_points,

2) that contains the coordinates of num_points sampled points.

Return type

point_coords (Tensor)

loss(inputs, prev_output, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], train_cfg, **kwargs)[source]

Forward function for training. :param inputs: List of multi-level img features. :type inputs: list[Tensor] :param prev_output: The output of previous decode head. :type prev_output: Tensor :param batch_data_samples: The seg

data samples. It usually includes information such as img_metas or gt_semantic_seg.

Parameters

train_cfg (dict) – The training config.

Returns

a dictionary of loss components

Return type

dict[str, Tensor]

loss_by_feat(point_logits, points, batch_data_samples, **kwargs)[source]

Compute segmentation loss.

predict(inputs, prev_output, batch_img_metas: List[dict], test_cfg, **kwargs)[source]

Forward function for testing.

Parameters
  • inputs (list[Tensor]) – List of multi-level img features.

  • prev_output (Tensor) – The output of previous decode head.

  • img_metas (list[dict]) – List of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:Collect.

  • test_cfg (dict) – The testing config.

Returns

Output segmentation map.

Return type

Tensor

class mmseg.models.decode_heads.SETRMLAHead(mla_channels=128, up_scale=4, **kwargs)[source]

Multi level feature aggretation head of SETR.

MLA head of SETR.

Parameters
  • mlahead_channels (int) – Channels of conv-conv-4x of multi-level feature aggregation. Default: 128.

  • up_scale (int) – The scale factor of interpolate. Default:4.

forward(inputs)[source]

Placeholder of forward function.

class mmseg.models.decode_heads.SETRUPHead(norm_layer={'eps': 1e-06, 'requires_grad': True, 'type': 'LN'}, num_convs=1, up_scale=4, kernel_size=3, init_cfg=[{'type': 'Constant', 'val': 1.0, 'bias': 0, 'layer': 'LayerNorm'}, {'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}}], **kwargs)[source]

Naive upsampling head and Progressive upsampling head of SETR.

Naive or PUP head of SETR.

Parameters
  • norm_layer (dict) – Config dict for input normalization. Default: norm_layer=dict(type=’LN’, eps=1e-6, requires_grad=True).

  • num_convs (int) – Number of decoder convolutions. Default: 1.

  • up_scale (int) – The scale factor of interpolate. Default:4.

  • kernel_size (int) – The kernel size of convolution when decoding feature information from backbone. Default: 3.

  • init_cfg (dict | list[dict] | None) –

    Initialization config dict. Default: dict(

    type=’Constant’, val=1.0, bias=0, layer=’LayerNorm’).

forward(x)[source]

Placeholder of forward function.

class mmseg.models.decode_heads.STDCHead(boundary_threshold=0.1, **kwargs)[source]

This head is the implementation of Rethinking BiSeNet For Real-time Semantic Segmentation.

Parameters

boundary_threshold (float) – The threshold of calculating boundary. Default: 0.1.

loss_by_feat(seg_logits: torch.Tensor, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])dict[source]

Compute Detail Aggregation Loss.

class mmseg.models.decode_heads.SegformerHead(interpolate_mode='bilinear', **kwargs)[source]

The all mlp Head of segformer.

This head is the implementation of Segformer <https://arxiv.org/abs/2105.15203> _.

Parameters

interpolate_mode – The interpolate mode of MLP head upsample operation. Default: ‘bilinear’.

forward(inputs)[source]

Placeholder of forward function.

class mmseg.models.decode_heads.SegmenterMaskTransformerHead(in_channels, num_layers, num_heads, embed_dims, mlp_ratio=4, drop_path_rate=0.1, drop_rate=0.0, attn_drop_rate=0.0, num_fcs=2, qkv_bias=True, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, init_std=0.02, **kwargs)[source]

Segmenter: Transformer for Semantic Segmentation.

This head is the implementation of Segmenter:.

Parameters
  • backbone_cfg – (dict): Config of backbone of Context Path.

  • in_channels (int) – The number of channels of input image.

  • num_layers (int) – The depth of transformer.

  • num_heads (int) – The number of attention heads.

  • embed_dims (int) – The number of embedding dimension.

  • mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.

  • drop_path_rate (float) – stochastic depth rate. Default 0.1.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.0

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.

  • qkv_bias (bool) – Enable bias for qkv if True. Default: True.

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • init_std (float) – The value of std in weight initialization. Default: 0.02.

forward(inputs)[source]

Placeholder of forward function.

init_weights()[source]

Initialize the weights.

class mmseg.models.decode_heads.UPerHead(pool_scales=(1, 2, 3, 6), **kwargs)[source]

Unified Perceptual Parsing for Scene Understanding.

This head is the implementation of UPerNet.

Parameters

pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module applied on the last feature. Default: (1, 2, 3, 6).

forward(inputs)[source]

Forward function.

psp_forward(inputs)[source]

Forward function of PSP module.

class mmseg.models.decode_heads.VPDDepthHead(max_depth: float = 10.0, in_channels: Sequence[int] = [320, 640, 1280, 1280], embed_dim: int = 192, feature_dim: int = 1536, num_deconv_layers: int = 3, num_deconv_filters: Sequence[int] = (32, 32, 32), fmap_border: Union[int, Sequence[int]] = 0, align_corners: bool = False, loss_decode: dict = {'type': 'SiLogLoss'}, init_cfg={'layer': ['Conv2d', 'Linear'], 'std': 0.02, 'type': 'TruncNormal'})[source]

Depth Prediction Head for VPD.

Parameters
  • max_depth (float) – Maximum depth value. Defaults to 10.0.

  • in_channels (Sequence[int]) – Number of input channels for each convolutional layer.

  • embed_dim (int) – Dimension of embedding. Defaults to 192.

  • feature_dim (int) – Dimension of aggregated feature. Defaults to 1536.

  • num_deconv_layers (int) – Number of deconvolution layers in the decoder. Defaults to 3.

  • num_deconv_filters (Sequence[int]) – Number of filters for each deconv layer. Defaults to (32, 32, 32).

  • fmap_border (Union[int, Sequence[int]]) – Feature map border for cropping. Defaults to 0.

  • align_corners (bool) – Flag for align_corners in interpolation. Defaults to False.

  • loss_decode (dict) – Configurations for the loss function. Defaults to dict(type=’SiLogLoss’).

  • init_cfg (dict) – Initialization configurations. Defaults to dict(type=’TruncNormal’, std=0.02, layer=[‘Conv2d’, ‘Linear’]).

forward(x)[source]

Placeholder of forward function.

loss_by_feat(pred_depth_map: torch.Tensor, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])dict[source]

Compute depth estimation loss.

Parameters
  • pred_depth_map (Tensor) – The output from decode head forward function.

  • batch_data_samples (List[SegDataSample]) – The seg data samples. It usually includes information such as metainfo and gt_dpeth_map.

Returns

a dictionary of loss components

Return type

dict[str, Tensor]

segmentors

class mmseg.models.segmentors.BaseSegmentor(data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]

Base class for segmentors.

Parameters

data_preprocessor – Model preprocessing config for processing the input data. it usually includes to_rgb, pad_size_divisor, pad_val, mean and std. Default to None.

abstract encode_decode(inputs: torch.Tensor, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])[source]

Placeholder for encode images with backbone and decode into a semantic segmentation map of the same size as input.

abstract extract_feat(inputs: torch.Tensor)bool[source]

Placeholder for extract features from images.

forward(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None, mode: str = 'tensor')Union[Dict[str, torch.Tensor], List[mmseg.structures.seg_data_sample.SegDataSample], Tuple[torch.Tensor], torch.Tensor][source]

The unified entry for a forward process in both training and test.

The method should accept three modes: “tensor”, “predict” and “loss”:

  • “tensor”: Forward the whole network and return tensor or tuple of

tensor without any post-processing, same as a common nn.Module. - “predict”: Forward and return the predictions, which are fully processed to a list of SegDataSample. - “loss”: Forward and return a dict of losses according to the given inputs and data samples.

Note that this method doesn’t handle neither back propagation nor optimizer updating, which are done in the train_step().

Parameters
  • inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.

  • data_samples (list[SegDataSample]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg. Default to None.

  • mode (str) – Return what kind of value. Defaults to ‘tensor’.

Returns

The return type depends on mode.

  • If mode="tensor", return a tensor or a tuple of tensor.

  • If mode="predict", return a list of DetDataSample.

  • If mode="loss", return a dict of tensor.

abstract loss(inputs: torch.Tensor, data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])dict[source]

Calculate losses from a batch of inputs and data samples.

postprocess_result(seg_logits: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None)Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]

Convert results list to SegDataSample. :param seg_logits: The segmentation results, seg_logits from

model of each input image.

Parameters

data_samples (list[SegDataSample]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg. Default to None.

Returns

Segmentation results of the input images. Each SegDataSample usually contain:

  • ``pred_sem_seg``(PixelData): Prediction of semantic segmentation.

  • ``seg_logits``(PixelData): Predicted logits of semantic

    segmentation before normalization.

Return type

list[SegDataSample]

abstract predict(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None)Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]

Predict results from a batch of inputs and data samples with post- processing.

property with_auxiliary_head: bool

whether the segmentor has auxiliary head

Type

bool

property with_decode_head: bool

whether the segmentor has decode head

Type

bool

property with_neck: bool

whether the segmentor has neck

Type

bool

class mmseg.models.segmentors.CascadeEncoderDecoder(num_stages: int, backbone: Union[mmengine.config.config.ConfigDict, dict], decode_head: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, auxiliary_head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, pretrained: Optional[str] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]

Cascade Encoder Decoder segmentors.

CascadeEncoderDecoder almost the same as EncoderDecoder, while decoders of CascadeEncoderDecoder are cascaded. The output of previous decoder_head will be the input of next decoder_head.

Parameters
  • num_stages (int) – How many stages will be cascaded.

  • backbone (ConfigType) – The config for the backnone of segmentor.

  • decode_head (ConfigType) – The config for the decode head of segmentor.

  • neck (OptConfigType) – The config for the neck of segmentor. Defaults to None.

  • auxiliary_head (OptConfigType) – The config for the auxiliary head of segmentor. Defaults to None.

  • train_cfg (OptConfigType) – The config for training. Defaults to None.

  • test_cfg (OptConfigType) – The config for testing. Defaults to None.

  • data_preprocessor (dict, optional) – The pre-process config of BaseDataPreprocessor.

  • pretrained (str, optional) – The path for pretrained model. Defaults to None.

  • init_cfg (dict, optional) – The weight initialized config for BaseModule.

encode_decode(inputs: torch.Tensor, batch_img_metas: List[dict])torch.Tensor[source]

Encode images with backbone and decode into a semantic segmentation map of the same size as input.

class mmseg.models.segmentors.DepthEstimator(backbone: Union[mmengine.config.config.ConfigDict, dict], decode_head: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, auxiliary_head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, pretrained: Optional[str] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]

Encoder Decoder depth estimator.

EncoderDecoder typically consists of backbone, decode_head, auxiliary_head. Note that auxiliary_head is only used for deep supervision during training, which could be dumped during inference.

1. The loss method is used to calculate the loss of model, which includes two steps: (1) Extracts features to obtain the feature maps (2) Call the decode head loss function to forward decode head model and calculate losses.

loss(): extract_feat() -> _decode_head_forward_train() -> _auxiliary_head_forward_train (optional)
_decode_head_forward_train(): decode_head.loss()
_auxiliary_head_forward_train(): auxiliary_head.loss (optional)

2. The predict method is used to predict depth estimation results, which includes two steps: (1) Run inference function to obtain the list of depth (2) Call post-processing function to obtain list of SegDataSample including pred_depth_map.

predict(): inference() -> postprocess_result()
inference(): whole_inference()/slide_inference()
whole_inference()/slide_inference(): encoder_decoder()
encoder_decoder(): extract_feat() -> decode_head.predict()

3. The _forward method is used to output the tensor by running the model, which includes two steps: (1) Extracts features to obtain the feature maps (2)Call the decode head forward function to forward decode head model.

_forward(): extract_feat() -> _decode_head.forward()
Parameters
  • backbone (ConfigType) – The config for the backnone of depth estimator.

  • decode_head (ConfigType) – The config for the decode head of depth estimator.

  • neck (OptConfigType) – The config for the neck of depth estimator. Defaults to None.

  • auxiliary_head (OptConfigType) – The config for the auxiliary head of depth estimator. Defaults to None.

  • train_cfg (OptConfigType) – The config for training. Defaults to None.

  • test_cfg (OptConfigType) – The config for testing. Defaults to None.

  • data_preprocessor (dict, optional) – The pre-process config of BaseDataPreprocessor.

  • pretrained (str, optional) – The path for pretrained model. Defaults to None.

  • init_cfg (dict, optional) – The weight initialized config for BaseModule.

encode_decode(inputs: torch.Tensor, batch_img_metas: List[dict])torch.Tensor[source]

Encode images with backbone and decode into a depth map of the same size as input.

extract_feat(inputs: torch.Tensor, batch_img_metas: Optional[List[dict]] = None)torch.Tensor[source]

Extract features from images.

inference(inputs: torch.Tensor, batch_img_metas: List[dict])torch.Tensor[source]

Inference with slide/whole style.

Parameters
  • inputs (Tensor) – The input image of shape (N, 3, H, W).

  • batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, ‘pad_shape’, and ‘padding_size’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.

Returns

The depth estimation results.

Return type

Tensor

loss(inputs: torch.Tensor, data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])dict[source]

Calculate losses from a batch of inputs and data samples.

Parameters
  • inputs (Tensor) – Input images.

  • data_samples (list[SegDataSample]) – The seg data samples. It usually includes information such as metainfo and gt_depth_map.

Returns

a dictionary of loss components

Return type

dict[str, Tensor]

postprocess_result(depth: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None)Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]

Convert results list to SegDataSample. :param depth: The depth estimation results. :type depth: Tensor :param data_samples: The seg data samples.

It usually includes information such as metainfo and gt_depth_map. Default to None.

Returns

Depth estomation results of the input images. Each SegDataSample usually contain:

  • ``pred_depth_map``(PixelData): Prediction of depth estimation.

Return type

list[SegDataSample]

predict(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None)Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]

Predict results from a batch of inputs and data samples with post- processing.

Parameters
  • inputs (Tensor) – Inputs with shape (N, C, H, W).

  • data_samples (List[SegDataSample], optional) – The seg data samples. It usually includes information such as metainfo and gt_depth_map.

Returns

Depth estimation results of the input images. Each SegDataSample usually contain:

  • ``pred_depth_max``(PixelData): Prediction of depth estimation.

Return type

list[SegDataSample]

slide_flip_inference(inputs: torch.Tensor, batch_img_metas: List[dict])torch.Tensor[source]

Inference by sliding-window with overlap and flip.

If h_crop > h_img or w_crop > w_img, the small patch will be used to decode without padding.

Parameters
  • inputs (tensor) – the tensor should have a shape NxCxHxW, which contains all images in the batch.

  • batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, and ‘pad_shape’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.

Returns

The depth estimation results.

Return type

Tensor

class mmseg.models.segmentors.EncoderDecoder(backbone: Union[mmengine.config.config.ConfigDict, dict], decode_head: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, auxiliary_head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, pretrained: Optional[str] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]

Encoder Decoder segmentors.

EncoderDecoder typically consists of backbone, decode_head, auxiliary_head. Note that auxiliary_head is only used for deep supervision during training, which could be dumped during inference.

1. The loss method is used to calculate the loss of model, which includes two steps: (1) Extracts features to obtain the feature maps (2) Call the decode head loss function to forward decode head model and calculate losses.

loss(): extract_feat() -> _decode_head_forward_train() -> _auxiliary_head_forward_train (optional)
_decode_head_forward_train(): decode_head.loss()
_auxiliary_head_forward_train(): auxiliary_head.loss (optional)

2. The predict method is used to predict segmentation results, which includes two steps: (1) Run inference function to obtain the list of seg_logits (2) Call post-processing function to obtain list of SegDataSample including pred_sem_seg and seg_logits.

predict(): inference() -> postprocess_result()
infercen(): whole_inference()/slide_inference()
whole_inference()/slide_inference(): encoder_decoder()
encoder_decoder(): extract_feat() -> decode_head.predict()

3. The _forward method is used to output the tensor by running the model, which includes two steps: (1) Extracts features to obtain the feature maps (2)Call the decode head forward function to forward decode head model.

_forward(): extract_feat() -> _decode_head.forward()
Parameters
  • backbone (ConfigType) – The config for the backnone of segmentor.

  • decode_head (ConfigType) – The config for the decode head of segmentor.

  • neck (OptConfigType) – The config for the neck of segmentor. Defaults to None.

  • auxiliary_head (OptConfigType) – The config for the auxiliary head of segmentor. Defaults to None.

  • train_cfg (OptConfigType) – The config for training. Defaults to None.

  • test_cfg (OptConfigType) – The config for testing. Defaults to None.

  • data_preprocessor (dict, optional) – The pre-process config of BaseDataPreprocessor.

  • pretrained (str, optional) – The path for pretrained model. Defaults to None.

  • init_cfg (dict, optional) – The weight initialized config for BaseModule.

aug_test(inputs, batch_img_metas, rescale=True)[source]

Test with augmentations.

Only rescale=True is supported.

encode_decode(inputs: torch.Tensor, batch_img_metas: List[dict])torch.Tensor[source]

Encode images with backbone and decode into a semantic segmentation map of the same size as input.

extract_feat(inputs: torch.Tensor)List[torch.Tensor][source]

Extract features from images.

inference(inputs: torch.Tensor, batch_img_metas: List[dict])torch.Tensor[source]

Inference with slide/whole style.

Parameters
  • inputs (Tensor) – The input image of shape (N, 3, H, W).

  • batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, ‘pad_shape’, and ‘padding_size’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.

Returns

The segmentation results, seg_logits from model of each

input image.

Return type

Tensor

loss(inputs: torch.Tensor, data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])dict[source]

Calculate losses from a batch of inputs and data samples.

Parameters
  • inputs (Tensor) – Input images.

  • data_samples (list[SegDataSample]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.

Returns

a dictionary of loss components

Return type

dict[str, Tensor]

predict(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None)Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]

Predict results from a batch of inputs and data samples with post- processing.

Parameters
  • inputs (Tensor) – Inputs with shape (N, C, H, W).

  • data_samples (List[SegDataSample], optional) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.

Returns

Segmentation results of the input images. Each SegDataSample usually contain:

  • ``pred_sem_seg``(PixelData): Prediction of semantic segmentation.

  • ``seg_logits``(PixelData): Predicted logits of semantic

    segmentation before normalization.

Return type

list[SegDataSample]

slide_inference(inputs: torch.Tensor, batch_img_metas: List[dict])torch.Tensor[source]

Inference by sliding-window with overlap.

If h_crop > h_img or w_crop > w_img, the small patch will be used to decode without padding.

Parameters
  • inputs (tensor) – the tensor should have a shape NxCxHxW, which contains all images in the batch.

  • batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, and ‘pad_shape’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.

Returns

The segmentation results, seg_logits from model of each

input image.

Return type

Tensor

whole_inference(inputs: torch.Tensor, batch_img_metas: List[dict])torch.Tensor[source]

Inference with full image.

Parameters
  • inputs (Tensor) – The tensor should have a shape NxCxHxW, which contains all images in the batch.

  • batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, and ‘pad_shape’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.

Returns

The segmentation results, seg_logits from model of each

input image.

Return type

Tensor

class mmseg.models.segmentors.SegTTAModel(module: Union[dict, torch.nn.modules.module.Module], data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None)[source]
merge_preds(data_samples_list: List[Sequence[mmseg.structures.seg_data_sample.SegDataSample]])Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]

Merge predictions of enhanced data to one prediction.

Parameters

data_samples_list (List[SampleList]) – List of predictions of all enhanced data.

Returns

Merged prediction.

Return type

SampleList

losses

class mmseg.models.losses.Accuracy(topk=(1), thresh=None, ignore_index=None)[source]

Accuracy calculation module.

forward(pred, target)[source]

Forward function to calculate accuracy.

Parameters
  • pred (torch.Tensor) – Prediction of models.

  • target (torch.Tensor) – Target for each prediction.

Returns

The accuracies under different topk criterions.

Return type

tuple[float]

class mmseg.models.losses.BoundaryLoss(loss_weight: float = 1.0, loss_name: str = 'loss_boundary')[source]

Boundary loss.

This function is modified from PIDNet. # noqa Licensed under the MIT License.

Parameters
  • loss_weight (float) – Weight of the loss. Defaults to 1.0.

  • loss_name (str) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_boundary’.

forward(bd_pre: torch.Tensor, bd_gt: torch.Tensor)torch.Tensor[source]

Forward function. :param bd_pre: Predictions of the boundary head. :type bd_pre: Tensor :param bd_gt: Ground truth of the boundary. :type bd_gt: Tensor

Returns

Loss tensor.

Return type

Tensor

class mmseg.models.losses.CrossEntropyLoss(use_sigmoid=False, use_mask=False, reduction='mean', class_weight=None, loss_weight=1.0, loss_name='loss_ce', avg_non_ignore=False)[source]

CrossEntropyLoss.

Parameters
  • use_sigmoid (bool, optional) – Whether the prediction uses sigmoid of softmax. Defaults to False.

  • use_mask (bool, optional) – Whether to use mask cross entropy loss. Defaults to False.

  • reduction (str, optional) – . Defaults to ‘mean’. Options are “none”, “mean” and “sum”.

  • class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.

  • loss_weight (float, optional) – Weight of the loss. Defaults to 1.0.

  • loss_name (str, optional) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_ce’.

  • avg_non_ignore (bool) – The flag decides to whether the loss is only averaged over non-ignored targets. Default: False. New in version 0.23.0.

extra_repr()[source]

Extra repr.

forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, ignore_index=- 100, **kwargs)[source]

Forward function.

property loss_name

Loss Name.

This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name.

Returns

The name of this loss item.

Return type

str

class mmseg.models.losses.DiceLoss(use_sigmoid=True, activate=True, reduction='mean', naive_dice=False, loss_weight=1.0, ignore_index=255, eps=0.001, loss_name='loss_dice')[source]
forward(pred, target, weight=None, avg_factor=None, reduction_override=None, ignore_index=255, **kwargs)[source]

Forward function.

Parameters
  • pred (torch.Tensor) – The prediction, has a shape (n, *).

  • target (torch.Tensor) – The label of the prediction, shape (n, *), same shape of pred.

  • weight (torch.Tensor, optional) – The weight of loss for each prediction, has a shape (n,). Defaults to None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Options are “none”, “mean” and “sum”.

Returns

The calculated loss

Return type

torch.Tensor

property loss_name

Loss Name.

This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str

class mmseg.models.losses.FocalLoss(use_sigmoid=True, gamma=2.0, alpha=0.5, reduction='mean', class_weight=None, loss_weight=1.0, loss_name='loss_focal')[source]
forward(pred, target, weight=None, avg_factor=None, reduction_override=None, ignore_index=255, **kwargs)[source]

Forward function.

Parameters
  • pred (torch.Tensor) – The prediction with shape (N, C) where C = number of classes, or (N, C, d_1, d_2, …, d_K) with K≥1 in the case of K-dimensional loss.

  • target (torch.Tensor) – The ground truth. If containing class indices, shape (N) where each value is 0≤targets[i]≤C−1, or (N, d_1, d_2, …, d_K) with K≥1 in the case of K-dimensional loss. If containing class probabilities, same shape as the input.

  • weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Options are “none”, “mean” and “sum”.

  • ignore_index (int, optional) – The label index to be ignored. Default: 255

Returns

The calculated loss

Return type

torch.Tensor

property loss_name

Loss Name.

This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str

class mmseg.models.losses.HuasdorffDisstanceLoss(reduction='mean', class_weight=None, loss_weight=1.0, ignore_index=255, loss_name='loss_huasdorff_disstance', **kwargs)