mmseg.apis¶
- class mmseg.apis.MMSegInferencer(model: Union[dict, mmengine.config.config.Config, mmengine.config.config.ConfigDict, str], weights: Optional[str] = None, classes: Optional[Union[str, List]] = None, palette: Optional[Union[str, List]] = None, dataset_name: Optional[str] = None, device: Optional[str] = None, scope: Optional[str] = 'mmseg')[source]¶
Semantic segmentation inferencer, provides inference and visualization interfaces. Note: MMEngine >= 0.5.0 is required.
- Parameters
model (str, optional) – Path to the config file or the model name defined in metafile. Take the mmseg metafile as an example the model could be “fcn_r50-d8_4xb2-40k_cityscapes-512x1024”, and the weights of model will be download automatically. If use config file, like “configs/fcn/fcn_r50-d8_4xb2-40k_cityscapes-512x1024.py”, the weights should be defined.
weights (str, optional) – Path to the checkpoint. If it is not specified and model is a model name of metafile, the weights will be loaded from metafile. Defaults to None.
classes (list, optional) – Input classes for result rendering, as the prediction of segmentation model is a segment map with label indices, classes is a list which includes items responding to the label indices. If classes is not defined, visualizer will take cityscapes classes by default. Defaults to None.
palette (list, optional) – Input palette for result rendering, which is a list of color palette responding to the classes. If palette is not defined, visualizer will take cityscapes palette by default. Defaults to None.
dataset_name (str, optional) – Dataset name or alias visulizer will use the meta information of the dataset i.e. classes and palette, but the classes and palette have higher priority. Defaults to None.
device (str, optional) – Device to run inference. If None, the available device will be automatically used. Defaults to None.
scope (str, optional) – The scope of the model. Defaults to ‘mmseg’.
- postprocess(preds: Union[mmseg.structures.seg_data_sample.SegDataSample, Sequence[mmseg.structures.seg_data_sample.SegDataSample]], visualization: List[numpy.ndarray], return_datasample: bool = False, pred_out_dir: str = '') → dict[source]¶
Process the predictions and visualization results from
forward
andvisualize
.This method should be responsible for the following tasks:
Pack the predictions and visualization results and return them.
Save the predictions, if it needed.
- Parameters
preds (List[Dict]) – Predictions of the model.
visualization (List[np.ndarray]) – The list of rendering color segmentation mask.
return_datasample (bool) – Whether to return results as datasamples. Defaults to False.
pred_out_dir – File to save the inference results w/o visualization. If left as empty, no file will be saved. Defaults to ‘’.
- Returns
Inference and visualization results with key
predictions
andvisualization
visualization (Any)
: Returned byvisualize()
predictions
(List[np.ndarray], np.ndarray): Returned byforward()
and processed inpostprocess()
. Ifreturn_datasample=False
, it will be the segmentation mask with label indice.
- Return type
dict
- visualize(inputs: list, preds: List[dict], return_vis: bool = False, show: bool = False, wait_time: int = 0, img_out_dir: str = '', opacity: float = 0.8) → List[numpy.ndarray][source]¶
Visualize predictions.
- Parameters
inputs (list) – Inputs preprocessed by
_inputs_to_list()
.preds (Any) – Predictions of the model.
show (bool) – Whether to display the image in a popup window. Defaults to False.
wait_time (float) – The interval of show (s). Defaults to 0.
img_out_dir (str) – Output directory of rendering prediction i.e. color segmentation mask. Defaults: ‘’
opacity (int, float) – The transparency of segmentation mask. Defaults to 0.8.
- Returns
Visualization results.
- Return type
List[np.ndarray]
- class mmseg.apis.RSImage(image)[source]¶
Remote sensing image class.
- Parameters
img (str or gdal.Dataset) – Image file path or gdal.Dataset.
- create_grids(window_size: Tuple[int, int], stride: Tuple[int, int] = (0, 0))[source]¶
Create grids for image inference.
- Parameters
window_size (Tuple[int, int]) – the size of the sliding window.
stride (Tuple[int, int], optional) – the stride of the sliding window. Defaults to (0, 0).
- Raises
AssertionError – window_size must be a tuple of 2 elements.
AssertionError – stride must be a tuple of 2 elements.
- read(grid: Optional[List] = None) → numpy.ndarray[source]¶
Read image data. If grid is None, read the whole image.
- Parameters
grid (Optional[List], optional) – Grid to read. Defaults to None.
- Returns
Image data.
- Return type
np.ndarray
- write(data: Optional[numpy.ndarray], grid: Optional[List] = None)[source]¶
Write image data.
- Parameters
grid (Optional[List], optional) – Grid to write. Defaults to None.
data (Optional[np.ndarray], optional) – Data to write. Defaults to None.
- Raises
ValueError – Either grid or data must be provided.
- class mmseg.apis.RSInferencer(model: mmengine.model.base_model.base_model.BaseModel, batch_size: int = 1, thread: int = 1)[source]¶
Remote sensing inference class.
- Parameters
model (BaseModel) – The loaded model.
batch_size (int, optional) – Batch size. Defaults to 1.
thread (int, optional) – Number of threads. Defaults to 1.
- classmethod from_config_path(config_path: str, checkpoint_path: str, batch_size: int = 1, thread: int = 1, device: Optional[str] = 'cpu')[source]¶
Initialize a segmentor from config file.
- Parameters
config_path (str) – Config file path.
checkpoint_path (str) – Checkpoint path.
batch_size (int, optional) – Batch size. Defaults to 1.
- classmethod from_model(model: mmengine.model.base_model.base_model.BaseModel, checkpoint_path: Optional[str] = None, batch_size: int = 1, thread: int = 1, device: Optional[str] = 'cpu')[source]¶
Initialize a segmentor from model.
- Parameters
model (BaseModel) – The loaded model.
checkpoint_path (Optional[str]) – Checkpoint path.
batch_size (int, optional) – Batch size. Defaults to 1.
- read(image: mmseg.apis.remote_sense_inferencer.RSImage, window_size: Tuple[int, int], strides: Tuple[int, int] = (0, 0))[source]¶
Load image data to read buffer.
- Parameters
image (RSImage) – The image to read.
window_size (Tuple[int, int]) – The size of the sliding window.
strides (Tuple[int, int], optional) – The stride of the sliding window. Defaults to (0, 0).
- run(image: mmseg.apis.remote_sense_inferencer.RSImage, window_size: Tuple[int, int], strides: Tuple[int, int] = (0, 0), output_path: Optional[str] = None)[source]¶
Run inference with multi-threading.
- Parameters
image (RSImage) – The image to inference.
window_size (Tuple[int, int]) – The size of the sliding window.
strides (Tuple[int, int], optional) – The stride of the sliding window. Defaults to (0, 0).
output_path (Optional[str], optional) – The path to save the segmentation map. Defaults to None.
- write(image: mmseg.apis.remote_sense_inferencer.RSImage, output_path: Optional[str] = None)[source]¶
Write image data from write buffer.
- Parameters
image (RSImage) – The image to write.
output_path (Optional[str], optional) – The path to save the segmentation map. Defaults to None.
- mmseg.apis.inference_model(model: mmseg.models.segmentors.base.BaseSegmentor, img: Union[str, numpy.ndarray, Sequence[str], Sequence[numpy.ndarray]]) → Union[mmseg.structures.seg_data_sample.SegDataSample, Sequence[mmseg.structures.seg_data_sample.SegDataSample]][source]¶
Inference image(s) with the segmentor.
- Parameters
model (nn.Module) – The loaded segmentor.
imgs (str/ndarray or list[str/ndarray]) – Either image files or loaded images.
- Returns
If imgs is a list or tuple, the same length list type results will be returned, otherwise return the segmentation results directly.
- Return type
SegDataSample
or list[SegDataSample
]
- mmseg.apis.init_model(config: Union[str, pathlib.Path, mmengine.config.config.Config], checkpoint: Optional[str] = None, device: str = 'cuda:0', cfg_options: Optional[dict] = None)[source]¶
Initialize a segmentor from config file.
- Parameters
config (str,
Path
, ormmengine.Config
) – Config file path,Path
, or the config object.checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.
device (str, optional) – 0’. Use ‘cpu’ for loading model on CPU.
cfg_options (dict, optional) – Options to override some settings in the used config.
- Returns
The constructed segmentor.
- Return type
nn.Module
- mmseg.apis.show_result_pyplot(model: mmseg.models.segmentors.base.BaseSegmentor, img: Union[str, numpy.ndarray], result: mmseg.structures.seg_data_sample.SegDataSample, opacity: float = 0.5, title: str = '', draw_gt: bool = True, draw_pred: bool = True, wait_time: float = 0, show: bool = True, withLabels: Optional[bool] = True, save_dir=None, out_file=None)[source]¶
Visualize the segmentation results on the image.
- Parameters
model (nn.Module) – The loaded segmentor.
img (str or np.ndarray) – Image filename or loaded image.
result (SegDataSample) – The prediction SegDataSample result.
opacity (float) – Opacity of painted segmentation map. Default 0.5. Must be in (0, 1] range.
title (str) – The title of pyplot figure. Default is ‘’.
draw_gt (bool) – Whether to draw GT SegDataSample. Default to True.
draw_pred (bool) – Whether to draw Prediction SegDataSample. Defaults to True.
wait_time (float) – The interval of show (s). 0 is the special value that means “forever”. Defaults to 0.
show (bool) – Whether to display the drawn image. Default to True.
withLabels (bool, optional) – Add semantic labels in visualization result, Default to True.
save_dir (str, optional) – Save file dir for all storage backends. If it is None, the backend storage will not save any data.
out_file (str, optional) – Path to output file. Default to None.
- Returns
the drawn image which channel is RGB.
- Return type
np.ndarray
mmseg.datasets¶
datasets¶
- class mmseg.datasets.ADE20KDataset(img_suffix='.jpg', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]¶
ADE20K dataset.
In segmentation map annotation for ADE20K, 0 stands for background, which is not included in 150 categories.
reduce_zero_label
is fixed to True. Theimg_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’.
- class mmseg.datasets.AdjustGamma(gamma=1.0)[source]¶
Using gamma correction to process the image.
Required Keys:
img
Modified Keys:
img
- Parameters
gamma (float or int) – Gamma value used in gamma correction. Default: 1.0.
- class mmseg.datasets.Albu(transforms: List[dict], keymap: Optional[dict] = None, update_pad_shape: bool = False)[source]¶
Albumentation augmentation. Adds custom transformations from Albumentations library. Please, visit https://albumentations.readthedocs.io to get more information. An example of
transforms
is as followed:- Parameters
transforms (list[dict]) – A list of albu transformations
keymap (dict) – Contains {‘input key’:’albumentation-style key’}
update_pad_shape (bool) – Whether to update padding shape according to the output shape of the last transform
- albu_builder(cfg: dict) → object[source]¶
Build a callable object from a dict containing albu arguments.
- Parameters
cfg (dict) – Config dict. It should at least contain the key “type”.
- Returns
A callable object.
- Return type
Callable
- static mapper(d: dict, keymap: dict)[source]¶
Dictionary mapper.
Renames keys according to keymap provided. :param d: old dict :type d: dict :param keymap: {‘old_key’:’new_key’} :type keymap: dict
- Returns
new dict.
- Return type
dict
- transform(results)[source]¶
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
- Parameters
results (dict) – The result dict.
- Returns
The result dict.
- Return type
dict
- class mmseg.datasets.BDD100KDataset(img_suffix='.jpg', seg_map_suffix='.png', reduce_zero_label=False, **kwargs)[source]¶
- class mmseg.datasets.BaseCDDataset(ann_file: str = '', img_suffix='.jpg', img_suffix2='.jpg', seg_map_suffix='.png', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img_path': '', 'img_path2': '', 'seg_map_path': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, ignore_index: int = 255, reduce_zero_label: bool = False, backend_args: Optional[dict] = None)[source]¶
Custom dataset for change detection. An example of file structure is as followed.
├── data │ ├── my_dataset │ │ ├── img_dir │ │ │ ├── train │ │ │ │ ├── xxx{img_suffix} │ │ │ │ ├── yyy{img_suffix} │ │ │ │ ├── zzz{img_suffix} │ │ │ ├── val │ │ ├── img_dir2 │ │ │ ├── train │ │ │ │ ├── xxx{img_suffix} │ │ │ │ ├── yyy{img_suffix} │ │ │ │ ├── zzz{img_suffix} │ │ │ ├── val │ │ ├── ann_dir │ │ │ ├── train │ │ │ │ ├── xxx{seg_map_suffix} │ │ │ │ ├── yyy{seg_map_suffix} │ │ │ │ ├── zzz{seg_map_suffix} │ │ │ ├── val
The image names in img_dir and img_dir2 should be consistent. The img/gt_semantic_seg pair of BaseSegDataset should be of the same except suffix. A valid img/gt_semantic_seg filename pair should be like
xxx{img_suffix}
andxxx{seg_map_suffix}
(extension is also included in the suffix). If split is given, thenxxx
is specified in txt file. Otherwise, all files inimg_dir/``and ``ann_dir
will be loaded. Please refer todocs/en/tutorials/new_dataset.md
for more details.- Parameters
ann_file (str) – Annotation file path. Defaults to ‘’.
metainfo (dict, optional) – Meta information for dataset, such as specify classes to load. Defaults to None.
data_root (str, optional) – The root directory for
data_prefix
andann_file
. Defaults to None.data_prefix (dict, optional) – Prefix for training data. Defaults to dict(img_path=None, img_path2=None, seg_map_path=None).
img_suffix (str) – Suffix of images. Default: ‘.jpg’
img_suffix2 (str) – Suffix of images. Default: ‘.jpg’
seg_map_suffix (str) – Suffix of segmentation maps. Default: ‘.png’
filter_cfg (dict, optional) – Config for filter data. Defaults to None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None which means using all
data_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.
pipeline (list, optional) – Processing pipeline. Defaults to [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Defaults to False.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=True
. Defaults to False.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.ignore_index (int) – The label index to be ignored. Default: 255
reduce_zero_label (bool) – Whether to mark label zero as ignored. Default to False.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- classmethod get_label_map(new_classes: Optional[Sequence] = None) → Optional[Dict][source]¶
Require label mapping.
The
label_map
is a dictionary, its keys are the old label ids and its values are the new label ids, and is used for changing pixel labels in load_annotations. If and only if old classes in cls.METAINFO is not equal to new classes in self._metainfo and nether of them is not None, label_map is not None.- Parameters
new_classes (list, tuple, optional) – The new classes name from metainfo. Default to None.
- Returns
- The mapping from old classes in cls.METAINFO to
new classes in self._metainfo
- Return type
dict, optional
- class mmseg.datasets.BaseSegDataset(ann_file: str = '', img_suffix='.jpg', seg_map_suffix='.png', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img_path': '', 'seg_map_path': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, ignore_index: int = 255, reduce_zero_label: bool = False, backend_args: Optional[dict] = None)[source]¶
Custom dataset for semantic segmentation. An example of file structure is as followed.
├── data │ ├── my_dataset │ │ ├── img_dir │ │ │ ├── train │ │ │ │ ├── xxx{img_suffix} │ │ │ │ ├── yyy{img_suffix} │ │ │ │ ├── zzz{img_suffix} │ │ │ ├── val │ │ ├── ann_dir │ │ │ ├── train │ │ │ │ ├── xxx{seg_map_suffix} │ │ │ │ ├── yyy{seg_map_suffix} │ │ │ │ ├── zzz{seg_map_suffix} │ │ │ ├── val
The img/gt_semantic_seg pair of BaseSegDataset should be of the same except suffix. A valid img/gt_semantic_seg filename pair should be like
xxx{img_suffix}
andxxx{seg_map_suffix}
(extension is also included in the suffix). If split is given, thenxxx
is specified in txt file. Otherwise, all files inimg_dir/``and ``ann_dir
will be loaded. Please refer todocs/en/tutorials/new_dataset.md
for more details.- Parameters
ann_file (str) – Annotation file path. Defaults to ‘’.
metainfo (dict, optional) – Meta information for dataset, such as specify classes to load. Defaults to None.
data_root (str, optional) – The root directory for
data_prefix
andann_file
. Defaults to None.data_prefix (dict, optional) – Prefix for training data. Defaults to dict(img_path=None, seg_map_path=None).
img_suffix (str) – Suffix of images. Default: ‘.jpg’
seg_map_suffix (str) – Suffix of segmentation maps. Default: ‘.png’
filter_cfg (dict, optional) – Config for filter data. Defaults to None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None which means using all
data_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.
pipeline (list, optional) – Processing pipeline. Defaults to [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Defaults to False.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=True
. Defaults to False.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.ignore_index (int) – The label index to be ignored. Default: 255
reduce_zero_label (bool) – Whether to mark label zero as ignored. Default to False.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- classmethod get_label_map(new_classes: Optional[Sequence] = None) → Optional[Dict][source]¶
Require label mapping.
The
label_map
is a dictionary, its keys are the old label ids and its values are the new label ids, and is used for changing pixel labels in load_annotations. If and only if old classes in cls.METAINFO is not equal to new classes in self._metainfo and nether of them is not None, label_map is not None.- Parameters
new_classes (list, tuple, optional) – The new classes name from metainfo. Default to None.
- Returns
- The mapping from old classes in cls.METAINFO to
new classes in self._metainfo
- Return type
dict, optional
- class mmseg.datasets.BioMedical3DPad(pad_shape: Tuple[int, int, int], pad_val: float = 0.0, seg_pad_val: int = 0)[source]¶
Pad the biomedical 3d image & biomedical 3d semantic segmentation maps.
Required Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Modified Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Added Keys:
pad_shape (Tuple[int, int, int]): The padded shape.
- Parameters
pad_shape (Tuple[int, int, int]) – Fixed padding size. Expected padding shape (Z, Y, X).
pad_val (float) – Padding value for biomedical image. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.
seg_pad_val (int) – Padding value for biomedical 3d semantic segmentation maps. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.
- class mmseg.datasets.BioMedical3DRandomCrop(crop_shape: Union[int, Tuple[int, int, int]], keep_foreground: bool = True)[source]¶
Crop the input patch for medical image & segmentation mask.
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
- gt_seg_map (np.ndarray, optional): Biomedical semantic segmentation mask
with shape (Z, Y, X).
Modified Keys:
img
img_shape
gt_seg_map (optional)
- Parameters
crop_shape (Union[int, Tuple[int, int, int]]) – Expected size after cropping with the format of (z, y, x). If set to an integer, then cropping width and height are equal to this integer.
keep_foreground (bool) – If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the crop bounding-box. Default to True.
- crop(img: numpy.ndarray, crop_bbox: tuple) → numpy.ndarray[source]¶
Crop from
img
- Parameters
img (np.ndarray) – Original input image.
crop_bbox (tuple) – Coordinates of the cropped image.
- Returns
The cropped image.
- Return type
np.ndarray
- generate_margin(results: dict) → tuple[source]¶
Generate margin of crop bounding-box.
If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the bounding-box, and return the margin between of the bounding-box and image. If keep_foreground is False, it will return the difference from crop shape and image shape.
- Parameters
results (dict) – Result dict from loading pipeline.
- Returns
The margin for 3 dimensions of crop bounding-box and image.
- Return type
tuple
- random_generate_crop_bbox(margin_z: int, margin_y: int, margin_x: int) → tuple[source]¶
Randomly get a crop bounding box.
- Parameters
seg_map (np.ndarray) – Ground truth segmentation map.
- Returns
Coordinates of the cropped image.
- Return type
tuple
- class mmseg.datasets.BioMedical3DRandomFlip(prob: float, axes: Tuple[int, ...], swap_label_pairs: Optional[List[Tuple[int, int]]] = None)[source]¶
Flip biomedical 3D images and segmentations.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/spatial_transforms.py # noqa:E501
Copyright 2021 Division of Medical Image Computing, German Cancer Research Center (DKFZ) and Applied Computer Vision Lab, Helmholtz Imaging Platform. Licensed under the Apache-2.0 License.
Required Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Modified Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Added Keys:
do_flip
flip_axes
- Parameters
prob (float) – Flipping probability.
axes (Tuple[int, ...]) – Flipping axes with order ‘ZXY’.
swap_label_pairs (Optional[List[Tuple[int, int]]]) –
segmentation label pairs that are swapped when flipping. (The) –
- class mmseg.datasets.BioMedicalGaussianBlur(sigma_range: Tuple[float, float] = (0.5, 1.0), prob: float = 0.2, prob_per_channel: float = 0.5, different_sigma_per_channel: bool = True, different_sigma_per_axis: bool = True)[source]¶
Add Gaussian blur with random sigma to image.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L81 # noqa:E501
Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
Modified Keys:
img
- Parameters
sigma_range (Tuple[float, float]|float) – range to randomly select sigma value. Default to (0.5, 1.0).
prob (float) – Probability to apply Gaussian blur for each sample. Default to 0.2.
prob_per_channel (float) – Probability to apply Gaussian blur for each channel (axis N of the image). Default to 0.5.
different_sigma_per_channel (bool) – whether to use different sigma for each channel (axis N of the image). Default to True.
different_sigma_per_axis (bool) – whether to use different sigma for axis Z, X and Y of the image. Default to True.
- class mmseg.datasets.BioMedicalGaussianNoise(prob: float = 0.1, mean: float = 0.0, std: float = 0.1)[source]¶
Add random Gaussian noise to image.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L53 # noqa:E501
Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
Modified Keys:
img
- Parameters
prob (float) – Probability to add Gaussian noise for each sample. Default to 0.1.
mean (float) – Mean or “centre” of the distribution. Default to 0.0.
std (float) – Standard deviation of distribution. Default to 0.1.
- class mmseg.datasets.BioMedicalRandomGamma(prob: float = 0.5, gamma_range: Tuple[float] = (0.5, 2), invert_image: bool = False, per_channel: bool = False, retain_stats: bool = False)[source]¶
Using random gamma correction to process the biomedical image.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/color_transforms.py#L132 # noqa:E501 With licence: Apache 2.0
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
Modified Keys: - img
- Parameters
prob (float) – The probability to perform this transform. Default: 0.5.
gamma_range (Tuple[float]) – Range of gamma values. Default: (0.5, 2).
invert_image (bool) – Whether invert the image before applying gamma augmentation. Default: False.
per_channel (bool) – Whether perform the transform each channel individually. Default: False
retain_stats (bool) – Gamma transformation will alter the mean and std of the data in the patch. If retain_stats=True, the data will be transformed to match the mean and standard deviation before gamma augmentation. Default: False.
- class mmseg.datasets.CLAHE(clip_limit=40.0, tile_grid_size=(8, 8))[source]¶
Use CLAHE method to process the image.
See ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994:474-485. for more information.
Required Keys:
img
Modified Keys:
img
- Parameters
clip_limit (float) – Threshold for contrast limiting. Default: 40.0.
tile_grid_size (tuple[int]) – Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. It defines the number of tiles in row and column. Default: (8, 8).
- class mmseg.datasets.COCOStuffDataset(img_suffix='.jpg', seg_map_suffix='_labelTrainIds.png', **kwargs)[source]¶
COCO-Stuff dataset.
In segmentation map annotation for COCO-Stuff, Train-IDs of the 10k version are from 1 to 171, where 0 is the ignore index, and Train-ID of COCO Stuff 164k is from 0 to 170, where 255 is the ignore index. So, they are all 171 semantic categories.
reduce_zero_label
is set to True and False for the 10k and 164k versions, respectively. Theimg_suffix
is fixed to ‘.jpg’, andseg_map_suffix
is fixed to ‘.png’.
- class mmseg.datasets.ChaseDB1Dataset(img_suffix='.png', seg_map_suffix='_1stHO.png', reduce_zero_label=False, **kwargs)[source]¶
Chase_db1 dataset.
In segmentation map annotation for Chase_db1, 0 stands for background, which is included in 2 categories.
reduce_zero_label
is fixed to False. Theimg_suffix
is fixed to ‘.png’ andseg_map_suffix
is fixed to ‘_1stHO.png’.
- class mmseg.datasets.CityscapesDataset(img_suffix='_leftImg8bit.png', seg_map_suffix='_gtFine_labelTrainIds.png', **kwargs)[source]¶
Cityscapes dataset.
The
img_suffix
is fixed to ‘_leftImg8bit.png’ andseg_map_suffix
is fixed to ‘_gtFine_labelTrainIds.png’ for Cityscapes dataset.
- class mmseg.datasets.ConcatCDInput(input_keys=('img', 'img2'))[source]¶
Concat images for change detection.
Required Keys:
img
img2
- Parameters
input_keys (tuple) – Input image keys for change detection. Default: (‘img’, ‘img2’).
- transform(results: dict) → dict[source]¶
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
- Parameters
results (dict) – The result dict.
- Returns
The result dict.
- Return type
dict
- class mmseg.datasets.DRIVEDataset(img_suffix='.png', seg_map_suffix='_manual1.png', reduce_zero_label=False, **kwargs)[source]¶
DRIVE dataset.
In segmentation map annotation for DRIVE, 0 stands for background, which is included in 2 categories.
reduce_zero_label
is fixed to False. Theimg_suffix
is fixed to ‘.png’ andseg_map_suffix
is fixed to ‘_manual1.png’.
- class mmseg.datasets.DSDLSegDataset(specific_key_path: Dict = {}, pre_transform: Dict = {}, used_labels: Optional[Sequence] = None, **kwargs)[source]¶
Dataset for dsdl segmentation.
- Parameters
specific_key_path (dict) – Path of specific key which can not be loaded by it’s field name.
pre_transform (dict) – pre-transform functions before loading.
used_labels (sequence) – list of actual used classes in train steps, this must be subset of class domain.
- get_label_map(new_classes: Optional[Sequence] = None) → Optional[Dict][source]¶
Require label mapping.
The
label_map
is a dictionary, its keys are the old label ids and its values are the new label ids, and is used for changing pixel labels in load_annotations. If and only if old classes in class_dom is not equal to new classes in args and nether of them is not None, label_map is not None. :param new_classes: The new classes name frommetainfo. Default to None.
- Returns
The mapping from old classes to new classes.
- Return type
dict, optional
- class mmseg.datasets.DarkZurichDataset(img_suffix='_rgb_anon.png', seg_map_suffix='_gt_labelTrainIds.png', **kwargs)[source]¶
DarkZurichDataset dataset.
- class mmseg.datasets.DecathlonDataset(ann_file: str = '', img_suffix='.jpg', seg_map_suffix='.png', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img_path': '', 'seg_map_path': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, ignore_index: int = 255, reduce_zero_label: bool = False, backend_args: Optional[dict] = None)[source]¶
Dataset for Dacathlon dataset.
The dataset.json format is shown as follows
{ "name": "BRATS", "tensorImageSize": "4D", "modality": { "0": "FLAIR", "1": "T1w", "2": "t1gd", "3": "T2w" }, "labels": { "0": "background", "1": "edema", "2": "non-enhancing tumor", "3": "enhancing tumour" }, "numTraining": 484, "numTest": 266, "training": [ { "image": "./imagesTr/BRATS_306.nii.gz" "label": "./labelsTr/BRATS_306.nii.gz" ... } ] "test": [ "./imagesTs/BRATS_557.nii.gz" ... ] }
- class mmseg.datasets.GenerateEdge(edge_width: int = 3, ignore_index: int = 255)[source]¶
Generate Edge for CE2P approach.
Edge will be used to calculate loss of CE2P.
Modified from https://github.com/liutinglt/CE2P/blob/master/dataset/target_generation.py # noqa:E501
Required Keys:
img_shape
gt_seg_map
- Added Keys:
- gt_edge_map (np.ndarray, uint8): The edge annotation generated from the
seg map by extracting border between different semantics.
- Parameters
edge_width (int) – The width of edge. Default to 3.
ignore_index (int) – Index that will be ignored. Default to 255.
- class mmseg.datasets.HRFDataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=False, **kwargs)[source]¶
HRF dataset.
In segmentation map annotation for HRF, 0 stands for background, which is included in 2 categories.
reduce_zero_label
is fixed to False. Theimg_suffix
is fixed to ‘.png’ andseg_map_suffix
is fixed to ‘.png’.
- class mmseg.datasets.ISPRSDataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]¶
ISPRS dataset.
In segmentation map annotation for ISPRS, 0 is the ignore index.
reduce_zero_label
should be set to True. Theimg_suffix
andseg_map_suffix
are both fixed to ‘.png’.
- class mmseg.datasets.LEVIRCDDataset(img_suffix='.png', img_suffix2='.png', seg_map_suffix='.png', reduce_zero_label=False, **kwargs)[source]¶
ISPRS dataset.
In segmentation map annotation for ISPRS, 0 is to ignore index.
reduce_zero_label
should be set to True. Theimg_suffix
andseg_map_suffix
are both fixed to ‘.png’.
- class mmseg.datasets.LIPDataset(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]¶
LIP dataset.
The
img_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’.
- class mmseg.datasets.LoadAnnotations(reduce_zero_label=None, backend_args=None, imdecode_backend='pillow')[source]¶
Load annotations for semantic segmentation provided by dataset.
The annotation format is as the following:
{ # Filename of semantic segmentation ground truth file. 'seg_map_path': 'a/b/c' }
After this module, the annotation has been changed to the format below:
{ # in str 'seg_fields': List # In uint8 type. 'gt_seg_map': np.ndarray (H, W) }
Required Keys:
seg_map_path (str): Path of semantic segmentation ground truth file.
Added Keys:
seg_fields (List)
gt_seg_map (np.uint8)
- Parameters
reduce_zero_label (bool, optional) – Whether reduce all label value by 1. Usually used for datasets where 0 is background label. Defaults to None.
imdecode_backend (str) – The image decoding backend type. The backend argument for :func:
mmcv.imfrombytes
. See :fun:mmcv.imfrombytes
for details. Defaults to ‘pillow’.backend_args (dict) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.LoadBiomedicalAnnotation(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]¶
Load
seg_map
annotation provided by biomedical dataset.The annotation format is as the following:
{ 'gt_seg_map': np.ndarray (X, Y, Z) or (Z, Y, X) }
Required Keys:
seg_map_path
Added Keys:
- gt_seg_map (np.ndarray): Biomedical seg map with shape (Z, Y, X) by
default, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.
- Parameters
decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.
to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.
to_float32 (bool) – Whether to convert the loaded seg map to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See
mmengine.fileio
for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.LoadBiomedicalData(with_seg=False, decode_backend: str = 'numpy', to_xyz: bool = False, backend_args: Optional[dict] = None)[source]¶
Load an biomedical image and annotation from file.
The loading data format is as the following:
{ 'img': np.ndarray data[:-1, X, Y, Z] 'seg_map': np.ndarray data[-1, X, Y, Z] }
Required Keys:
img_path
Added Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
img_shape
ori_shape
- Parameters
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.
decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.
to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.LoadBiomedicalImageFromFile(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]¶
Load an biomedical mage from file.
Required Keys:
img_path
Added Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.
img_shape
ori_shape
- Parameters
decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.
to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.LoadImageFromNDArray(to_float32: bool = False, color_type: str = 'color', imdecode_backend: str = 'cv2', file_client_args: Optional[dict] = None, ignore_empty: bool = False, *, backend_args: Optional[dict] = None)[source]¶
Load an image from
results['img']
.Similar with
LoadImageFromFile
, but the image has been loaded asnp.ndarray
inresults['img']
. Can be used when loading image from webcam.Required Keys:
img
Modified Keys:
img
img_path
img_shape
ori_shape
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
- class mmseg.datasets.LoadMultipleRSImageFromFile(to_float32: bool = True)[source]¶
Load two Remote Sensing mage from file.
Required Keys:
img_path
img_path2
Modified Keys:
img
img2
img_shape
ori_shape
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is a float64 array. Defaults to True.
- class mmseg.datasets.LoadSingleRSImageFromFile(to_float32: bool = True)[source]¶
Load a Remote Sensing mage from file.
Required Keys:
img_path
Modified Keys:
img
img_shape
ori_shape
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is a float64 array. Defaults to True.
- class mmseg.datasets.LoveDADataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]¶
LoveDA dataset.
In segmentation map annotation for LoveDA, 0 is the ignore index.
reduce_zero_label
should be set to True. Theimg_suffix
andseg_map_suffix
are both fixed to ‘.png’.
- class mmseg.datasets.MapillaryDataset_v1(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]¶
Mapillary Vistas Dataset.
Dataset paper link: http://ieeexplore.ieee.org/document/8237796/
v1.2 contain 66 object classes. (37 instance-specific)
v2.0 contain 124 object classes. (70 instance-specific, 46 stuff, 8 void or crowd).
The
img_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’ for Mapillary Vistas Dataset.
- class mmseg.datasets.MapillaryDataset_v2(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]¶
Mapillary Vistas Dataset.
Dataset paper link: http://ieeexplore.ieee.org/document/8237796/
v1.2 contain 66 object classes. (37 instance-specific)
v2.0 contain 124 object classes. (70 instance-specific, 46 stuff, 8 void or crowd).
The
img_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’ for Mapillary Vistas Dataset.
- class mmseg.datasets.MultiImageMixDataset(dataset: Union[mmengine.dataset.dataset_wrapper.ConcatDataset, dict], pipeline: Sequence[dict], skip_type_keys: Optional[List[str]] = None, lazy_init: bool = False)[source]¶
A wrapper of multiple images mixed dataset.
Suitable for training on multiple images mixed data augmentation like mosaic and mixup.
- Parameters
dataset (ConcatDataset or dict) – The dataset to be mixed.
pipeline (Sequence[dict]) – Sequence of transform object or config dict to be composed.
skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline. Default to None.
- get_data_info(idx: int) → dict[source]¶
Get annotation by index.
- Parameters
idx (int) – Global index of
ConcatDataset
.- Returns
The idx-th annotation of the datasets.
- Return type
dict
- property metainfo: dict¶
Get the meta information of the multi-image-mixed dataset.
- Returns
The meta information of multi-image-mixed dataset.
- Return type
dict
- class mmseg.datasets.NYUDataset(data_prefix={'depth_map_path': 'annotations', 'img_path': 'images'}, img_suffix='.jpg', depth_map_suffix='.png', **kwargs)[source]¶
NYU depth estimation dataset. The file structure should be.
├── data │ ├── nyu │ │ ├── images │ │ │ ├── train │ │ │ │ ├── scene_xxx.jpg │ │ │ │ ├── ... │ │ │ ├── test │ │ ├── annotations │ │ │ ├── train │ │ │ │ ├── scene_xxx.png │ │ │ │ ├── ... │ │ │ ├── test
- Parameters
ann_file (str) – Annotation file path. Defaults to ‘’.
metainfo (dict, optional) – Meta information for dataset, such as specify classes to load. Defaults to None.
data_root (str, optional) – The root directory for
data_prefix
andann_file
. Defaults to None.data_prefix (dict, optional) – Prefix for training data. Defaults to dict(img_path=’images’, depth_map_path=’annotations’).
img_suffix (str) – Suffix of images. Default: ‘.jpg’
seg_map_suffix (str) – Suffix of segmentation maps. Default: ‘.png’
filter_cfg (dict, optional) – Config for filter data. Defaults to None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None which means using all
data_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.
pipeline (list, optional) – Processing pipeline. Defaults to [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Defaults to False.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=True
. Defaults to False.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.ignore_index (int) – The label index to be ignored. Default: 255
reduce_zero_label (bool) – Whether to mark label zero as ignored. Default to False.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.NightDrivingDataset(img_suffix='_leftImg8bit.png', seg_map_suffix='_gtCoarse_labelTrainIds.png', **kwargs)[source]¶
NightDrivingDataset dataset.
- class mmseg.datasets.PackSegInputs(meta_keys=('img_path', 'seg_map_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'reduce_zero_label'))[source]¶
Pack the inputs data for the semantic segmentation.
The
img_meta
item is always populated. The contents of theimg_meta
dictionary depends onmeta_keys
. By default this includes:img_path
: filename of the imageori_shape
: original shape of the image as a tuple (h, w, c)img_shape
: shape of the image input to the network as a tuple (h, w, c). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.pad_shape
: shape of padded imagesscale_factor
: a float indicating the preprocessing scaleflip
: a boolean indicating if image flip transform was usedflip_direction
: the flipping direction
- Parameters
meta_keys (Sequence[str], optional) – Meta keys to be packed from
SegDataSample
and collected indata[img_metas]
. Default:('img_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction')
- class mmseg.datasets.PascalContextDataset(ann_file='', img_suffix='.jpg', seg_map_suffix='.png', reduce_zero_label=False, **kwargs)[source]¶
PascalContext dataset.
In segmentation map annotation for PascalContext, 0 stands for background, which is included in 60 categories.
reduce_zero_label
is fixed to False. Theimg_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’.- Parameters
ann_file (str) – Annotation file path.
- class mmseg.datasets.PascalContextDataset59(ann_file='', img_suffix='.jpg', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]¶
PascalContext dataset.
In segmentation map annotation for PascalContext, 0 stands for background, which is included in 60 categories.
reduce_zero_label
is fixed to True. Theimg_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’. Noted: If the background is 255 and the ids of categories are from 0 to 58,reduce_zero_label
needs to be set to False.- Parameters
ann_file (str) – Annotation file path.
- class mmseg.datasets.PascalVOCDataset(ann_file, img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]¶
Pascal VOC dataset.
- Parameters
split (str) – Split txt file for Pascal VOC.
- class mmseg.datasets.PhotoMetricDistortion(brightness_delta: int = 32, contrast_range: Sequence[float] = (0.5, 1.5), saturation_range: Sequence[float] = (0.5, 1.5), hue_delta: int = 18)[source]¶
Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.
random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
Required Keys:
img
Modified Keys:
img
- Parameters
brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.
- brightness(img: numpy.ndarray) → numpy.ndarray[source]¶
Brightness distortion.
- Parameters
img (np.ndarray) – The input image.
- Returns
Image after brightness change.
- Return type
np.ndarray
- contrast(img: numpy.ndarray) → numpy.ndarray[source]¶
Contrast distortion.
- Parameters
img (np.ndarray) – The input image.
- Returns
Image after contrast change.
- Return type
np.ndarray
- convert(img: numpy.ndarray, alpha: int = 1, beta: int = 0) → numpy.ndarray[source]¶
Multiple with alpha and add beat with clip.
- Parameters
img (np.ndarray) – The input image.
alpha (int) – Image weights, change the contrast/saturation of the image. Default: 1
beta (int) – Image bias, change the brightness of the image. Default: 0
- Returns
The transformed image.
- Return type
np.ndarray
- hue(img: numpy.ndarray) → numpy.ndarray[source]¶
Hue distortion.
- Parameters
img (np.ndarray) – The input image.
- Returns
Image after hue change.
- Return type
np.ndarray
- class mmseg.datasets.PotsdamDataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]¶
ISPRS Potsdam dataset.
In segmentation map annotation for Potsdam dataset, 0 is the ignore index.
reduce_zero_label
should be set to True. Theimg_suffix
andseg_map_suffix
are both fixed to ‘.png’.
- class mmseg.datasets.REFUGEDataset(**kwargs)[source]¶
REFUGE dataset.
In segmentation map annotation for REFUGE, 0 stands for background, which is not included in 2 categories.
reduce_zero_label
is fixed to True. Theimg_suffix
is fixed to ‘.png’ andseg_map_suffix
is fixed to ‘.png’.
- class mmseg.datasets.RGB2Gray(out_channels=None, weights=(0.299, 0.587, 0.114))[source]¶
Convert RGB image to grayscale image.
Required Keys:
img
Modified Keys:
img
img_shape
This transform calculate the weighted mean of input image channels with
weights
and then expand the channels toout_channels
. Whenout_channels
is None, the number of output channels is the same as input channels.- Parameters
out_channels (int) – Expected number of output channels after transforming. Default: None.
weights (tuple[float]) – The weights to calculate the weighted mean. Default: (0.299, 0.587, 0.114).
- class mmseg.datasets.RandomCrop(crop_size: Union[int, Tuple[int, int]], cat_max_ratio: float = 1.0, ignore_index: int = 255)[source]¶
Random crop the image & seg.
Required Keys:
img
gt_seg_map
Modified Keys:
img
img_shape
gt_seg_map
- Parameters
crop_size (Union[int, Tuple[int, int]]) – Expected size after cropping with the format of (h, w). If set to an integer, then cropping width and height are equal to this integer.
cat_max_ratio (float) – The maximum ratio that single category could occupy.
ignore_index (int) – The label index to be ignored. Default: 255
- class mmseg.datasets.RandomCutOut(prob, n_holes, cutout_shape=None, cutout_ratio=None, fill_in=(0, 0, 0), seg_fill_in=None)[source]¶
CutOut operation.
Randomly drop some regions of image used in Cutout.
Required Keys:
img
gt_seg_map
Modified Keys:
img
gt_seg_map
- Parameters
prob (float) – cutout probability.
n_holes (int | tuple[int, int]) – Number of regions to be dropped. If it is given as a list, number of holes will be randomly selected from the closed interval [n_holes[0], n_holes[1]].
cutout_shape (tuple[int, int] | list[tuple[int, int]]) – The candidate shape of dropped regions. It can be tuple[int, int] to use a fixed cutout shape, or list[tuple[int, int]] to randomly choose shape from the list.
cutout_ratio (tuple[float, float] | list[tuple[float, float]]) – The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time.
fill_in (tuple[float, float, float] | tuple[int, int, int]) – The value of pixel to fill in the dropped regions. Default: (0, 0, 0).
seg_fill_in (int) – The labels of pixel to fill in the dropped regions. If seg_fill_in is None, skip. Default: None.
- class mmseg.datasets.RandomMosaic(prob, img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=0, seg_pad_val=255)[source]¶
Mosaic augmentation. Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.
mosaic transform center_x +------------------------------+ | pad | pad | | +-----------+ | | | | | | | image1 |--------+ | | | | | | | | | image2 | | center_y |----+-------------+-----------| | | cropped | | |pad | image3 | image4 | | | | | +----|-------------+-----------+ | | +-------------+ The mosaic transform steps are as follows: 1. Choose the mosaic center as the intersections of 4 images 2. Get the left top image according to the index, and randomly sample another 3 images from the custom dataset. 3. Sub image will be cropped if image is larger than mosaic patch
Required Keys:
img
gt_seg_map
mix_results
Modified Keys:
img
img_shape
ori_shape
gt_seg_map
- Parameters
prob (float) – mosaic probability.
img_scale (Sequence[int]) – Image size after mosaic pipeline of a single image. The size of the output image is four times that of a single image. The output image comprises 4 single images. Default: (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default: (0.5, 1.5).
pad_val (int) – Pad value. Default: 0.
seg_pad_val (int) – Pad value of segmentation map. Default: 255.
- get_indices(dataset: mmseg.datasets.dataset_wrappers.MultiImageMixDataset) → list[source]¶
Call function to collect indices.
- Parameters
dataset (
MultiImageMixDataset
) – The dataset.- Returns
indices.
- Return type
list
- class mmseg.datasets.RandomRotFlip(rotate_prob=0.5, flip_prob=0.5, degree=(- 20, 20))[source]¶
Rotate and flip the image & seg or just rotate the image & seg.
Required Keys:
img
gt_seg_map
Modified Keys:
img
gt_seg_map
- Parameters
rotate_prob (float) – The probability of rotate image.
flip_prob (float) – The probability of rotate&flip image.
degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (
-degree
,+degree
)
- class mmseg.datasets.RandomRotate(prob, degree, pad_val=0, seg_pad_val=255, center=None, auto_bound=False)[source]¶
Rotate the image & seg.
Required Keys:
img
gt_seg_map
Modified Keys:
img
gt_seg_map
- Parameters
prob (float) – The rotation probability.
degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (
-degree
,+degree
)pad_val (float, optional) – Padding value of image. Default: 0.
seg_pad_val (float, optional) – Padding value of segmentation map. Default: 255.
center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used. Default: None.
auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image. Default: False
- class mmseg.datasets.Rerange(min_value=0, max_value=255)[source]¶
Rerange the image pixel value.
Required Keys:
img
Modified Keys:
img
- Parameters
min_value (float or int) – Minimum value of the reranged image. Default: 0.
max_value (float or int) – Maximum value of the reranged image. Default: 255.
- class mmseg.datasets.ResizeShortestEdge(scale: Union[int, Tuple[int, int]], max_size: int)[source]¶
Resize the image and mask while keeping the aspect ratio unchanged.
Modified from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/transforms/augmentation_impl.py#L130 # noqa:E501 Copyright (c) Facebook, Inc. and its affiliates. Licensed under the Apache-2.0 License
This transform attempts to scale the shorter edge to the given scale, as long as the longer edge does not exceed max_size. If max_size is reached, then downscale so that the longer edge does not exceed max_size.
Required Keys:
img
gt_seg_map (optional)
Modified Keys:
img
img_shape
gt_seg_map (optional))
Added Keys:
scale
scale_factor
keep_ratio
- Parameters
scale (Union[int, Tuple[int, int]]) – The target short edge length. If it’s tuple, will select the min value as the short edge length.
max_size (int) – The maximum allowed longest edge length.
- transform(results: Dict) → Dict[source]¶
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
- Parameters
results (dict) – The result dict.
- Returns
The result dict.
- Return type
dict
- class mmseg.datasets.ResizeToMultiple(size_divisor=32, interpolation=None)[source]¶
Resize images & seg to multiple of divisor.
Required Keys:
img
gt_seg_map
Modified Keys:
img
img_shape
pad_shape
- Parameters
size_divisor (int) – images and gt seg maps need to resize to multiple of size_divisor. Default: 32.
interpolation (str, optional) – The interpolation mode of image resize. Default: None
- class mmseg.datasets.STAREDataset(img_suffix='.png', seg_map_suffix='.ah.png', reduce_zero_label=False, **kwargs)[source]¶
STARE dataset.
In segmentation map annotation for STARE, 0 stands for background, which is included in 2 categories.
reduce_zero_label
is fixed to False. Theimg_suffix
is fixed to ‘.png’ andseg_map_suffix
is fixed to ‘.ah.png’.
- class mmseg.datasets.SegRescale(scale_factor=1)[source]¶
Rescale semantic segmentation maps.
Required Keys:
gt_seg_map
Modified Keys:
gt_seg_map
- Parameters
scale_factor (float) – The scale factor of the final output.
- class mmseg.datasets.SynapseDataset(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]¶
Synapse dataset.
Before dataset preprocess of Synapse, there are total 13 categories of foreground which does not include background. After preprocessing, 8 foreground categories are kept while the other 5 foreground categories are handled as background. The
img_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’.
- class mmseg.datasets.iSAIDDataset(img_suffix='.png', seg_map_suffix='_instance_color_RGB.png', ignore_index=255, **kwargs)[source]¶
iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images In segmentation map annotation for iSAID dataset, which is included in 16 categories.
reduce_zero_label
is fixed to False. Theimg_suffix
is fixed to ‘.png’ andseg_map_suffix
is fixed to ‘_manual1.png’.
transforms¶
- class mmseg.datasets.transforms.AdjustGamma(gamma=1.0)[source]¶
Using gamma correction to process the image.
Required Keys:
img
Modified Keys:
img
- Parameters
gamma (float or int) – Gamma value used in gamma correction. Default: 1.0.
- class mmseg.datasets.transforms.Albu(transforms: List[dict], keymap: Optional[dict] = None, update_pad_shape: bool = False)[source]¶
Albumentation augmentation. Adds custom transformations from Albumentations library. Please, visit https://albumentations.readthedocs.io to get more information. An example of
transforms
is as followed:- Parameters
transforms (list[dict]) – A list of albu transformations
keymap (dict) – Contains {‘input key’:’albumentation-style key’}
update_pad_shape (bool) – Whether to update padding shape according to the output shape of the last transform
- albu_builder(cfg: dict) → object[source]¶
Build a callable object from a dict containing albu arguments.
- Parameters
cfg (dict) – Config dict. It should at least contain the key “type”.
- Returns
A callable object.
- Return type
Callable
- static mapper(d: dict, keymap: dict)[source]¶
Dictionary mapper.
Renames keys according to keymap provided. :param d: old dict :type d: dict :param keymap: {‘old_key’:’new_key’} :type keymap: dict
- Returns
new dict.
- Return type
dict
- transform(results)[source]¶
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
- Parameters
results (dict) – The result dict.
- Returns
The result dict.
- Return type
dict
- class mmseg.datasets.transforms.BioMedical3DPad(pad_shape: Tuple[int, int, int], pad_val: float = 0.0, seg_pad_val: int = 0)[source]¶
Pad the biomedical 3d image & biomedical 3d semantic segmentation maps.
Required Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Modified Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Added Keys:
pad_shape (Tuple[int, int, int]): The padded shape.
- Parameters
pad_shape (Tuple[int, int, int]) – Fixed padding size. Expected padding shape (Z, Y, X).
pad_val (float) – Padding value for biomedical image. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.
seg_pad_val (int) – Padding value for biomedical 3d semantic segmentation maps. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.
- class mmseg.datasets.transforms.BioMedical3DRandomCrop(crop_shape: Union[int, Tuple[int, int, int]], keep_foreground: bool = True)[source]¶
Crop the input patch for medical image & segmentation mask.
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
- gt_seg_map (np.ndarray, optional): Biomedical semantic segmentation mask
with shape (Z, Y, X).
Modified Keys:
img
img_shape
gt_seg_map (optional)
- Parameters
crop_shape (Union[int, Tuple[int, int, int]]) – Expected size after cropping with the format of (z, y, x). If set to an integer, then cropping width and height are equal to this integer.
keep_foreground (bool) – If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the crop bounding-box. Default to True.
- crop(img: numpy.ndarray, crop_bbox: tuple) → numpy.ndarray[source]¶
Crop from
img
- Parameters
img (np.ndarray) – Original input image.
crop_bbox (tuple) – Coordinates of the cropped image.
- Returns
The cropped image.
- Return type
np.ndarray
- generate_margin(results: dict) → tuple[source]¶
Generate margin of crop bounding-box.
If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the bounding-box, and return the margin between of the bounding-box and image. If keep_foreground is False, it will return the difference from crop shape and image shape.
- Parameters
results (dict) – Result dict from loading pipeline.
- Returns
The margin for 3 dimensions of crop bounding-box and image.
- Return type
tuple
- random_generate_crop_bbox(margin_z: int, margin_y: int, margin_x: int) → tuple[source]¶
Randomly get a crop bounding box.
- Parameters
seg_map (np.ndarray) – Ground truth segmentation map.
- Returns
Coordinates of the cropped image.
- Return type
tuple
- class mmseg.datasets.transforms.BioMedical3DRandomFlip(prob: float, axes: Tuple[int, ...], swap_label_pairs: Optional[List[Tuple[int, int]]] = None)[source]¶
Flip biomedical 3D images and segmentations.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/spatial_transforms.py # noqa:E501
Copyright 2021 Division of Medical Image Computing, German Cancer Research Center (DKFZ) and Applied Computer Vision Lab, Helmholtz Imaging Platform. Licensed under the Apache-2.0 License.
Required Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Modified Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Added Keys:
do_flip
flip_axes
- Parameters
prob (float) – Flipping probability.
axes (Tuple[int, ...]) – Flipping axes with order ‘ZXY’.
swap_label_pairs (Optional[List[Tuple[int, int]]]) –
segmentation label pairs that are swapped when flipping. (The) –
- class mmseg.datasets.transforms.BioMedicalGaussianBlur(sigma_range: Tuple[float, float] = (0.5, 1.0), prob: float = 0.2, prob_per_channel: float = 0.5, different_sigma_per_channel: bool = True, different_sigma_per_axis: bool = True)[source]¶
Add Gaussian blur with random sigma to image.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L81 # noqa:E501
Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
Modified Keys:
img
- Parameters
sigma_range (Tuple[float, float]|float) – range to randomly select sigma value. Default to (0.5, 1.0).
prob (float) – Probability to apply Gaussian blur for each sample. Default to 0.2.
prob_per_channel (float) – Probability to apply Gaussian blur for each channel (axis N of the image). Default to 0.5.
different_sigma_per_channel (bool) – whether to use different sigma for each channel (axis N of the image). Default to True.
different_sigma_per_axis (bool) – whether to use different sigma for axis Z, X and Y of the image. Default to True.
- class mmseg.datasets.transforms.BioMedicalGaussianNoise(prob: float = 0.1, mean: float = 0.0, std: float = 0.1)[source]¶
Add random Gaussian noise to image.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L53 # noqa:E501
Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
Modified Keys:
img
- Parameters
prob (float) – Probability to add Gaussian noise for each sample. Default to 0.1.
mean (float) – Mean or “centre” of the distribution. Default to 0.0.
std (float) – Standard deviation of distribution. Default to 0.1.
- class mmseg.datasets.transforms.BioMedicalRandomGamma(prob: float = 0.5, gamma_range: Tuple[float] = (0.5, 2), invert_image: bool = False, per_channel: bool = False, retain_stats: bool = False)[source]¶
Using random gamma correction to process the biomedical image.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/color_transforms.py#L132 # noqa:E501 With licence: Apache 2.0
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
Modified Keys: - img
- Parameters
prob (float) – The probability to perform this transform. Default: 0.5.
gamma_range (Tuple[float]) – Range of gamma values. Default: (0.5, 2).
invert_image (bool) – Whether invert the image before applying gamma augmentation. Default: False.
per_channel (bool) – Whether perform the transform each channel individually. Default: False
retain_stats (bool) – Gamma transformation will alter the mean and std of the data in the patch. If retain_stats=True, the data will be transformed to match the mean and standard deviation before gamma augmentation. Default: False.
- class mmseg.datasets.transforms.CLAHE(clip_limit=40.0, tile_grid_size=(8, 8))[source]¶
Use CLAHE method to process the image.
See ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994:474-485. for more information.
Required Keys:
img
Modified Keys:
img
- Parameters
clip_limit (float) – Threshold for contrast limiting. Default: 40.0.
tile_grid_size (tuple[int]) – Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. It defines the number of tiles in row and column. Default: (8, 8).
- class mmseg.datasets.transforms.ConcatCDInput(input_keys=('img', 'img2'))[source]¶
Concat images for change detection.
Required Keys:
img
img2
- Parameters
input_keys (tuple) – Input image keys for change detection. Default: (‘img’, ‘img2’).
- transform(results: dict) → dict[source]¶
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
- Parameters
results (dict) – The result dict.
- Returns
The result dict.
- Return type
dict
- class mmseg.datasets.transforms.GenerateEdge(edge_width: int = 3, ignore_index: int = 255)[source]¶
Generate Edge for CE2P approach.
Edge will be used to calculate loss of CE2P.
Modified from https://github.com/liutinglt/CE2P/blob/master/dataset/target_generation.py # noqa:E501
Required Keys:
img_shape
gt_seg_map
- Added Keys:
- gt_edge_map (np.ndarray, uint8): The edge annotation generated from the
seg map by extracting border between different semantics.
- Parameters
edge_width (int) – The width of edge. Default to 3.
ignore_index (int) – Index that will be ignored. Default to 255.
- class mmseg.datasets.transforms.LoadAnnotations(reduce_zero_label=None, backend_args=None, imdecode_backend='pillow')[source]¶
Load annotations for semantic segmentation provided by dataset.
The annotation format is as the following:
{ # Filename of semantic segmentation ground truth file. 'seg_map_path': 'a/b/c' }
After this module, the annotation has been changed to the format below:
{ # in str 'seg_fields': List # In uint8 type. 'gt_seg_map': np.ndarray (H, W) }
Required Keys:
seg_map_path (str): Path of semantic segmentation ground truth file.
Added Keys:
seg_fields (List)
gt_seg_map (np.uint8)
- Parameters
reduce_zero_label (bool, optional) – Whether reduce all label value by 1. Usually used for datasets where 0 is background label. Defaults to None.
imdecode_backend (str) – The image decoding backend type. The backend argument for :func:
mmcv.imfrombytes
. See :fun:mmcv.imfrombytes
for details. Defaults to ‘pillow’.backend_args (dict) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.transforms.LoadBiomedicalAnnotation(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]¶
Load
seg_map
annotation provided by biomedical dataset.The annotation format is as the following:
{ 'gt_seg_map': np.ndarray (X, Y, Z) or (Z, Y, X) }
Required Keys:
seg_map_path
Added Keys:
- gt_seg_map (np.ndarray): Biomedical seg map with shape (Z, Y, X) by
default, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.
- Parameters
decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.
to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.
to_float32 (bool) – Whether to convert the loaded seg map to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See
mmengine.fileio
for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.transforms.LoadBiomedicalData(with_seg=False, decode_backend: str = 'numpy', to_xyz: bool = False, backend_args: Optional[dict] = None)[source]¶
Load an biomedical image and annotation from file.
The loading data format is as the following:
{ 'img': np.ndarray data[:-1, X, Y, Z] 'seg_map': np.ndarray data[-1, X, Y, Z] }
Required Keys:
img_path
Added Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
img_shape
ori_shape
- Parameters
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.
decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.
to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.transforms.LoadBiomedicalImageFromFile(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]¶
Load an biomedical mage from file.
Required Keys:
img_path
Added Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.
img_shape
ori_shape
- Parameters
decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.
to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.transforms.LoadDepthAnnotation(decode_backend: str = 'cv2', to_float32: bool = True, depth_rescale_factor: float = 1.0, backend_args: Optional[dict] = None)[source]¶
Load
depth_map
annotation provided by depth estimation dataset.The annotation format is as the following:
{ 'gt_depth_map': np.ndarray [Y, X] }
Required Keys:
seg_depth_path
Added Keys:
- gt_depth_map (np.ndarray): Depth map with shape (Y, X) by
default, and data type is float32 if set to_float32 = True.
- depth_rescale_factor (float): The rescale factor of depth map, which
can be used to recover the original value of depth map.
- Parameters
decode_backend (str) – The data decoding backend type. Options are ‘numpy’, ‘nifti’, and ‘cv2’. Defaults to ‘cv2’.
to_float32 (bool) – Whether to convert the loaded depth map to a float32 numpy array. If set to False, the loaded image is an uint16 array. Defaults to True.
depth_rescale_factor (float) – Factor to rescale the depth value to limit the range. Defaults to 1.0.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See
mmengine.fileio
for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.transforms.LoadImageFromNDArray(to_float32: bool = False, color_type: str = 'color', imdecode_backend: str = 'cv2', file_client_args: Optional[dict] = None, ignore_empty: bool = False, *, backend_args: Optional[dict] = None)[source]¶
Load an image from
results['img']
.Similar with
LoadImageFromFile
, but the image has been loaded asnp.ndarray
inresults['img']
. Can be used when loading image from webcam.Required Keys:
img
Modified Keys:
img
img_path
img_shape
ori_shape
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
- class mmseg.datasets.transforms.LoadMultipleRSImageFromFile(to_float32: bool = True)[source]¶
Load two Remote Sensing mage from file.
Required Keys:
img_path
img_path2
Modified Keys:
img
img2
img_shape
ori_shape
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is a float64 array. Defaults to True.
- class mmseg.datasets.transforms.LoadSingleRSImageFromFile(to_float32: bool = True)[source]¶
Load a Remote Sensing mage from file.
Required Keys:
img_path
Modified Keys:
img
img_shape
ori_shape
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is a float64 array. Defaults to True.
- class mmseg.datasets.transforms.PackSegInputs(meta_keys=('img_path', 'seg_map_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'reduce_zero_label'))[source]¶
Pack the inputs data for the semantic segmentation.
The
img_meta
item is always populated. The contents of theimg_meta
dictionary depends onmeta_keys
. By default this includes:img_path
: filename of the imageori_shape
: original shape of the image as a tuple (h, w, c)img_shape
: shape of the image input to the network as a tuple (h, w, c). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.pad_shape
: shape of padded imagesscale_factor
: a float indicating the preprocessing scaleflip
: a boolean indicating if image flip transform was usedflip_direction
: the flipping direction
- Parameters
meta_keys (Sequence[str], optional) – Meta keys to be packed from
SegDataSample
and collected indata[img_metas]
. Default:('img_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction')
- class mmseg.datasets.transforms.PhotoMetricDistortion(brightness_delta: int = 32, contrast_range: Sequence[float] = (0.5, 1.5), saturation_range: Sequence[float] = (0.5, 1.5), hue_delta: int = 18)[source]¶
Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.
random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
Required Keys:
img
Modified Keys:
img
- Parameters
brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.
- brightness(img: numpy.ndarray) → numpy.ndarray[source]¶
Brightness distortion.
- Parameters
img (np.ndarray) – The input image.
- Returns
Image after brightness change.
- Return type
np.ndarray
- contrast(img: numpy.ndarray) → numpy.ndarray[source]¶
Contrast distortion.
- Parameters
img (np.ndarray) – The input image.
- Returns
Image after contrast change.
- Return type
np.ndarray
- convert(img: numpy.ndarray, alpha: int = 1, beta: int = 0) → numpy.ndarray[source]¶
Multiple with alpha and add beat with clip.
- Parameters
img (np.ndarray) – The input image.
alpha (int) – Image weights, change the contrast/saturation of the image. Default: 1
beta (int) – Image bias, change the brightness of the image. Default: 0
- Returns
The transformed image.
- Return type
np.ndarray
- hue(img: numpy.ndarray) → numpy.ndarray[source]¶
Hue distortion.
- Parameters
img (np.ndarray) – The input image.
- Returns
Image after hue change.
- Return type
np.ndarray
- class mmseg.datasets.transforms.RGB2Gray(out_channels=None, weights=(0.299, 0.587, 0.114))[source]¶
Convert RGB image to grayscale image.
Required Keys:
img
Modified Keys:
img
img_shape
This transform calculate the weighted mean of input image channels with
weights
and then expand the channels toout_channels
. Whenout_channels
is None, the number of output channels is the same as input channels.- Parameters
out_channels (int) – Expected number of output channels after transforming. Default: None.
weights (tuple[float]) – The weights to calculate the weighted mean. Default: (0.299, 0.587, 0.114).
- class mmseg.datasets.transforms.RandomCrop(crop_size: Union[int, Tuple[int, int]], cat_max_ratio: float = 1.0, ignore_index: int = 255)[source]¶
Random crop the image & seg.
Required Keys:
img
gt_seg_map
Modified Keys:
img
img_shape
gt_seg_map
- Parameters
crop_size (Union[int, Tuple[int, int]]) – Expected size after cropping with the format of (h, w). If set to an integer, then cropping width and height are equal to this integer.
cat_max_ratio (float) – The maximum ratio that single category could occupy.
ignore_index (int) – The label index to be ignored. Default: 255
- class mmseg.datasets.transforms.RandomCutOut(prob, n_holes, cutout_shape=None, cutout_ratio=None, fill_in=(0, 0, 0), seg_fill_in=None)[source]¶
CutOut operation.
Randomly drop some regions of image used in Cutout.
Required Keys:
img
gt_seg_map
Modified Keys:
img
gt_seg_map
- Parameters
prob (float) – cutout probability.
n_holes (int | tuple[int, int]) – Number of regions to be dropped. If it is given as a list, number of holes will be randomly selected from the closed interval [n_holes[0], n_holes[1]].
cutout_shape (tuple[int, int] | list[tuple[int, int]]) – The candidate shape of dropped regions. It can be tuple[int, int] to use a fixed cutout shape, or list[tuple[int, int]] to randomly choose shape from the list.
cutout_ratio (tuple[float, float] | list[tuple[float, float]]) – The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time.
fill_in (tuple[float, float, float] | tuple[int, int, int]) – The value of pixel to fill in the dropped regions. Default: (0, 0, 0).
seg_fill_in (int) – The labels of pixel to fill in the dropped regions. If seg_fill_in is None, skip. Default: None.
- class mmseg.datasets.transforms.RandomDepthMix(prob: float = 0.25, mix_scale_ratio: float = 0.75)[source]¶
This class implements the RandomDepthMix transform.
- Parameters
prob (float) – Probability of applying the transformation. Defaults to 0.25.
mix_scale_ratio (float) – Ratio to scale the mix width. Defaults to 0.75.
- transform(results: dict) → dict[source]¶
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
- Parameters
results (dict) – The result dict.
- Returns
The result dict.
- Return type
dict
- class mmseg.datasets.transforms.RandomFlip(prob: Optional[Union[float, Iterable[float]]] = None, direction: Union[str, Sequence[Optional[str]]] = 'horizontal', swap_seg_labels: Optional[Sequence] = None)[source]¶
Flip the image & bbox & segmentation map. Added or Updated keys: flip, flip_direction, img, gt_bboxes, gt_seg_map, and gt_depth_map. There are 3 flip modes:
prob
is float,direction
is string: the image will bedirection``ly flipped with probability of ``prob
. E.g.,prob=0.5
,direction='horizontal'
, then image will be horizontally flipped with probability of 0.5.prob
is float,direction
is list of string: the image will bedirection[i]``ly flipped with probability of ``prob/len(direction)
. E.g.,prob=0.5
,direction=['horizontal', 'vertical']
, then image will be horizontally flipped with probability of 0.25, vertically with probability of 0.25.prob
is list of float,direction
is list of string: givenlen(prob) == len(direction)
, the image will bedirection[i]``ly flipped with probability of ``prob[i]
. E.g.,prob=[0.3, 0.5]
,direction=['horizontal', 'vertical']
, then image will be horizontally flipped with probability of 0.3, vertically with probability of 0.5.
Required Keys:
img
gt_bboxes (optional)
gt_seg_map (optional)
gt_depth_map (optional)
Modified Keys:
img
gt_bboxes (optional)
gt_seg_map (optional)
gt_depth_map (optional)
Added Keys:
flip
flip_direction
swap_seg_labels (optional)
- Parameters
prob (float | list[float], optional) – The flipping probability. Defaults to None.
direction (str | list[str]) – The flipping direction. Options If input is a list, the length must equal
prob
. Each element inprob
indicates the flip probability of corresponding direction. Defaults to ‘horizontal’.swap_seg_labels (list, optional) – The label pair need to be swapped for ground truth, like ‘left arm’ and ‘right arm’ need to be swapped after horizontal flipping. For example,
[(1, 5)]
, where 1/5 is the label of the left/right arm. Defaults to None.
- class mmseg.datasets.transforms.RandomMosaic(prob, img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=0, seg_pad_val=255)[source]¶
Mosaic augmentation. Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.
mosaic transform center_x +------------------------------+ | pad | pad | | +-----------+ | | | | | | | image1 |--------+ | | | | | | | | | image2 | | center_y |----+-------------+-----------| | | cropped | | |pad | image3 | image4 | | | | | +----|-------------+-----------+ | | +-------------+ The mosaic transform steps are as follows: 1. Choose the mosaic center as the intersections of 4 images 2. Get the left top image according to the index, and randomly sample another 3 images from the custom dataset. 3. Sub image will be cropped if image is larger than mosaic patch
Required Keys:
img
gt_seg_map
mix_results
Modified Keys:
img
img_shape
ori_shape
gt_seg_map
- Parameters
prob (float) – mosaic probability.
img_scale (Sequence[int]) – Image size after mosaic pipeline of a single image. The size of the output image is four times that of a single image. The output image comprises 4 single images. Default: (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default: (0.5, 1.5).
pad_val (int) – Pad value. Default: 0.
seg_pad_val (int) – Pad value of segmentation map. Default: 255.
- get_indices(dataset: mmseg.datasets.dataset_wrappers.MultiImageMixDataset) → list[source]¶
Call function to collect indices.
- Parameters
dataset (
MultiImageMixDataset
) – The dataset.- Returns
indices.
- Return type
list
- class mmseg.datasets.transforms.RandomRotFlip(rotate_prob=0.5, flip_prob=0.5, degree=(- 20, 20))[source]¶
Rotate and flip the image & seg or just rotate the image & seg.
Required Keys:
img
gt_seg_map
Modified Keys:
img
gt_seg_map
- Parameters
rotate_prob (float) – The probability of rotate image.
flip_prob (float) – The probability of rotate&flip image.
degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (
-degree
,+degree
)
- class mmseg.datasets.transforms.RandomRotate(prob, degree, pad_val=0, seg_pad_val=255, center=None, auto_bound=False)[source]¶
Rotate the image & seg.
Required Keys:
img
gt_seg_map
Modified Keys:
img
gt_seg_map
- Parameters
prob (float) – The rotation probability.
degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (
-degree
,+degree
)pad_val (float, optional) – Padding value of image. Default: 0.
seg_pad_val (float, optional) – Padding value of segmentation map. Default: 255.
center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used. Default: None.
auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image. Default: False
- class mmseg.datasets.transforms.Rerange(min_value=0, max_value=255)[source]¶
Rerange the image pixel value.
Required Keys:
img
Modified Keys:
img
- Parameters
min_value (float or int) – Minimum value of the reranged image. Default: 0.
max_value (float or int) – Maximum value of the reranged image. Default: 255.
- class mmseg.datasets.transforms.Resize(scale: Optional[Union[int, Tuple[int, int]]] = None, scale_factor: Optional[Union[float, Tuple[float, float]]] = None, keep_ratio: bool = False, clip_object_border: bool = True, backend: str = 'cv2', interpolation='bilinear')[source]¶
Resize images & seg & depth map.
This transform resizes the input image according to
scale
orscale_factor
. Seg map, depth map and other relative annotations are then resized with the same scale factor. ifscale
andscale_factor
are both set, it will usescale
to resize.Required Keys:
img
gt_seg_map (optional)
gt_depth_map (optional)
Modified Keys:
img
gt_seg_map
gt_depth_map
Added Keys:
scale
scale_factor
keep_ratio
- Parameters
scale (int or tuple) – Images scales for resizing. Defaults to None
scale_factor (float or tuple[float]) – Scale factors for resizing. Defaults to None.
keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Defaults to False.
clip_object_border (bool) – Whether to clip the objects outside the border of the image. In some dataset like MOT17, the gt bboxes are allowed to cross the border of images. Therefore, we don’t need to clip the gt bboxes in these cases. Defaults to True.
backend (str) – Image resize backend, choices are ‘cv2’ and ‘pillow’. These two backends generates slightly different results. Defaults to ‘cv2’.
interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend. Defaults to ‘bilinear’.
- class mmseg.datasets.transforms.ResizeShortestEdge(scale: Union[int, Tuple[int, int]], max_size: int)[source]¶
Resize the image and mask while keeping the aspect ratio unchanged.
Modified from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/transforms/augmentation_impl.py#L130 # noqa:E501 Copyright (c) Facebook, Inc. and its affiliates. Licensed under the Apache-2.0 License
This transform attempts to scale the shorter edge to the given scale, as long as the longer edge does not exceed max_size. If max_size is reached, then downscale so that the longer edge does not exceed max_size.
Required Keys:
img
gt_seg_map (optional)
Modified Keys:
img
img_shape
gt_seg_map (optional))
Added Keys:
scale
scale_factor
keep_ratio
- Parameters
scale (Union[int, Tuple[int, int]]) – The target short edge length. If it’s tuple, will select the min value as the short edge length.
max_size (int) – The maximum allowed longest edge length.
- transform(results: Dict) → Dict[source]¶
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
- Parameters
results (dict) – The result dict.
- Returns
The result dict.
- Return type
dict
- class mmseg.datasets.transforms.ResizeToMultiple(size_divisor=32, interpolation=None)[source]¶
Resize images & seg to multiple of divisor.
Required Keys:
img
gt_seg_map
Modified Keys:
img
img_shape
pad_shape
- Parameters
size_divisor (int) – images and gt seg maps need to resize to multiple of size_divisor. Default: 32.
interpolation (str, optional) – The interpolation mode of image resize. Default: None
mmseg.engine¶
hooks¶
- class mmseg.engine.hooks.SegVisualizationHook(draw: bool = False, interval: int = 50, show: bool = False, wait_time: float = 0.0, backend_args: Optional[dict] = None)[source]¶
Segmentation Visualization Hook. Used to visualize validation and testing process prediction results.
In the testing phase:
- If
show
is True, it means that only the prediction results are visualized without storing data, so
vis_backends
needs to be excluded.
- If
- Parameters
draw (bool) – whether to draw prediction results. If it is False, it means that no drawing will be done. Defaults to False.
interval (int) – The interval of visualization. Defaults to 50.
show (bool) – Whether to display the drawn image. Default to False.
wait_time (float) – The interval of show (s). Defaults to 0.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
optimizers¶
- class mmseg.engine.optimizers.ForceDefaultOptimWrapperConstructor(optim_wrapper_cfg: dict, paramwise_cfg: Optional[dict] = None)[source]¶
Default constructor with forced optimizer settings.
This constructor extends the default constructor to add an option for forcing default optimizer settings. This is useful for ensuring that certain parameters or layers strictly adhere to pre-defined default settings, regardless of any custom settings specified.
By default, each parameter share the same optimizer settings, and we provide an argument
paramwise_cfg
to specify parameter-wise settings. It is a dict and may contain various fields like ‘custom_keys’, ‘bias_lr_mult’, etc., as well as the additional field force_default_settings which allows for enforcing default settings on optimizer parameters.custom_keys
(dict): Specified parameters-wise settings by keys. If one of the keys incustom_keys
is a substring of the name of one parameter, then the setting of the parameter will be specified bycustom_keys[key]
and other setting likebias_lr_mult
etc. will be ignored. It should be noted that the aforementionedkey
is the longest key that is a substring of the name of the parameter. If there are multiple matched keys with the same length, then the key with lower alphabet order will be chosen.custom_keys[key]
should be a dict and may contain fieldslr_mult
anddecay_mult
. See Example 2 below.bias_lr_mult
(float): It will be multiplied to the learning rate for all bias parameters (except for those in normalization layers and offset layers of DCN).bias_decay_mult
(float): It will be multiplied to the weight decay for all bias parameters (except for those in normalization layers, depthwise conv layers, offset layers of DCN).norm_decay_mult
(float): It will be multiplied to the weight decay for all weight and bias parameters of normalization layers.flat_decay_mult
(float): It will be multiplied to the weight decay for all one-dimensional parametersdwconv_decay_mult
(float): It will be multiplied to the weight decay for all weight and bias parameters of depthwise conv layers.dcn_offset_lr_mult
(float): It will be multiplied to the learning rate for parameters of offset layer in the deformable convs of a model.bypass_duplicate
(bool): If true, the duplicate parameters would not be added into optimizer. Defaults to False.force_default_settings
(bool): If true, this will override any custom settings defined bycustom_keys
and enforce the use of default settings for optimizer parameters likebias_lr_mult
. This is particularly useful when you want to ensure that certain layers or parameters adhere strictly to the pre-defined default settings.
Note
1. If the option
dcn_offset_lr_mult
is used, the constructor will override the effect ofbias_lr_mult
in the bias of offset layer. So be careful when using bothbias_lr_mult
anddcn_offset_lr_mult
. If you wish to apply both of them to the offset layer in deformable convs, setdcn_offset_lr_mult
to the originaldcn_offset_lr_mult
*bias_lr_mult
.2. If the option
dcn_offset_lr_mult
is used, the constructor will apply it to all the DCN layers in the model. So be careful when the model contains multiple DCN layers in places other than backbone.3. When the option
force_default_settings
is true, it will override any custom settings provided incustom_keys
. This ensures that the default settings for the optimizer parameters are used.- Parameters
optim_wrapper_cfg (dict) –
The config dict of the optimizer wrapper.
Required fields of
optim_wrapper_cfg
aretype
: class name of the OptimizerWrapperoptimizer
: The configuration of optimizer.
Optional fields of
optim_wrapper_cfg
areany arguments of the corresponding optimizer wrapper type, e.g., accumulative_counts, clip_grad, etc.
Required fields of
optimizer
aretype: class name of the optimizer.
Optional fields of
optimizer
areany arguments of the corresponding optimizer type, e.g., lr, weight_decay, momentum, etc.
paramwise_cfg (dict, optional) – Parameter-wise options.
- Example 1:
>>> model = torch.nn.modules.Conv1d(1, 1, 1) >>> optim_wrapper_cfg = dict( >>> dict(type='OptimWrapper', optimizer=dict(type='SGD', lr=0.01, >>> momentum=0.9, weight_decay=0.0001)) >>> paramwise_cfg = dict(norm_decay_mult=0.) >>> optim_wrapper_builder = DefaultOptimWrapperConstructor( >>> optim_wrapper_cfg, paramwise_cfg) >>> optim_wrapper = optim_wrapper_builder(model)
- Example 2:
>>> # assume model have attribute model.backbone and model.cls_head >>> optim_wrapper_cfg = dict(type='OptimWrapper', optimizer=dict( >>> type='SGD', lr=0.01, weight_decay=0.95)) >>> paramwise_cfg = dict(custom_keys={ >>> 'backbone': dict(lr_mult=0.1, decay_mult=0.9)}) >>> optim_wrapper_builder = DefaultOptimWrapperConstructor( >>> optim_wrapper_cfg, paramwise_cfg) >>> optim_wrapper = optim_wrapper_builder(model) >>> # Then the `lr` and `weight_decay` for model.backbone is >>> # (0.01 * 0.1, 0.95 * 0.9). `lr` and `weight_decay` for >>> # model.cls_head is (0.01, 0.95).
- add_params(params: List[dict], module: torch.nn.modules.module.Module, prefix: str = '', is_dcn_module: Optional[Union[int, float]] = None) → None[source]¶
Add all parameters of module to the params list.
The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.
- Parameters
params (list[dict]) – A list of param groups, it will be modified in place.
module (nn.Module) – The module to be added.
prefix (str) – The prefix of the module
is_dcn_module (int|float|None) – If the current module is a submodule of DCN, is_dcn_module will be passed to control conv_offset layer’s learning rate. Defaults to None.
- class mmseg.engine.optimizers.LayerDecayOptimizerConstructor(optim_wrapper_cfg, paramwise_cfg)[source]¶
Different learning rates are set for different layers of backbone.
Note: Currently, this optimizer constructor is built for BEiT, and it will be deprecated. Please use
LearningRateDecayOptimizerConstructor
instead.
- class mmseg.engine.optimizers.LearningRateDecayOptimizerConstructor(optim_wrapper_cfg: dict, paramwise_cfg: Optional[dict] = None)[source]¶
Different learning rates are set for different layers of backbone.
Note: Currently, this optimizer constructor is built for ConvNeXt, BEiT and MAE.
- add_params(params, module, **kwargs)[source]¶
Add all parameters of module to the params list.
The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.
- Parameters
params (list[dict]) – A list of param groups, it will be modified in place.
module (nn.Module) – The module to be added.
mmseg.evaluation¶
metrics¶
- class mmseg.evaluation.metrics.CityscapesMetric(output_dir: str, ignore_index: int = 255, format_only: bool = False, keep_results: bool = False, collect_device: str = 'cpu', prefix: Optional[str] = None, **kwargs)[source]¶
Cityscapes evaluation metric.
- Parameters
output_dir (str) – The directory for output prediction
ignore_index (int) – Index that will be ignored in evaluation. Default: 255.
format_only (bool) – Only format result for results commit without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.
keep_results (bool) – Whether to keep the results. When
format_only
is True,keep_results
must be True. Defaults to False.collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
- compute_metrics(results: list) → Dict[str, float][source]¶
Compute the metrics from processed results.
- Parameters
results (list) – Testing results of the dataset.
- Returns
float]: Cityscapes evaluation results.
- Return type
dict[str
- process(data_batch: dict, data_samples: Sequence[dict]) → None[source]¶
Process one batch of data and data_samples.
The processed results should be stored in
self.results
, which will be used to computed the metrics when all batches have been processed.- Parameters
data_batch (dict) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.
- class mmseg.evaluation.metrics.DepthMetric(depth_metrics: Optional[List[str]] = None, min_depth_eval: float = 0.0, max_depth_eval: float = inf, crop_type: Optional[str] = None, depth_scale_factor: float = 1.0, collect_device: str = 'cpu', output_dir: Optional[str] = None, format_only: bool = False, prefix: Optional[str] = None, **kwargs)[source]¶
Depth estimation evaluation metric.
- Parameters
depth_metrics (List[str], optional) – List of metrics to compute. If not specified, defaults to all metrics in self.METRICS.
min_depth_eval (float) – Minimum depth value for evaluation. Defaults to 0.0.
max_depth_eval (float) – Maximum depth value for evaluation. Defaults to infinity.
crop_type (str, optional) – Specifies the type of cropping to be used during evaluation. This option can affect how the evaluation mask is generated. Currently, ‘nyu_crop’ is supported, but other types can be added in future. Defaults to None if no cropping should be applied.
depth_scale_factor (float) – Factor to scale the depth values. Defaults to 1.0.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
output_dir (str) – The directory for output prediction. Defaults to None.
format_only (bool) – Only format result for results commit without perform evaluation. It is useful when you want to save the result to a specific format and submit it to the test server. Defaults to False.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
- compute_metrics(results: list) → Dict[str, float][source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
- The computed metrics. The keys are the names of
the metrics, and the values are corresponding results. The keys are identical with self.metrics.
- Return type
Dict[str, float]
- process(data_batch: dict, data_samples: Sequence[dict]) → None[source]¶
Process one batch of data and data_samples.
The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (dict) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.
- class mmseg.evaluation.metrics.IoUMetric(ignore_index: int = 255, iou_metrics: List[str] = ['mIoU'], nan_to_num: Optional[int] = None, beta: int = 1, collect_device: str = 'cpu', output_dir: Optional[str] = None, format_only: bool = False, prefix: Optional[str] = None, **kwargs)[source]¶
IoU evaluation metric.
- Parameters
ignore_index (int) – Index that will be ignored in evaluation. Default: 255.
iou_metrics (list[str] | str) – Metrics to be calculated, the options includes ‘mIoU’, ‘mDice’ and ‘mFscore’.
nan_to_num (int, optional) – If specified, NaN values will be replaced by the numbers defined by the user. Default: None.
beta (int) – Determines the weight of recall in the combined score. Default: 1.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
output_dir (str) – The directory for output prediction. Defaults to None.
format_only (bool) – Only format result for results commit without perform evaluation. It is useful when you want to save the result to a specific format and submit it to the test server. Defaults to False.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
- compute_metrics(results: list) → Dict[str, float][source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
- The computed metrics. The keys are the names of
the metrics, and the values are corresponding results. The key mainly includes aAcc, mIoU, mAcc, mDice, mFscore, mPrecision, mRecall.
- Return type
Dict[str, float]
- static intersect_and_union(pred_label: torch._VariableFunctionsClass.tensor, label: torch._VariableFunctionsClass.tensor, num_classes: int, ignore_index: int)[source]¶
Calculate Intersection and Union.
- Parameters
pred_label (torch.tensor) – Prediction segmentation map or predict result filename. The shape is (H, W).
label (torch.tensor) – Ground truth segmentation map or label filename. The shape is (H, W).
num_classes (int) – Number of categories.
ignore_index (int) – Index that will be ignored in evaluation.
- Returns
- The intersection of prediction and ground truth
histogram on all classes.
- torch.Tensor: The union of prediction and ground truth histogram on
all classes.
torch.Tensor: The prediction histogram on all classes. torch.Tensor: The ground truth histogram on all classes.
- Return type
torch.Tensor
- process(data_batch: dict, data_samples: Sequence[dict]) → None[source]¶
Process one batch of data and data_samples.
The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (dict) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.
- static total_area_to_metrics(total_area_intersect: numpy.ndarray, total_area_union: numpy.ndarray, total_area_pred_label: numpy.ndarray, total_area_label: numpy.ndarray, metrics: List[str] = ['mIoU'], nan_to_num: Optional[int] = None, beta: int = 1)[source]¶
Calculate evaluation metrics :param total_area_intersect: The intersection of prediction
and ground truth histogram on all classes.
- Parameters
total_area_union (np.ndarray) – The union of prediction and ground truth histogram on all classes.
total_area_pred_label (np.ndarray) – The prediction histogram on all classes.
total_area_label (np.ndarray) – The ground truth histogram on all classes.
metrics (List[str] | str) – Metrics to be evaluated, ‘mIoU’ and ‘mDice’.
nan_to_num (int, optional) – If specified, NaN values will be replaced by the numbers defined by the user. Default: None.
beta (int) – Determines the weight of recall in the combined score. Default: 1.
- Returns
- per category evaluation metrics,
shape (num_classes, ).
- Return type
Dict[str, np.ndarray]
mmseg.models¶
backbones¶
- class mmseg.models.backbones.BEiT(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, qv_bias=True, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, num_fcs=2, norm_eval=False, pretrained=None, init_values=0.1, init_cfg=None)[source]¶
BERT Pre-Training of Image Transformers.
- Parameters
img_size (int | tuple) – Input image size. Default: 224.
patch_size (int) – The patch size. Default: 16.
in_channels (int) – Number of input channels. Default: 3.
embed_dims (int) – Embedding dimension. Default: 768.
num_layers (int) – Depth of transformer. Default: 12.
num_heads (int) – Number of attention heads. Default: 12.
mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Default: 4.
out_indices (list | tuple | int) – Output from which stages. Default: -1.
qv_bias (bool) – Enable bias for qv if True. Default: True.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0
drop_path_rate (float) – Stochastic depth rate. Default 0.0.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.
final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.
num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
pretrained (str, optional) – Model pretrained path. Default: None.
init_values (float) – Initialize the values of BEiTAttention and FFN with learnable scaling.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- forward(inputs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- resize_rel_pos_embed(checkpoint)[source]¶
Resize relative pos_embed weights.
This function is modified from https://github.com/microsoft/unilm/blob/master/beit/semantic_segmentation/mmcv_custom/checkpoint.py. # noqa: E501 Copyright (c) Microsoft Corporation Licensed under the MIT License :param checkpoint: Key and value of the pretrain model. :type checkpoint: dict
- Returns
- Interpolate the relative pos_embed weights
in the pre-train model to the current model size.
- Return type
state_dict (dict)
- train(mode=True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
- class mmseg.models.backbones.BiSeNetV1(backbone_cfg, in_channels=3, spatial_channels=(64, 64, 64, 128), context_channels=(128, 256, 512), out_indices=(0, 1, 2), align_corners=False, out_channels=256, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]¶
BiSeNetV1 backbone.
This backbone is the implementation of BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation.
- Parameters
backbone_cfg – (dict): Config of backbone of Context Path.
in_channels (int) – The number of channels of input image. Default: 3.
spatial_channels (Tuple[int]) – Size of channel numbers of various layers in Spatial Path. Default: (64, 64, 64, 128).
context_channels (Tuple[int]) – Size of channel numbers of various modules in Context Path. Default: (128, 256, 512).
out_indices (Tuple[int] | int, optional) – Output from which stages. Default: (0, 1, 2).
align_corners (bool, optional) – The align_corners argument of resize operation in Bilateral Guided Aggregation Layer. Default: False.
out_channels (int) – The number of channels of output. It must be the same with in_channels of decode_head. Default: 256.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.BiSeNetV2(in_channels=3, detail_channels=(64, 64, 128), semantic_channels=(16, 32, 64, 128), semantic_expansion_ratio=6, bga_channels=128, out_indices=(0, 1, 2, 3, 4), align_corners=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]¶
BiSeNetV2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation.
This backbone is the implementation of BiSeNetV2.
- Parameters
in_channels (int) – Number of channel of input image. Default: 3.
detail_channels (Tuple[int], optional) – Channels of each stage in Detail Branch. Default: (64, 64, 128).
semantic_channels (Tuple[int], optional) – Channels of each stage in Semantic Branch. Default: (16, 32, 64, 128). See Table 1 and Figure 3 of paper for more details.
semantic_expansion_ratio (int, optional) – The expansion factor expanding channel number of middle channels in Semantic Branch. Default: 6.
bga_channels (int, optional) – Number of middle channels in Bilateral Guided Aggregation Layer. Default: 128.
out_indices (Tuple[int] | int, optional) – Output from which stages. Default: (0, 1, 2, 3, 4).
align_corners (bool, optional) – The align_corners argument of resize operation in Bilateral Guided Aggregation Layer. Default: False.
conv_cfg (dict | None) – Config of conv layers. Default: None.
norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’BN’).
act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.CGNet(in_channels=3, num_channels=(32, 64, 128), num_blocks=(3, 21), dilations=(2, 4), reductions=(8, 16), conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'PReLU'}, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]¶
CGNet backbone.
This backbone is the implementation of A Light-weight Context Guided Network for Semantic Segmentation.
- Parameters
in_channels (int) – Number of input image channels. Normally 3.
num_channels (tuple[int]) – Numbers of feature channels at each stages. Default: (32, 64, 128).
num_blocks (tuple[int]) – Numbers of CG blocks at stage 1 and stage 2. Default: (3, 21).
dilations (tuple[int]) – Dilation rate for surrounding context extractors at stage 1 and stage 2. Default: (2, 4).
reductions (tuple[int]) – Reductions for global context extractors at stage 1 and stage 2. Default: (8, 16).
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’, requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’PReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.DDRNet(in_channels: int = 3, channels: int = 32, ppm_channels: int = 128, align_corners: bool = False, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'requires_grad': True, 'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
DDRNet backbone.
This backbone is the implementation of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. Modified from https://github.com/ydhongHIT/DDRNet.
- Parameters
in_channels (int) – Number of input image channels. Default: 3.
channels – (int): The base channels of DDRNet. Default: 32.
ppm_channels (int) – The channels of PPM module. Default: 128.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’BN’, requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’, inplace=True).
init_cfg (dict, optional) – Initialization config dict. Default: None.
- class mmseg.models.backbones.ERFNet(in_channels=3, enc_downsample_channels=(16, 64, 128), enc_stage_non_bottlenecks=(5, 8), enc_non_bottleneck_dilations=(2, 4, 8, 16), enc_non_bottleneck_channels=(64, 128), dec_upsample_channels=(64, 16), dec_stages_non_bottleneck=(2, 2), dec_non_bottleneck_channels=(64, 16), dropout_ratio=0.1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]¶
ERFNet backbone.
This backbone is the implementation of ERFNet: Efficient Residual Factorized ConvNet for Real-time SemanticSegmentation.
- Parameters
in_channels (int) – The number of channels of input image. Default: 3.
enc_downsample_channels (Tuple[int]) – Size of channel numbers of various Downsampler block in encoder. Default: (16, 64, 128).
enc_stage_non_bottlenecks (Tuple[int]) – Number of stages of Non-bottleneck block in encoder. Default: (5, 8).
enc_non_bottleneck_dilations (Tuple[int]) – Dilation rate of each stage of Non-bottleneck block of encoder. Default: (2, 4, 8, 16).
enc_non_bottleneck_channels (Tuple[int]) – Size of channel numbers of various Non-bottleneck block in encoder. Default: (64, 128).
dec_upsample_channels (Tuple[int]) – Size of channel numbers of various Deconvolution block in decoder. Default: (64, 16).
dec_stages_non_bottleneck (Tuple[int]) – Number of stages of Non-bottleneck block in decoder. Default: (2, 2).
dec_non_bottleneck_channels (Tuple[int]) – Size of channel numbers of various Non-bottleneck block in decoder. Default: (64, 16).
drop_rate (float) – Probability of an element to be zeroed. Default 0.1.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.FastSCNN(in_channels=3, downsample_dw_channels=(32, 48), global_in_channels=64, global_block_channels=(64, 96, 128), global_block_strides=(2, 2, 1), global_out_channels=128, higher_in_channels=64, lower_in_channels=128, fusion_out_channels=128, out_indices=(0, 1, 2), conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, align_corners=False, dw_act_cfg=None, init_cfg=None)[source]¶
Fast-SCNN Backbone.
This backbone is the implementation of Fast-SCNN: Fast Semantic Segmentation Network.
- Parameters
in_channels (int) – Number of input image channels. Default: 3.
downsample_dw_channels (tuple[int]) – Number of output channels after the first conv layer & the second conv layer in Learning-To-Downsample (LTD) module. Default: (32, 48).
global_in_channels (int) – Number of input channels of Global Feature Extractor(GFE). Equal to number of output channels of LTD. Default: 64.
global_block_channels (tuple[int]) – Tuple of integers that describe the output channels for each of the MobileNet-v2 bottleneck residual blocks in GFE. Default: (64, 96, 128).
global_block_strides (tuple[int]) – Tuple of integers that describe the strides (downsampling factors) for each of the MobileNet-v2 bottleneck residual blocks in GFE. Default: (2, 2, 1).
global_out_channels (int) – Number of output channels of GFE. Default: 128.
higher_in_channels (int) – Number of input channels of the higher resolution branch in FFM. Equal to global_in_channels. Default: 64.
lower_in_channels (int) – Number of input channels of the lower resolution branch in FFM. Equal to global_out_channels. Default: 128.
fusion_out_channels (int) – Number of output channels of FFM. Default: 128.
out_indices (tuple) – Tuple of indices of list [higher_res_features, lower_res_features, fusion_output]. Often set to (0,1,2) to enable aux. heads. Default: (0, 1, 2).
conv_cfg (dict | None) – Config of conv layers. Default: None
norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’BN’)
act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’)
align_corners (bool) – align_corners argument of F.interpolate. Default: False
dw_act_cfg (dict) – In DepthwiseSeparableConvModule, activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, frozen_stages=- 1, zero_init_residual=False, multiscale_output=True, pretrained=None, init_cfg=None)[source]¶
HRNet backbone.
This backbone is the implementation of High-Resolution Representations for Labeling Pixels and Regions.
- Parameters
extra (dict) –
Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:
num_modules (int): The number of HRModule in this stage.
num_branches (int): The number of branches in the HRModule.
block (str): The type of convolution block.
- num_blocks (tuple): The number of blocks in each branch.
The length must be equal to num_branches.
- num_channels (tuple): The number of channels in each branch.
The length must be equal to num_branches.
in_channels (int) – Number of input image channels. Normally 3.
conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Use BN by default.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.
multiscale_output (bool) – Whether to output multi-level features produced by multiple branches. If False, only the first level feature will be output. Default: True.
pretrained (str, optional) – Model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
Example
>>> from mmseg.models import HRNet >>> import torch >>> extra = dict( >>> stage1=dict( >>> num_modules=1, >>> num_branches=1, >>> block='BOTTLENECK', >>> num_blocks=(4, ), >>> num_channels=(64, )), >>> stage2=dict( >>> num_modules=1, >>> num_branches=2, >>> block='BASIC', >>> num_blocks=(4, 4), >>> num_channels=(32, 64)), >>> stage3=dict( >>> num_modules=4, >>> num_branches=3, >>> block='BASIC', >>> num_blocks=(4, 4, 4), >>> num_channels=(32, 64, 128)), >>> stage4=dict( >>> num_modules=3, >>> num_branches=4, >>> block='BASIC', >>> num_blocks=(4, 4, 4, 4), >>> num_channels=(32, 64, 128, 256))) >>> self = HRNet(extra, in_channels=1) >>> self.eval() >>> inputs = torch.rand(1, 1, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 32, 8, 8) (1, 64, 4, 4) (1, 128, 2, 2) (1, 256, 1, 1)
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- property norm2¶
the normalization layer named “norm2”
- Type
nn.Module
- class mmseg.models.backbones.ICNet(backbone_cfg, in_channels=3, layer_channels=(512, 2048), light_branch_middle_channels=32, psp_out_channels=512, out_channels=(64, 256, 256), pool_scales=(1, 2, 3, 6), conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, align_corners=False, init_cfg=None)[source]¶
ICNet for Real-Time Semantic Segmentation on High-Resolution Images.
This backbone is the implementation of ICNet.
- Parameters
backbone_cfg (dict) – Config dict to build backbone. Usually it is ResNet but it can also be other backbones.
in_channels (int) – The number of input image channels. Default: 3.
layer_channels (Sequence[int]) – The numbers of feature channels at layer 2 and layer 4 in ResNet. It can also be other backbones. Default: (512, 2048).
light_branch_middle_channels (int) – The number of channels of the middle layer in light branch. Default: 32.
psp_out_channels (int) – The number of channels of the output of PSP module. Default: 512.
out_channels (Sequence[int]) – The numbers of output feature channels at each branches. Default: (64, 256, 256).
pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module. Default: (1, 2, 3, 6).
conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’).
act_cfg (dict) – Dictionary to construct and config act layer. Default: dict(type=’ReLU’).
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.MAE(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, num_fcs=2, norm_eval=False, pretrained=None, init_values=0.1, init_cfg=None)[source]¶
VisionTransformer with support for patch.
- Parameters
img_size (int | tuple) – Input image size. Default: 224.
patch_size (int) – The patch size. Default: 16.
in_channels (int) – Number of input channels. Default: 3.
embed_dims (int) – embedding dimension. Default: 768.
num_layers (int) – depth of transformer. Default: 12.
num_heads (int) – number of attention heads. Default: 12.
mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.
out_indices (list | tuple | int) – Output from which stages. Default: -1.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0
drop_path_rate (float) – stochastic depth rate. Default 0.0.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.
final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.
num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
pretrained (str, optional) – model pretrained path. Default: None.
init_values (float) – Initialize the values of Attention and FFN with learnable scaling. Defaults to 0.1.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- fix_init_weight()[source]¶
Rescale the initialization according to layer id.
This function is copied from https://github.com/microsoft/unilm/blob/master/beit/modeling_pretrain.py. # noqa: E501 Copyright (c) Microsoft Corporation Licensed under the MIT License
- forward(inputs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.MSCAN(in_channels=3, embed_dims=[64, 128, 256, 512], mlp_ratios=[4, 4, 4, 4], drop_rate=0.0, drop_path_rate=0.0, depths=[3, 4, 6, 3], num_stages=4, attention_kernel_sizes=[5, [1, 7], [1, 11], [1, 21]], attention_kernel_paddings=[2, [0, 3], [0, 5], [0, 10]], act_cfg={'type': 'GELU'}, norm_cfg={'requires_grad': True, 'type': 'SyncBN'}, pretrained=None, init_cfg=None)[source]¶
SegNeXt Multi-Scale Convolutional Attention Network (MCSAN) backbone.
This backbone is the implementation of SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. Inspiration from https://github.com/visual-attention-network/segnext.
- Parameters
in_channels (int) – The number of input channels. Defaults: 3.
embed_dims (list[int]) – Embedding dimension. Defaults: [64, 128, 256, 512].
mlp_ratios (list[int]) – Ratio of mlp hidden dim to embedding dim. Defaults: [4, 4, 4, 4].
drop_rate (float) – Dropout rate. Defaults: 0.
drop_path_rate (float) – Stochastic depth rate. Defaults: 0.
depths (list[int]) – Depths of each Swin Transformer stage. Default: [3, 4, 6, 3].
num_stages (int) – MSCAN stages. Default: 4.
attention_kernel_sizes (list) – Size of attention kernel in Attention Module (Figure 2(b) of original paper). Defaults: [5, [1, 7], [1, 11], [1, 21]].
attention_kernel_paddings (list) – Size of attention paddings in Attention Module (Figure 2(b) of original paper). Defaults: [2, [0, 3], [0, 5], [0, 10]].
norm_cfg (dict) – Config of norm layers. Defaults: dict(type=’SyncBN’, requires_grad=True).
pretrained (str, optional) – model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- class mmseg.models.backbones.MixVisionTransformer(in_channels=3, embed_dims=64, num_stages=4, num_layers=[3, 4, 6, 3], num_heads=[1, 2, 4, 8], patch_sizes=[7, 3, 3, 3], strides=[4, 2, 2, 2], sr_ratios=[8, 4, 2, 1], out_indices=(0, 1, 2, 3), mlp_ratio=4, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, act_cfg={'type': 'GELU'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, pretrained=None, init_cfg=None, with_cp=False)[source]¶
The backbone of Segformer.
This backbone is the implementation of SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. :param in_channels: Number of input channels. Default: 3. :type in_channels: int :param embed_dims: Embedding dimension. Default: 768. :type embed_dims: int :param num_stags: The num of stages. Default: 4. :type num_stags: int :param num_layers: The layer number of each transformer encode
layer. Default: [3, 4, 6, 3].
- Parameters
num_heads (Sequence[int]) – The attention heads of each transformer encode layer. Default: [1, 2, 4, 8].
patch_sizes (Sequence[int]) – The patch_size of each overlapped patch embedding. Default: [7, 3, 3, 3].
strides (Sequence[int]) – The stride of each overlapped patch embedding. Default: [4, 2, 2, 2].
sr_ratios (Sequence[int]) – The spatial reduction rate of each transformer encode layer. Default: [8, 4, 2, 1].
out_indices (Sequence[int] | int) – Output from which stages. Default: (0, 1, 2, 3).
mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.
qkv_bias (bool) – Enable bias for qkv if True. Default: True.
drop_rate (float) – Probability of an element to be zeroed. Default 0.0
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0
drop_path_rate (float) – stochastic depth rate. Default 0.0
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
pretrained (str, optional) – model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.MobileNetV2(widen_factor=1.0, strides=(1, 2, 2, 2, 1, 2, 1), dilations=(1, 1, 1, 1, 1, 1, 1), out_indices=(1, 2, 4, 6), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]¶
MobileNetV2 backbone.
This backbone is the implementation of MobileNetV2: Inverted Residuals and Linear Bottlenecks.
- Parameters
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.
strides (Sequence[int], optional) – Strides of the first block of each layer. If not specified, default config in
arch_setting
will be used.dilations (Sequence[int]) – Dilation of each layer.
out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- make_layer(out_channels, num_blocks, stride, dilation, expand_ratio)[source]¶
Stack InvertedResidual blocks to build a layer for MobileNetV2.
- Parameters
out_channels (int) – out_channels of block.
num_blocks (int) – Number of blocks.
stride (int) – Stride of the first block.
dilation (int) – Dilation of the first block.
expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio.
- train(mode=True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
- class mmseg.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'type': 'BN'}, out_indices=(0, 1, 12), frozen_stages=- 1, reduction_factor=1, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]¶
MobileNetV3 backbone.
This backbone is the improved implementation of Searching for MobileNetV3.
- Parameters
arch (str) – Architecture of mobilnetv3, from {‘small’, ‘large’}. Default: ‘small’.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
out_indices (tuple[int]) – Output from which layer. Default: (0, 1, 12).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- train(mode=True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
- class mmseg.models.backbones.PCPVT(in_channels=3, embed_dims=[64, 128, 256, 512], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], out_indices=(0, 1, 2, 3), qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], norm_after_stage=False, pretrained=None, init_cfg=None)[source]¶
The backbone of Twins-PCPVT.
This backbone is the implementation of Twins: Revisiting the Design of Spatial Attention in Vision Transformers.
- Parameters
in_channels (int) – Number of input channels. Default: 3.
embed_dims (list) – Embedding dimension. Default: [64, 128, 256, 512].
patch_sizes (list) – The patch sizes. Default: [4, 2, 2, 2].
strides (list) – The strides. Default: [4, 2, 2, 2].
num_heads (int) – Number of attention heads. Default: [1, 2, 4, 8].
mlp_ratios (int) – Ratio of mlp hidden dim to embedding dim. Default: [4, 4, 4, 4].
out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).
qkv_bias (bool) – Enable bias for qkv if True. Default: False.
drop_rate (float) – Probability of an element to be zeroed. Default 0.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0
drop_path_rate (float) – Stochastic depth rate. Default 0.0
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
depths (list) – Depths of each stage. Default [3, 4, 6, 3]
sr_ratios (list) – Kernel_size of conv in each Attn module in Transformer encoder layer. Default: [8, 4, 2, 1].
norm_after_stage(bool) – Add extra norm. Default False.
init_cfg (dict, optional) – The Config for initialization. Defaults to None.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.PIDNet(in_channels: int = 3, channels: int = 64, ppm_channels: int = 96, num_stem_blocks: int = 2, num_branch_blocks: int = 3, align_corners: bool = False, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, **kwargs)[source]¶
PIDNet backbone.
This backbone is the implementation of PIDNet: A Real-time Semantic Segmentation Network Inspired from PID Controller. Modified from https://github.com/XuJiacong/PIDNet.
Licensed under the MIT License.
- Parameters
in_channels (int) – The number of input channels. Default: 3.
channels (int) – The number of channels in the stem layer. Default: 64.
ppm_channels (int) – The number of channels in the PPM layer. Default: 96.
num_stem_blocks (int) – The number of blocks in the stem layer. Default: 2.
num_branch_blocks (int) – The number of blocks in the branch layer. Default: 3.
align_corners (bool) – The align_corners argument of F.interpolate. Default: False.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’, inplace=True).
init_cfg (dict) – Config dict for initialization. Default: None.
- class mmseg.models.backbones.ResNeSt(groups=1, base_width=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[source]¶
ResNeSt backbone.
This backbone is the implementation of ResNeSt: Split-Attention Networks.
- Parameters
groups (int) – Number of groups of Bottleneck. Default: 1
base_width (int) – Base width of Bottleneck. Default: 4
radix (int) – Radix of SpltAtConv2d. Default: 2
reduction_factor (int) – Reduction factor of inter_channels in SplitAttentionConv2d. Default: 4.
avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.
kwargs (dict) – Keyword arguments for ResNet.
- class mmseg.models.backbones.ResNeXt(groups=1, base_width=4, **kwargs)[source]¶
ResNeXt backbone.
This backbone is the implementation of Aggregated Residual Transformations for Deep Neural Networks.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Normally 3.
num_stages (int) – Resnet stages, normally 4.
groups (int) – Group of resnext.
base_width (int) – Base width of resnext.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.
Example
>>> from mmseg.models import ResNeXt >>> import torch >>> self = ResNeXt(depth=50) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 256, 8, 8) (1, 512, 4, 4) (1, 1024, 2, 2) (1, 2048, 1, 1)
- class mmseg.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, multi_grid=None, contract_dilation=False, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[source]¶
ResNet backbone.
This backbone is the improved implementation of Deep Residual Learning for Image Recognition.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Number of stem channels. Default: 64.
base_channels (int) – Number of base channels of res layer. Default: 64.
num_stages (int) – Resnet stages, normally 4. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: ‘pytorch’.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – Dictionary to construct and config conv layer. When conv_cfg is None, cfg will be set to dict(type=’Conv2d’). Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
dcn (dict | None) – Dictionary to construct and config DCN conv layer. When dcn is not None, conv_cfg must be None. Default: None.
stage_with_dcn (Sequence[bool]) – Whether to set DCN conv for each stage. The length of stage_with_dcn is equal to num_stages. Default: (False, False, False, False).
plugins (list[dict]) –
List of plugins for stages, each dict contains:
cfg (dict, required): Cfg dict to build plugin.
position (str, required): Position inside block to insert plugin,
options: ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.
stages (tuple[bool], optional): Stages to apply plugin, length
should be same as ‘num_stages’. Default: None.
multi_grid (Sequence[int]|None) – Multi grid dilation rates of last stage. Default: None.
contract_dilation (bool) – Whether contract first dilation of each layer Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
pretrained (str, optional) – model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
Example
>>> from mmseg.models import ResNet >>> import torch >>> self = ResNet(depth=18) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 64, 8, 8) (1, 128, 4, 4) (1, 256, 2, 2) (1, 512, 1, 1)
- make_stage_plugins(plugins, stage_idx)[source]¶
make plugins for ResNet ‘stage_idx’th stage .
Currently we support to insert ‘context_block’, ‘empirical_attention_block’, ‘nonlocal_block’ into the backbone like ResNet/ResNeXt. They could be inserted after conv1/conv2/conv3 of Bottleneck.
An example of plugins format could be : >>> plugins=[ … dict(cfg=dict(type=’xxx’, arg1=’xxx’), … stages=(False, True, True, True), … position=’after_conv2’), … dict(cfg=dict(type=’yyy’), … stages=(True, True, True, True), … position=’after_conv3’), … dict(cfg=dict(type=’zzz’, postfix=’1’), … stages=(True, True, True, True), … position=’after_conv3’), … dict(cfg=dict(type=’zzz’, postfix=’2’), … stages=(True, True, True, True), … position=’after_conv3’) … ] >>> self = ResNet(depth=18) >>> stage_plugins = self.make_stage_plugins(plugins, 0) >>> assert len(stage_plugins) == 3
- Suppose ‘stage_idx=0’, the structure of blocks in the stage would be:
conv1-> conv2->conv3->yyy->zzz1->zzz2
- Suppose ‘stage_idx=1’, the structure of blocks in the stage would be:
conv1-> conv2->xxx->conv3->yyy->zzz1->zzz2
If stages is missing, the plugin would be applied to all stages.
- Parameters
plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.
stage_idx (int) – Index of stage to build
- Returns
Plugins for current stage
- Return type
list[dict]
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- class mmseg.models.backbones.ResNetV1c(**kwargs)[source]¶
ResNetV1c variant described in [1]_.
Compared with default ResNet(ResNetV1b), ResNetV1c replaces the 7x7 conv in the input stem with three 3x3 convs. For more details please refer to Bag of Tricks for Image Classification with Convolutional Neural Networks.
- class mmseg.models.backbones.ResNetV1d(**kwargs)[source]¶
ResNetV1d variant described in [1]_.
Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.
- class mmseg.models.backbones.STDCContextPathNet(backbone_cfg, last_in_channels=(1024, 512), out_channels=128, ffm_cfg={'in_channels': 512, 'out_channels': 256, 'scale_factor': 4}, upsample_mode='nearest', align_corners=None, norm_cfg={'type': 'BN'}, init_cfg=None)[source]¶
STDCNet with Context Path. The outs below is a list of three feature maps from deep to shallow, whose height and width is from small to big, respectively. The biggest feature map of outs is outputted for STDCHead, where Detail Loss would be calculated by Detail Ground-truth. The other two feature maps are used for Attention Refinement Module, respectively. Besides, the biggest feature map of outs and the last output of Attention Refinement Module are concatenated for Feature Fusion Module. Then, this fusion feature map feat_fuse would be outputted for decode_head. More details please refer to Figure 4 of original paper.
- Parameters
backbone_cfg (dict) – Config dict for stdc backbone.
last_in_channels (tuple(int)) – two feature maps from stdc backbone. Default: (1024, 512).
out_channels (int) – The channels of output feature maps. Default: 128.
ffm_cfg (dict) – Config dict for Feature Fusion Module. Default: dict(in_channels=512, out_channels=256, scale_factor=4).
upsample_mode (str) – Algorithm used for upsampling:
'nearest'
|'linear'
|'bilinear'
|'bicubic'
|'trilinear'
. Default:'nearest'
.align_corners (str) – align_corners argument of F.interpolate. It must be None if upsample_mode is
'nearest'
. Default: None.norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- Returns
- The tuple of list of output feature map for
auxiliary heads and decoder head.
- Return type
outputs (tuple)
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.STDCNet(stdc_type, in_channels, channels, bottleneck_type, norm_cfg, act_cfg, num_convs=4, with_final_conv=False, pretrained=None, init_cfg=None)[source]¶
This backbone is the implementation of Rethinking BiSeNet For Real-time Semantic Segmentation.
- Parameters
stdc_type (int) – The type of backbone structure, STDCNet1 and`STDCNet2` denotes two main backbones in paper, whose FLOPs is 813M and 1446M, respectively.
in_channels (int) – The num of input_channels.
channels (tuple[int]) – The output channels for each stage.
bottleneck_type (str) – The type of STDC Module type, the value must be ‘add’ or ‘cat’.
norm_cfg (dict) – Config dict for normalization layer.
act_cfg (dict) – The activation config for conv layers.
num_convs (int) – Numbers of conv layer at each STDC Module. Default: 4.
with_final_conv (bool) – Whether add a conv layer at the Module output. Default: True.
pretrained (str, optional) – Model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
Example
>>> import torch >>> stdc_type = 'STDCNet1' >>> in_channels = 3 >>> channels = (32, 64, 256, 512, 1024) >>> bottleneck_type = 'cat' >>> inputs = torch.rand(1, 3, 1024, 2048) >>> self = STDCNet(stdc_type, in_channels, ... channels, bottleneck_type).eval() >>> outputs = self.forward(inputs) >>> for i in range(len(outputs)): ... print(f'outputs[{i}].shape = {outputs[i].shape}') outputs[0].shape = torch.Size([1, 256, 128, 256]) outputs[1].shape = torch.Size([1, 512, 64, 128]) outputs[2].shape = torch.Size([1, 1024, 32, 64])
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.SVT(in_channels=3, embed_dims=[64, 128, 256], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], num_heads=[1, 2, 4], mlp_ratios=[4, 4, 4], out_indices=(0, 1, 2, 3), qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.2, norm_cfg={'type': 'LN'}, depths=[4, 4, 4], sr_ratios=[4, 2, 1], windiow_sizes=[7, 7, 7], norm_after_stage=True, pretrained=None, init_cfg=None)[source]¶
The backbone of Twins-SVT.
This backbone is the implementation of Twins: Revisiting the Design of Spatial Attention in Vision Transformers.
- Parameters
in_channels (int) – Number of input channels. Default: 3.
embed_dims (list) – Embedding dimension. Default: [64, 128, 256, 512].
patch_sizes (list) – The patch sizes. Default: [4, 2, 2, 2].
strides (list) – The strides. Default: [4, 2, 2, 2].
num_heads (int) – Number of attention heads. Default: [1, 2, 4].
mlp_ratios (int) – Ratio of mlp hidden dim to embedding dim. Default: [4, 4, 4].
out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).
qkv_bias (bool) – Enable bias for qkv if True. Default: False.
drop_rate (float) – Dropout rate. Default 0.
attn_drop_rate (float) – Dropout ratio of attention weight. Default 0.0
drop_path_rate (float) – Stochastic depth rate. Default 0.2.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
depths (list) – Depths of each stage. Default [4, 4, 4].
sr_ratios (list) – Kernel_size of conv in each Attn module in Transformer encoder layer. Default: [4, 2, 1].
windiow_sizes (list) – Window size of LSA. Default: [7, 7, 7],
input_features_slice(bool) – Input features need slice. Default: False.
norm_after_stage(bool) – Add extra norm. Default False.
strides – Strides in patch-Embedding modules. Default: (2, 2, 2)
init_cfg (dict, optional) – The Config for initialization. Defaults to None.
- class mmseg.models.backbones.SwinTransformer(pretrain_img_size=224, in_channels=3, embed_dims=96, patch_size=4, window_size=7, mlp_ratio=4, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), strides=(4, 2, 2, 2), out_indices=(0, 1, 2, 3), qkv_bias=True, qk_scale=None, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=False, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, with_cp=False, pretrained=None, frozen_stages=- 1, init_cfg=None)[source]¶
Swin Transformer backbone.
This backbone is the implementation of Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Inspiration from https://github.com/microsoft/Swin-Transformer.
- Parameters
pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.
in_channels (int) – The num of input channels. Defaults: 3.
embed_dims (int) – The feature dimension. Default: 96.
patch_size (int | tuple[int]) – Patch size. Default: 4.
window_size (int) – Window size. Default: 7.
mlp_ratio (int | float) – Ratio of mlp hidden dim to embedding dim. Default: 4.
depths (tuple[int]) – Depths of each Swin Transformer stage. Default: (2, 2, 6, 2).
num_heads (tuple[int]) – Parallel attention heads of each Swin Transformer stage. Default: (3, 6, 12, 24).
strides (tuple[int]) – The patch merging or patch embedding stride of each Swin Transformer stage. (In swin, we set kernel size equal to stride.) Default: (4, 2, 2, 2).
out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).
qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.
patch_norm (bool) – If add a norm layer for patch embed and patch merging. Default: True.
drop_rate (float) – Dropout rate. Defaults: 0.
attn_drop_rate (float) – Attention dropout rate. Default: 0.
drop_path_rate (float) – Stochastic depth rate. Defaults: 0.1.
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: False.
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LN’).
norm_cfg (dict) – Config dict for normalization layer at output of backone. Defaults: dict(type=’LN’).
with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.
init_cfg (dict, optional) – The Config for initialization. Defaults to None.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.TIMMBackbone(model_name, features_only=True, pretrained=True, checkpoint_path='', in_channels=3, init_cfg=None, **kwargs)[source]¶
Wrapper to use backbones from timm library. More details can be found in timm .
- Parameters
model_name (str) – Name of timm model to instantiate.
pretrained (bool) – Load pretrained weights if True.
checkpoint_path (str) – Path of checkpoint to load after model is initialized.
in_channels (int) – Number of input image channels. Default: 3.
init_cfg (dict, optional) – Initialization config dict
**kwargs – Other timm & model specific arguments.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.UNet(in_channels=3, base_channels=64, num_stages=5, strides=(1, 1, 1, 1, 1), enc_num_convs=(2, 2, 2, 2, 2), dec_num_convs=(2, 2, 2, 2), downsamples=(True, True, True, True), enc_dilations=(1, 1, 1, 1, 1), dec_dilations=(1, 1, 1, 1), with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, upsample_cfg={'type': 'InterpConv'}, norm_eval=False, dcn=None, plugins=None, pretrained=None, init_cfg=None)[source]¶
UNet backbone.
This backbone is the implementation of U-Net: Convolutional Networks for Biomedical Image Segmentation.
- Parameters
in_channels (int) – Number of input image channels. Default” 3.
base_channels (int) – Number of base channels of each stage. The output channels of the first stage. Default: 64.
num_stages (int) – Number of stages in encoder, normally 5. Default: 5.
strides (Sequence[int 1 | 2]) – Strides of each stage in encoder. len(strides) is equal to num_stages. Normally the stride of the first stage in encoder is 1. If strides[i]=2, it uses stride convolution to downsample in the correspondence encoder stage. Default: (1, 1, 1, 1, 1).
enc_num_convs (Sequence[int]) – Number of convolutional layers in the convolution block of the correspondence encoder stage. Default: (2, 2, 2, 2, 2).
dec_num_convs (Sequence[int]) – Number of convolutional layers in the convolution block of the correspondence decoder stage. Default: (2, 2, 2, 2).
downsamples (Sequence[int]) – Whether use MaxPool to downsample the feature map after the first stage of encoder (stages: [1, num_stages)). If the correspondence encoder stage use stride convolution (strides[i]=2), it will never use MaxPool to downsample, even downsamples[i-1]=True. Default: (True, True, True, True).
enc_dilations (Sequence[int]) – Dilation rate of each stage in encoder. Default: (1, 1, 1, 1, 1).
dec_dilations (Sequence[int]) – Dilation rate of each stage in decoder. Default: (1, 1, 1, 1).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
conv_cfg (dict | None) – Config dict for convolution layer. Default: None.
norm_cfg (dict | None) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict | None) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’).
upsample_cfg (dict) – The upsample config of the upsample module in decoder. Default: dict(type=’InterpConv’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
dcn (bool) – Use deformable convolution in convolutional layer or not. Default: None.
plugins (dict) – plugins for convolutional layers. Default: None.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- Notice:
The input image size should be divisible by the whole downsample rate of the encoder. More detail of the whole downsample rate can be found in UNet._check_input_divisible.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.VPD(diffusion_cfg: Union[mmengine.config.config.ConfigDict, dict], class_embed_path: str, unet_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {}, gamma: float = 0.0001, class_embed_select=False, pad_shape: Optional[Union[int, List[int]]] = None, pad_val: Union[int, List[int]] = 0, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
VPD (Visual Perception Diffusion) model.
- Parameters
diffusion_cfg (dict) – Configuration for diffusion model.
class_embed_path (str) – Path for class embeddings.
unet_cfg (dict, optional) – Configuration for U-Net.
gamma (float, optional) – Gamma for text adaptation. Defaults to 1e-4.
class_embed_select (bool, optional) – If True, enables class embedding selection. Defaults to False.
pad_shape (Optional[Union[int, List[int]]], optional) – Padding shape. Defaults to None.
pad_val (Union[int, List[int]], optional) – Padding value. Defaults to 0.
init_cfg (dict, optional) – Configuration for network initialization.
- class mmseg.models.backbones.VisionTransformer(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, with_cls_token=True, output_cls_token=False, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, interpolate_mode='bicubic', num_fcs=2, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]¶
Vision Transformer.
This backbone is the implementation of An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
- Parameters
img_size (int | tuple) – Input image size. Default: 224.
patch_size (int) – The patch size. Default: 16.
in_channels (int) – Number of input channels. Default: 3.
embed_dims (int) – embedding dimension. Default: 768.
num_layers (int) – depth of transformer. Default: 12.
num_heads (int) – number of attention heads. Default: 12.
mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.
out_indices (list | tuple | int) – Output from which stages. Default: -1.
qkv_bias (bool) – enable bias for qkv if True. Default: True.
drop_rate (float) – Probability of an element to be zeroed. Default 0.0
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0
drop_path_rate (float) – stochastic depth rate. Default 0.0
with_cls_token (bool) – Whether concatenating class token into image tokens as transformer input. Default: True.
output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Default: False.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.
final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.
interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Default: bicubic.
num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- forward(inputs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- static resize_pos_embed(pos_embed, input_shpae, pos_shape, mode)[source]¶
Resize pos_embed weights.
Resize pos_embed using bicubic interpolate method. :param pos_embed: Position embedding weights. :type pos_embed: torch.Tensor :param input_shpae: Tuple for (downsampled input image height,
downsampled input image width).
- Parameters
pos_shape (tuple) – The resolution of downsampled origin training image.
mode (str) – Algorithm used for upsampling:
'nearest'
|'linear'
|'bilinear'
|'bicubic'
|'trilinear'
. Default:'nearest'
- Returns
The resized pos_embed of shape [B, L_new, C]
- Return type
torch.Tensor
- train(mode=True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
decode_heads¶
- class mmseg.models.decode_heads.ANNHead(project_channels, query_scales=(1), key_pool_scales=(1, 3, 6, 8), **kwargs)[source]¶
Asymmetric Non-local Neural Networks for Semantic Segmentation.
This head is the implementation of ANNNet.
- Parameters
project_channels (int) – Projection channels for Nonlocal.
query_scales (tuple[int]) – The scales of query feature map. Default: (1,)
key_pool_scales (tuple[int]) – The pooling scales of key feature map. Default: (1, 3, 6, 8).
- class mmseg.models.decode_heads.APCHead(pool_scales=(1, 2, 3, 6), fusion=True, **kwargs)[source]¶
Adaptive Pyramid Context Network for Semantic Segmentation.
This head is the implementation of APCNet.
- Parameters
pool_scales (tuple[int]) – Pooling scales used in Adaptive Context Module. Default: (1, 2, 3, 6).
fusion (bool) – Add one conv to fuse residual feature.
- class mmseg.models.decode_heads.ASPPHead(dilations=(1, 6, 12, 18), **kwargs)[source]¶
Rethinking Atrous Convolution for Semantic Image Segmentation.
This head is the implementation of DeepLabV3.
- Parameters
dilations (tuple[int]) – Dilation rates for ASPP module. Default: (1, 6, 12, 18).
- class mmseg.models.decode_heads.CCHead(recurrence=2, **kwargs)[source]¶
CCNet: Criss-Cross Attention for Semantic Segmentation.
This head is the implementation of CCNet.
- Parameters
recurrence (int) – Number of recurrence of Criss Cross Attention module. Default: 2.
- class mmseg.models.decode_heads.DAHead(pam_channels, **kwargs)[source]¶
Dual Attention Network for Scene Segmentation.
This head is the implementation of DANet.
- Parameters
pam_channels (int) – The channels of Position Attention Module(PAM).
- loss_by_feat(seg_logit: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], **kwargs) → dict[source]¶
Compute
pam_cam
,pam
,cam
loss.
- class mmseg.models.decode_heads.DDRHead(in_channels: int, channels: int, num_classes: int, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, **kwargs)[source]¶
Decode head for DDRNet.
- Parameters
in_channels (int) – Number of input channels.
channels (int) – Number of output channels.
num_classes (int) – Number of classes.
norm_cfg (dict, optional) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict, optional) – Config dict for activation layer. Default: dict(type=’ReLU’, inplace=True).
- forward(inputs: Union[torch.Tensor, Tuple[torch.Tensor]]) → Union[torch.Tensor, Tuple[torch.Tensor]][source]¶
Placeholder of forward function.
- loss_by_feat(seg_logits: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample]) → dict[source]¶
Compute segmentation loss.
- Parameters
seg_logits (Tensor) – The output from decode head forward function.
batch_data_samples (List[
SegDataSample
]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.
- Returns
a dictionary of loss components
- Return type
dict[str, Tensor]
- class mmseg.models.decode_heads.DMHead(filter_sizes=(1, 3, 5, 7), fusion=False, **kwargs)[source]¶
Dynamic Multi-scale Filters for Semantic Segmentation.
This head is the implementation of DMNet.
- Parameters
filter_sizes (tuple[int]) – The size of generated convolutional filters used in Dynamic Convolutional Module. Default: (1, 3, 5, 7).
fusion (bool) – Add one conv to fuse DCM output feature.
- class mmseg.models.decode_heads.DNLHead(reduction=2, use_scale=True, mode='embedded_gaussian', temperature=0.05, **kwargs)[source]¶
Disentangled Non-Local Neural Networks.
This head is the implementation of DNLNet.
- Parameters
reduction (int) – Reduction factor of projection transform. Default: 2.
use_scale (bool) – Whether to scale pairwise_weight by sqrt(1/inter_channels). Default: False.
mode (str) – The nonlocal mode. Options are ‘embedded_gaussian’, ‘dot_product’. Default: ‘embedded_gaussian.’.
temperature (float) – Temperature to adjust attention. Default: 0.05
- class mmseg.models.decode_heads.DPTHead(embed_dims=768, post_process_channels=[96, 192, 384, 768], readout_type='ignore', patch_size=16, expand_channels=False, act_cfg={'type': 'ReLU'}, norm_cfg={'type': 'BN'}, **kwargs)[source]¶
Vision Transformers for Dense Prediction.
This head is implemented of DPT.
- Parameters
embed_dims (int) – The embed dimension of the ViT backbone. Default: 768.
post_process_channels (List) – Out channels of post process conv layers. Default: [96, 192, 384, 768].
readout_type (str) – Type of readout operation. Default: ‘ignore’.
patch_size (int) – The patch size. Default: 16.
expand_channels (bool) – Whether expand the channels in post process block. Default: False.
act_cfg (dict) – The activation config for residual conv unit. Default dict(type=’ReLU’).
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
- class mmseg.models.decode_heads.DepthwiseSeparableASPPHead(c1_in_channels, c1_channels, **kwargs)[source]¶
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.
This head is the implementation of DeepLabV3+.
- Parameters
c1_in_channels (int) – The input channels of c1 decoder. If is 0, the no decoder will be used.
c1_channels (int) – The intermediate channels of c1 decoder.
- class mmseg.models.decode_heads.DepthwiseSeparableFCNHead(dw_act_cfg=None, **kwargs)[source]¶
Depthwise-Separable Fully Convolutional Network for Semantic Segmentation.
This head is implemented according to Fast-SCNN: Fast Semantic Segmentation Network.
- Parameters
in_channels (int) – Number of output channels of FFM.
channels (int) – Number of middle-stage channels in the decode head.
concat_input (bool) – Whether to concatenate original decode input into the result of several consecutive convolution layers. Default: True.
num_classes (int) – Used to determine the dimension of final prediction tensor.
in_index (int) – Correspond with ‘out_indices’ in FastSCNN backbone.
norm_cfg (dict | None) – Config of norm layers.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
loss_decode (dict) – Config of loss type and some relevant additional options.
dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: None.
- class mmseg.models.decode_heads.EMAHead(ema_channels, num_bases, num_stages, concat_input=True, momentum=0.1, **kwargs)[source]¶
Expectation Maximization Attention Networks for Semantic Segmentation.
This head is the implementation of EMANet.
- Parameters
ema_channels (int) – EMA module channels
num_bases (int) – Number of bases.
num_stages (int) – Number of the EM iterations.
concat_input (bool) – Whether concat the input and output of convs before classification layer. Default: True
momentum (float) – Momentum to update the base. Default: 0.1.
- class mmseg.models.decode_heads.EncHead(num_codes=32, use_se_loss=True, add_lateral=False, loss_se_decode={'loss_weight': 0.2, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, **kwargs)[source]¶
Context Encoding for Semantic Segmentation.
This head is the implementation of EncNet.
- Parameters
num_codes (int) – Number of code words. Default: 32.
use_se_loss (bool) – Whether use Semantic Encoding Loss (SE-loss) to regularize the training. Default: True.
add_lateral (bool) – Whether use lateral connection to fuse features. Default: False.
loss_se_decode (dict) – Config of decode loss. Default: dict(type=’CrossEntropyLoss’, use_sigmoid=True).
- loss_by_feat(seg_logit: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], **kwargs) → dict[source]¶
Compute segmentation and semantic encoding loss.
- class mmseg.models.decode_heads.FCNHead(num_convs=2, kernel_size=3, concat_input=True, dilation=1, **kwargs)[source]¶
Fully Convolution Networks for Semantic Segmentation.
This head is implemented of FCNNet.
- Parameters
num_convs (int) – Number of convs in the head. Default: 2.
kernel_size (int) – The kernel size for convs in the head. Default: 3.
concat_input (bool) – Whether concat the input and output of convs before classification layer.
dilation (int) – The dilation rate for convs in the head. Default: 1.
- class mmseg.models.decode_heads.FPNHead(feature_strides, **kwargs)[source]¶
Panoptic Feature Pyramid Networks.
This head is the implementation of Semantic FPN.
- Parameters
feature_strides (tuple[int]) – The strides for input feature maps. stack_lateral. All strides suppose to be power of 2. The first one is of largest resolution.
- class mmseg.models.decode_heads.GCHead(ratio=0.25, pooling_type='att', fusion_types=('channel_add'), **kwargs)[source]¶
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond.
This head is the implementation of GCNet.
- Parameters
ratio (float) – Multiplier of channels ratio. Default: 1/4.
pooling_type (str) – The pooling type of context aggregation. Options are ‘att’, ‘avg’. Default: ‘avg’.
fusion_types (tuple[str]) – The fusion type for feature fusion. Options are ‘channel_add’, ‘channel_mul’. Default: (‘channel_add’,)
- class mmseg.models.decode_heads.ISAHead(isa_channels, down_factor=(8, 8), **kwargs)[source]¶
Interlaced Sparse Self-Attention for Semantic Segmentation.
This head is the implementation of ISA.
- Parameters
isa_channels (int) – The channels of ISA Module.
down_factor (tuple[int]) – The local group size of ISA.
- class mmseg.models.decode_heads.IterativeDecodeHead(num_stages, kernel_generate_head, kernel_update_head, **kwargs)[source]¶
K-Net: Towards Unified Image Segmentation.
This head is the implementation of `K-Net: <https://arxiv.org/abs/2106.14855>`_.
- Parameters
num_stages (int) – The number of stages (kernel update heads) in IterativeDecodeHead. Default: 3.
kernel_generate_head – (dict): Config of kernel generate head which generate mask predictions, dynamic kernels and class predictions for next kernel update heads.
kernel_update_head (dict) – Config of kernel update head which refine dynamic kernels and class predictions iteratively.
- loss_by_feat(seg_logits: List[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], **kwargs) → dict[source]¶
Compute segmentation loss.
- Parameters
seg_logits (Tensor) – The output from decode head forward function.
batch_data_samples (List[
SegDataSample
]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.
- Returns
a dictionary of loss components
- Return type
dict[str, Tensor]
- class mmseg.models.decode_heads.KernelUpdateHead(num_classes=150, num_ffn_fcs=2, num_heads=8, num_mask_fcs=3, feedforward_channels=2048, in_channels=256, out_channels=256, dropout=0.0, act_cfg={'inplace': True, 'type': 'ReLU'}, ffn_act_cfg={'inplace': True, 'type': 'ReLU'}, conv_kernel_size=1, feat_transform_cfg=None, kernel_init=False, with_ffn=True, feat_gather_stride=1, mask_transform_stride=1, kernel_updator_cfg={'act_cfg': {'inplace': True, 'type': 'ReLU'}, 'feat_channels': 64, 'in_channels': 256, 'norm_cfg': {'type': 'LN'}, 'out_channels': 256, 'type': 'DynamicConv'})[source]¶
Kernel Update Head in K-Net.
- Parameters
num_classes (int) – Number of classes. Default: 150.
num_ffn_fcs (int) – The number of fully-connected layers in FFNs. Default: 2.
num_heads (int) – The number of parallel attention heads. Default: 8.
num_mask_fcs (int) – The number of fully connected layers for mask prediction. Default: 3.
feedforward_channels (int) – The hidden dimension of FFNs. Defaults: 2048.
in_channels (int) – The number of channels of input feature map. Default: 256.
out_channels (int) – The number of output channels. Default: 256.
dropout (float) – The Probability of an element to be zeroed in MultiheadAttention and FFN. Default 0.0.
act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).
ffn_act_cfg (dict) – Config of activation layers in FFN. Default: dict(type=’ReLU’).
conv_kernel_size (int) – The kernel size of convolution in Kernel Update Head for dynamic kernel updation. Default: 1.
feat_transform_cfg (dict | None) – Config of feature transform. Default: None.
kernel_init (bool) – Whether initiate mask kernel in mask head. Default: False.
with_ffn (bool) – Whether add FFN in kernel update head. Default: True.
feat_gather_stride (int) – Stride of convolution in feature transform. Default: 1.
mask_transform_stride (int) – Stride of mask transform. Default: 1.
kernel_updator_cfg (dict) –
Config of kernel updator. Default: dict(
type=’DynamicConv’, in_channels=256, feat_channels=64, out_channels=256, act_cfg=dict(type=’ReLU’, inplace=True), norm_cfg=dict(type=’LN’)).
- forward(x, proposal_feat, mask_preds, mask_shape=None)[source]¶
Forward function of Dynamic Instance Interactive Head.
- Parameters
x (Tensor) – Feature map from FPN with shape (batch_size, feature_dimensions, H , W).
proposal_feat (Tensor) – Intermediate feature get from diihead in last stage, has shape (batch_size, num_proposals, feature_dimensions)
mask_preds (Tensor) – mask prediction from the former stage in shape (batch_size, num_proposals, H, W).
- Returns
The first tensor is predicted mask with shape (N, num_classes, H, W), the second tensor is dynamic kernel with shape (N, num_classes, channels, K, K).
- Return type
Tuple
- class mmseg.models.decode_heads.KernelUpdator(in_channels=256, feat_channels=64, out_channels=None, gate_sigmoid=True, gate_norm_act=False, activate_out=False, norm_cfg={'type': 'LN'}, act_cfg={'inplace': True, 'type': 'ReLU'})[source]¶
Dynamic Kernel Updator in Kernel Update Head.
- Parameters
in_channels (int) – The number of channels of input feature map. Default: 256.
feat_channels (int) – The number of middle-stage channels in the kernel updator. Default: 64.
out_channels (int) – The number of output channels.
gate_sigmoid (bool) – Whether use sigmoid function in gate mechanism. Default: True.
gate_norm_act (bool) – Whether add normalization and activation layer in gate mechanism. Default: False.
activate_out – Whether add activation after gate mechanism. Default: False.
norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’LN’).
act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).
- forward(update_feature, input_feature)[source]¶
Forward function of KernelUpdator.
- Parameters
update_feature (torch.Tensor) – Feature map assembled from each group. It would be reshaped with last dimension shape: self.in_channels.
input_feature (torch.Tensor) – Intermediate feature with shape: (N, num_classes, conv_kernel_size**2, channels).
- Returns
The output tensor of shape (N*C1/C2, K*K, C2), where N is the number of classes, C1 and C2 are the feature map channels of KernelUpdateHead and KernelUpdator, respectively.
- Return type
Tensor
- class mmseg.models.decode_heads.LRASPPHead(branch_channels=(32, 64), **kwargs)[source]¶
Lite R-ASPP (LRASPP) head is proposed in Searching for MobileNetV3.
This head is the improved implementation of Searching for MobileNetV3.
- Parameters
branch_channels (tuple[int]) – The number of output channels in every each branch. Default: (32, 64).
- class mmseg.models.decode_heads.LightHamHead(ham_channels=512, ham_kwargs={}, **kwargs)[source]¶
SegNeXt decode head.
This decode head is the implementation of SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. Inspiration from https://github.com/visual-attention-network/segnext.
Specifically, LightHamHead is inspired by HamNet from Is Attention Better Than Matrix Decomposition? <https://arxiv.org/abs/2109.04553>.
- Parameters
ham_channels (int) – input channels for Hamburger. Defaults: 512.
ham_kwargs (int) – kwagrs for Ham. Defaults: dict().
- class mmseg.models.decode_heads.Mask2FormerHead(num_classes, align_corners=False, ignore_index=255, **kwargs)[source]¶
Implements the Mask2Former head.
See Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation for details.
- Parameters
num_classes (int) – Number of classes. Default: 150.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
ignore_index (int) – The label index to be ignored. Default: 255.
- loss(x: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], train_cfg: Union[mmengine.config.config.ConfigDict, dict]) → dict[source]¶
Perform forward propagation and loss calculation of the decoder head on the features of the upstream network.
- Parameters
x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.
batch_data_samples (List[
SegDataSample
]) – The Data Samples. It usually includes information such as gt_sem_seg.train_cfg (ConfigType) – Training config.
- Returns
a dictionary of loss components.
- Return type
dict[str, Tensor]
- predict(x: Tuple[torch.Tensor], batch_img_metas: List[dict], test_cfg: Union[mmengine.config.config.ConfigDict, dict]) → Tuple[torch.Tensor][source]¶
Test without augmentaton.
- Parameters
x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.
batch_img_metas (List[
SegDataSample
]) – The Data Samples. It usually includes information such as gt_sem_seg.test_cfg (ConfigType) – Test config.
- Returns
A tensor of segmentation mask.
- Return type
Tensor
- class mmseg.models.decode_heads.MaskFormerHead(num_classes: int = 150, align_corners: bool = False, ignore_index: int = 255, **kwargs)[source]¶
Implements the MaskFormer head.
See Per-Pixel Classification is Not All You Need for Semantic Segmentation for details.
- Parameters
num_classes (int) – Number of classes. Default: 150.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
ignore_index (int) – The label index to be ignored. Default: 255.
- loss(x: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], train_cfg: Union[mmengine.config.config.ConfigDict, dict]) → dict[source]¶
Perform forward propagation and loss calculation of the decoder head on the features of the upstream network.
- Parameters
x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.
batch_data_samples (List[
SegDataSample
]) – The Data Samples. It usually includes information such as gt_sem_seg.train_cfg (ConfigType) – Training config.
- Returns
a dictionary of loss components.
- Return type
dict[str, Tensor]
- predict(x: Tuple[torch.Tensor], batch_img_metas: List[dict], test_cfg: Union[mmengine.config.config.ConfigDict, dict]) → Tuple[torch.Tensor][source]¶
Test without augmentaton.
- Parameters
x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.
batch_img_metas (List[
SegDataSample
]) – The Data Samples. It usually includes information such as gt_sem_seg.test_cfg (ConfigType) – Test config.
- Returns
A tensor of segmentation mask.
- Return type
Tensor
- class mmseg.models.decode_heads.NLHead(reduction=2, use_scale=True, mode='embedded_gaussian', **kwargs)[source]¶
Non-local Neural Networks.
This head is the implementation of NLNet.
- Parameters
reduction (int) – Reduction factor of projection transform. Default: 2.
use_scale (bool) – Whether to scale pairwise_weight by sqrt(1/inter_channels). Default: True.
mode (str) – The nonlocal mode. Options are ‘embedded_gaussian’, ‘dot_product’. Default: ‘embedded_gaussian.’.
- class mmseg.models.decode_heads.OCRHead(ocr_channels, scale=1, **kwargs)[source]¶
Object-Contextual Representations for Semantic Segmentation.
This head is the implementation of OCRNet.
- Parameters
ocr_channels (int) – The intermediate channels of OCR block.
scale (int) – The scale of probability map in SpatialGatherModule in Default: 1.
- class mmseg.models.decode_heads.PIDHead(in_channels: int, channels: int, num_classes: int, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, **kwargs)[source]¶
Decode head for PIDNet.
- Parameters
in_channels (int) – Number of input channels.
channels (int) – Number of output channels.
num_classes (int) – Number of classes.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’, inplace=True).
- forward(inputs: Union[torch.Tensor, Tuple[torch.Tensor]]) → Union[torch.Tensor, Tuple[torch.Tensor]][source]¶
Forward function. :param inputs: Input tensor or tuple of
Tensor. When training, the input is a tuple of three tensors, (p_feat, i_feat, d_feat), and the output is a tuple of three tensors, (p_seg_logit, i_seg_logit, d_seg_logit). When inference, only the head of integral branch is used, and input is a tensor of integral feature map, and the output is the segmentation logit.
- Returns
Output tensor or tuple of tensors.
- Return type
Tensor | tuple[Tensor]
- loss_by_feat(seg_logits: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample]) → dict[source]¶
Compute segmentation loss.
- Parameters
seg_logits (Tensor) – The output from decode head forward function.
batch_data_samples (List[
SegDataSample
]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.
- Returns
a dictionary of loss components
- Return type
dict[str, Tensor]
- class mmseg.models.decode_heads.PSAHead(mask_size, psa_type='bi-direction', compact=False, shrink_factor=2, normalization_factor=1.0, psa_softmax=True, **kwargs)[source]¶
Point-wise Spatial Attention Network for Scene Parsing.
This head is the implementation of PSANet.
- Parameters
mask_size (tuple[int]) – The PSA mask size. It usually equals input size.
psa_type (str) – The type of psa module. Options are ‘collect’, ‘distribute’, ‘bi-direction’. Default: ‘bi-direction’
compact (bool) – Whether use compact map for ‘collect’ mode. Default: True.
shrink_factor (int) – The downsample factors of psa mask. Default: 2.
normalization_factor (float) – The normalize factor of attention.
psa_softmax (bool) – Whether use softmax for attention.
- class mmseg.models.decode_heads.PSPHead(pool_scales=(1, 2, 3, 6), **kwargs)[source]¶
Pyramid Scene Parsing Network.
This head is the implementation of PSPNet.
- Parameters
pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module. Default: (1, 2, 3, 6).
- class mmseg.models.decode_heads.PointHead(num_fcs=3, coarse_pred_each_layer=True, conv_cfg={'type': 'Conv1d'}, norm_cfg=None, act_cfg={'inplace': False, 'type': 'ReLU'}, **kwargs)[source]¶
A mask point head use in PointRend.
This head is implemented of PointRend: Image Segmentation as Rendering.
PointHead
use shared multi-layer perceptron (equivalent to nn.Conv1d) to predict the logit of input points. The fine-grained feature and coarse feature will be concatenate together for predication.- Parameters
num_fcs (int) – Number of fc layers in the head. Default: 3.
in_channels (int) – Number of input channels. Default: 256.
fc_channels (int) – Number of fc channels. Default: 256.
num_classes (int) – Number of classes for logits. Default: 80.
class_agnostic (bool) – Whether use class agnostic classification. If so, the output channels of logits will be 1. Default: False.
coarse_pred_each_layer (bool) – Whether concatenate coarse feature with the output of each fc layer. Default: True.
conv_cfg (dict|None) – Dictionary to construct and config conv layer. Default: dict(type=’Conv1d’))
norm_cfg (dict|None) – Dictionary to construct and config norm layer. Default: None.
loss_point (dict) – Dictionary to construct and config loss layer of point head. Default: dict(type=’CrossEntropyLoss’, use_mask=True, loss_weight=1.0).
- get_points_test(seg_logits, uncertainty_func, cfg)[source]¶
Sample points for testing.
Find
num_points
most uncertain points fromuncertainty_map
.- Parameters
seg_logits (Tensor) – A tensor of shape (batch_size, num_classes, height, width) for class-specific or class-agnostic prediction.
uncertainty_func (func) – uncertainty calculation function.
cfg (dict) – Testing config of point head.
- Returns
- A tensor of shape (batch_size, num_points)
that contains indices from [0, height x width) of the most uncertain points.
- point_coords (Tensor): A tensor of shape (batch_size, num_points,
2) that contains [0, 1] x [0, 1] normalized coordinates of the most uncertain points from the
height x width
grid .
- Return type
point_indices (Tensor)
- get_points_train(seg_logits, uncertainty_func, cfg)[source]¶
Sample points for training.
Sample points in [0, 1] x [0, 1] coordinate space based on their uncertainty. The uncertainties are calculated for each point using ‘uncertainty_func’ function that takes point’s logit prediction as input.
- Parameters
seg_logits (Tensor) – Semantic segmentation logits, shape ( batch_size, num_classes, height, width).
uncertainty_func (func) – uncertainty calculation function.
cfg (dict) – Training config of point head.
- Returns
- A tensor of shape (batch_size, num_points,
2) that contains the coordinates of
num_points
sampled points.
- Return type
point_coords (Tensor)
- loss(inputs, prev_output, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], train_cfg, **kwargs)[source]¶
Forward function for training. :param inputs: List of multi-level img features. :type inputs: list[Tensor] :param prev_output: The output of previous decode head. :type prev_output: Tensor :param batch_data_samples: The seg
data samples. It usually includes information such as img_metas or gt_semantic_seg.
- Parameters
train_cfg (dict) – The training config.
- Returns
a dictionary of loss components
- Return type
dict[str, Tensor]
- loss_by_feat(point_logits, points, batch_data_samples, **kwargs)[source]¶
Compute segmentation loss.
- predict(inputs, prev_output, batch_img_metas: List[dict], test_cfg, **kwargs)[source]¶
Forward function for testing.
- Parameters
inputs (list[Tensor]) – List of multi-level img features.
prev_output (Tensor) – The output of previous decode head.
img_metas (list[dict]) – List of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:Collect.
test_cfg (dict) – The testing config.
- Returns
Output segmentation map.
- Return type
Tensor
- class mmseg.models.decode_heads.SETRMLAHead(mla_channels=128, up_scale=4, **kwargs)[source]¶
Multi level feature aggretation head of SETR.
MLA head of SETR.
- Parameters
mlahead_channels (int) – Channels of conv-conv-4x of multi-level feature aggregation. Default: 128.
up_scale (int) – The scale factor of interpolate. Default:4.
- class mmseg.models.decode_heads.SETRUPHead(norm_layer={'eps': 1e-06, 'requires_grad': True, 'type': 'LN'}, num_convs=1, up_scale=4, kernel_size=3, init_cfg=[{'type': 'Constant', 'val': 1.0, 'bias': 0, 'layer': 'LayerNorm'}, {'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}}], **kwargs)[source]¶
Naive upsampling head and Progressive upsampling head of SETR.
Naive or PUP head of SETR.
- Parameters
norm_layer (dict) – Config dict for input normalization. Default: norm_layer=dict(type=’LN’, eps=1e-6, requires_grad=True).
num_convs (int) – Number of decoder convolutions. Default: 1.
up_scale (int) – The scale factor of interpolate. Default:4.
kernel_size (int) – The kernel size of convolution when decoding feature information from backbone. Default: 3.
init_cfg (dict | list[dict] | None) –
Initialization config dict. Default: dict(
type=’Constant’, val=1.0, bias=0, layer=’LayerNorm’).
- class mmseg.models.decode_heads.STDCHead(boundary_threshold=0.1, **kwargs)[source]¶
This head is the implementation of Rethinking BiSeNet For Real-time Semantic Segmentation.
- Parameters
boundary_threshold (float) – The threshold of calculating boundary. Default: 0.1.
- loss_by_feat(seg_logits: torch.Tensor, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample]) → dict[source]¶
Compute Detail Aggregation Loss.
- class mmseg.models.decode_heads.SegformerHead(interpolate_mode='bilinear', **kwargs)[source]¶
The all mlp Head of segformer.
This head is the implementation of Segformer <https://arxiv.org/abs/2105.15203> _.
- Parameters
interpolate_mode – The interpolate mode of MLP head upsample operation. Default: ‘bilinear’.
- class mmseg.models.decode_heads.SegmenterMaskTransformerHead(in_channels, num_layers, num_heads, embed_dims, mlp_ratio=4, drop_path_rate=0.1, drop_rate=0.0, attn_drop_rate=0.0, num_fcs=2, qkv_bias=True, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, init_std=0.02, **kwargs)[source]¶
Segmenter: Transformer for Semantic Segmentation.
This head is the implementation of Segmenter:.
- Parameters
backbone_cfg – (dict): Config of backbone of Context Path.
in_channels (int) – The number of channels of input image.
num_layers (int) – The depth of transformer.
num_heads (int) – The number of attention heads.
embed_dims (int) – The number of embedding dimension.
mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.
drop_path_rate (float) – stochastic depth rate. Default 0.1.
drop_rate (float) – Probability of an element to be zeroed. Default 0.0
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0
num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.
qkv_bias (bool) – Enable bias for qkv if True. Default: True.
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
init_std (float) – The value of std in weight initialization. Default: 0.02.
- class mmseg.models.decode_heads.UPerHead(pool_scales=(1, 2, 3, 6), **kwargs)[source]¶
Unified Perceptual Parsing for Scene Understanding.
This head is the implementation of UPerNet.
- Parameters
pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module applied on the last feature. Default: (1, 2, 3, 6).
- class mmseg.models.decode_heads.VPDDepthHead(max_depth: float = 10.0, in_channels: Sequence[int] = [320, 640, 1280, 1280], embed_dim: int = 192, feature_dim: int = 1536, num_deconv_layers: int = 3, num_deconv_filters: Sequence[int] = (32, 32, 32), fmap_border: Union[int, Sequence[int]] = 0, align_corners: bool = False, loss_decode: dict = {'type': 'SiLogLoss'}, init_cfg={'layer': ['Conv2d', 'Linear'], 'std': 0.02, 'type': 'TruncNormal'})[source]¶
Depth Prediction Head for VPD.
- Parameters
max_depth (float) – Maximum depth value. Defaults to 10.0.
in_channels (Sequence[int]) – Number of input channels for each convolutional layer.
embed_dim (int) – Dimension of embedding. Defaults to 192.
feature_dim (int) – Dimension of aggregated feature. Defaults to 1536.
num_deconv_layers (int) – Number of deconvolution layers in the decoder. Defaults to 3.
num_deconv_filters (Sequence[int]) – Number of filters for each deconv layer. Defaults to (32, 32, 32).
fmap_border (Union[int, Sequence[int]]) – Feature map border for cropping. Defaults to 0.
align_corners (bool) – Flag for align_corners in interpolation. Defaults to False.
loss_decode (dict) – Configurations for the loss function. Defaults to dict(type=’SiLogLoss’).
init_cfg (dict) – Initialization configurations. Defaults to dict(type=’TruncNormal’, std=0.02, layer=[‘Conv2d’, ‘Linear’]).
- loss_by_feat(pred_depth_map: torch.Tensor, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample]) → dict[source]¶
Compute depth estimation loss.
- Parameters
pred_depth_map (Tensor) – The output from decode head forward function.
batch_data_samples (List[
SegDataSample
]) – The seg data samples. It usually includes information such as metainfo and gt_dpeth_map.
- Returns
a dictionary of loss components
- Return type
dict[str, Tensor]
segmentors¶
- class mmseg.models.segmentors.BaseSegmentor(data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]¶
Base class for segmentors.
- Parameters
data_preprocessor – Model preprocessing config for processing the input data. it usually includes
to_rgb
,pad_size_divisor
,pad_val
,mean
andstd
. Default to None.
- abstract encode_decode(inputs: torch.Tensor, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])[source]¶
Placeholder for encode images with backbone and decode into a semantic segmentation map of the same size as input.
- abstract extract_feat(inputs: torch.Tensor) → bool[source]¶
Placeholder for extract features from images.
- forward(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None, mode: str = 'tensor') → Union[Dict[str, torch.Tensor], List[mmseg.structures.seg_data_sample.SegDataSample], Tuple[torch.Tensor], torch.Tensor][source]¶
The unified entry for a forward process in both training and test.
The method should accept three modes: “tensor”, “predict” and “loss”:
“tensor”: Forward the whole network and return tensor or tuple of
tensor without any post-processing, same as a common nn.Module. - “predict”: Forward and return the predictions, which are fully processed to a list of
SegDataSample
. - “loss”: Forward and return a dict of losses according to the given inputs and data samples.Note that this method doesn’t handle neither back propagation nor optimizer updating, which are done in the
train_step()
.- Parameters
inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.
data_samples (list[
SegDataSample
]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg. Default to None.mode (str) – Return what kind of value. Defaults to ‘tensor’.
- Returns
The return type depends on
mode
.If
mode="tensor"
, return a tensor or a tuple of tensor.If
mode="predict"
, return a list ofDetDataSample
.If
mode="loss"
, return a dict of tensor.
- abstract loss(inputs: torch.Tensor, data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample]) → dict[source]¶
Calculate losses from a batch of inputs and data samples.
- postprocess_result(seg_logits: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None) → Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]¶
Convert results list to SegDataSample. :param seg_logits: The segmentation results, seg_logits from
model of each input image.
- Parameters
data_samples (list[
SegDataSample
]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg. Default to None.- Returns
Segmentation results of the input images. Each SegDataSample usually contain:
- Return type
list[
SegDataSample
]
- abstract predict(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None) → Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]¶
Predict results from a batch of inputs and data samples with post- processing.
- property with_auxiliary_head: bool¶
whether the segmentor has auxiliary head
- Type
bool
- property with_decode_head: bool¶
whether the segmentor has decode head
- Type
bool
- property with_neck: bool¶
whether the segmentor has neck
- Type
bool
- class mmseg.models.segmentors.CascadeEncoderDecoder(num_stages: int, backbone: Union[mmengine.config.config.ConfigDict, dict], decode_head: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, auxiliary_head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, pretrained: Optional[str] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]¶
Cascade Encoder Decoder segmentors.
CascadeEncoderDecoder almost the same as EncoderDecoder, while decoders of CascadeEncoderDecoder are cascaded. The output of previous decoder_head will be the input of next decoder_head.
- Parameters
num_stages (int) – How many stages will be cascaded.
backbone (ConfigType) – The config for the backnone of segmentor.
decode_head (ConfigType) – The config for the decode head of segmentor.
neck (OptConfigType) – The config for the neck of segmentor. Defaults to None.
auxiliary_head (OptConfigType) – The config for the auxiliary head of segmentor. Defaults to None.
train_cfg (OptConfigType) – The config for training. Defaults to None.
test_cfg (OptConfigType) – The config for testing. Defaults to None.
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.pretrained (str, optional) – The path for pretrained model. Defaults to None.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
.
- class mmseg.models.segmentors.DepthEstimator(backbone: Union[mmengine.config.config.ConfigDict, dict], decode_head: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, auxiliary_head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, pretrained: Optional[str] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]¶
Encoder Decoder depth estimator.
EncoderDecoder typically consists of backbone, decode_head, auxiliary_head. Note that auxiliary_head is only used for deep supervision during training, which could be dumped during inference.
1. The
loss
method is used to calculate the loss of model, which includes two steps: (1) Extracts features to obtain the feature maps (2) Call the decode head loss function to forward decode head model and calculate losses.loss(): extract_feat() -> _decode_head_forward_train() -> _auxiliary_head_forward_train (optional) _decode_head_forward_train(): decode_head.loss() _auxiliary_head_forward_train(): auxiliary_head.loss (optional)
2. The
predict
method is used to predict depth estimation results, which includes two steps: (1) Run inference function to obtain the list of depth (2) Call post-processing function to obtain list ofSegDataSample
includingpred_depth_map
.predict(): inference() -> postprocess_result() inference(): whole_inference()/slide_inference() whole_inference()/slide_inference(): encoder_decoder() encoder_decoder(): extract_feat() -> decode_head.predict()
3. The
_forward
method is used to output the tensor by running the model, which includes two steps: (1) Extracts features to obtain the feature maps (2)Call the decode head forward function to forward decode head model._forward(): extract_feat() -> _decode_head.forward()
- Parameters
backbone (ConfigType) – The config for the backnone of depth estimator.
decode_head (ConfigType) – The config for the decode head of depth estimator.
neck (OptConfigType) – The config for the neck of depth estimator. Defaults to None.
auxiliary_head (OptConfigType) – The config for the auxiliary head of depth estimator. Defaults to None.
train_cfg (OptConfigType) – The config for training. Defaults to None.
test_cfg (OptConfigType) – The config for testing. Defaults to None.
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.pretrained (str, optional) – The path for pretrained model. Defaults to None.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
.
- encode_decode(inputs: torch.Tensor, batch_img_metas: List[dict]) → torch.Tensor[source]¶
Encode images with backbone and decode into a depth map of the same size as input.
- extract_feat(inputs: torch.Tensor, batch_img_metas: Optional[List[dict]] = None) → torch.Tensor[source]¶
Extract features from images.
- inference(inputs: torch.Tensor, batch_img_metas: List[dict]) → torch.Tensor[source]¶
Inference with slide/whole style.
- Parameters
inputs (Tensor) – The input image of shape (N, 3, H, W).
batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, ‘pad_shape’, and ‘padding_size’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.
- Returns
The depth estimation results.
- Return type
Tensor
- loss(inputs: torch.Tensor, data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample]) → dict[source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters
inputs (Tensor) – Input images.
data_samples (list[
SegDataSample
]) – The seg data samples. It usually includes information such as metainfo and gt_depth_map.
- Returns
a dictionary of loss components
- Return type
dict[str, Tensor]
- postprocess_result(depth: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None) → Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]¶
Convert results list to SegDataSample. :param depth: The depth estimation results. :type depth: Tensor :param data_samples: The seg data samples.
It usually includes information such as metainfo and gt_depth_map. Default to None.
- Returns
Depth estomation results of the input images. Each SegDataSample usually contain:
``pred_depth_map``(PixelData): Prediction of depth estimation.
- Return type
list[
SegDataSample
]
- predict(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None) → Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]¶
Predict results from a batch of inputs and data samples with post- processing.
- Parameters
inputs (Tensor) – Inputs with shape (N, C, H, W).
data_samples (List[
SegDataSample
], optional) – The seg data samples. It usually includes information such as metainfo and gt_depth_map.
- Returns
Depth estimation results of the input images. Each SegDataSample usually contain:
``pred_depth_max``(PixelData): Prediction of depth estimation.
- Return type
list[
SegDataSample
]
- slide_flip_inference(inputs: torch.Tensor, batch_img_metas: List[dict]) → torch.Tensor[source]¶
Inference by sliding-window with overlap and flip.
If h_crop > h_img or w_crop > w_img, the small patch will be used to decode without padding.
- Parameters
inputs (tensor) – the tensor should have a shape NxCxHxW, which contains all images in the batch.
batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, and ‘pad_shape’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.
- Returns
The depth estimation results.
- Return type
Tensor
- class mmseg.models.segmentors.EncoderDecoder(backbone: Union[mmengine.config.config.ConfigDict, dict], decode_head: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, auxiliary_head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, pretrained: Optional[str] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]¶
Encoder Decoder segmentors.
EncoderDecoder typically consists of backbone, decode_head, auxiliary_head. Note that auxiliary_head is only used for deep supervision during training, which could be dumped during inference.
1. The
loss
method is used to calculate the loss of model, which includes two steps: (1) Extracts features to obtain the feature maps (2) Call the decode head loss function to forward decode head model and calculate losses.loss(): extract_feat() -> _decode_head_forward_train() -> _auxiliary_head_forward_train (optional) _decode_head_forward_train(): decode_head.loss() _auxiliary_head_forward_train(): auxiliary_head.loss (optional)
2. The
predict
method is used to predict segmentation results, which includes two steps: (1) Run inference function to obtain the list of seg_logits (2) Call post-processing function to obtain list ofSegDataSample
includingpred_sem_seg
andseg_logits
.predict(): inference() -> postprocess_result() infercen(): whole_inference()/slide_inference() whole_inference()/slide_inference(): encoder_decoder() encoder_decoder(): extract_feat() -> decode_head.predict()
3. The
_forward
method is used to output the tensor by running the model, which includes two steps: (1) Extracts features to obtain the feature maps (2)Call the decode head forward function to forward decode head model._forward(): extract_feat() -> _decode_head.forward()
- Parameters
backbone (ConfigType) – The config for the backnone of segmentor.
decode_head (ConfigType) – The config for the decode head of segmentor.
neck (OptConfigType) – The config for the neck of segmentor. Defaults to None.
auxiliary_head (OptConfigType) – The config for the auxiliary head of segmentor. Defaults to None.
train_cfg (OptConfigType) – The config for training. Defaults to None.
test_cfg (OptConfigType) – The config for testing. Defaults to None.
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.pretrained (str, optional) – The path for pretrained model. Defaults to None.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
.
- aug_test(inputs, batch_img_metas, rescale=True)[source]¶
Test with augmentations.
Only rescale=True is supported.
- encode_decode(inputs: torch.Tensor, batch_img_metas: List[dict]) → torch.Tensor[source]¶
Encode images with backbone and decode into a semantic segmentation map of the same size as input.
- inference(inputs: torch.Tensor, batch_img_metas: List[dict]) → torch.Tensor[source]¶
Inference with slide/whole style.
- Parameters
inputs (Tensor) – The input image of shape (N, 3, H, W).
batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, ‘pad_shape’, and ‘padding_size’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.
- Returns
- The segmentation results, seg_logits from model of each
input image.
- Return type
Tensor
- loss(inputs: torch.Tensor, data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample]) → dict[source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters
inputs (Tensor) – Input images.
data_samples (list[
SegDataSample
]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.
- Returns
a dictionary of loss components
- Return type
dict[str, Tensor]
- predict(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None) → Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]¶
Predict results from a batch of inputs and data samples with post- processing.
- Parameters
inputs (Tensor) – Inputs with shape (N, C, H, W).
data_samples (List[
SegDataSample
], optional) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.
- Returns
Segmentation results of the input images. Each SegDataSample usually contain:
- Return type
list[
SegDataSample
]
- slide_inference(inputs: torch.Tensor, batch_img_metas: List[dict]) → torch.Tensor[source]¶
Inference by sliding-window with overlap.
If h_crop > h_img or w_crop > w_img, the small patch will be used to decode without padding.
- Parameters
inputs (tensor) – the tensor should have a shape NxCxHxW, which contains all images in the batch.
batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, and ‘pad_shape’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.
- Returns
- The segmentation results, seg_logits from model of each
input image.
- Return type
Tensor
- whole_inference(inputs: torch.Tensor, batch_img_metas: List[dict]) → torch.Tensor[source]¶
Inference with full image.
- Parameters
inputs (Tensor) – The tensor should have a shape NxCxHxW, which contains all images in the batch.
batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, and ‘pad_shape’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.
- Returns
- The segmentation results, seg_logits from model of each
input image.
- Return type
Tensor
- class mmseg.models.segmentors.SegTTAModel(module: Union[dict, torch.nn.modules.module.Module], data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None)[source]¶
- merge_preds(data_samples_list: List[Sequence[mmseg.structures.seg_data_sample.SegDataSample]]) → Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]¶
Merge predictions of enhanced data to one prediction.
- Parameters
data_samples_list (List[SampleList]) – List of predictions of all enhanced data.
- Returns
Merged prediction.
- Return type
SampleList
losses¶
- class mmseg.models.losses.Accuracy(topk=(1), thresh=None, ignore_index=None)[source]¶
Accuracy calculation module.
- class mmseg.models.losses.BoundaryLoss(loss_weight: float = 1.0, loss_name: str = 'loss_boundary')[source]¶
Boundary loss.
This function is modified from PIDNet. # noqa Licensed under the MIT License.
- Parameters
loss_weight (float) – Weight of the loss. Defaults to 1.0.
loss_name (str) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_boundary’.
- class mmseg.models.losses.CrossEntropyLoss(use_sigmoid=False, use_mask=False, reduction='mean', class_weight=None, loss_weight=1.0, loss_name='loss_ce', avg_non_ignore=False)[source]¶
CrossEntropyLoss.
- Parameters
use_sigmoid (bool, optional) – Whether the prediction uses sigmoid of softmax. Defaults to False.
use_mask (bool, optional) – Whether to use mask cross entropy loss. Defaults to False.
reduction (str, optional) – . Defaults to ‘mean’. Options are “none”, “mean” and “sum”.
class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.
loss_weight (float, optional) – Weight of the loss. Defaults to 1.0.
loss_name (str, optional) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_ce’.
avg_non_ignore (bool) – The flag decides to whether the loss is only averaged over non-ignored targets. Default: False. New in version 0.23.0.
- forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, ignore_index=- 100, **kwargs)[source]¶
Forward function.
- property loss_name¶
Loss Name.
This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name.
- Returns
The name of this loss item.
- Return type
str
- class mmseg.models.losses.DiceLoss(use_sigmoid=True, activate=True, reduction='mean', naive_dice=False, loss_weight=1.0, ignore_index=255, eps=0.001, loss_name='loss_dice')[source]¶
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None, ignore_index=255, **kwargs)[source]¶
Forward function.
- Parameters
pred (torch.Tensor) – The prediction, has a shape (n, *).
target (torch.Tensor) – The label of the prediction, shape (n, *), same shape of pred.
weight (torch.Tensor, optional) – The weight of loss for each prediction, has a shape (n,). Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Options are “none”, “mean” and “sum”.
- Returns
The calculated loss
- Return type
torch.Tensor
- property loss_name¶
Loss Name.
This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str
- class mmseg.models.losses.FocalLoss(use_sigmoid=True, gamma=2.0, alpha=0.5, reduction='mean', class_weight=None, loss_weight=1.0, loss_name='loss_focal')[source]¶
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None, ignore_index=255, **kwargs)[source]¶
Forward function.
- Parameters
pred (torch.Tensor) – The prediction with shape (N, C) where C = number of classes, or (N, C, d_1, d_2, …, d_K) with K≥1 in the case of K-dimensional loss.
target (torch.Tensor) – The ground truth. If containing class indices, shape (N) where each value is 0≤targets[i]≤C−1, or (N, d_1, d_2, …, d_K) with K≥1 in the case of K-dimensional loss. If containing class probabilities, same shape as the input.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Options are “none”, “mean” and “sum”.
ignore_index (int, optional) – The label index to be ignored. Default: 255
- Returns
The calculated loss
- Return type
torch.Tensor
- property loss_name¶
Loss Name.
This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str