mmseg.apis¶
- class mmseg.apis.MMSegInferencer(model: Union[dict, mmengine.config.config.Config, mmengine.config.config.ConfigDict, str], weights: Optional[str] = None, classes: Optional[Union[str, List]] = None, palette: Optional[Union[str, List]] = None, dataset_name: Optional[str] = None, device: Optional[str] = None, scope: Optional[str] = 'mmseg')[source]¶
Semantic segmentation inferencer, provides inference and visualization interfaces. Note: MMEngine >= 0.5.0 is required.
- Parameters
model (str, optional) – Path to the config file or the model name defined in metafile. Take the mmseg metafile as an example the model could be “fcn_r50-d8_4xb2-40k_cityscapes-512x1024”, and the weights of model will be download automatically. If use config file, like “configs/fcn/fcn_r50-d8_4xb2-40k_cityscapes-512x1024.py”, the weights should be defined.
weights (str, optional) – Path to the checkpoint. If it is not specified and model is a model name of metafile, the weights will be loaded from metafile. Defaults to None.
classes (list, optional) – Input classes for result rendering, as the prediction of segmentation model is a segment map with label indices, classes is a list which includes items responding to the label indices. If classes is not defined, visualizer will take cityscapes classes by default. Defaults to None.
palette (list, optional) – Input palette for result rendering, which is a list of color palette responding to the classes. If palette is not defined, visualizer will take cityscapes palette by default. Defaults to None.
dataset_name (str, optional) – Dataset name or alias visulizer will use the meta information of the dataset i.e. classes and palette, but the classes and palette have higher priority. Defaults to None.
device (str, optional) – Device to run inference. If None, the available device will be automatically used. Defaults to None.
scope (str, optional) – The scope of the model. Defaults to ‘mmseg’.
- postprocess(preds: Union[mmseg.structures.seg_data_sample.SegDataSample, Sequence[mmseg.structures.seg_data_sample.SegDataSample]], visualization: List[numpy.ndarray], return_datasample: bool = False, pred_out_dir: str = '') → dict[source]¶
Process the predictions and visualization results from
forward
andvisualize
.This method should be responsible for the following tasks:
Pack the predictions and visualization results and return them.
Save the predictions, if it needed.
- Parameters
preds (List[Dict]) – Predictions of the model.
visualization (List[np.ndarray]) – The list of rendering color segmentation mask.
return_datasample (bool) – Whether to return results as datasamples. Defaults to False.
pred_out_dir – File to save the inference results w/o visualization. If left as empty, no file will be saved. Defaults to ‘’.
- Returns
Inference and visualization results with key
predictions
andvisualization
visualization (Any)
: Returned byvisualize()
predictions
(List[np.ndarray], np.ndarray): Returned byforward()
and processed inpostprocess()
. Ifreturn_datasample=False
, it will be the segmentation mask with label indice.
- Return type
dict
- visualize(inputs: list, preds: List[dict], show: bool = False, wait_time: int = 0, img_out_dir: str = '', opacity: float = 0.8) → List[numpy.ndarray][source]¶
Visualize predictions.
- Parameters
inputs (list) – Inputs preprocessed by
_inputs_to_list()
.preds (Any) – Predictions of the model.
show (bool) – Whether to display the image in a popup window. Defaults to False.
wait_time (float) – The interval of show (s). Defaults to 0.
img_out_dir (str) – Output directory of rendering prediction i.e. color segmentation mask. Defaults: ‘’
opacity (int, float) – The transparency of segmentation mask. Defaults to 0.8.
- Returns
Visualization results.
- Return type
List[np.ndarray]
- mmseg.apis.inference_model(model: mmseg.models.segmentors.base.BaseSegmentor, img: Union[str, numpy.ndarray, Sequence[str], Sequence[numpy.ndarray]]) → Union[mmseg.structures.seg_data_sample.SegDataSample, Sequence[mmseg.structures.seg_data_sample.SegDataSample]][source]¶
Inference image(s) with the segmentor.
- Parameters
model (nn.Module) – The loaded segmentor.
imgs (str/ndarray or list[str/ndarray]) – Either image files or loaded images.
- Returns
If imgs is a list or tuple, the same length list type results will be returned, otherwise return the segmentation results directly.
- Return type
SegDataSample
or list[SegDataSample
]
- mmseg.apis.init_model(config: Union[str, pathlib.Path, mmengine.config.config.Config], checkpoint: Optional[str] = None, device: str = 'cuda:0', cfg_options: Optional[dict] = None)[source]¶
Initialize a segmentor from config file.
- Parameters
config (str,
Path
, ormmengine.Config
) – Config file path,Path
, or the config object.checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.
device (str, optional) – 0’. Use ‘cpu’ for loading model on CPU.
cfg_options (dict, optional) – Options to override some settings in the used config.
- Returns
The constructed segmentor.
- Return type
nn.Module
- mmseg.apis.show_result_pyplot(model: mmseg.models.segmentors.base.BaseSegmentor, img: Union[str, numpy.ndarray], result: mmseg.structures.seg_data_sample.SegDataSample, opacity: float = 0.5, title: str = '', draw_gt: bool = True, draw_pred: bool = True, wait_time: float = 0, show: bool = True, save_dir=None, out_file=None)[source]¶
Visualize the segmentation results on the image.
- Parameters
model (nn.Module) – The loaded segmentor.
img (str or np.ndarray) – Image filename or loaded image.
result (SegDataSample) – The prediction SegDataSample result.
opacity (float) – Opacity of painted segmentation map. Default 0.5. Must be in (0, 1] range.
title (str) – The title of pyplot figure. Default is ‘’.
draw_gt (bool) – Whether to draw GT SegDataSample. Default to True.
draw_pred (bool) – Whether to draw Prediction SegDataSample. Defaults to True.
wait_time (float) – The interval of show (s). 0 is the special value that means “forever”. Defaults to 0.
show (bool) – Whether to display the drawn image. Default to True.
save_dir (str, optional) – Save file dir for all storage backends. If it is None, the backend storage will not save any data.
out_file (str, optional) – Path to output file. Default to None.
- Returns
the drawn image which channel is RGB.
- Return type
np.ndarray
mmseg.datasets¶
datasets¶
- class mmseg.datasets.ADE20KDataset(img_suffix='.jpg', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]¶
ADE20K dataset.
In segmentation map annotation for ADE20K, 0 stands for background, which is not included in 150 categories.
reduce_zero_label
is fixed to True. Theimg_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’.
- class mmseg.datasets.AdjustGamma(gamma=1.0)[source]¶
Using gamma correction to process the image.
Required Keys:
img
Modified Keys:
img
- Parameters
gamma (float or int) – Gamma value used in gamma correction. Default: 1.0.
- class mmseg.datasets.BaseSegDataset(ann_file: str = '', img_suffix='.jpg', seg_map_suffix='.png', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img_path': '', 'seg_map_path': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, ignore_index: int = 255, reduce_zero_label: bool = False, backend_args: Optional[dict] = None)[source]¶
Custom dataset for semantic segmentation. An example of file structure is as followed.
├── data │ ├── my_dataset │ │ ├── img_dir │ │ │ ├── train │ │ │ │ ├── xxx{img_suffix} │ │ │ │ ├── yyy{img_suffix} │ │ │ │ ├── zzz{img_suffix} │ │ │ ├── val │ │ ├── ann_dir │ │ │ ├── train │ │ │ │ ├── xxx{seg_map_suffix} │ │ │ │ ├── yyy{seg_map_suffix} │ │ │ │ ├── zzz{seg_map_suffix} │ │ │ ├── val
The img/gt_semantic_seg pair of BaseSegDataset should be of the same except suffix. A valid img/gt_semantic_seg filename pair should be like
xxx{img_suffix}
andxxx{seg_map_suffix}
(extension is also included in the suffix). If split is given, thenxxx
is specified in txt file. Otherwise, all files inimg_dir/``and ``ann_dir
will be loaded. Please refer todocs/en/tutorials/new_dataset.md
for more details.- Parameters
ann_file (str) – Annotation file path. Defaults to ‘’.
metainfo (dict, optional) – Meta information for dataset, such as specify classes to load. Defaults to None.
data_root (str, optional) – The root directory for
data_prefix
andann_file
. Defaults to None.data_prefix (dict, optional) – Prefix for training data. Defaults to dict(img_path=None, seg_map_path=None).
img_suffix (str) – Suffix of images. Default: ‘.jpg’
seg_map_suffix (str) – Suffix of segmentation maps. Default: ‘.png’
filter_cfg (dict, optional) – Config for filter data. Defaults to None.
indices (int or Sequence[int], optional) – Support using first few data in annotation file to facilitate training/testing on a smaller dataset. Defaults to None which means using all
data_infos
.serialize_data (bool, optional) – Whether to hold memory using serialized objects, when enabled, data loader workers can use shared RAM from master process instead of making a copy. Defaults to True.
pipeline (list, optional) – Processing pipeline. Defaults to [].
test_mode (bool, optional) –
test_mode=True
means in test phase. Defaults to False.lazy_init (bool, optional) – Whether to load annotation during instantiation. In some cases, such as visualization, only the meta information of the dataset is needed, which is not necessary to load annotation file.
Basedataset
can skip load annotations to save time by setlazy_init=True
. Defaults to False.max_refetch (int, optional) – If
Basedataset.prepare_data
get a None img. The maximum extra number of cycles to get a valid image. Defaults to 1000.ignore_index (int) – The label index to be ignored. Default: 255
reduce_zero_label (bool) – Whether to mark label zero as ignored. Default to False.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- classmethod get_label_map(new_classes: Optional[Sequence] = None) → Optional[Dict][source]¶
Require label mapping.
The
label_map
is a dictionary, its keys are the old label ids and its values are the new label ids, and is used for changing pixel labels in load_annotations. If and only if old classes in cls.METAINFO is not equal to new classes in self._metainfo and nether of them is not None, label_map is not None.- Parameters
new_classes (list, tuple, optional) – The new classes name from metainfo. Default to None.
- Returns
- The mapping from old classes in cls.METAINFO to
new classes in self._metainfo
- Return type
dict, optional
- class mmseg.datasets.BioMedical3DPad(pad_shape: Tuple[int, int, int], pad_val: float = 0.0, seg_pad_val: int = 0)[source]¶
Pad the biomedical 3d image & biomedical 3d semantic segmentation maps.
Required Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Modified Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Added Keys:
pad_shape (Tuple[int, int, int]): The padded shape.
- Parameters
pad_shape (Tuple[int, int, int]) – Fixed padding size. Expected padding shape (Z, Y, X).
pad_val (float) – Padding value for biomedical image. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.
seg_pad_val (int) – Padding value for biomedical 3d semantic segmentation maps. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.
- class mmseg.datasets.BioMedical3DRandomCrop(crop_shape: Union[int, Tuple[int, int, int]], keep_foreground: bool = True)[source]¶
Crop the input patch for medical image & segmentation mask.
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
- gt_seg_map (np.ndarray, optional): Biomedical semantic segmentation mask
with shape (Z, Y, X).
Modified Keys:
img
img_shape
gt_seg_map (optional)
- Parameters
crop_shape (Union[int, Tuple[int, int, int]]) – Expected size after cropping with the format of (z, y, x). If set to an integer, then cropping width and height are equal to this integer.
keep_foreground (bool) – If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the crop bounding-box. Default to True.
- crop(img: numpy.ndarray, crop_bbox: tuple) → numpy.ndarray[source]¶
Crop from
img
- Parameters
img (np.ndarray) – Original input image.
crop_bbox (tuple) – Coordinates of the cropped image.
- Returns
The cropped image.
- Return type
np.ndarray
- generate_margin(results: dict) → tuple[source]¶
Generate margin of crop bounding-box.
If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the bounding-box, and return the margin between of the bounding-box and image. If keep_foreground is False, it will return the difference from crop shape and image shape.
- Parameters
results (dict) – Result dict from loading pipeline.
- Returns
The margin for 3 dimensions of crop bounding-box and image.
- Return type
tuple
- random_generate_crop_bbox(margin_z: int, margin_y: int, margin_x: int) → tuple[source]¶
Randomly get a crop bounding box.
- Parameters
seg_map (np.ndarray) – Ground truth segmentation map.
- Returns
Coordinates of the cropped image.
- Return type
tuple
- class mmseg.datasets.BioMedical3DRandomFlip(prob: float, axes: Tuple[int, ...], swap_label_pairs: Optional[List[Tuple[int, int]]] = None)[source]¶
Flip biomedical 3D images and segmentations.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/spatial_transforms.py # noqa:E501
Copyright 2021 Division of Medical Image Computing, German Cancer Research Center (DKFZ) and Applied Computer Vision Lab, Helmholtz Imaging Platform. Licensed under the Apache-2.0 License.
Required Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Modified Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Added Keys:
do_flip
flip_axes
- Parameters
prob (float) – Flipping probability.
axes (Tuple[int, ...]) – Flipping axes with order ‘ZXY’.
swap_label_pairs (Optional[List[Tuple[int, int]]]) –
segmentation label pairs that are swapped when flipping. (The) –
- class mmseg.datasets.BioMedicalGaussianBlur(sigma_range: Tuple[float, float] = (0.5, 1.0), prob: float = 0.2, prob_per_channel: float = 0.5, different_sigma_per_channel: bool = True, different_sigma_per_axis: bool = True)[source]¶
Add Gaussian blur with random sigma to image.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L81 # noqa:E501
Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
Modified Keys:
img
- Parameters
sigma_range (Tuple[float, float]|float) – range to randomly select sigma value. Default to (0.5, 1.0).
prob (float) – Probability to apply Gaussian blur for each sample. Default to 0.2.
prob_per_channel (float) – Probability to apply Gaussian blur for each channel (axis N of the image). Default to 0.5.
different_sigma_per_channel (bool) – whether to use different sigma for each channel (axis N of the image). Default to True.
different_sigma_per_axis (bool) – whether to use different sigma for axis Z, X and Y of the image. Default to True.
- class mmseg.datasets.BioMedicalGaussianNoise(prob: float = 0.1, mean: float = 0.0, std: float = 0.1)[source]¶
Add random Gaussian noise to image.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L53 # noqa:E501
Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
Modified Keys:
img
- Parameters
prob (float) – Probability to add Gaussian noise for each sample. Default to 0.1.
mean (float) – Mean or “centre” of the distribution. Default to 0.0.
std (float) – Standard deviation of distribution. Default to 0.1.
- class mmseg.datasets.BioMedicalRandomGamma(prob: float = 0.5, gamma_range: Tuple[float] = (0.5, 2), invert_image: bool = False, per_channel: bool = False, retain_stats: bool = False)[source]¶
Using random gamma correction to process the biomedical image.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/color_transforms.py#L132 # noqa:E501 With licence: Apache 2.0
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
Modified Keys: - img
- Parameters
prob (float) – The probability to perform this transform. Default: 0.5.
gamma_range (Tuple[float]) – Range of gamma values. Default: (0.5, 2).
invert_image (bool) – Whether invert the image before applying gamma augmentation. Default: False.
per_channel (bool) – Whether perform the transform each channel individually. Default: False
retain_stats (bool) – Gamma transformation will alter the mean and std of the data in the patch. If retain_stats=True, the data will be transformed to match the mean and standard deviation before gamma augmentation. Default: False.
- class mmseg.datasets.CLAHE(clip_limit=40.0, tile_grid_size=(8, 8))[source]¶
Use CLAHE method to process the image.
See ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994:474-485. for more information.
Required Keys:
img
Modified Keys:
img
- Parameters
clip_limit (float) – Threshold for contrast limiting. Default: 40.0.
tile_grid_size (tuple[int]) – Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. It defines the number of tiles in row and column. Default: (8, 8).
- class mmseg.datasets.COCOStuffDataset(img_suffix='.jpg', seg_map_suffix='_labelTrainIds.png', **kwargs)[source]¶
COCO-Stuff dataset.
In segmentation map annotation for COCO-Stuff, Train-IDs of the 10k version are from 1 to 171, where 0 is the ignore index, and Train-ID of COCO Stuff 164k is from 0 to 170, where 255 is the ignore index. So, they are all 171 semantic categories.
reduce_zero_label
is set to True and False for the 10k and 164k versions, respectively. Theimg_suffix
is fixed to ‘.jpg’, andseg_map_suffix
is fixed to ‘.png’.
- class mmseg.datasets.ChaseDB1Dataset(img_suffix='.png', seg_map_suffix='_1stHO.png', reduce_zero_label=False, **kwargs)[source]¶
Chase_db1 dataset.
In segmentation map annotation for Chase_db1, 0 stands for background, which is included in 2 categories.
reduce_zero_label
is fixed to False. Theimg_suffix
is fixed to ‘.png’ andseg_map_suffix
is fixed to ‘_1stHO.png’.
- class mmseg.datasets.CityscapesDataset(img_suffix='_leftImg8bit.png', seg_map_suffix='_gtFine_labelTrainIds.png', **kwargs)[source]¶
Cityscapes dataset.
The
img_suffix
is fixed to ‘_leftImg8bit.png’ andseg_map_suffix
is fixed to ‘_gtFine_labelTrainIds.png’ for Cityscapes dataset.
- class mmseg.datasets.DRIVEDataset(img_suffix='.png', seg_map_suffix='_manual1.png', reduce_zero_label=False, **kwargs)[source]¶
DRIVE dataset.
In segmentation map annotation for DRIVE, 0 stands for background, which is included in 2 categories.
reduce_zero_label
is fixed to False. Theimg_suffix
is fixed to ‘.png’ andseg_map_suffix
is fixed to ‘_manual1.png’.
- class mmseg.datasets.DarkZurichDataset(img_suffix='_rgb_anon.png', seg_map_suffix='_gt_labelTrainIds.png', **kwargs)[source]¶
DarkZurichDataset dataset.
- class mmseg.datasets.DecathlonDataset(ann_file: str = '', img_suffix='.jpg', seg_map_suffix='.png', metainfo: Optional[dict] = None, data_root: Optional[str] = None, data_prefix: dict = {'img_path': '', 'seg_map_path': ''}, filter_cfg: Optional[dict] = None, indices: Optional[Union[int, Sequence[int]]] = None, serialize_data: bool = True, pipeline: List[Union[dict, Callable]] = [], test_mode: bool = False, lazy_init: bool = False, max_refetch: int = 1000, ignore_index: int = 255, reduce_zero_label: bool = False, backend_args: Optional[dict] = None)[source]¶
Dataset for Dacathlon dataset.
The dataset.json format is shown as follows
{ "name": "BRATS", "tensorImageSize": "4D", "modality": { "0": "FLAIR", "1": "T1w", "2": "t1gd", "3": "T2w" }, "labels": { "0": "background", "1": "edema", "2": "non-enhancing tumor", "3": "enhancing tumour" }, "numTraining": 484, "numTest": 266, "training": [ { "image": "./imagesTr/BRATS_306.nii.gz" "label": "./labelsTr/BRATS_306.nii.gz" ... } ] "test": [ "./imagesTs/BRATS_557.nii.gz" ... ] }
- class mmseg.datasets.GenerateEdge(edge_width: int = 3, ignore_index: int = 255)[source]¶
Generate Edge for CE2P approach.
Edge will be used to calculate loss of CE2P.
Modified from https://github.com/liutinglt/CE2P/blob/master/dataset/target_generation.py # noqa:E501
Required Keys:
img_shape
gt_seg_map
- Added Keys:
- gt_edge_map (np.ndarray, uint8): The edge annotation generated from the
seg map by extracting border between different semantics.
- Parameters
edge_width (int) – The width of edge. Default to 3.
ignore_index (int) – Index that will be ignored. Default to 255.
- class mmseg.datasets.HRFDataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=False, **kwargs)[source]¶
HRF dataset.
In segmentation map annotation for HRF, 0 stands for background, which is included in 2 categories.
reduce_zero_label
is fixed to False. Theimg_suffix
is fixed to ‘.png’ andseg_map_suffix
is fixed to ‘.png’.
- class mmseg.datasets.ISPRSDataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]¶
ISPRS dataset.
In segmentation map annotation for ISPRS, 0 is the ignore index.
reduce_zero_label
should be set to True. Theimg_suffix
andseg_map_suffix
are both fixed to ‘.png’.
- class mmseg.datasets.LIPDataset(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]¶
LIP dataset.
The
img_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’.
- class mmseg.datasets.LoadAnnotations(reduce_zero_label=None, backend_args=None, imdecode_backend='pillow')[source]¶
Load annotations for semantic segmentation provided by dataset.
The annotation format is as the following:
{ # Filename of semantic segmentation ground truth file. 'seg_map_path': 'a/b/c' }
After this module, the annotation has been changed to the format below:
{ # in str 'seg_fields': List # In uint8 type. 'gt_seg_map': np.ndarray (H, W) }
Required Keys:
seg_map_path (str): Path of semantic segmentation ground truth file.
Added Keys:
seg_fields (List)
gt_seg_map (np.uint8)
- Parameters
reduce_zero_label (bool, optional) – Whether reduce all label value by 1. Usually used for datasets where 0 is background label. Defaults to None.
imdecode_backend (str) – The image decoding backend type. The backend argument for :func:
mmcv.imfrombytes
. See :fun:mmcv.imfrombytes
for details. Defaults to ‘pillow’.backend_args (dict) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.LoadBiomedicalAnnotation(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]¶
Load
seg_map
annotation provided by biomedical dataset.The annotation format is as the following:
{ 'gt_seg_map': np.ndarray (X, Y, Z) or (Z, Y, X) }
Required Keys:
seg_map_path
Added Keys:
- gt_seg_map (np.ndarray): Biomedical seg map with shape (Z, Y, X) by
default, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.
- Parameters
decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.
to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.
to_float32 (bool) – Whether to convert the loaded seg map to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See
mmengine.fileio
for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.LoadBiomedicalData(with_seg=False, decode_backend: str = 'numpy', to_xyz: bool = False, backend_args: Optional[dict] = None)[source]¶
Load an biomedical image and annotation from file.
The loading data format is as the following:
{ 'img': np.ndarray data[:-1, X, Y, Z] 'seg_map': np.ndarray data[-1, X, Y, Z] }
Required Keys:
img_path
Added Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
img_shape
ori_shape
- Parameters
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.
decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.
to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.LoadBiomedicalImageFromFile(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]¶
Load an biomedical mage from file.
Required Keys:
img_path
Added Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.
img_shape
ori_shape
- Parameters
decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.
to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.LoadImageFromNDArray(to_float32: bool = False, color_type: str = 'color', imdecode_backend: str = 'cv2', file_client_args: Optional[dict] = None, ignore_empty: bool = False, *, backend_args: Optional[dict] = None)[source]¶
Load an image from
results['img']
.Similar with
LoadImageFromFile
, but the image has been loaded asnp.ndarray
inresults['img']
. Can be used when loading image from webcam.Required Keys:
img
Modified Keys:
img
img_path
img_shape
ori_shape
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
- class mmseg.datasets.LoveDADataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]¶
LoveDA dataset.
In segmentation map annotation for LoveDA, 0 is the ignore index.
reduce_zero_label
should be set to True. Theimg_suffix
andseg_map_suffix
are both fixed to ‘.png’.
- class mmseg.datasets.MapillaryDataset_v1(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]¶
Mapillary Vistas Dataset.
Dataset paper link: http://ieeexplore.ieee.org/document/8237796/
v1.2 contain 66 object classes. (37 instance-specific)
v2.0 contain 124 object classes. (70 instance-specific, 46 stuff, 8 void or crowd).
The
img_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’ for Mapillary Vistas Dataset.
- class mmseg.datasets.MapillaryDataset_v2(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]¶
Mapillary Vistas Dataset.
Dataset paper link: http://ieeexplore.ieee.org/document/8237796/
v1.2 contain 66 object classes. (37 instance-specific)
v2.0 contain 124 object classes. (70 instance-specific, 46 stuff, 8 void or crowd).
The
img_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’ for Mapillary Vistas Dataset.
- class mmseg.datasets.MultiImageMixDataset(dataset: Union[mmengine.dataset.dataset_wrapper.ConcatDataset, dict], pipeline: Sequence[dict], skip_type_keys: Optional[List[str]] = None, lazy_init: bool = False)[source]¶
A wrapper of multiple images mixed dataset.
Suitable for training on multiple images mixed data augmentation like mosaic and mixup.
- Parameters
dataset (ConcatDataset or dict) – The dataset to be mixed.
pipeline (Sequence[dict]) – Sequence of transform object or config dict to be composed.
skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline. Default to None.
- get_data_info(idx: int) → dict[source]¶
Get annotation by index.
- Parameters
idx (int) – Global index of
ConcatDataset
.- Returns
The idx-th annotation of the datasets.
- Return type
dict
- property metainfo: dict¶
Get the meta information of the multi-image-mixed dataset.
- Returns
The meta information of multi-image-mixed dataset.
- Return type
dict
- class mmseg.datasets.NightDrivingDataset(img_suffix='_leftImg8bit.png', seg_map_suffix='_gtCoarse_labelTrainIds.png', **kwargs)[source]¶
NightDrivingDataset dataset.
- class mmseg.datasets.PackSegInputs(meta_keys=('img_path', 'seg_map_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'reduce_zero_label'))[source]¶
Pack the inputs data for the semantic segmentation.
The
img_meta
item is always populated. The contents of theimg_meta
dictionary depends onmeta_keys
. By default this includes:img_path
: filename of the imageori_shape
: original shape of the image as a tuple (h, w, c)img_shape
: shape of the image input to the network as a tuple (h, w, c). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.pad_shape
: shape of padded imagesscale_factor
: a float indicating the preprocessing scaleflip
: a boolean indicating if image flip transform was usedflip_direction
: the flipping direction
- Parameters
meta_keys (Sequence[str], optional) – Meta keys to be packed from
SegDataSample
and collected indata[img_metas]
. Default:('img_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction')
- class mmseg.datasets.PascalContextDataset(ann_file: str, img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]¶
PascalContext dataset.
In segmentation map annotation for PascalContext, 0 stands for background, which is included in 60 categories.
reduce_zero_label
is fixed to False. Theimg_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’.- Parameters
ann_file (str) – Annotation file path.
- class mmseg.datasets.PascalContextDataset59(ann_file: str, img_suffix='.jpg', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]¶
PascalContext dataset.
In segmentation map annotation for PascalContext, 0 stands for background, which is included in 60 categories.
reduce_zero_label
is fixed to False. Theimg_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’.- Parameters
ann_file (str) – Annotation file path.
- class mmseg.datasets.PascalVOCDataset(ann_file, img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]¶
Pascal VOC dataset.
- Parameters
split (str) – Split txt file for Pascal VOC.
- class mmseg.datasets.PhotoMetricDistortion(brightness_delta: int = 32, contrast_range: Sequence[float] = (0.5, 1.5), saturation_range: Sequence[float] = (0.5, 1.5), hue_delta: int = 18)[source]¶
Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.
random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
Required Keys:
img
Modified Keys:
img
- Parameters
brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.
- brightness(img: numpy.ndarray) → numpy.ndarray[source]¶
Brightness distortion.
- Parameters
img (np.ndarray) – The input image.
- Returns
Image after brightness change.
- Return type
np.ndarray
- contrast(img: numpy.ndarray) → numpy.ndarray[source]¶
Contrast distortion.
- Parameters
img (np.ndarray) – The input image.
- Returns
Image after contrast change.
- Return type
np.ndarray
- convert(img: numpy.ndarray, alpha: int = 1, beta: int = 0) → numpy.ndarray[source]¶
Multiple with alpha and add beat with clip.
- Parameters
img (np.ndarray) – The input image.
alpha (int) – Image weights, change the contrast/saturation of the image. Default: 1
beta (int) – Image bias, change the brightness of the image. Default: 0
- Returns
The transformed image.
- Return type
np.ndarray
- hue(img: numpy.ndarray) → numpy.ndarray[source]¶
Hue distortion.
- Parameters
img (np.ndarray) – The input image.
- Returns
Image after hue change.
- Return type
np.ndarray
- class mmseg.datasets.PotsdamDataset(img_suffix='.png', seg_map_suffix='.png', reduce_zero_label=True, **kwargs)[source]¶
ISPRS Potsdam dataset.
In segmentation map annotation for Potsdam dataset, 0 is the ignore index.
reduce_zero_label
should be set to True. Theimg_suffix
andseg_map_suffix
are both fixed to ‘.png’.
- class mmseg.datasets.REFUGEDataset(**kwargs)[source]¶
REFUGE dataset.
In segmentation map annotation for REFUGE, 0 stands for background, which is not included in 2 categories.
reduce_zero_label
is fixed to True. Theimg_suffix
is fixed to ‘.png’ andseg_map_suffix
is fixed to ‘.png’.
- class mmseg.datasets.RGB2Gray(out_channels=None, weights=(0.299, 0.587, 0.114))[source]¶
Convert RGB image to grayscale image.
Required Keys:
img
Modified Keys:
img
img_shape
This transform calculate the weighted mean of input image channels with
weights
and then expand the channels toout_channels
. Whenout_channels
is None, the number of output channels is the same as input channels.- Parameters
out_channels (int) – Expected number of output channels after transforming. Default: None.
weights (tuple[float]) – The weights to calculate the weighted mean. Default: (0.299, 0.587, 0.114).
- class mmseg.datasets.RandomCrop(crop_size: Union[int, Tuple[int, int]], cat_max_ratio: float = 1.0, ignore_index: int = 255)[source]¶
Random crop the image & seg.
Required Keys:
img
gt_seg_map
Modified Keys:
img
img_shape
gt_seg_map
- Parameters
crop_size (Union[int, Tuple[int, int]]) – Expected size after cropping with the format of (h, w). If set to an integer, then cropping width and height are equal to this integer.
cat_max_ratio (float) – The maximum ratio that single category could occupy.
ignore_index (int) – The label index to be ignored. Default: 255
- class mmseg.datasets.RandomCutOut(prob, n_holes, cutout_shape=None, cutout_ratio=None, fill_in=(0, 0, 0), seg_fill_in=None)[source]¶
CutOut operation.
Randomly drop some regions of image used in Cutout.
Required Keys:
img
gt_seg_map
Modified Keys:
img
gt_seg_map
- Parameters
prob (float) – cutout probability.
n_holes (int | tuple[int, int]) – Number of regions to be dropped. If it is given as a list, number of holes will be randomly selected from the closed interval [n_holes[0], n_holes[1]].
cutout_shape (tuple[int, int] | list[tuple[int, int]]) – The candidate shape of dropped regions. It can be tuple[int, int] to use a fixed cutout shape, or list[tuple[int, int]] to randomly choose shape from the list.
cutout_ratio (tuple[float, float] | list[tuple[float, float]]) – The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time.
fill_in (tuple[float, float, float] | tuple[int, int, int]) – The value of pixel to fill in the dropped regions. Default: (0, 0, 0).
seg_fill_in (int) – The labels of pixel to fill in the dropped regions. If seg_fill_in is None, skip. Default: None.
- class mmseg.datasets.RandomMosaic(prob, img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=0, seg_pad_val=255)[source]¶
Mosaic augmentation. Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.
mosaic transform center_x +------------------------------+ | pad | pad | | +-----------+ | | | | | | | image1 |--------+ | | | | | | | | | image2 | | center_y |----+-------------+-----------| | | cropped | | |pad | image3 | image4 | | | | | +----|-------------+-----------+ | | +-------------+ The mosaic transform steps are as follows: 1. Choose the mosaic center as the intersections of 4 images 2. Get the left top image according to the index, and randomly sample another 3 images from the custom dataset. 3. Sub image will be cropped if image is larger than mosaic patch
Required Keys:
img
gt_seg_map
mix_results
Modified Keys:
img
img_shape
ori_shape
gt_seg_map
- Parameters
prob (float) – mosaic probability.
img_scale (Sequence[int]) – Image size after mosaic pipeline of a single image. The size of the output image is four times that of a single image. The output image comprises 4 single images. Default: (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default: (0.5, 1.5).
pad_val (int) – Pad value. Default: 0.
seg_pad_val (int) – Pad value of segmentation map. Default: 255.
- get_indices(dataset: mmseg.datasets.dataset_wrappers.MultiImageMixDataset) → list[source]¶
Call function to collect indices.
- Parameters
dataset (
MultiImageMixDataset
) – The dataset.- Returns
indices.
- Return type
list
- class mmseg.datasets.RandomRotFlip(rotate_prob=0.5, flip_prob=0.5, degree=(- 20, 20))[source]¶
Rotate and flip the image & seg or just rotate the image & seg.
Required Keys:
img
gt_seg_map
Modified Keys:
img
gt_seg_map
- Parameters
rotate_prob (float) – The probability of rotate image.
flip_prob (float) – The probability of rotate&flip image.
degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (
-degree
,+degree
)
- class mmseg.datasets.RandomRotate(prob, degree, pad_val=0, seg_pad_val=255, center=None, auto_bound=False)[source]¶
Rotate the image & seg.
Required Keys:
img
gt_seg_map
Modified Keys:
img
gt_seg_map
- Parameters
prob (float) – The rotation probability.
degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (
-degree
,+degree
)pad_val (float, optional) – Padding value of image. Default: 0.
seg_pad_val (float, optional) – Padding value of segmentation map. Default: 255.
center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used. Default: None.
auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image. Default: False
- class mmseg.datasets.Rerange(min_value=0, max_value=255)[source]¶
Rerange the image pixel value.
Required Keys:
img
Modified Keys:
img
- Parameters
min_value (float or int) – Minimum value of the reranged image. Default: 0.
max_value (float or int) – Maximum value of the reranged image. Default: 255.
- class mmseg.datasets.ResizeShortestEdge(scale: Union[int, Tuple[int, int]], max_size: int)[source]¶
Resize the image and mask while keeping the aspect ratio unchanged.
Modified from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/transforms/augmentation_impl.py#L130 # noqa:E501 Copyright (c) Facebook, Inc. and its affiliates. Licensed under the Apache-2.0 License
This transform attempts to scale the shorter edge to the given scale, as long as the longer edge does not exceed max_size. If max_size is reached, then downscale so that the longer edge does not exceed max_size.
Required Keys:
img
gt_seg_map (optional)
Modified Keys:
img
img_shape
gt_seg_map (optional))
Added Keys:
scale
scale_factor
keep_ratio
- Parameters
scale (Union[int, Tuple[int, int]]) – The target short edge length. If it’s tuple, will select the min value as the short edge length.
max_size (int) – The maximum allowed longest edge length.
- transform(results: Dict) → Dict[source]¶
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
- Parameters
results (dict) – The result dict.
- Returns
The result dict.
- Return type
dict
- class mmseg.datasets.ResizeToMultiple(size_divisor=32, interpolation=None)[source]¶
Resize images & seg to multiple of divisor.
Required Keys:
img
gt_seg_map
Modified Keys:
img
img_shape
pad_shape
- Parameters
size_divisor (int) – images and gt seg maps need to resize to multiple of size_divisor. Default: 32.
interpolation (str, optional) – The interpolation mode of image resize. Default: None
- class mmseg.datasets.STAREDataset(img_suffix='.png', seg_map_suffix='.ah.png', reduce_zero_label=False, **kwargs)[source]¶
STARE dataset.
In segmentation map annotation for STARE, 0 stands for background, which is included in 2 categories.
reduce_zero_label
is fixed to False. Theimg_suffix
is fixed to ‘.png’ andseg_map_suffix
is fixed to ‘.ah.png’.
- class mmseg.datasets.SegRescale(scale_factor=1)[source]¶
Rescale semantic segmentation maps.
Required Keys:
gt_seg_map
Modified Keys:
gt_seg_map
- Parameters
scale_factor (float) – The scale factor of the final output.
- class mmseg.datasets.SynapseDataset(img_suffix='.jpg', seg_map_suffix='.png', **kwargs)[source]¶
Synapse dataset.
Before dataset preprocess of Synapse, there are total 13 categories of foreground which does not include background. After preprocessing, 8 foreground categories are kept while the other 5 foreground categories are handled as background. The
img_suffix
is fixed to ‘.jpg’ andseg_map_suffix
is fixed to ‘.png’.
- class mmseg.datasets.iSAIDDataset(img_suffix='.png', seg_map_suffix='_instance_color_RGB.png', ignore_index=255, **kwargs)[source]¶
iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images In segmentation map annotation for iSAID dataset, which is included in 16 categories.
reduce_zero_label
is fixed to False. Theimg_suffix
is fixed to ‘.png’ andseg_map_suffix
is fixed to ‘_manual1.png’.
transforms¶
- class mmseg.datasets.transforms.AdjustGamma(gamma=1.0)[source]¶
Using gamma correction to process the image.
Required Keys:
img
Modified Keys:
img
- Parameters
gamma (float or int) – Gamma value used in gamma correction. Default: 1.0.
- class mmseg.datasets.transforms.BioMedical3DPad(pad_shape: Tuple[int, int, int], pad_val: float = 0.0, seg_pad_val: int = 0)[source]¶
Pad the biomedical 3d image & biomedical 3d semantic segmentation maps.
Required Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Modified Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Added Keys:
pad_shape (Tuple[int, int, int]): The padded shape.
- Parameters
pad_shape (Tuple[int, int, int]) – Fixed padding size. Expected padding shape (Z, Y, X).
pad_val (float) – Padding value for biomedical image. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.
seg_pad_val (int) – Padding value for biomedical 3d semantic segmentation maps. The padding mode is set to “constant”. The value to be filled in padding area. Default: 0.
- class mmseg.datasets.transforms.BioMedical3DRandomCrop(crop_shape: Union[int, Tuple[int, int, int]], keep_foreground: bool = True)[source]¶
Crop the input patch for medical image & segmentation mask.
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
- gt_seg_map (np.ndarray, optional): Biomedical semantic segmentation mask
with shape (Z, Y, X).
Modified Keys:
img
img_shape
gt_seg_map (optional)
- Parameters
crop_shape (Union[int, Tuple[int, int, int]]) – Expected size after cropping with the format of (z, y, x). If set to an integer, then cropping width and height are equal to this integer.
keep_foreground (bool) – If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the crop bounding-box. Default to True.
- crop(img: numpy.ndarray, crop_bbox: tuple) → numpy.ndarray[source]¶
Crop from
img
- Parameters
img (np.ndarray) – Original input image.
crop_bbox (tuple) – Coordinates of the cropped image.
- Returns
The cropped image.
- Return type
np.ndarray
- generate_margin(results: dict) → tuple[source]¶
Generate margin of crop bounding-box.
If keep_foreground is True, it will sample a voxel of foreground classes randomly, and will take it as the center of the bounding-box, and return the margin between of the bounding-box and image. If keep_foreground is False, it will return the difference from crop shape and image shape.
- Parameters
results (dict) – Result dict from loading pipeline.
- Returns
The margin for 3 dimensions of crop bounding-box and image.
- Return type
tuple
- random_generate_crop_bbox(margin_z: int, margin_y: int, margin_x: int) → tuple[source]¶
Randomly get a crop bounding box.
- Parameters
seg_map (np.ndarray) – Ground truth segmentation map.
- Returns
Coordinates of the cropped image.
- Return type
tuple
- class mmseg.datasets.transforms.BioMedical3DRandomFlip(prob: float, axes: Tuple[int, ...], swap_label_pairs: Optional[List[Tuple[int, int]]] = None)[source]¶
Flip biomedical 3D images and segmentations.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/spatial_transforms.py # noqa:E501
Copyright 2021 Division of Medical Image Computing, German Cancer Research Center (DKFZ) and Applied Computer Vision Lab, Helmholtz Imaging Platform. Licensed under the Apache-2.0 License.
Required Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Modified Keys:
- img (np.ndarry): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
Added Keys:
do_flip
flip_axes
- Parameters
prob (float) – Flipping probability.
axes (Tuple[int, ...]) – Flipping axes with order ‘ZXY’.
swap_label_pairs (Optional[List[Tuple[int, int]]]) –
segmentation label pairs that are swapped when flipping. (The) –
- class mmseg.datasets.transforms.BioMedicalGaussianBlur(sigma_range: Tuple[float, float] = (0.5, 1.0), prob: float = 0.2, prob_per_channel: float = 0.5, different_sigma_per_channel: bool = True, different_sigma_per_axis: bool = True)[source]¶
Add Gaussian blur with random sigma to image.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L81 # noqa:E501
Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
Modified Keys:
img
- Parameters
sigma_range (Tuple[float, float]|float) – range to randomly select sigma value. Default to (0.5, 1.0).
prob (float) – Probability to apply Gaussian blur for each sample. Default to 0.2.
prob_per_channel (float) – Probability to apply Gaussian blur for each channel (axis N of the image). Default to 0.5.
different_sigma_per_channel (bool) – whether to use different sigma for each channel (axis N of the image). Default to True.
different_sigma_per_axis (bool) – whether to use different sigma for axis Z, X and Y of the image. Default to True.
- class mmseg.datasets.transforms.BioMedicalGaussianNoise(prob: float = 0.1, mean: float = 0.0, std: float = 0.1)[source]¶
Add random Gaussian noise to image.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/7651ece69faf55263dd582a9f5cbd149ed9c3ad0/batchgenerators/transforms/noise_transforms.py#L53 # noqa:E501
Copyright (c) German Cancer Research Center (DKFZ) Licensed under the Apache License, Version 2.0
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
Modified Keys:
img
- Parameters
prob (float) – Probability to add Gaussian noise for each sample. Default to 0.1.
mean (float) – Mean or “centre” of the distribution. Default to 0.0.
std (float) – Standard deviation of distribution. Default to 0.1.
- class mmseg.datasets.transforms.BioMedicalRandomGamma(prob: float = 0.5, gamma_range: Tuple[float] = (0.5, 2), invert_image: bool = False, per_channel: bool = False, retain_stats: bool = False)[source]¶
Using random gamma correction to process the biomedical image.
Modified from https://github.com/MIC-DKFZ/batchgenerators/blob/master/batchgenerators/transforms/color_transforms.py#L132 # noqa:E501 With licence: Apache 2.0
Required Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X),
N is the number of modalities, and data type is float32.
Modified Keys: - img
- Parameters
prob (float) – The probability to perform this transform. Default: 0.5.
gamma_range (Tuple[float]) – Range of gamma values. Default: (0.5, 2).
invert_image (bool) – Whether invert the image before applying gamma augmentation. Default: False.
per_channel (bool) – Whether perform the transform each channel individually. Default: False
retain_stats (bool) – Gamma transformation will alter the mean and std of the data in the patch. If retain_stats=True, the data will be transformed to match the mean and standard deviation before gamma augmentation. Default: False.
- class mmseg.datasets.transforms.CLAHE(clip_limit=40.0, tile_grid_size=(8, 8))[source]¶
Use CLAHE method to process the image.
See ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994:474-485. for more information.
Required Keys:
img
Modified Keys:
img
- Parameters
clip_limit (float) – Threshold for contrast limiting. Default: 40.0.
tile_grid_size (tuple[int]) – Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. It defines the number of tiles in row and column. Default: (8, 8).
- class mmseg.datasets.transforms.GenerateEdge(edge_width: int = 3, ignore_index: int = 255)[source]¶
Generate Edge for CE2P approach.
Edge will be used to calculate loss of CE2P.
Modified from https://github.com/liutinglt/CE2P/blob/master/dataset/target_generation.py # noqa:E501
Required Keys:
img_shape
gt_seg_map
- Added Keys:
- gt_edge_map (np.ndarray, uint8): The edge annotation generated from the
seg map by extracting border between different semantics.
- Parameters
edge_width (int) – The width of edge. Default to 3.
ignore_index (int) – Index that will be ignored. Default to 255.
- class mmseg.datasets.transforms.LoadAnnotations(reduce_zero_label=None, backend_args=None, imdecode_backend='pillow')[source]¶
Load annotations for semantic segmentation provided by dataset.
The annotation format is as the following:
{ # Filename of semantic segmentation ground truth file. 'seg_map_path': 'a/b/c' }
After this module, the annotation has been changed to the format below:
{ # in str 'seg_fields': List # In uint8 type. 'gt_seg_map': np.ndarray (H, W) }
Required Keys:
seg_map_path (str): Path of semantic segmentation ground truth file.
Added Keys:
seg_fields (List)
gt_seg_map (np.uint8)
- Parameters
reduce_zero_label (bool, optional) – Whether reduce all label value by 1. Usually used for datasets where 0 is background label. Defaults to None.
imdecode_backend (str) – The image decoding backend type. The backend argument for :func:
mmcv.imfrombytes
. See :fun:mmcv.imfrombytes
for details. Defaults to ‘pillow’.backend_args (dict) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.transforms.LoadBiomedicalAnnotation(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]¶
Load
seg_map
annotation provided by biomedical dataset.The annotation format is as the following:
{ 'gt_seg_map': np.ndarray (X, Y, Z) or (Z, Y, X) }
Required Keys:
seg_map_path
Added Keys:
- gt_seg_map (np.ndarray): Biomedical seg map with shape (Z, Y, X) by
default, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.
- Parameters
decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.
to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.
to_float32 (bool) – Whether to convert the loaded seg map to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See
mmengine.fileio
for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.transforms.LoadBiomedicalData(with_seg=False, decode_backend: str = 'numpy', to_xyz: bool = False, backend_args: Optional[dict] = None)[source]¶
Load an biomedical image and annotation from file.
The loading data format is as the following:
{ 'img': np.ndarray data[:-1, X, Y, Z] 'seg_map': np.ndarray data[-1, X, Y, Z] }
Required Keys:
img_path
Added Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities.
- gt_seg_map (np.ndarray, optional): Biomedical seg map with shape
(Z, Y, X) by default.
img_shape
ori_shape
- Parameters
with_seg (bool) – Whether to parse and load the semantic segmentation annotation. Defaults to False.
decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.
to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.transforms.LoadBiomedicalImageFromFile(decode_backend: str = 'nifti', to_xyz: bool = False, to_float32: bool = True, backend_args: Optional[dict] = None)[source]¶
Load an biomedical mage from file.
Required Keys:
img_path
Added Keys:
- img (np.ndarray): Biomedical image with shape (N, Z, Y, X) by default,
N is the number of modalities, and data type is float32 if set to_float32 = True, or float64 if decode_backend is ‘nifti’ and to_float32 is False.
img_shape
ori_shape
- Parameters
decode_backend (str) – The data decoding backend type. Options are ‘numpy’and ‘nifti’, and there is a convention that when backend is ‘nifti’ the axis of data loaded is XYZ, and when backend is ‘numpy’, the the axis is ZYX. The data will be transposed if the backend is ‘nifti’. Defaults to ‘nifti’.
to_xyz (bool) – Whether transpose data from Z, Y, X to X, Y, Z. Defaults to False.
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an float64 array. Defaults to True.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
- class mmseg.datasets.transforms.LoadImageFromNDArray(to_float32: bool = False, color_type: str = 'color', imdecode_backend: str = 'cv2', file_client_args: Optional[dict] = None, ignore_empty: bool = False, *, backend_args: Optional[dict] = None)[source]¶
Load an image from
results['img']
.Similar with
LoadImageFromFile
, but the image has been loaded asnp.ndarray
inresults['img']
. Can be used when loading image from webcam.Required Keys:
img
Modified Keys:
img
img_path
img_shape
ori_shape
- Parameters
to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.
- class mmseg.datasets.transforms.PackSegInputs(meta_keys=('img_path', 'seg_map_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'reduce_zero_label'))[source]¶
Pack the inputs data for the semantic segmentation.
The
img_meta
item is always populated. The contents of theimg_meta
dictionary depends onmeta_keys
. By default this includes:img_path
: filename of the imageori_shape
: original shape of the image as a tuple (h, w, c)img_shape
: shape of the image input to the network as a tuple (h, w, c). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.pad_shape
: shape of padded imagesscale_factor
: a float indicating the preprocessing scaleflip
: a boolean indicating if image flip transform was usedflip_direction
: the flipping direction
- Parameters
meta_keys (Sequence[str], optional) – Meta keys to be packed from
SegDataSample
and collected indata[img_metas]
. Default:('img_path', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction')
- class mmseg.datasets.transforms.PhotoMetricDistortion(brightness_delta: int = 32, contrast_range: Sequence[float] = (0.5, 1.5), saturation_range: Sequence[float] = (0.5, 1.5), hue_delta: int = 18)[source]¶
Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.
random brightness
random contrast (mode 0)
convert color from BGR to HSV
random saturation
random hue
convert color from HSV to BGR
random contrast (mode 1)
Required Keys:
img
Modified Keys:
img
- Parameters
brightness_delta (int) – delta of brightness.
contrast_range (tuple) – range of contrast.
saturation_range (tuple) – range of saturation.
hue_delta (int) – delta of hue.
- brightness(img: numpy.ndarray) → numpy.ndarray[source]¶
Brightness distortion.
- Parameters
img (np.ndarray) – The input image.
- Returns
Image after brightness change.
- Return type
np.ndarray
- contrast(img: numpy.ndarray) → numpy.ndarray[source]¶
Contrast distortion.
- Parameters
img (np.ndarray) – The input image.
- Returns
Image after contrast change.
- Return type
np.ndarray
- convert(img: numpy.ndarray, alpha: int = 1, beta: int = 0) → numpy.ndarray[source]¶
Multiple with alpha and add beat with clip.
- Parameters
img (np.ndarray) – The input image.
alpha (int) – Image weights, change the contrast/saturation of the image. Default: 1
beta (int) – Image bias, change the brightness of the image. Default: 0
- Returns
The transformed image.
- Return type
np.ndarray
- hue(img: numpy.ndarray) → numpy.ndarray[source]¶
Hue distortion.
- Parameters
img (np.ndarray) – The input image.
- Returns
Image after hue change.
- Return type
np.ndarray
- class mmseg.datasets.transforms.RGB2Gray(out_channels=None, weights=(0.299, 0.587, 0.114))[source]¶
Convert RGB image to grayscale image.
Required Keys:
img
Modified Keys:
img
img_shape
This transform calculate the weighted mean of input image channels with
weights
and then expand the channels toout_channels
. Whenout_channels
is None, the number of output channels is the same as input channels.- Parameters
out_channels (int) – Expected number of output channels after transforming. Default: None.
weights (tuple[float]) – The weights to calculate the weighted mean. Default: (0.299, 0.587, 0.114).
- class mmseg.datasets.transforms.RandomCrop(crop_size: Union[int, Tuple[int, int]], cat_max_ratio: float = 1.0, ignore_index: int = 255)[source]¶
Random crop the image & seg.
Required Keys:
img
gt_seg_map
Modified Keys:
img
img_shape
gt_seg_map
- Parameters
crop_size (Union[int, Tuple[int, int]]) – Expected size after cropping with the format of (h, w). If set to an integer, then cropping width and height are equal to this integer.
cat_max_ratio (float) – The maximum ratio that single category could occupy.
ignore_index (int) – The label index to be ignored. Default: 255
- class mmseg.datasets.transforms.RandomCutOut(prob, n_holes, cutout_shape=None, cutout_ratio=None, fill_in=(0, 0, 0), seg_fill_in=None)[source]¶
CutOut operation.
Randomly drop some regions of image used in Cutout.
Required Keys:
img
gt_seg_map
Modified Keys:
img
gt_seg_map
- Parameters
prob (float) – cutout probability.
n_holes (int | tuple[int, int]) – Number of regions to be dropped. If it is given as a list, number of holes will be randomly selected from the closed interval [n_holes[0], n_holes[1]].
cutout_shape (tuple[int, int] | list[tuple[int, int]]) – The candidate shape of dropped regions. It can be tuple[int, int] to use a fixed cutout shape, or list[tuple[int, int]] to randomly choose shape from the list.
cutout_ratio (tuple[float, float] | list[tuple[float, float]]) – The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time.
fill_in (tuple[float, float, float] | tuple[int, int, int]) – The value of pixel to fill in the dropped regions. Default: (0, 0, 0).
seg_fill_in (int) – The labels of pixel to fill in the dropped regions. If seg_fill_in is None, skip. Default: None.
- class mmseg.datasets.transforms.RandomMosaic(prob, img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=0, seg_pad_val=255)[source]¶
Mosaic augmentation. Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.
mosaic transform center_x +------------------------------+ | pad | pad | | +-----------+ | | | | | | | image1 |--------+ | | | | | | | | | image2 | | center_y |----+-------------+-----------| | | cropped | | |pad | image3 | image4 | | | | | +----|-------------+-----------+ | | +-------------+ The mosaic transform steps are as follows: 1. Choose the mosaic center as the intersections of 4 images 2. Get the left top image according to the index, and randomly sample another 3 images from the custom dataset. 3. Sub image will be cropped if image is larger than mosaic patch
Required Keys:
img
gt_seg_map
mix_results
Modified Keys:
img
img_shape
ori_shape
gt_seg_map
- Parameters
prob (float) – mosaic probability.
img_scale (Sequence[int]) – Image size after mosaic pipeline of a single image. The size of the output image is four times that of a single image. The output image comprises 4 single images. Default: (640, 640).
center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default: (0.5, 1.5).
pad_val (int) – Pad value. Default: 0.
seg_pad_val (int) – Pad value of segmentation map. Default: 255.
- get_indices(dataset: mmseg.datasets.dataset_wrappers.MultiImageMixDataset) → list[source]¶
Call function to collect indices.
- Parameters
dataset (
MultiImageMixDataset
) – The dataset.- Returns
indices.
- Return type
list
- class mmseg.datasets.transforms.RandomRotFlip(rotate_prob=0.5, flip_prob=0.5, degree=(- 20, 20))[source]¶
Rotate and flip the image & seg or just rotate the image & seg.
Required Keys:
img
gt_seg_map
Modified Keys:
img
gt_seg_map
- Parameters
rotate_prob (float) – The probability of rotate image.
flip_prob (float) – The probability of rotate&flip image.
degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (
-degree
,+degree
)
- class mmseg.datasets.transforms.RandomRotate(prob, degree, pad_val=0, seg_pad_val=255, center=None, auto_bound=False)[source]¶
Rotate the image & seg.
Required Keys:
img
gt_seg_map
Modified Keys:
img
gt_seg_map
- Parameters
prob (float) – The rotation probability.
degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (
-degree
,+degree
)pad_val (float, optional) – Padding value of image. Default: 0.
seg_pad_val (float, optional) – Padding value of segmentation map. Default: 255.
center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used. Default: None.
auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image. Default: False
- class mmseg.datasets.transforms.Rerange(min_value=0, max_value=255)[source]¶
Rerange the image pixel value.
Required Keys:
img
Modified Keys:
img
- Parameters
min_value (float or int) – Minimum value of the reranged image. Default: 0.
max_value (float or int) – Maximum value of the reranged image. Default: 255.
- class mmseg.datasets.transforms.ResizeShortestEdge(scale: Union[int, Tuple[int, int]], max_size: int)[source]¶
Resize the image and mask while keeping the aspect ratio unchanged.
Modified from https://github.com/facebookresearch/detectron2/blob/main/detectron2/data/transforms/augmentation_impl.py#L130 # noqa:E501 Copyright (c) Facebook, Inc. and its affiliates. Licensed under the Apache-2.0 License
This transform attempts to scale the shorter edge to the given scale, as long as the longer edge does not exceed max_size. If max_size is reached, then downscale so that the longer edge does not exceed max_size.
Required Keys:
img
gt_seg_map (optional)
Modified Keys:
img
img_shape
gt_seg_map (optional))
Added Keys:
scale
scale_factor
keep_ratio
- Parameters
scale (Union[int, Tuple[int, int]]) – The target short edge length. If it’s tuple, will select the min value as the short edge length.
max_size (int) – The maximum allowed longest edge length.
- transform(results: Dict) → Dict[source]¶
The transform function. All subclass of BaseTransform should override this method.
This function takes the result dict as the input, and can add new items to the dict or modify existing items in the dict. And the result dict will be returned in the end, which allows to concate multiple transforms into a pipeline.
- Parameters
results (dict) – The result dict.
- Returns
The result dict.
- Return type
dict
- class mmseg.datasets.transforms.ResizeToMultiple(size_divisor=32, interpolation=None)[source]¶
Resize images & seg to multiple of divisor.
Required Keys:
img
gt_seg_map
Modified Keys:
img
img_shape
pad_shape
- Parameters
size_divisor (int) – images and gt seg maps need to resize to multiple of size_divisor. Default: 32.
interpolation (str, optional) – The interpolation mode of image resize. Default: None
mmseg.engine¶
hooks¶
- class mmseg.engine.hooks.SegVisualizationHook(draw: bool = False, interval: int = 50, show: bool = False, wait_time: float = 0.0, backend_args: Optional[dict] = None)[source]¶
Segmentation Visualization Hook. Used to visualize validation and testing process prediction results.
In the testing phase:
- If
show
is True, it means that only the prediction results are visualized without storing data, so
vis_backends
needs to be excluded.
- If
- Parameters
draw (bool) – whether to draw prediction results. If it is False, it means that no drawing will be done. Defaults to False.
interval (int) – The interval of visualization. Defaults to 50.
show (bool) – Whether to display the drawn image. Default to False.
wait_time (float) – The interval of show (s). Defaults to 0.
backend_args (dict, Optional) – Arguments to instantiate a file backend. See https://mmengine.readthedocs.io/en/latest/api/fileio.htm for details. Defaults to None. Notes: mmcv>=2.0.0rc4, mmengine>=0.2.0 required.
optimizers¶
- class mmseg.engine.optimizers.LayerDecayOptimizerConstructor(optim_wrapper_cfg, paramwise_cfg)[source]¶
Different learning rates are set for different layers of backbone.
Note: Currently, this optimizer constructor is built for BEiT, and it will be deprecated. Please use
LearningRateDecayOptimizerConstructor
instead.
- class mmseg.engine.optimizers.LearningRateDecayOptimizerConstructor(optim_wrapper_cfg: dict, paramwise_cfg: Optional[dict] = None)[source]¶
Different learning rates are set for different layers of backbone.
Note: Currently, this optimizer constructor is built for ConvNeXt, BEiT and MAE.
- add_params(params, module, **kwargs)[source]¶
Add all parameters of module to the params list.
The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.
- Parameters
params (list[dict]) – A list of param groups, it will be modified in place.
module (nn.Module) – The module to be added.
mmseg.evaluation¶
metrics¶
- class mmseg.evaluation.metrics.CityscapesMetric(output_dir: str, ignore_index: int = 255, format_only: bool = False, keep_results: bool = False, collect_device: str = 'cpu', prefix: Optional[str] = None, **kwargs)[source]¶
Cityscapes evaluation metric.
- Parameters
output_dir (str) – The directory for output prediction
ignore_index (int) – Index that will be ignored in evaluation. Default: 255.
format_only (bool) – Only format result for results commit without perform evaluation. It is useful when you want to format the result to a specific format and submit it to the test server. Defaults to False.
keep_results (bool) – Whether to keep the results. When
format_only
is True,keep_results
must be True. Defaults to False.collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
- compute_metrics(results: list) → Dict[str, float][source]¶
Compute the metrics from processed results.
- Parameters
results (list) – Testing results of the dataset.
- Returns
float]: Cityscapes evaluation results.
- Return type
dict[str
- process(data_batch: dict, data_samples: Sequence[dict]) → None[source]¶
Process one batch of data and data_samples.
The processed results should be stored in
self.results
, which will be used to computed the metrics when all batches have been processed.- Parameters
data_batch (dict) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.
- class mmseg.evaluation.metrics.IoUMetric(ignore_index: int = 255, iou_metrics: List[str] = ['mIoU'], nan_to_num: Optional[int] = None, beta: int = 1, collect_device: str = 'cpu', output_dir: Optional[str] = None, format_only: bool = False, prefix: Optional[str] = None, **kwargs)[source]¶
IoU evaluation metric.
- Parameters
ignore_index (int) – Index that will be ignored in evaluation. Default: 255.
iou_metrics (list[str] | str) – Metrics to be calculated, the options includes ‘mIoU’, ‘mDice’ and ‘mFscore’.
nan_to_num (int, optional) – If specified, NaN values will be replaced by the numbers defined by the user. Default: None.
beta (int) – Determines the weight of recall in the combined score. Default: 1.
collect_device (str) – Device name used for collecting results from different ranks during distributed training. Must be ‘cpu’ or ‘gpu’. Defaults to ‘cpu’.
output_dir (str) – The directory for output prediction. Defaults to None.
format_only (bool) – Only format result for results commit without perform evaluation. It is useful when you want to save the result to a specific format and submit it to the test server. Defaults to False.
prefix (str, optional) – The prefix that will be added in the metric names to disambiguate homonymous metrics of different evaluators. If prefix is not provided in the argument, self.default_prefix will be used instead. Defaults to None.
- compute_metrics(results: list) → Dict[str, float][source]¶
Compute the metrics from processed results.
- Parameters
results (list) – The processed results of each batch.
- Returns
- The computed metrics. The keys are the names of
the metrics, and the values are corresponding results. The key mainly includes aAcc, mIoU, mAcc, mDice, mFscore, mPrecision, mRecall.
- Return type
Dict[str, float]
- static intersect_and_union(pred_label: torch._VariableFunctionsClass.tensor, label: torch._VariableFunctionsClass.tensor, num_classes: int, ignore_index: int)[source]¶
Calculate Intersection and Union.
- Parameters
pred_label (torch.tensor) – Prediction segmentation map or predict result filename. The shape is (H, W).
label (torch.tensor) – Ground truth segmentation map or label filename. The shape is (H, W).
num_classes (int) – Number of categories.
ignore_index (int) – Index that will be ignored in evaluation.
- Returns
- The intersection of prediction and ground truth
histogram on all classes.
- torch.Tensor: The union of prediction and ground truth histogram on
all classes.
torch.Tensor: The prediction histogram on all classes. torch.Tensor: The ground truth histogram on all classes.
- Return type
torch.Tensor
- process(data_batch: dict, data_samples: Sequence[dict]) → None[source]¶
Process one batch of data and data_samples.
The processed results should be stored in
self.results
, which will be used to compute the metrics when all batches have been processed.- Parameters
data_batch (dict) – A batch of data from the dataloader.
data_samples (Sequence[dict]) – A batch of outputs from the model.
- static total_area_to_metrics(total_area_intersect: numpy.ndarray, total_area_union: numpy.ndarray, total_area_pred_label: numpy.ndarray, total_area_label: numpy.ndarray, metrics: List[str] = ['mIoU'], nan_to_num: Optional[int] = None, beta: int = 1)[source]¶
Calculate evaluation metrics :param total_area_intersect: The intersection of prediction
and ground truth histogram on all classes.
- Parameters
total_area_union (np.ndarray) – The union of prediction and ground truth histogram on all classes.
total_area_pred_label (np.ndarray) – The prediction histogram on all classes.
total_area_label (np.ndarray) – The ground truth histogram on all classes.
metrics (List[str] | str) – Metrics to be evaluated, ‘mIoU’ and ‘mDice’.
nan_to_num (int, optional) – If specified, NaN values will be replaced by the numbers defined by the user. Default: None.
beta (int) – Determines the weight of recall in the combined score. Default: 1.
- Returns
- per category evaluation metrics,
shape (num_classes, ).
- Return type
Dict[str, np.ndarray]
mmseg.models¶
backbones¶
- class mmseg.models.backbones.BEiT(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, qv_bias=True, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, num_fcs=2, norm_eval=False, pretrained=None, init_values=0.1, init_cfg=None)[source]¶
BERT Pre-Training of Image Transformers.
- Parameters
img_size (int | tuple) – Input image size. Default: 224.
patch_size (int) – The patch size. Default: 16.
in_channels (int) – Number of input channels. Default: 3.
embed_dims (int) – Embedding dimension. Default: 768.
num_layers (int) – Depth of transformer. Default: 12.
num_heads (int) – Number of attention heads. Default: 12.
mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Default: 4.
out_indices (list | tuple | int) – Output from which stages. Default: -1.
qv_bias (bool) – Enable bias for qv if True. Default: True.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0
drop_path_rate (float) – Stochastic depth rate. Default 0.0.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.
final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.
num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
pretrained (str, optional) – Model pretrained path. Default: None.
init_values (float) – Initialize the values of BEiTAttention and FFN with learnable scaling.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- forward(inputs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- resize_rel_pos_embed(checkpoint)[source]¶
Resize relative pos_embed weights.
This function is modified from https://github.com/microsoft/unilm/blob/master/beit/semantic_segmentation/mmcv_custom/checkpoint.py. # noqa: E501 Copyright (c) Microsoft Corporation Licensed under the MIT License :param checkpoint: Key and value of the pretrain model. :type checkpoint: dict
- Returns
- Interpolate the relative pos_embed weights
in the pre-train model to the current model size.
- Return type
state_dict (dict)
- train(mode=True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
- class mmseg.models.backbones.BiSeNetV1(backbone_cfg, in_channels=3, spatial_channels=(64, 64, 64, 128), context_channels=(128, 256, 512), out_indices=(0, 1, 2), align_corners=False, out_channels=256, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]¶
BiSeNetV1 backbone.
This backbone is the implementation of BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation.
- Parameters
backbone_cfg – (dict): Config of backbone of Context Path.
in_channels (int) – The number of channels of input image. Default: 3.
spatial_channels (Tuple[int]) – Size of channel numbers of various layers in Spatial Path. Default: (64, 64, 64, 128).
context_channels (Tuple[int]) – Size of channel numbers of various modules in Context Path. Default: (128, 256, 512).
out_indices (Tuple[int] | int, optional) – Output from which stages. Default: (0, 1, 2).
align_corners (bool, optional) – The align_corners argument of resize operation in Bilateral Guided Aggregation Layer. Default: False.
out_channels (int) – The number of channels of output. It must be the same with in_channels of decode_head. Default: 256.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.BiSeNetV2(in_channels=3, detail_channels=(64, 64, 128), semantic_channels=(16, 32, 64, 128), semantic_expansion_ratio=6, bga_channels=128, out_indices=(0, 1, 2, 3, 4), align_corners=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]¶
BiSeNetV2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation.
This backbone is the implementation of BiSeNetV2.
- Parameters
in_channels (int) – Number of channel of input image. Default: 3.
detail_channels (Tuple[int], optional) – Channels of each stage in Detail Branch. Default: (64, 64, 128).
semantic_channels (Tuple[int], optional) – Channels of each stage in Semantic Branch. Default: (16, 32, 64, 128). See Table 1 and Figure 3 of paper for more details.
semantic_expansion_ratio (int, optional) – The expansion factor expanding channel number of middle channels in Semantic Branch. Default: 6.
bga_channels (int, optional) – Number of middle channels in Bilateral Guided Aggregation Layer. Default: 128.
out_indices (Tuple[int] | int, optional) – Output from which stages. Default: (0, 1, 2, 3, 4).
align_corners (bool, optional) – The align_corners argument of resize operation in Bilateral Guided Aggregation Layer. Default: False.
conv_cfg (dict | None) – Config of conv layers. Default: None.
norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’BN’).
act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.CGNet(in_channels=3, num_channels=(32, 64, 128), num_blocks=(3, 21), dilations=(2, 4), reductions=(8, 16), conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'PReLU'}, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]¶
CGNet backbone.
This backbone is the implementation of A Light-weight Context Guided Network for Semantic Segmentation.
- Parameters
in_channels (int) – Number of input image channels. Normally 3.
num_channels (tuple[int]) – Numbers of feature channels at each stages. Default: (32, 64, 128).
num_blocks (tuple[int]) – Numbers of CG blocks at stage 1 and stage 2. Default: (3, 21).
dilations (tuple[int]) – Dilation rate for surrounding context extractors at stage 1 and stage 2. Default: (2, 4).
reductions (tuple[int]) – Reductions for global context extractors at stage 1 and stage 2. Default: (8, 16).
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’, requires_grad=True).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’PReLU’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.ERFNet(in_channels=3, enc_downsample_channels=(16, 64, 128), enc_stage_non_bottlenecks=(5, 8), enc_non_bottleneck_dilations=(2, 4, 8, 16), enc_non_bottleneck_channels=(64, 128), dec_upsample_channels=(64, 16), dec_stages_non_bottleneck=(2, 2), dec_non_bottleneck_channels=(64, 16), dropout_ratio=0.1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]¶
ERFNet backbone.
This backbone is the implementation of ERFNet: Efficient Residual Factorized ConvNet for Real-time SemanticSegmentation.
- Parameters
in_channels (int) – The number of channels of input image. Default: 3.
enc_downsample_channels (Tuple[int]) – Size of channel numbers of various Downsampler block in encoder. Default: (16, 64, 128).
enc_stage_non_bottlenecks (Tuple[int]) – Number of stages of Non-bottleneck block in encoder. Default: (5, 8).
enc_non_bottleneck_dilations (Tuple[int]) – Dilation rate of each stage of Non-bottleneck block of encoder. Default: (2, 4, 8, 16).
enc_non_bottleneck_channels (Tuple[int]) – Size of channel numbers of various Non-bottleneck block in encoder. Default: (64, 128).
dec_upsample_channels (Tuple[int]) – Size of channel numbers of various Deconvolution block in decoder. Default: (64, 16).
dec_stages_non_bottleneck (Tuple[int]) – Number of stages of Non-bottleneck block in decoder. Default: (2, 2).
dec_non_bottleneck_channels (Tuple[int]) – Size of channel numbers of various Non-bottleneck block in decoder. Default: (64, 16).
drop_rate (float) – Probability of an element to be zeroed. Default 0.1.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.FastSCNN(in_channels=3, downsample_dw_channels=(32, 48), global_in_channels=64, global_block_channels=(64, 96, 128), global_block_strides=(2, 2, 1), global_out_channels=128, higher_in_channels=64, lower_in_channels=128, fusion_out_channels=128, out_indices=(0, 1, 2), conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, align_corners=False, dw_act_cfg=None, init_cfg=None)[source]¶
Fast-SCNN Backbone.
This backbone is the implementation of Fast-SCNN: Fast Semantic Segmentation Network.
- Parameters
in_channels (int) – Number of input image channels. Default: 3.
downsample_dw_channels (tuple[int]) – Number of output channels after the first conv layer & the second conv layer in Learning-To-Downsample (LTD) module. Default: (32, 48).
global_in_channels (int) – Number of input channels of Global Feature Extractor(GFE). Equal to number of output channels of LTD. Default: 64.
global_block_channels (tuple[int]) – Tuple of integers that describe the output channels for each of the MobileNet-v2 bottleneck residual blocks in GFE. Default: (64, 96, 128).
global_block_strides (tuple[int]) – Tuple of integers that describe the strides (downsampling factors) for each of the MobileNet-v2 bottleneck residual blocks in GFE. Default: (2, 2, 1).
global_out_channels (int) – Number of output channels of GFE. Default: 128.
higher_in_channels (int) – Number of input channels of the higher resolution branch in FFM. Equal to global_in_channels. Default: 64.
lower_in_channels (int) – Number of input channels of the lower resolution branch in FFM. Equal to global_out_channels. Default: 128.
fusion_out_channels (int) – Number of output channels of FFM. Default: 128.
out_indices (tuple) – Tuple of indices of list [higher_res_features, lower_res_features, fusion_output]. Often set to (0,1,2) to enable aux. heads. Default: (0, 1, 2).
conv_cfg (dict | None) – Config of conv layers. Default: None
norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’BN’)
act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’)
align_corners (bool) – align_corners argument of F.interpolate. Default: False
dw_act_cfg (dict) – In DepthwiseSeparableConvModule, activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, frozen_stages=- 1, zero_init_residual=False, multiscale_output=True, pretrained=None, init_cfg=None)[source]¶
HRNet backbone.
This backbone is the implementation of High-Resolution Representations for Labeling Pixels and Regions.
- Parameters
extra (dict) –
Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:
num_modules (int): The number of HRModule in this stage.
num_branches (int): The number of branches in the HRModule.
block (str): The type of convolution block.
- num_blocks (tuple): The number of blocks in each branch.
The length must be equal to num_branches.
- num_channels (tuple): The number of channels in each branch.
The length must be equal to num_branches.
in_channels (int) – Number of input image channels. Normally 3.
conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Use BN by default.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.
multiscale_output (bool) – Whether to output multi-level features produced by multiple branches. If False, only the first level feature will be output. Default: True.
pretrained (str, optional) – Model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
Example
>>> from mmseg.models import HRNet >>> import torch >>> extra = dict( >>> stage1=dict( >>> num_modules=1, >>> num_branches=1, >>> block='BOTTLENECK', >>> num_blocks=(4, ), >>> num_channels=(64, )), >>> stage2=dict( >>> num_modules=1, >>> num_branches=2, >>> block='BASIC', >>> num_blocks=(4, 4), >>> num_channels=(32, 64)), >>> stage3=dict( >>> num_modules=4, >>> num_branches=3, >>> block='BASIC', >>> num_blocks=(4, 4, 4), >>> num_channels=(32, 64, 128)), >>> stage4=dict( >>> num_modules=3, >>> num_branches=4, >>> block='BASIC', >>> num_blocks=(4, 4, 4, 4), >>> num_channels=(32, 64, 128, 256))) >>> self = HRNet(extra, in_channels=1) >>> self.eval() >>> inputs = torch.rand(1, 1, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 32, 8, 8) (1, 64, 4, 4) (1, 128, 2, 2) (1, 256, 1, 1)
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- property norm2¶
the normalization layer named “norm2”
- Type
nn.Module
- class mmseg.models.backbones.ICNet(backbone_cfg, in_channels=3, layer_channels=(512, 2048), light_branch_middle_channels=32, psp_out_channels=512, out_channels=(64, 256, 256), pool_scales=(1, 2, 3, 6), conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, align_corners=False, init_cfg=None)[source]¶
ICNet for Real-Time Semantic Segmentation on High-Resolution Images.
This backbone is the implementation of ICNet.
- Parameters
backbone_cfg (dict) – Config dict to build backbone. Usually it is ResNet but it can also be other backbones.
in_channels (int) – The number of input image channels. Default: 3.
layer_channels (Sequence[int]) – The numbers of feature channels at layer 2 and layer 4 in ResNet. It can also be other backbones. Default: (512, 2048).
light_branch_middle_channels (int) – The number of channels of the middle layer in light branch. Default: 32.
psp_out_channels (int) – The number of channels of the output of PSP module. Default: 512.
out_channels (Sequence[int]) – The numbers of output feature channels at each branches. Default: (64, 256, 256).
pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module. Default: (1, 2, 3, 6).
conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’).
act_cfg (dict) – Dictionary to construct and config act layer. Default: dict(type=’ReLU’).
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.MAE(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, num_fcs=2, norm_eval=False, pretrained=None, init_values=0.1, init_cfg=None)[source]¶
VisionTransformer with support for patch.
- Parameters
img_size (int | tuple) – Input image size. Default: 224.
patch_size (int) – The patch size. Default: 16.
in_channels (int) – Number of input channels. Default: 3.
embed_dims (int) – embedding dimension. Default: 768.
num_layers (int) – depth of transformer. Default: 12.
num_heads (int) – number of attention heads. Default: 12.
mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.
out_indices (list | tuple | int) – Output from which stages. Default: -1.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0
drop_path_rate (float) – stochastic depth rate. Default 0.0.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.
final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.
num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
pretrained (str, optional) – model pretrained path. Default: None.
init_values (float) – Initialize the values of Attention and FFN with learnable scaling. Defaults to 0.1.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- fix_init_weight()[source]¶
Rescale the initialization according to layer id.
This function is copied from https://github.com/microsoft/unilm/blob/master/beit/modeling_pretrain.py. # noqa: E501 Copyright (c) Microsoft Corporation Licensed under the MIT License
- forward(inputs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.MSCAN(in_channels=3, embed_dims=[64, 128, 256, 512], mlp_ratios=[4, 4, 4, 4], drop_rate=0.0, drop_path_rate=0.0, depths=[3, 4, 6, 3], num_stages=4, attention_kernel_sizes=[5, [1, 7], [1, 11], [1, 21]], attention_kernel_paddings=[2, [0, 3], [0, 5], [0, 10]], act_cfg={'type': 'GELU'}, norm_cfg={'requires_grad': True, 'type': 'SyncBN'}, pretrained=None, init_cfg=None)[source]¶
SegNeXt Multi-Scale Convolutional Attention Network (MCSAN) backbone.
This backbone is the implementation of SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. Inspiration from https://github.com/visual-attention-network/segnext.
- Parameters
in_channels (int) – The number of input channels. Defaults: 3.
embed_dims (list[int]) – Embedding dimension. Defaults: [64, 128, 256, 512].
mlp_ratios (list[int]) – Ratio of mlp hidden dim to embedding dim. Defaults: [4, 4, 4, 4].
drop_rate (float) – Dropout rate. Defaults: 0.
drop_path_rate (float) – Stochastic depth rate. Defaults: 0.
depths (list[int]) – Depths of each Swin Transformer stage. Default: [3, 4, 6, 3].
num_stages (int) – MSCAN stages. Default: 4.
attention_kernel_sizes (list) – Size of attention kernel in Attention Module (Figure 2(b) of original paper). Defaults: [5, [1, 7], [1, 11], [1, 21]].
attention_kernel_paddings (list) – Size of attention paddings in Attention Module (Figure 2(b) of original paper). Defaults: [2, [0, 3], [0, 5], [0, 10]].
norm_cfg (dict) – Config of norm layers. Defaults: dict(type=’SyncBN’, requires_grad=True).
pretrained (str, optional) – model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- class mmseg.models.backbones.MixVisionTransformer(in_channels=3, embed_dims=64, num_stages=4, num_layers=[3, 4, 6, 3], num_heads=[1, 2, 4, 8], patch_sizes=[7, 3, 3, 3], strides=[4, 2, 2, 2], sr_ratios=[8, 4, 2, 1], out_indices=(0, 1, 2, 3), mlp_ratio=4, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, act_cfg={'type': 'GELU'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, pretrained=None, init_cfg=None, with_cp=False)[source]¶
The backbone of Segformer.
This backbone is the implementation of SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. :param in_channels: Number of input channels. Default: 3. :type in_channels: int :param embed_dims: Embedding dimension. Default: 768. :type embed_dims: int :param num_stags: The num of stages. Default: 4. :type num_stags: int :param num_layers: The layer number of each transformer encode
layer. Default: [3, 4, 6, 3].
- Parameters
num_heads (Sequence[int]) – The attention heads of each transformer encode layer. Default: [1, 2, 4, 8].
patch_sizes (Sequence[int]) – The patch_size of each overlapped patch embedding. Default: [7, 3, 3, 3].
strides (Sequence[int]) – The stride of each overlapped patch embedding. Default: [4, 2, 2, 2].
sr_ratios (Sequence[int]) – The spatial reduction rate of each transformer encode layer. Default: [8, 4, 2, 1].
out_indices (Sequence[int] | int) – Output from which stages. Default: (0, 1, 2, 3).
mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.
qkv_bias (bool) – Enable bias for qkv if True. Default: True.
drop_rate (float) – Probability of an element to be zeroed. Default 0.0
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0
drop_path_rate (float) – stochastic depth rate. Default 0.0
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
pretrained (str, optional) – model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.MobileNetV2(widen_factor=1.0, strides=(1, 2, 2, 2, 1, 2, 1), dilations=(1, 1, 1, 1, 1, 1, 1), out_indices=(1, 2, 4, 6), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]¶
MobileNetV2 backbone.
This backbone is the implementation of MobileNetV2: Inverted Residuals and Linear Bottlenecks.
- Parameters
widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.
strides (Sequence[int], optional) – Strides of the first block of each layer. If not specified, default config in
arch_setting
will be used.dilations (Sequence[int]) – Dilation of each layer.
out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- make_layer(out_channels, num_blocks, stride, dilation, expand_ratio)[source]¶
Stack InvertedResidual blocks to build a layer for MobileNetV2.
- Parameters
out_channels (int) – out_channels of block.
num_blocks (int) – Number of blocks.
stride (int) – Stride of the first block.
dilation (int) – Dilation of the first block.
expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio.
- train(mode=True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
- class mmseg.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'type': 'BN'}, out_indices=(0, 1, 12), frozen_stages=- 1, reduction_factor=1, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]¶
MobileNetV3 backbone.
This backbone is the improved implementation of Searching for MobileNetV3.
- Parameters
arch (str) – Architecture of mobilnetv3, from {‘small’, ‘large’}. Default: ‘small’.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
out_indices (tuple[int]) – Output from which layer. Default: (0, 1, 12).
frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- train(mode=True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
- class mmseg.models.backbones.PCPVT(in_channels=3, embed_dims=[64, 128, 256, 512], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], out_indices=(0, 1, 2, 3), qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], norm_after_stage=False, pretrained=None, init_cfg=None)[source]¶
The backbone of Twins-PCPVT.
This backbone is the implementation of Twins: Revisiting the Design of Spatial Attention in Vision Transformers.
- Parameters
in_channels (int) – Number of input channels. Default: 3.
embed_dims (list) – Embedding dimension. Default: [64, 128, 256, 512].
patch_sizes (list) – The patch sizes. Default: [4, 2, 2, 2].
strides (list) – The strides. Default: [4, 2, 2, 2].
num_heads (int) – Number of attention heads. Default: [1, 2, 4, 8].
mlp_ratios (int) – Ratio of mlp hidden dim to embedding dim. Default: [4, 4, 4, 4].
out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).
qkv_bias (bool) – Enable bias for qkv if True. Default: False.
drop_rate (float) – Probability of an element to be zeroed. Default 0.
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0
drop_path_rate (float) – Stochastic depth rate. Default 0.0
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
depths (list) – Depths of each stage. Default [3, 4, 6, 3]
sr_ratios (list) – Kernel_size of conv in each Attn module in Transformer encoder layer. Default: [8, 4, 2, 1].
norm_after_stage(bool) – Add extra norm. Default False.
init_cfg (dict, optional) – The Config for initialization. Defaults to None.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.PIDNet(in_channels: int = 3, channels: int = 64, ppm_channels: int = 96, num_stem_blocks: int = 2, num_branch_blocks: int = 3, align_corners: bool = False, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, **kwargs)[source]¶
PIDNet backbone.
This backbone is the implementation of PIDNet: A Real-time Semantic Segmentation Network Inspired from PID Controller. Modified from https://github.com/XuJiacong/PIDNet.
Licensed under the MIT License.
- Parameters
in_channels (int) – The number of input channels. Default: 3.
channels (int) – The number of channels in the stem layer. Default: 64.
ppm_channels (int) – The number of channels in the PPM layer. Default: 96.
num_stem_blocks (int) – The number of blocks in the stem layer. Default: 2.
num_branch_blocks (int) – The number of blocks in the branch layer. Default: 3.
align_corners (bool) – The align_corners argument of F.interpolate. Default: False.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’, inplace=True).
init_cfg (dict) – Config dict for initialization. Default: None.
- class mmseg.models.backbones.ResNeSt(groups=1, base_width=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[source]¶
ResNeSt backbone.
This backbone is the implementation of ResNeSt: Split-Attention Networks.
- Parameters
groups (int) – Number of groups of Bottleneck. Default: 1
base_width (int) – Base width of Bottleneck. Default: 4
radix (int) – Radix of SpltAtConv2d. Default: 2
reduction_factor (int) – Reduction factor of inter_channels in SplitAttentionConv2d. Default: 4.
avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.
kwargs (dict) – Keyword arguments for ResNet.
- class mmseg.models.backbones.ResNeXt(groups=1, base_width=4, **kwargs)[source]¶
ResNeXt backbone.
This backbone is the implementation of Aggregated Residual Transformations for Deep Neural Networks.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Normally 3.
num_stages (int) – Resnet stages, normally 4.
groups (int) – Group of resnext.
base_width (int) – Base width of resnext.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
norm_cfg (dict) – dictionary to construct and config norm layer.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.
Example
>>> from mmseg.models import ResNeXt >>> import torch >>> self = ResNeXt(depth=50) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 256, 8, 8) (1, 512, 4, 4) (1, 1024, 2, 2) (1, 2048, 1, 1)
- class mmseg.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, multi_grid=None, contract_dilation=False, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[source]¶
ResNet backbone.
This backbone is the improved implementation of Deep Residual Learning for Image Recognition.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
in_channels (int) – Number of input image channels. Default: 3.
stem_channels (int) – Number of stem channels. Default: 64.
base_channels (int) – Number of base channels of res layer. Default: 64.
num_stages (int) – Resnet stages, normally 4. Default: 4.
strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).
dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).
out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: ‘pytorch’.
deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.
conv_cfg (dict | None) – Dictionary to construct and config conv layer. When conv_cfg is None, cfg will be set to dict(type=’Conv2d’). Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
dcn (dict | None) – Dictionary to construct and config DCN conv layer. When dcn is not None, conv_cfg must be None. Default: None.
stage_with_dcn (Sequence[bool]) – Whether to set DCN conv for each stage. The length of stage_with_dcn is equal to num_stages. Default: (False, False, False, False).
plugins (list[dict]) –
List of plugins for stages, each dict contains:
cfg (dict, required): Cfg dict to build plugin.
position (str, required): Position inside block to insert plugin,
options: ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.
stages (tuple[bool], optional): Stages to apply plugin, length
should be same as ‘num_stages’. Default: None.
multi_grid (Sequence[int]|None) – Multi grid dilation rates of last stage. Default: None.
contract_dilation (bool) – Whether contract first dilation of each layer Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.
pretrained (str, optional) – model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
Example
>>> from mmseg.models import ResNet >>> import torch >>> self = ResNet(depth=18) >>> self.eval() >>> inputs = torch.rand(1, 3, 32, 32) >>> level_outputs = self.forward(inputs) >>> for level_out in level_outputs: ... print(tuple(level_out.shape)) (1, 64, 8, 8) (1, 128, 4, 4) (1, 256, 2, 2) (1, 512, 1, 1)
- make_stage_plugins(plugins, stage_idx)[source]¶
make plugins for ResNet ‘stage_idx’th stage .
Currently we support to insert ‘context_block’, ‘empirical_attention_block’, ‘nonlocal_block’ into the backbone like ResNet/ResNeXt. They could be inserted after conv1/conv2/conv3 of Bottleneck.
An example of plugins format could be : >>> plugins=[ … dict(cfg=dict(type=’xxx’, arg1=’xxx’), … stages=(False, True, True, True), … position=’after_conv2’), … dict(cfg=dict(type=’yyy’), … stages=(True, True, True, True), … position=’after_conv3’), … dict(cfg=dict(type=’zzz’, postfix=’1’), … stages=(True, True, True, True), … position=’after_conv3’), … dict(cfg=dict(type=’zzz’, postfix=’2’), … stages=(True, True, True, True), … position=’after_conv3’) … ] >>> self = ResNet(depth=18) >>> stage_plugins = self.make_stage_plugins(plugins, 0) >>> assert len(stage_plugins) == 3
- Suppose ‘stage_idx=0’, the structure of blocks in the stage would be:
conv1-> conv2->conv3->yyy->zzz1->zzz2
- Suppose ‘stage_idx=1’, the structure of blocks in the stage would be:
conv1-> conv2->xxx->conv3->yyy->zzz1->zzz2
If stages is missing, the plugin would be applied to all stages.
- Parameters
plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.
stage_idx (int) – Index of stage to build
- Returns
Plugins for current stage
- Return type
list[dict]
- property norm1¶
the normalization layer named “norm1”
- Type
nn.Module
- class mmseg.models.backbones.ResNetV1c(**kwargs)[source]¶
ResNetV1c variant described in [1]_.
Compared with default ResNet(ResNetV1b), ResNetV1c replaces the 7x7 conv in the input stem with three 3x3 convs. For more details please refer to Bag of Tricks for Image Classification with Convolutional Neural Networks.
- class mmseg.models.backbones.ResNetV1d(**kwargs)[source]¶
ResNetV1d variant described in [1]_.
Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.
- class mmseg.models.backbones.STDCContextPathNet(backbone_cfg, last_in_channels=(1024, 512), out_channels=128, ffm_cfg={'in_channels': 512, 'out_channels': 256, 'scale_factor': 4}, upsample_mode='nearest', align_corners=None, norm_cfg={'type': 'BN'}, init_cfg=None)[source]¶
STDCNet with Context Path. The outs below is a list of three feature maps from deep to shallow, whose height and width is from small to big, respectively. The biggest feature map of outs is outputted for STDCHead, where Detail Loss would be calculated by Detail Ground-truth. The other two feature maps are used for Attention Refinement Module, respectively. Besides, the biggest feature map of outs and the last output of Attention Refinement Module are concatenated for Feature Fusion Module. Then, this fusion feature map feat_fuse would be outputted for decode_head. More details please refer to Figure 4 of original paper.
- Parameters
backbone_cfg (dict) – Config dict for stdc backbone.
last_in_channels (tuple(int)) – two feature maps from stdc backbone. Default: (1024, 512).
out_channels (int) – The channels of output feature maps. Default: 128.
ffm_cfg (dict) – Config dict for Feature Fusion Module. Default: dict(in_channels=512, out_channels=256, scale_factor=4).
upsample_mode (str) – Algorithm used for upsampling:
'nearest'
|'linear'
|'bilinear'
|'bicubic'
|'trilinear'
. Default:'nearest'
.align_corners (str) – align_corners argument of F.interpolate. It must be None if upsample_mode is
'nearest'
. Default: None.norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- Returns
- The tuple of list of output feature map for
auxiliary heads and decoder head.
- Return type
outputs (tuple)
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.STDCNet(stdc_type, in_channels, channels, bottleneck_type, norm_cfg, act_cfg, num_convs=4, with_final_conv=False, pretrained=None, init_cfg=None)[source]¶
This backbone is the implementation of Rethinking BiSeNet For Real-time Semantic Segmentation.
- Parameters
stdc_type (int) – The type of backbone structure, STDCNet1 and`STDCNet2` denotes two main backbones in paper, whose FLOPs is 813M and 1446M, respectively.
in_channels (int) – The num of input_channels.
channels (tuple[int]) – The output channels for each stage.
bottleneck_type (str) – The type of STDC Module type, the value must be ‘add’ or ‘cat’.
norm_cfg (dict) – Config dict for normalization layer.
act_cfg (dict) – The activation config for conv layers.
num_convs (int) – Numbers of conv layer at each STDC Module. Default: 4.
with_final_conv (bool) – Whether add a conv layer at the Module output. Default: True.
pretrained (str, optional) – Model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
Example
>>> import torch >>> stdc_type = 'STDCNet1' >>> in_channels = 3 >>> channels = (32, 64, 256, 512, 1024) >>> bottleneck_type = 'cat' >>> inputs = torch.rand(1, 3, 1024, 2048) >>> self = STDCNet(stdc_type, in_channels, ... channels, bottleneck_type).eval() >>> outputs = self.forward(inputs) >>> for i in range(len(outputs)): ... print(f'outputs[{i}].shape = {outputs[i].shape}') outputs[0].shape = torch.Size([1, 256, 128, 256]) outputs[1].shape = torch.Size([1, 512, 64, 128]) outputs[2].shape = torch.Size([1, 1024, 32, 64])
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.SVT(in_channels=3, embed_dims=[64, 128, 256], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], num_heads=[1, 2, 4], mlp_ratios=[4, 4, 4], out_indices=(0, 1, 2, 3), qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.2, norm_cfg={'type': 'LN'}, depths=[4, 4, 4], sr_ratios=[4, 2, 1], windiow_sizes=[7, 7, 7], norm_after_stage=True, pretrained=None, init_cfg=None)[source]¶
The backbone of Twins-SVT.
This backbone is the implementation of Twins: Revisiting the Design of Spatial Attention in Vision Transformers.
- Parameters
in_channels (int) – Number of input channels. Default: 3.
embed_dims (list) – Embedding dimension. Default: [64, 128, 256, 512].
patch_sizes (list) – The patch sizes. Default: [4, 2, 2, 2].
strides (list) – The strides. Default: [4, 2, 2, 2].
num_heads (int) – Number of attention heads. Default: [1, 2, 4].
mlp_ratios (int) – Ratio of mlp hidden dim to embedding dim. Default: [4, 4, 4].
out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).
qkv_bias (bool) – Enable bias for qkv if True. Default: False.
drop_rate (float) – Dropout rate. Default 0.
attn_drop_rate (float) – Dropout ratio of attention weight. Default 0.0
drop_path_rate (float) – Stochastic depth rate. Default 0.2.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
depths (list) – Depths of each stage. Default [4, 4, 4].
sr_ratios (list) – Kernel_size of conv in each Attn module in Transformer encoder layer. Default: [4, 2, 1].
windiow_sizes (list) – Window size of LSA. Default: [7, 7, 7],
input_features_slice(bool) – Input features need slice. Default: False.
norm_after_stage(bool) – Add extra norm. Default False.
strides – Strides in patch-Embedding modules. Default: (2, 2, 2)
init_cfg (dict, optional) – The Config for initialization. Defaults to None.
- class mmseg.models.backbones.SwinTransformer(pretrain_img_size=224, in_channels=3, embed_dims=96, patch_size=4, window_size=7, mlp_ratio=4, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), strides=(4, 2, 2, 2), out_indices=(0, 1, 2, 3), qkv_bias=True, qk_scale=None, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=False, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, with_cp=False, pretrained=None, frozen_stages=- 1, init_cfg=None)[source]¶
Swin Transformer backbone.
This backbone is the implementation of Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Inspiration from https://github.com/microsoft/Swin-Transformer.
- Parameters
pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.
in_channels (int) – The num of input channels. Defaults: 3.
embed_dims (int) – The feature dimension. Default: 96.
patch_size (int | tuple[int]) – Patch size. Default: 4.
window_size (int) – Window size. Default: 7.
mlp_ratio (int | float) – Ratio of mlp hidden dim to embedding dim. Default: 4.
depths (tuple[int]) – Depths of each Swin Transformer stage. Default: (2, 2, 6, 2).
num_heads (tuple[int]) – Parallel attention heads of each Swin Transformer stage. Default: (3, 6, 12, 24).
strides (tuple[int]) – The patch merging or patch embedding stride of each Swin Transformer stage. (In swin, we set kernel size equal to stride.) Default: (4, 2, 2, 2).
out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).
qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True
qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.
patch_norm (bool) – If add a norm layer for patch embed and patch merging. Default: True.
drop_rate (float) – Dropout rate. Defaults: 0.
attn_drop_rate (float) – Attention dropout rate. Default: 0.
drop_path_rate (float) – Stochastic depth rate. Defaults: 0.1.
use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: False.
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LN’).
norm_cfg (dict) – Config dict for normalization layer at output of backone. Defaults: dict(type=’LN’).
with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None.
frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.
init_cfg (dict, optional) – The Config for initialization. Defaults to None.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.TIMMBackbone(model_name, features_only=True, pretrained=True, checkpoint_path='', in_channels=3, init_cfg=None, **kwargs)[source]¶
Wrapper to use backbones from timm library. More details can be found in timm .
- Parameters
model_name (str) – Name of timm model to instantiate.
pretrained (bool) – Load pretrained weights if True.
checkpoint_path (str) – Path of checkpoint to load after model is initialized.
in_channels (int) – Number of input image channels. Default: 3.
init_cfg (dict, optional) – Initialization config dict
**kwargs – Other timm & model specific arguments.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.UNet(in_channels=3, base_channels=64, num_stages=5, strides=(1, 1, 1, 1, 1), enc_num_convs=(2, 2, 2, 2, 2), dec_num_convs=(2, 2, 2, 2), downsamples=(True, True, True, True), enc_dilations=(1, 1, 1, 1, 1), dec_dilations=(1, 1, 1, 1), with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, upsample_cfg={'type': 'InterpConv'}, norm_eval=False, dcn=None, plugins=None, pretrained=None, init_cfg=None)[source]¶
UNet backbone.
This backbone is the implementation of U-Net: Convolutional Networks for Biomedical Image Segmentation.
- Parameters
in_channels (int) – Number of input image channels. Default” 3.
base_channels (int) – Number of base channels of each stage. The output channels of the first stage. Default: 64.
num_stages (int) – Number of stages in encoder, normally 5. Default: 5.
strides (Sequence[int 1 | 2]) – Strides of each stage in encoder. len(strides) is equal to num_stages. Normally the stride of the first stage in encoder is 1. If strides[i]=2, it uses stride convolution to downsample in the correspondence encoder stage. Default: (1, 1, 1, 1, 1).
enc_num_convs (Sequence[int]) – Number of convolutional layers in the convolution block of the correspondence encoder stage. Default: (2, 2, 2, 2, 2).
dec_num_convs (Sequence[int]) – Number of convolutional layers in the convolution block of the correspondence decoder stage. Default: (2, 2, 2, 2).
downsamples (Sequence[int]) – Whether use MaxPool to downsample the feature map after the first stage of encoder (stages: [1, num_stages)). If the correspondence encoder stage use stride convolution (strides[i]=2), it will never use MaxPool to downsample, even downsamples[i-1]=True. Default: (True, True, True, True).
enc_dilations (Sequence[int]) – Dilation rate of each stage in encoder. Default: (1, 1, 1, 1, 1).
dec_dilations (Sequence[int]) – Dilation rate of each stage in decoder. Default: (1, 1, 1, 1).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
conv_cfg (dict | None) – Config dict for convolution layer. Default: None.
norm_cfg (dict | None) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict | None) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’).
upsample_cfg (dict) – The upsample config of the upsample module in decoder. Default: dict(type=’InterpConv’).
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
dcn (bool) – Use deformable convolution in convolutional layer or not. Default: None.
plugins (dict) – plugins for convolutional layers. Default: None.
pretrained (str, optional) – model pretrained path. Default: None
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None
- Notice:
The input image size should be divisible by the whole downsample rate of the encoder. More detail of the whole downsample rate can be found in UNet._check_input_divisible.
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.backbones.VisionTransformer(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, with_cls_token=True, output_cls_token=False, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, interpolate_mode='bicubic', num_fcs=2, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[source]¶
Vision Transformer.
This backbone is the implementation of An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
- Parameters
img_size (int | tuple) – Input image size. Default: 224.
patch_size (int) – The patch size. Default: 16.
in_channels (int) – Number of input channels. Default: 3.
embed_dims (int) – embedding dimension. Default: 768.
num_layers (int) – depth of transformer. Default: 12.
num_heads (int) – number of attention heads. Default: 12.
mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.
out_indices (list | tuple | int) – Output from which stages. Default: -1.
qkv_bias (bool) – enable bias for qkv if True. Default: True.
drop_rate (float) – Probability of an element to be zeroed. Default 0.0
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0
drop_path_rate (float) – stochastic depth rate. Default 0.0
with_cls_token (bool) – Whether concatenating class token into image tokens as transformer input. Default: True.
output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Default: False.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.
final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.
interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Default: bicubic.
num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
pretrained (str, optional) – model pretrained path. Default: None.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- forward(inputs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- static resize_pos_embed(pos_embed, input_shpae, pos_shape, mode)[source]¶
Resize pos_embed weights.
Resize pos_embed using bicubic interpolate method. :param pos_embed: Position embedding weights. :type pos_embed: torch.Tensor :param input_shpae: Tuple for (downsampled input image height,
downsampled input image width).
- Parameters
pos_shape (tuple) – The resolution of downsampled origin training image.
mode (str) – Algorithm used for upsampling:
'nearest'
|'linear'
|'bilinear'
|'bicubic'
|'trilinear'
. Default:'nearest'
- Returns
The resized pos_embed of shape [B, L_new, C]
- Return type
torch.Tensor
- train(mode=True)[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
decode_heads¶
- class mmseg.models.decode_heads.ANNHead(project_channels, query_scales=(1), key_pool_scales=(1, 3, 6, 8), **kwargs)[source]¶
Asymmetric Non-local Neural Networks for Semantic Segmentation.
This head is the implementation of ANNNet.
- Parameters
project_channels (int) – Projection channels for Nonlocal.
query_scales (tuple[int]) – The scales of query feature map. Default: (1,)
key_pool_scales (tuple[int]) – The pooling scales of key feature map. Default: (1, 3, 6, 8).
- class mmseg.models.decode_heads.APCHead(pool_scales=(1, 2, 3, 6), fusion=True, **kwargs)[source]¶
Adaptive Pyramid Context Network for Semantic Segmentation.
This head is the implementation of APCNet.
- Parameters
pool_scales (tuple[int]) – Pooling scales used in Adaptive Context Module. Default: (1, 2, 3, 6).
fusion (bool) – Add one conv to fuse residual feature.
- class mmseg.models.decode_heads.ASPPHead(dilations=(1, 6, 12, 18), **kwargs)[source]¶
Rethinking Atrous Convolution for Semantic Image Segmentation.
This head is the implementation of DeepLabV3.
- Parameters
dilations (tuple[int]) – Dilation rates for ASPP module. Default: (1, 6, 12, 18).
- class mmseg.models.decode_heads.CCHead(recurrence=2, **kwargs)[source]¶
CCNet: Criss-Cross Attention for Semantic Segmentation.
This head is the implementation of CCNet.
- Parameters
recurrence (int) – Number of recurrence of Criss Cross Attention module. Default: 2.
- class mmseg.models.decode_heads.DAHead(pam_channels, **kwargs)[source]¶
Dual Attention Network for Scene Segmentation.
This head is the implementation of DANet.
- Parameters
pam_channels (int) – The channels of Position Attention Module(PAM).
- loss_by_feat(seg_logit: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], **kwargs) → dict[source]¶
Compute
pam_cam
,pam
,cam
loss.
- class mmseg.models.decode_heads.DMHead(filter_sizes=(1, 3, 5, 7), fusion=False, **kwargs)[source]¶
Dynamic Multi-scale Filters for Semantic Segmentation.
This head is the implementation of DMNet.
- Parameters
filter_sizes (tuple[int]) – The size of generated convolutional filters used in Dynamic Convolutional Module. Default: (1, 3, 5, 7).
fusion (bool) – Add one conv to fuse DCM output feature.
- class mmseg.models.decode_heads.DNLHead(reduction=2, use_scale=True, mode='embedded_gaussian', temperature=0.05, **kwargs)[source]¶
Disentangled Non-Local Neural Networks.
This head is the implementation of DNLNet.
- Parameters
reduction (int) – Reduction factor of projection transform. Default: 2.
use_scale (bool) – Whether to scale pairwise_weight by sqrt(1/inter_channels). Default: False.
mode (str) – The nonlocal mode. Options are ‘embedded_gaussian’, ‘dot_product’. Default: ‘embedded_gaussian.’.
temperature (float) – Temperature to adjust attention. Default: 0.05
- class mmseg.models.decode_heads.DPTHead(embed_dims=768, post_process_channels=[96, 192, 384, 768], readout_type='ignore', patch_size=16, expand_channels=False, act_cfg={'type': 'ReLU'}, norm_cfg={'type': 'BN'}, **kwargs)[source]¶
Vision Transformers for Dense Prediction.
This head is implemented of DPT.
- Parameters
embed_dims (int) – The embed dimension of the ViT backbone. Default: 768.
post_process_channels (List) – Out channels of post process conv layers. Default: [96, 192, 384, 768].
readout_type (str) – Type of readout operation. Default: ‘ignore’.
patch_size (int) – The patch size. Default: 16.
expand_channels (bool) – Whether expand the channels in post process block. Default: False.
act_cfg (dict) – The activation config for residual conv unit. Default dict(type=’ReLU’).
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
- class mmseg.models.decode_heads.DepthwiseSeparableASPPHead(c1_in_channels, c1_channels, **kwargs)[source]¶
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.
This head is the implementation of DeepLabV3+.
- Parameters
c1_in_channels (int) – The input channels of c1 decoder. If is 0, the no decoder will be used.
c1_channels (int) – The intermediate channels of c1 decoder.
- class mmseg.models.decode_heads.DepthwiseSeparableFCNHead(dw_act_cfg=None, **kwargs)[source]¶
Depthwise-Separable Fully Convolutional Network for Semantic Segmentation.
This head is implemented according to Fast-SCNN: Fast Semantic Segmentation Network.
- Parameters
in_channels (int) – Number of output channels of FFM.
channels (int) – Number of middle-stage channels in the decode head.
concat_input (bool) – Whether to concatenate original decode input into the result of several consecutive convolution layers. Default: True.
num_classes (int) – Used to determine the dimension of final prediction tensor.
in_index (int) – Correspond with ‘out_indices’ in FastSCNN backbone.
norm_cfg (dict | None) – Config of norm layers.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
loss_decode (dict) – Config of loss type and some relevant additional options.
dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: None.
- class mmseg.models.decode_heads.EMAHead(ema_channels, num_bases, num_stages, concat_input=True, momentum=0.1, **kwargs)[source]¶
Expectation Maximization Attention Networks for Semantic Segmentation.
This head is the implementation of EMANet.
- Parameters
ema_channels (int) – EMA module channels
num_bases (int) – Number of bases.
num_stages (int) – Number of the EM iterations.
concat_input (bool) – Whether concat the input and output of convs before classification layer. Default: True
momentum (float) – Momentum to update the base. Default: 0.1.
- class mmseg.models.decode_heads.EncHead(num_codes=32, use_se_loss=True, add_lateral=False, loss_se_decode={'loss_weight': 0.2, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, **kwargs)[source]¶
Context Encoding for Semantic Segmentation.
This head is the implementation of EncNet.
- Parameters
num_codes (int) – Number of code words. Default: 32.
use_se_loss (bool) – Whether use Semantic Encoding Loss (SE-loss) to regularize the training. Default: True.
add_lateral (bool) – Whether use lateral connection to fuse features. Default: False.
loss_se_decode (dict) – Config of decode loss. Default: dict(type=’CrossEntropyLoss’, use_sigmoid=True).
- loss_by_feat(seg_logit: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], **kwargs) → dict[source]¶
Compute segmentation and semantic encoding loss.
- class mmseg.models.decode_heads.FCNHead(num_convs=2, kernel_size=3, concat_input=True, dilation=1, **kwargs)[source]¶
Fully Convolution Networks for Semantic Segmentation.
This head is implemented of FCNNet.
- Parameters
num_convs (int) – Number of convs in the head. Default: 2.
kernel_size (int) – The kernel size for convs in the head. Default: 3.
concat_input (bool) – Whether concat the input and output of convs before classification layer.
dilation (int) – The dilation rate for convs in the head. Default: 1.
- class mmseg.models.decode_heads.FPNHead(feature_strides, **kwargs)[source]¶
Panoptic Feature Pyramid Networks.
This head is the implementation of Semantic FPN.
- Parameters
feature_strides (tuple[int]) – The strides for input feature maps. stack_lateral. All strides suppose to be power of 2. The first one is of largest resolution.
- class mmseg.models.decode_heads.GCHead(ratio=0.25, pooling_type='att', fusion_types=('channel_add'), **kwargs)[source]¶
GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond.
This head is the implementation of GCNet.
- Parameters
ratio (float) – Multiplier of channels ratio. Default: 1/4.
pooling_type (str) – The pooling type of context aggregation. Options are ‘att’, ‘avg’. Default: ‘avg’.
fusion_types (tuple[str]) – The fusion type for feature fusion. Options are ‘channel_add’, ‘channel_mul’. Default: (‘channel_add’,)
- class mmseg.models.decode_heads.ISAHead(isa_channels, down_factor=(8, 8), **kwargs)[source]¶
Interlaced Sparse Self-Attention for Semantic Segmentation.
This head is the implementation of ISA.
- Parameters
isa_channels (int) – The channels of ISA Module.
down_factor (tuple[int]) – The local group size of ISA.
- class mmseg.models.decode_heads.IterativeDecodeHead(num_stages, kernel_generate_head, kernel_update_head, **kwargs)[source]¶
K-Net: Towards Unified Image Segmentation.
This head is the implementation of `K-Net: <https://arxiv.org/abs/2106.14855>`_.
- Parameters
num_stages (int) – The number of stages (kernel update heads) in IterativeDecodeHead. Default: 3.
kernel_generate_head – (dict): Config of kernel generate head which generate mask predictions, dynamic kernels and class predictions for next kernel update heads.
kernel_update_head (dict) – Config of kernel update head which refine dynamic kernels and class predictions iteratively.
- loss_by_feat(seg_logits: List[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], **kwargs) → dict[source]¶
Compute segmentation loss.
- Parameters
seg_logits (Tensor) – The output from decode head forward function.
batch_data_samples (List[
SegDataSample
]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.
- Returns
a dictionary of loss components
- Return type
dict[str, Tensor]
- class mmseg.models.decode_heads.KernelUpdateHead(num_classes=150, num_ffn_fcs=2, num_heads=8, num_mask_fcs=3, feedforward_channels=2048, in_channels=256, out_channels=256, dropout=0.0, act_cfg={'inplace': True, 'type': 'ReLU'}, ffn_act_cfg={'inplace': True, 'type': 'ReLU'}, conv_kernel_size=1, feat_transform_cfg=None, kernel_init=False, with_ffn=True, feat_gather_stride=1, mask_transform_stride=1, kernel_updator_cfg={'act_cfg': {'inplace': True, 'type': 'ReLU'}, 'feat_channels': 64, 'in_channels': 256, 'norm_cfg': {'type': 'LN'}, 'out_channels': 256, 'type': 'DynamicConv'})[source]¶
Kernel Update Head in K-Net.
- Parameters
num_classes (int) – Number of classes. Default: 150.
num_ffn_fcs (int) – The number of fully-connected layers in FFNs. Default: 2.
num_heads (int) – The number of parallel attention heads. Default: 8.
num_mask_fcs (int) – The number of fully connected layers for mask prediction. Default: 3.
feedforward_channels (int) – The hidden dimension of FFNs. Defaults: 2048.
in_channels (int) – The number of channels of input feature map. Default: 256.
out_channels (int) – The number of output channels. Default: 256.
dropout (float) – The Probability of an element to be zeroed in MultiheadAttention and FFN. Default 0.0.
act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).
ffn_act_cfg (dict) – Config of activation layers in FFN. Default: dict(type=’ReLU’).
conv_kernel_size (int) – The kernel size of convolution in Kernel Update Head for dynamic kernel updation. Default: 1.
feat_transform_cfg (dict | None) – Config of feature transform. Default: None.
kernel_init (bool) – Whether initiate mask kernel in mask head. Default: False.
with_ffn (bool) – Whether add FFN in kernel update head. Default: True.
feat_gather_stride (int) – Stride of convolution in feature transform. Default: 1.
mask_transform_stride (int) – Stride of mask transform. Default: 1.
kernel_updator_cfg (dict) –
Config of kernel updator. Default: dict(
type=’DynamicConv’, in_channels=256, feat_channels=64, out_channels=256, act_cfg=dict(type=’ReLU’, inplace=True), norm_cfg=dict(type=’LN’)).
- forward(x, proposal_feat, mask_preds, mask_shape=None)[source]¶
Forward function of Dynamic Instance Interactive Head.
- Parameters
x (Tensor) – Feature map from FPN with shape (batch_size, feature_dimensions, H , W).
proposal_feat (Tensor) – Intermediate feature get from diihead in last stage, has shape (batch_size, num_proposals, feature_dimensions)
mask_preds (Tensor) – mask prediction from the former stage in shape (batch_size, num_proposals, H, W).
- Returns
The first tensor is predicted mask with shape (N, num_classes, H, W), the second tensor is dynamic kernel with shape (N, num_classes, channels, K, K).
- Return type
Tuple
- class mmseg.models.decode_heads.KernelUpdator(in_channels=256, feat_channels=64, out_channels=None, gate_sigmoid=True, gate_norm_act=False, activate_out=False, norm_cfg={'type': 'LN'}, act_cfg={'inplace': True, 'type': 'ReLU'})[source]¶
Dynamic Kernel Updator in Kernel Update Head.
- Parameters
in_channels (int) – The number of channels of input feature map. Default: 256.
feat_channels (int) – The number of middle-stage channels in the kernel updator. Default: 64.
out_channels (int) – The number of output channels.
gate_sigmoid (bool) – Whether use sigmoid function in gate mechanism. Default: True.
gate_norm_act (bool) – Whether add normalization and activation layer in gate mechanism. Default: False.
activate_out – Whether add activation after gate mechanism. Default: False.
norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’LN’).
act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).
- forward(update_feature, input_feature)[source]¶
Forward function of KernelUpdator.
- Parameters
update_feature (torch.Tensor) – Feature map assembled from each group. It would be reshaped with last dimension shape: self.in_channels.
input_feature (torch.Tensor) – Intermediate feature with shape: (N, num_classes, conv_kernel_size**2, channels).
- Returns
The output tensor of shape (N*C1/C2, K*K, C2), where N is the number of classes, C1 and C2 are the feature map channels of KernelUpdateHead and KernelUpdator, respectively.
- Return type
Tensor
- class mmseg.models.decode_heads.LRASPPHead(branch_channels=(32, 64), **kwargs)[source]¶
Lite R-ASPP (LRASPP) head is proposed in Searching for MobileNetV3.
This head is the improved implementation of Searching for MobileNetV3.
- Parameters
branch_channels (tuple[int]) – The number of output channels in every each branch. Default: (32, 64).
- class mmseg.models.decode_heads.LightHamHead(ham_channels=512, ham_kwargs={}, **kwargs)[source]¶
SegNeXt decode head.
This decode head is the implementation of SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. Inspiration from https://github.com/visual-attention-network/segnext.
Specifically, LightHamHead is inspired by HamNet from Is Attention Better Than Matrix Decomposition? <https://arxiv.org/abs/2109.04553>.
- Parameters
ham_channels (int) – input channels for Hamburger. Defaults: 512.
ham_kwargs (int) – kwagrs for Ham. Defaults: dict().
- class mmseg.models.decode_heads.Mask2FormerHead(num_classes, align_corners=False, ignore_index=255, **kwargs)[source]¶
Implements the Mask2Former head.
See Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation for details.
- Parameters
num_classes (int) – Number of classes. Default: 150.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
ignore_index (int) – The label index to be ignored. Default: 255.
- loss(x: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], train_cfg: Union[mmengine.config.config.ConfigDict, dict]) → dict[source]¶
Perform forward propagation and loss calculation of the decoder head on the features of the upstream network.
- Parameters
x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.
batch_data_samples (List[
SegDataSample
]) – The Data Samples. It usually includes information such as gt_sem_seg.train_cfg (ConfigType) – Training config.
- Returns
a dictionary of loss components.
- Return type
dict[str, Tensor]
- predict(x: Tuple[torch.Tensor], batch_img_metas: List[dict], test_cfg: Union[mmengine.config.config.ConfigDict, dict]) → Tuple[torch.Tensor][source]¶
Test without augmentaton.
- Parameters
x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.
batch_img_metas (List[
SegDataSample
]) – The Data Samples. It usually includes information such as gt_sem_seg.test_cfg (ConfigType) – Test config.
- Returns
A tensor of segmentation mask.
- Return type
Tensor
- class mmseg.models.decode_heads.MaskFormerHead(num_classes: int = 150, align_corners: bool = False, ignore_index: int = 255, **kwargs)[source]¶
Implements the MaskFormer head.
See Per-Pixel Classification is Not All You Need for Semantic Segmentation for details.
- Parameters
num_classes (int) – Number of classes. Default: 150.
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
ignore_index (int) – The label index to be ignored. Default: 255.
- loss(x: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], train_cfg: Union[mmengine.config.config.ConfigDict, dict]) → dict[source]¶
Perform forward propagation and loss calculation of the decoder head on the features of the upstream network.
- Parameters
x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.
batch_data_samples (List[
SegDataSample
]) – The Data Samples. It usually includes information such as gt_sem_seg.train_cfg (ConfigType) – Training config.
- Returns
a dictionary of loss components.
- Return type
dict[str, Tensor]
- predict(x: Tuple[torch.Tensor], batch_img_metas: List[dict], test_cfg: Union[mmengine.config.config.ConfigDict, dict]) → Tuple[torch.Tensor][source]¶
Test without augmentaton.
- Parameters
x (tuple[Tensor]) – Multi-level features from the upstream network, each is a 4D-tensor.
batch_img_metas (List[
SegDataSample
]) – The Data Samples. It usually includes information such as gt_sem_seg.test_cfg (ConfigType) – Test config.
- Returns
A tensor of segmentation mask.
- Return type
Tensor
- class mmseg.models.decode_heads.NLHead(reduction=2, use_scale=True, mode='embedded_gaussian', **kwargs)[source]¶
Non-local Neural Networks.
This head is the implementation of NLNet.
- Parameters
reduction (int) – Reduction factor of projection transform. Default: 2.
use_scale (bool) – Whether to scale pairwise_weight by sqrt(1/inter_channels). Default: True.
mode (str) – The nonlocal mode. Options are ‘embedded_gaussian’, ‘dot_product’. Default: ‘embedded_gaussian.’.
- class mmseg.models.decode_heads.OCRHead(ocr_channels, scale=1, **kwargs)[source]¶
Object-Contextual Representations for Semantic Segmentation.
This head is the implementation of OCRNet.
- Parameters
ocr_channels (int) – The intermediate channels of OCR block.
scale (int) – The scale of probability map in SpatialGatherModule in Default: 1.
- class mmseg.models.decode_heads.PIDHead(in_channels: int, channels: int, num_classes: int, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, **kwargs)[source]¶
Decode head for PIDNet.
- Parameters
in_channels (int) – Number of input channels.
channels (int) – Number of output channels.
num_classes (int) – Number of classes.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’, inplace=True).
- forward(inputs: Union[torch.Tensor, Tuple[torch.Tensor]]) → Union[torch.Tensor, Tuple[torch.Tensor]][source]¶
Forward function. :param inputs: Input tensor or tuple of
Tensor. When training, the input is a tuple of three tensors, (p_feat, i_feat, d_feat), and the output is a tuple of three tensors, (p_seg_logit, i_seg_logit, d_seg_logit). When inference, only the head of integral branch is used, and input is a tensor of integral feature map, and the output is the segmentation logit.
- Returns
Output tensor or tuple of tensors.
- Return type
Tensor | tuple[Tensor]
- loss_by_feat(seg_logits: Tuple[torch.Tensor], batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample]) → dict[source]¶
Compute segmentation loss.
- Parameters
seg_logits (Tensor) – The output from decode head forward function.
batch_data_samples (List[
SegDataSample
]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.
- Returns
a dictionary of loss components
- Return type
dict[str, Tensor]
- class mmseg.models.decode_heads.PSAHead(mask_size, psa_type='bi-direction', compact=False, shrink_factor=2, normalization_factor=1.0, psa_softmax=True, **kwargs)[source]¶
Point-wise Spatial Attention Network for Scene Parsing.
This head is the implementation of PSANet.
- Parameters
mask_size (tuple[int]) – The PSA mask size. It usually equals input size.
psa_type (str) – The type of psa module. Options are ‘collect’, ‘distribute’, ‘bi-direction’. Default: ‘bi-direction’
compact (bool) – Whether use compact map for ‘collect’ mode. Default: True.
shrink_factor (int) – The downsample factors of psa mask. Default: 2.
normalization_factor (float) – The normalize factor of attention.
psa_softmax (bool) – Whether use softmax for attention.
- class mmseg.models.decode_heads.PSPHead(pool_scales=(1, 2, 3, 6), **kwargs)[source]¶
Pyramid Scene Parsing Network.
This head is the implementation of PSPNet.
- Parameters
pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module. Default: (1, 2, 3, 6).
- class mmseg.models.decode_heads.PointHead(num_fcs=3, coarse_pred_each_layer=True, conv_cfg={'type': 'Conv1d'}, norm_cfg=None, act_cfg={'inplace': False, 'type': 'ReLU'}, **kwargs)[source]¶
A mask point head use in PointRend.
This head is implemented of PointRend: Image Segmentation as Rendering.
PointHead
use shared multi-layer perceptron (equivalent to nn.Conv1d) to predict the logit of input points. The fine-grained feature and coarse feature will be concatenate together for predication.- Parameters
num_fcs (int) – Number of fc layers in the head. Default: 3.
in_channels (int) – Number of input channels. Default: 256.
fc_channels (int) – Number of fc channels. Default: 256.
num_classes (int) – Number of classes for logits. Default: 80.
class_agnostic (bool) – Whether use class agnostic classification. If so, the output channels of logits will be 1. Default: False.
coarse_pred_each_layer (bool) – Whether concatenate coarse feature with the output of each fc layer. Default: True.
conv_cfg (dict|None) – Dictionary to construct and config conv layer. Default: dict(type=’Conv1d’))
norm_cfg (dict|None) – Dictionary to construct and config norm layer. Default: None.
loss_point (dict) – Dictionary to construct and config loss layer of point head. Default: dict(type=’CrossEntropyLoss’, use_mask=True, loss_weight=1.0).
- get_points_test(seg_logits, uncertainty_func, cfg)[source]¶
Sample points for testing.
Find
num_points
most uncertain points fromuncertainty_map
.- Parameters
seg_logits (Tensor) – A tensor of shape (batch_size, num_classes, height, width) for class-specific or class-agnostic prediction.
uncertainty_func (func) – uncertainty calculation function.
cfg (dict) – Testing config of point head.
- Returns
- A tensor of shape (batch_size, num_points)
that contains indices from [0, height x width) of the most uncertain points.
- point_coords (Tensor): A tensor of shape (batch_size, num_points,
2) that contains [0, 1] x [0, 1] normalized coordinates of the most uncertain points from the
height x width
grid .
- Return type
point_indices (Tensor)
- get_points_train(seg_logits, uncertainty_func, cfg)[source]¶
Sample points for training.
Sample points in [0, 1] x [0, 1] coordinate space based on their uncertainty. The uncertainties are calculated for each point using ‘uncertainty_func’ function that takes point’s logit prediction as input.
- Parameters
seg_logits (Tensor) – Semantic segmentation logits, shape ( batch_size, num_classes, height, width).
uncertainty_func (func) – uncertainty calculation function.
cfg (dict) – Training config of point head.
- Returns
- A tensor of shape (batch_size, num_points,
2) that contains the coordinates of
num_points
sampled points.
- Return type
point_coords (Tensor)
- loss(inputs, prev_output, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample], train_cfg, **kwargs)[source]¶
Forward function for training. :param inputs: List of multi-level img features. :type inputs: list[Tensor] :param prev_output: The output of previous decode head. :type prev_output: Tensor :param batch_data_samples: The seg
data samples. It usually includes information such as img_metas or gt_semantic_seg.
- Parameters
train_cfg (dict) – The training config.
- Returns
a dictionary of loss components
- Return type
dict[str, Tensor]
- loss_by_feat(point_logits, points, batch_data_samples, **kwargs)[source]¶
Compute segmentation loss.
- predict(inputs, prev_output, batch_img_metas: List[dict], test_cfg, **kwargs)[source]¶
Forward function for testing.
- Parameters
inputs (list[Tensor]) – List of multi-level img features.
prev_output (Tensor) – The output of previous decode head.
img_metas (list[dict]) – List of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:Collect.
test_cfg (dict) – The testing config.
- Returns
Output segmentation map.
- Return type
Tensor
- class mmseg.models.decode_heads.SETRMLAHead(mla_channels=128, up_scale=4, **kwargs)[source]¶
Multi level feature aggretation head of SETR.
MLA head of SETR.
- Parameters
mlahead_channels (int) – Channels of conv-conv-4x of multi-level feature aggregation. Default: 128.
up_scale (int) – The scale factor of interpolate. Default:4.
- class mmseg.models.decode_heads.SETRUPHead(norm_layer={'eps': 1e-06, 'requires_grad': True, 'type': 'LN'}, num_convs=1, up_scale=4, kernel_size=3, init_cfg=[{'type': 'Constant', 'val': 1.0, 'bias': 0, 'layer': 'LayerNorm'}, {'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}}], **kwargs)[source]¶
Naive upsampling head and Progressive upsampling head of SETR.
Naive or PUP head of SETR.
- Parameters
norm_layer (dict) – Config dict for input normalization. Default: norm_layer=dict(type=’LN’, eps=1e-6, requires_grad=True).
num_convs (int) – Number of decoder convolutions. Default: 1.
up_scale (int) – The scale factor of interpolate. Default:4.
kernel_size (int) – The kernel size of convolution when decoding feature information from backbone. Default: 3.
init_cfg (dict | list[dict] | None) –
Initialization config dict. Default: dict(
type=’Constant’, val=1.0, bias=0, layer=’LayerNorm’).
- class mmseg.models.decode_heads.STDCHead(boundary_threshold=0.1, **kwargs)[source]¶
This head is the implementation of Rethinking BiSeNet For Real-time Semantic Segmentation.
- Parameters
boundary_threshold (float) – The threshold of calculating boundary. Default: 0.1.
- loss_by_feat(seg_logits: torch.Tensor, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample]) → dict[source]¶
Compute Detail Aggregation Loss.
- class mmseg.models.decode_heads.SegformerHead(interpolate_mode='bilinear', **kwargs)[source]¶
The all mlp Head of segformer.
This head is the implementation of Segformer <https://arxiv.org/abs/2105.15203> _.
- Parameters
interpolate_mode – The interpolate mode of MLP head upsample operation. Default: ‘bilinear’.
- class mmseg.models.decode_heads.SegmenterMaskTransformerHead(in_channels, num_layers, num_heads, embed_dims, mlp_ratio=4, drop_path_rate=0.1, drop_rate=0.0, attn_drop_rate=0.0, num_fcs=2, qkv_bias=True, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, init_std=0.02, **kwargs)[source]¶
Segmenter: Transformer for Semantic Segmentation.
This head is the implementation of Segmenter:.
- Parameters
backbone_cfg – (dict): Config of backbone of Context Path.
in_channels (int) – The number of channels of input image.
num_layers (int) – The depth of transformer.
num_heads (int) – The number of attention heads.
embed_dims (int) – The number of embedding dimension.
mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.
drop_path_rate (float) – stochastic depth rate. Default 0.1.
drop_rate (float) – Probability of an element to be zeroed. Default 0.0
attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0
num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.
qkv_bias (bool) – Enable bias for qkv if True. Default: True.
act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)
init_std (float) – The value of std in weight initialization. Default: 0.02.
- class mmseg.models.decode_heads.UPerHead(pool_scales=(1, 2, 3, 6), **kwargs)[source]¶
Unified Perceptual Parsing for Scene Understanding.
This head is the implementation of UPerNet.
- Parameters
pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module applied on the last feature. Default: (1, 2, 3, 6).
segmentors¶
- class mmseg.models.segmentors.BaseSegmentor(data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]¶
Base class for segmentors.
- Parameters
data_preprocessor – Model preprocessing config for processing the input data. it usually includes
to_rgb
,pad_size_divisor
,pad_val
,mean
andstd
. Default to None.
- abstract encode_decode(inputs: torch.Tensor, batch_data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample])[source]¶
Placeholder for encode images with backbone and decode into a semantic segmentation map of the same size as input.
- abstract extract_feat(inputs: torch.Tensor) → bool[source]¶
Placeholder for extract features from images.
- forward(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None, mode: str = 'tensor') → Union[Dict[str, torch.Tensor], List[mmseg.structures.seg_data_sample.SegDataSample], Tuple[torch.Tensor], torch.Tensor][source]¶
The unified entry for a forward process in both training and test.
The method should accept three modes: “tensor”, “predict” and “loss”:
“tensor”: Forward the whole network and return tensor or tuple of
tensor without any post-processing, same as a common nn.Module. - “predict”: Forward and return the predictions, which are fully processed to a list of
SegDataSample
. - “loss”: Forward and return a dict of losses according to the given inputs and data samples.Note that this method doesn’t handle neither back propagation nor optimizer updating, which are done in the
train_step()
.- Parameters
inputs (torch.Tensor) – The input tensor with shape (N, C, …) in general.
data_samples (list[
SegDataSample
]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg. Default to None.mode (str) – Return what kind of value. Defaults to ‘tensor’.
- Returns
The return type depends on
mode
.If
mode="tensor"
, return a tensor or a tuple of tensor.If
mode="predict"
, return a list ofDetDataSample
.If
mode="loss"
, return a dict of tensor.
- abstract loss(inputs: torch.Tensor, data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample]) → dict[source]¶
Calculate losses from a batch of inputs and data samples.
- postprocess_result(seg_logits: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None) → Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]¶
Convert results list to SegDataSample. :param seg_logits: The segmentation results, seg_logits from
model of each input image.
- Parameters
data_samples (list[
SegDataSample
]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg. Default to None.- Returns
Segmentation results of the input images. Each SegDataSample usually contain:
- Return type
list[
SegDataSample
]
- abstract predict(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None) → Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]¶
Predict results from a batch of inputs and data samples with post- processing.
- property with_auxiliary_head: bool¶
whether the segmentor has auxiliary head
- Type
bool
- property with_decode_head: bool¶
whether the segmentor has decode head
- Type
bool
- property with_neck: bool¶
whether the segmentor has neck
- Type
bool
- class mmseg.models.segmentors.CascadeEncoderDecoder(num_stages: int, backbone: Union[mmengine.config.config.ConfigDict, dict], decode_head: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, auxiliary_head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, pretrained: Optional[str] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]¶
Cascade Encoder Decoder segmentors.
CascadeEncoderDecoder almost the same as EncoderDecoder, while decoders of CascadeEncoderDecoder are cascaded. The output of previous decoder_head will be the input of next decoder_head.
- Parameters
num_stages (int) – How many stages will be cascaded.
backbone (ConfigType) – The config for the backnone of segmentor.
decode_head (ConfigType) – The config for the decode head of segmentor.
neck (OptConfigType) – The config for the neck of segmentor. Defaults to None.
auxiliary_head (OptConfigType) – The config for the auxiliary head of segmentor. Defaults to None.
train_cfg (OptConfigType) – The config for training. Defaults to None.
test_cfg (OptConfigType) – The config for testing. Defaults to None.
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.pretrained (str, optional) – The path for pretrained model. Defaults to None.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
.
- class mmseg.models.segmentors.EncoderDecoder(backbone: Union[mmengine.config.config.ConfigDict, dict], decode_head: Union[mmengine.config.config.ConfigDict, dict], neck: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, auxiliary_head: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, train_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, test_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, data_preprocessor: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, pretrained: Optional[str] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict, Sequence[Union[mmengine.config.config.ConfigDict, dict]]]] = None)[source]¶
Encoder Decoder segmentors.
EncoderDecoder typically consists of backbone, decode_head, auxiliary_head. Note that auxiliary_head is only used for deep supervision during training, which could be dumped during inference.
1. The
loss
method is used to calculate the loss of model, which includes two steps: (1) Extracts features to obtain the feature maps (2) Call the decode head loss function to forward decode head model and calculate losses.loss(): extract_feat() -> _decode_head_forward_train() -> _auxiliary_head_forward_train (optional) _decode_head_forward_train(): decode_head.loss() _auxiliary_head_forward_train(): auxiliary_head.loss (optional)
2. The
predict
method is used to predict segmentation results, which includes two steps: (1) Run inference function to obtain the list of seg_logits (2) Call post-processing function to obtain list ofSegDataSampel
includingpred_sem_seg
andseg_logits
.predict(): inference() -> postprocess_result() infercen(): whole_inference()/slide_inference() whole_inference()/slide_inference(): encoder_decoder() encoder_decoder(): extract_feat() -> decode_head.predict()
3. The
_forward
method is used to output the tensor by running the model, which includes two steps: (1) Extracts features to obtain the feature maps (2)Call the decode head forward function to forward decode head model._forward(): extract_feat() -> _decode_head.forward()
- Parameters
backbone (ConfigType) – The config for the backnone of segmentor.
decode_head (ConfigType) – The config for the decode head of segmentor.
neck (OptConfigType) – The config for the neck of segmentor. Defaults to None.
auxiliary_head (OptConfigType) – The config for the auxiliary head of segmentor. Defaults to None.
train_cfg (OptConfigType) – The config for training. Defaults to None.
test_cfg (OptConfigType) – The config for testing. Defaults to None.
data_preprocessor (dict, optional) – The pre-process config of
BaseDataPreprocessor
.pretrained (str, optional) – The path for pretrained model. Defaults to None.
init_cfg (dict, optional) – The weight initialized config for
BaseModule
.
- aug_test(inputs, batch_img_metas, rescale=True)[source]¶
Test with augmentations.
Only rescale=True is supported.
- encode_decode(inputs: torch.Tensor, batch_img_metas: List[dict]) → torch.Tensor[source]¶
Encode images with backbone and decode into a semantic segmentation map of the same size as input.
- inference(inputs: torch.Tensor, batch_img_metas: List[dict]) → torch.Tensor[source]¶
Inference with slide/whole style.
- Parameters
inputs (Tensor) – The input image of shape (N, 3, H, W).
batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, ‘pad_shape’, and ‘padding_size’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.
- Returns
- The segmentation results, seg_logits from model of each
input image.
- Return type
Tensor
- loss(inputs: torch.Tensor, data_samples: Sequence[mmseg.structures.seg_data_sample.SegDataSample]) → dict[source]¶
Calculate losses from a batch of inputs and data samples.
- Parameters
inputs (Tensor) – Input images.
data_samples (list[
SegDataSample
]) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.
- Returns
a dictionary of loss components
- Return type
dict[str, Tensor]
- predict(inputs: torch.Tensor, data_samples: Optional[Sequence[mmseg.structures.seg_data_sample.SegDataSample]] = None) → Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]¶
Predict results from a batch of inputs and data samples with post- processing.
- Parameters
inputs (Tensor) – Inputs with shape (N, C, H, W).
data_samples (List[
SegDataSample
], optional) – The seg data samples. It usually includes information such as metainfo and gt_sem_seg.
- Returns
Segmentation results of the input images. Each SegDataSample usually contain:
- Return type
list[
SegDataSample
]
- slide_inference(inputs: torch.Tensor, batch_img_metas: List[dict]) → torch.Tensor[source]¶
Inference by sliding-window with overlap.
If h_crop > h_img or w_crop > w_img, the small patch will be used to decode without padding.
- Parameters
inputs (tensor) – the tensor should have a shape NxCxHxW, which contains all images in the batch.
batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, and ‘pad_shape’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.
- Returns
- The segmentation results, seg_logits from model of each
input image.
- Return type
Tensor
- whole_inference(inputs: torch.Tensor, batch_img_metas: List[dict]) → torch.Tensor[source]¶
Inference with full image.
- Parameters
inputs (Tensor) – The tensor should have a shape NxCxHxW, which contains all images in the batch.
batch_img_metas (List[dict]) – List of image metainfo where each may also contain: ‘img_shape’, ‘scale_factor’, ‘flip’, ‘img_path’, ‘ori_shape’, and ‘pad_shape’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:PackSegInputs.
- Returns
- The segmentation results, seg_logits from model of each
input image.
- Return type
Tensor
- class mmseg.models.segmentors.SegTTAModel(module: Union[dict, torch.nn.modules.module.Module], data_preprocessor: Optional[Union[dict, torch.nn.modules.module.Module]] = None)[source]¶
- merge_preds(data_samples_list: List[Sequence[mmseg.structures.seg_data_sample.SegDataSample]]) → Sequence[mmseg.structures.seg_data_sample.SegDataSample][source]¶
Merge predictions of enhanced data to one prediction.
- Parameters
data_samples_list (List[SampleList]) – List of predictions of all enhanced data.
- Returns
Merged prediction.
- Return type
SampleList
losses¶
- class mmseg.models.losses.Accuracy(topk=(1), thresh=None, ignore_index=None)[source]¶
Accuracy calculation module.
- class mmseg.models.losses.BoundaryLoss(loss_weight: float = 1.0, loss_name: str = 'loss_boundary')[source]¶
Boundary loss.
This function is modified from PIDNet. # noqa Licensed under the MIT License.
- Parameters
loss_weight (float) – Weight of the loss. Defaults to 1.0.
loss_name (str) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_boundary’.
- class mmseg.models.losses.CrossEntropyLoss(use_sigmoid=False, use_mask=False, reduction='mean', class_weight=None, loss_weight=1.0, loss_name='loss_ce', avg_non_ignore=False)[source]¶
CrossEntropyLoss.
- Parameters
use_sigmoid (bool, optional) – Whether the prediction uses sigmoid of softmax. Defaults to False.
use_mask (bool, optional) – Whether to use mask cross entropy loss. Defaults to False.
reduction (str, optional) – . Defaults to ‘mean’. Options are “none”, “mean” and “sum”.
class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.
loss_weight (float, optional) – Weight of the loss. Defaults to 1.0.
loss_name (str, optional) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_ce’.
avg_non_ignore (bool) – The flag decides to whether the loss is only averaged over non-ignored targets. Default: False. New in version 0.23.0.
- forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, ignore_index=- 100, **kwargs)[source]¶
Forward function.
- property loss_name¶
Loss Name.
This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name.
- Returns
The name of this loss item.
- Return type
str
- class mmseg.models.losses.DiceLoss(smooth=1, exponent=2, reduction='mean', class_weight=None, loss_weight=1.0, ignore_index=255, loss_name='loss_dice', **kwards)[source]¶
DiceLoss.
This loss is proposed in V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation.
- Parameters
smooth (float) – A float number to smooth loss, and avoid NaN error. Default: 1
exponent (float) – An float number to calculate denominator value: sum{x^exponent} + sum{y^exponent}. Default: 2.
reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. This parameter only works when per_image is True. Default: ‘mean’.
class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.
loss_weight (float, optional) – Weight of the loss. Default to 1.0.
ignore_index (int | None) – The label index to be ignored. Default: 255.
loss_name (str, optional) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_dice’.
- forward(pred, target, avg_factor=None, reduction_override=None, **kwards)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- property loss_name¶
Loss Name.
This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str
- class mmseg.models.losses.FocalLoss(use_sigmoid=True, gamma=2.0, alpha=0.5, reduction='mean', class_weight=None, loss_weight=1.0, loss_name='loss_focal')[source]¶
- forward(pred, target, weight=None, avg_factor=None, reduction_override=None, ignore_index=255, **kwargs)[source]¶
Forward function.
- Parameters
pred (torch.Tensor) – The prediction with shape (N, C) where C = number of classes, or (N, C, d_1, d_2, …, d_K) with K≥1 in the case of K-dimensional loss.
target (torch.Tensor) – The ground truth. If containing class indices, shape (N) where each value is 0≤targets[i]≤C−1, or (N, d_1, d_2, …, d_K) with K≥1 in the case of K-dimensional loss. If containing class probabilities, same shape as the input.
weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Options are “none”, “mean” and “sum”.
ignore_index (int, optional) – The label index to be ignored. Default: 255
- Returns
The calculated loss
- Return type
torch.Tensor
- property loss_name¶
Loss Name.
This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str
- class mmseg.models.losses.LovaszLoss(loss_type='multi_class', classes='present', per_image=False, reduction='mean', class_weight=None, loss_weight=1.0, loss_name='loss_lovasz')[source]¶
LovaszLoss.
This loss is proposed in The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks.
- Parameters
loss_type (str, optional) – Binary or multi-class loss. Default: ‘multi_class’. Options are “binary” and “multi_class”.
classes (str | list[int], optional) – Classes chosen to calculate loss. ‘all’ for all classes, ‘present’ for classes present in labels, or a list of classes to average. Default: ‘present’.
per_image (bool, optional) – If per_image is True, compute the loss per image instead of per batch. Default: False.
reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. This parameter only works when per_image is True. Default: ‘mean’.
class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.
loss_weight (float, optional) – Weight of the loss. Defaults to 1.0.
loss_name (str, optional) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_lovasz’.
- forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, **kwargs)[source]¶
Forward function.
- property loss_name¶
Loss Name.
This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str
- class mmseg.models.losses.OhemCrossEntropy(ignore_label: int = 255, thres: float = 0.7, min_kept: int = 100000, loss_weight: float = 1.0, class_weight: Optional[Union[List[float], str]] = None, loss_name: str = 'loss_ohem')[source]¶
OhemCrossEntropy loss.
This func is modified from PIDNet. # noqa
Licensed under the MIT License.
- Parameters
ignore_label (int) – Labels to ignore when computing the loss. Default: 255
thresh (float, optional) – The threshold for hard example selection. Below which, are prediction with low confidence. If not specified, the hard examples will be pixels of top
min_kept
loss. Default: 0.7.min_kept (int, optional) – The minimum number of predictions to keep. Default: 100000.
loss_weight (float) – Weight of the loss. Defaults to 1.0.
class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.
loss_name (str) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_boundary’.
- class mmseg.models.losses.TverskyLoss(smooth=1, class_weight=None, loss_weight=1.0, ignore_index=255, alpha=0.3, beta=0.7, loss_name='loss_tversky')[source]¶
TverskyLoss. This loss is proposed in `Tversky loss function for image segmentation using 3D fully convolutional deep networks.
<https://arxiv.org/abs/1706.05721>`_. :param smooth: A float number to smooth loss, and avoid NaN error.
Default: 1.
- Parameters
class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.
loss_weight (float, optional) – Weight of the loss. Default to 1.0.
ignore_index (int | None) – The label index to be ignored. Default: 255.
alpha (float, in [0, 1]) – The coefficient of false positives. Default: 0.3.
beta (float, in [0, 1]) – The coefficient of false negatives. Default: 0.7. Note: alpha + beta = 1.
loss_name (str, optional) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_tversky’.
- forward(pred, target, **kwargs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- property loss_name¶
Loss Name.
This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str
- mmseg.models.losses.accuracy(pred, target, topk=1, thresh=None, ignore_index=None)[source]¶
Calculate accuracy according to the prediction and target.
- Parameters
pred (torch.Tensor) – The model prediction, shape (N, num_class, …)
target (torch.Tensor) – The target of each prediction, shape (N, , …)
ignore_index (int | None) – The label index to be ignored. Default: None
topk (int | tuple[int], optional) – If the predictions in
topk
matches the target, the predictions will be regarded as correct ones. Defaults to 1.thresh (float, optional) – If not None, predictions with scores under this threshold are considered incorrect. Default to None.
- Returns
- If the input
topk
is a single integer, the function will return a single float as accuracy. If
topk
is a tuple containing multiple integers, the function will return a tuple containing accuracies of eachtopk
number.
- If the input
- Return type
float | tuple[float]
- mmseg.models.losses.binary_cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None, class_weight=None, ignore_index=- 100, avg_non_ignore=False, **kwargs)[source]¶
Calculate the binary CrossEntropy loss.
- Parameters
pred (torch.Tensor) – The prediction with shape (N, 1).
label (torch.Tensor) – The learning label of the prediction. Note: In bce loss, label < 0 is invalid.
weight (torch.Tensor, optional) – Sample-wise loss weight.
reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
class_weight (list[float], optional) – The weight for each class.
ignore_index (int) – The label index to be ignored. Default: -100.
avg_non_ignore (bool) – The flag decides to whether the loss is only averaged over non-ignored targets. Default: False. New in version 0.23.0.
- Returns
The calculated loss
- Return type
torch.Tensor
- mmseg.models.losses.cross_entropy(pred, label, weight=None, class_weight=None, reduction='mean', avg_factor=None, ignore_index=- 100, avg_non_ignore=False)[source]¶
cross_entropy. The wrapper function for
F.cross_entropy()
- Parameters
pred (torch.Tensor) – The prediction with shape (N, 1).
label (torch.Tensor) – The learning label of the prediction.
weight (torch.Tensor, optional) – Sample-wise loss weight. Default: None.
class_weight (list[float], optional) – The weight for each class. Default: None.
reduction (str, optional) – The method used to reduce the loss. Options are ‘none’, ‘mean’ and ‘sum’. Default: ‘mean’.
avg_factor (int, optional) – Average factor that is used to average the loss. Default: None.
ignore_index (int) – Specifies a target value that is ignored and does not contribute to the input gradients. When
avg_non_ignore `` is ``True
, and thereduction
is''mean''
, the loss is averaged over non-ignored targets. Defaults: -100.avg_non_ignore (bool) – The flag decides to whether the loss is only averaged over non-ignored targets. Default: False. New in version 0.23.0.
- mmseg.models.losses.mask_cross_entropy(pred, target, label, reduction='mean', avg_factor=None, class_weight=None, ignore_index=None, **kwargs)[source]¶
Calculate the CrossEntropy loss for masks.
- Parameters
pred (torch.Tensor) – The prediction with shape (N, C), C is the number of classes.
target (torch.Tensor) – The learning label of the prediction.
label (torch.Tensor) –
label
indicates the class label of the mask’ corresponding object. This will be used to select the mask in the of the class which the object belongs to when the mask prediction if not class-agnostic.reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”.
avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.
class_weight (list[float], optional) – The weight for each class.
ignore_index (None) – Placeholder, to be consistent with other loss. Default: None.
- Returns
The calculated loss
- Return type
torch.Tensor
- mmseg.models.losses.reduce_loss(loss, reduction)[source]¶
Reduce loss as specified.
- Parameters
loss (Tensor) – Elementwise loss tensor.
reduction (str) – Options are “none”, “mean” and “sum”.
- Returns
Reduced loss tensor.
- Return type
Tensor
- mmseg.models.losses.weight_reduce_loss(loss, weight=None, reduction='mean', avg_factor=None)[source]¶
Apply element-wise weight and reduce loss.
- Parameters
loss (Tensor) – Element-wise loss.
weight (Tensor) – Element-wise weights.
reduction (str) – Same as built-in losses of PyTorch.
avg_factor (float) – Average factor when computing the mean of losses.
- Returns
Processed loss values.
- Return type
Tensor
- mmseg.models.losses.weighted_loss(loss_func)[source]¶
Create a weighted version of a given loss function.
To use this decorator, the loss function must have the signature like loss_func(pred, target, **kwargs). The function only needs to compute element-wise loss without any reduction. This decorator will add weight and reduction arguments to the function. The decorated function will have the signature like loss_func(pred, target, weight=None, reduction=’mean’, avg_factor=None, **kwargs).
- Example
>>> import torch >>> @weighted_loss >>> def l1_loss(pred, target): >>> return (pred - target).abs()
>>> pred = torch.Tensor([0, 2, 3]) >>> target = torch.Tensor([1, 1, 1]) >>> weight = torch.Tensor([1, 0, 1])
>>> l1_loss(pred, target) tensor(1.3333) >>> l1_loss(pred, target, weight) tensor(1.) >>> l1_loss(pred, target, reduction='none') tensor([1., 1., 2.]) >>> l1_loss(pred, target, weight, avg_factor=2) tensor(1.5000)
necks¶
- class mmseg.models.necks.FPN(in_channels, out_channels, num_outs, start_level=0, end_level=- 1, add_extra_convs=False, extra_convs_on_inputs=False, relu_before_extra_convs=False, no_norm_on_lateral=False, conv_cfg=None, norm_cfg=None, act_cfg=None, upsample_cfg={'mode': 'nearest'}, init_cfg={'distribution': 'uniform', 'layer': 'Conv2d', 'type': 'Xavier'})[source]¶
Feature Pyramid Network.
This neck is the implementation of Feature Pyramid Networks for Object Detection.
- Parameters
in_channels (list[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale).
num_outs (int) – Number of output scales.
start_level (int) – Index of the start input backbone level used to build the feature pyramid. Default: 0.
end_level (int) – Index of the end input backbone level (exclusive) to build the feature pyramid. Default: -1, which means the last level.
add_extra_convs (bool | str) –
If bool, it decides whether to add conv layers on top of the original feature maps. Default to False. If True, its actual mode is specified by extra_convs_on_inputs. If str, it specifies the source feature map of the extra convs. Only the following options are allowed
’on_input’: Last feat map of neck inputs (i.e. backbone feature).
’on_lateral’: Last feature map after lateral convs.
’on_output’: The last output feature map after fpn convs.
extra_convs_on_inputs (bool, deprecated) – Whether to apply extra convs on the original feature from the backbone. If True, it is equivalent to add_extra_convs=’on_input’. If False, it is equivalent to set add_extra_convs=’on_output’. Default to True.
relu_before_extra_convs (bool) – Whether to apply relu before the extra conv. Default: False.
no_norm_on_lateral (bool) – Whether to apply norm on lateral. Default: False.
conv_cfg (dict) – Config dict for convolution layer. Default: None.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
act_cfg (dict) – Config dict for activation layer in ConvModule. Default: None.
upsample_cfg (dict) – Config dict for interpolate layer. Default: dict(mode=’nearest’).
init_cfg (dict or list[dict], optional) – Initialization config dict.
Example
>>> import torch >>> in_channels = [2, 3, 5, 7] >>> scales = [340, 170, 84, 43] >>> inputs = [torch.rand(1, c, s, s) ... for c, s in zip(in_channels, scales)] >>> self = FPN(in_channels, 11, len(in_channels)).eval() >>> outputs = self.forward(inputs) >>> for i in range(len(outputs)): ... print(f'outputs[{i}].shape = {outputs[i].shape}') outputs[0].shape = torch.Size([1, 11, 340, 340]) outputs[1].shape = torch.Size([1, 11, 170, 170]) outputs[2].shape = torch.Size([1, 11, 84, 84]) outputs[3].shape = torch.Size([1, 11, 43, 43])
- forward(inputs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.necks.Feature2Pyramid(embed_dim, rescales=[4, 2, 1, 0.5], norm_cfg={'requires_grad': True, 'type': 'SyncBN'})[source]¶
Feature2Pyramid.
A neck structure connect ViT backbone and decoder_heads.
- Parameters
embed_dims (int) – Embedding dimension.
rescales (list[float]) – Different sampling multiples were used to obtain pyramid features. Default: [4, 2, 1, 0.5].
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’SyncBN’, requires_grad=True).
- forward(inputs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.necks.ICNeck(in_channels=(64, 256, 256), out_channels=128, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, align_corners=False, init_cfg=None)[source]¶
ICNet for Real-Time Semantic Segmentation on High-Resolution Images.
This head is the implementation of ICHead.
- Parameters
in_channels (int) – The number of input image channels. Default: 3.
out_channels (int) – The numbers of output feature channels. Default: 128.
conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.
norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’).
act_cfg (dict) – Dictionary to construct and config act layer. Default: dict(type=’ReLU’).
align_corners (bool) – align_corners argument of F.interpolate. Default: False.
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- forward(inputs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.necks.JPU(in_channels=(512, 1024, 2048), mid_channels=512, start_level=0, end_level=- 1, dilations=(1, 2, 4, 8), align_corners=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[source]¶
FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.
This Joint Pyramid Upsampling (JPU) neck is the implementation of FastFCN.
- Parameters
in_channels (Tuple[int], optional) – The number of input channels for each convolution operations before upsampling. Default: (512, 1024, 2048).
mid_channels (int) – The number of output channels of JPU. Default: 512.
start_level (int) – Index of the start input backbone level used to build the feature pyramid. Default: 0.
end_level (int) – Index of the end input backbone level (exclusive) to build the feature pyramid. Default: -1, which means the last level.
dilations (tuple[int]) – Dilation rate of each Depthwise Separable ConvModule. Default: (1, 2, 4, 8).
align_corners (bool, optional) – The align_corners argument of resize operation. Default: False.
conv_cfg (dict | None) – Config of conv layers. Default: None.
norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’BN’).
act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).
init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.
- class mmseg.models.necks.MLANeck(in_channels, out_channels, norm_layer={'eps': 1e-06, 'requires_grad': True, 'type': 'LN'}, norm_cfg=None, act_cfg=None)[source]¶
Multi-level Feature Aggregation.
This neck is The Multi-level Feature Aggregation construction of SETR.
- Parameters
in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale).
norm_layer (dict) – Config dict for input normalization. Default: norm_layer=dict(type=’LN’, eps=1e-6, requires_grad=True).
norm_cfg (dict) – Config dict for normalization layer. Default: None.
act_cfg (dict) – Config dict for activation layer in ConvModule. Default: None.
- forward(inputs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.necks.MultiLevelNeck(in_channels, out_channels, scales=[0.5, 1, 2, 4], norm_cfg=None, act_cfg=None)[source]¶
MultiLevelNeck.
A neck structure connect vit backbone and decoder_heads.
- Parameters
in_channels (List[int]) – Number of input channels per scale.
out_channels (int) – Number of output channels (used at each scale).
scales (List[float]) – Scale factors for each input feature map. Default: [0.5, 1, 2, 4]
norm_cfg (dict) – Config dict for normalization layer. Default: None.
act_cfg (dict) – Config dict for activation layer in ConvModule. Default: None.
- forward(inputs)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
utils¶
- class mmseg.models.utils.BasicBlock(in_channels: int, channels: int, stride: int = 1, downsample: Optional[torch.nn.modules.module.Module] = None, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, act_cfg_out: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
Basic block from ResNet.
- Parameters
in_channels (int) – Input channels.
channels (int) – Output channels.
stride (int) – Stride of the first block. Default: 1.
downsample (nn.Module, optional) – Downsample operation on identity. Default: None.
norm_cfg (dict, optional) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict, optional) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’, inplace=True).
act_cfg_out (dict, optional) – Config dict for activation layer at the last of the block. Default: None.
init_cfg (dict, optional) – Initialization config dict. Default: None.
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.utils.Bottleneck(in_channels: int, channels: int, stride: int = 1, downsample: Optional[torch.nn.modules.module.Module] = None, norm_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'type': 'BN'}, act_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = {'inplace': True, 'type': 'ReLU'}, act_cfg_out: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None, init_cfg: Optional[Union[mmengine.config.config.ConfigDict, dict]] = None)[source]¶
Bottleneck block from ResNet.
- Parameters
in_channels (int) – Input channels.
channels (int) – Output channels.
stride (int) – Stride of the first block. Default: 1.
downsample (nn.Module, optional) – Downsample operation on identity. Default: None.
norm_cfg (dict, optional) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict, optional) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’, inplace=True).
act_cfg_out (dict, optional) – Config dict for activation layer at the last of the block. Default: None.
init_cfg (dict, optional) – Initialization config dict. Default: None.
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.utils.DAPPM(in_channels: int, branch_channels: int, out_channels: int, num_scales: int, kernel_sizes: List[int] = [5, 9, 17], strides: List[int] = [2, 4, 8], paddings: List[int] = [2, 4, 8], norm_cfg: Dict = {'momentum': 0.1, 'type': 'BN'}, act_cfg: Dict = {'inplace': True, 'type': 'ReLU'}, conv_cfg: Dict = {'bias': False, 'order': ('norm', 'act', 'conv')}, upsample_mode: str = 'bilinear')[source]¶
DAPPM module in DDRNet.
- Parameters
in_channels (int) – Input channels.
branch_channels (int) – Branch channels.
out_channels (int) – Output channels.
num_scales (int) – Number of scales.
kernel_sizes (list[int]) – Kernel sizes of each scale.
strides (list[int]) – Strides of each scale.
paddings (list[int]) – Paddings of each scale.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’, inplace=True).
conv_cfg (dict) – Config dict for convolution layer in ConvModule. Default: dict(order=(‘norm’, ‘act’, ‘conv’), bias=False).
upsample_mode (str) – Upsample mode. Default: ‘bilinear’.
- forward(inputs: torch.Tensor)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.utils.Encoding(channels, num_codes)[source]¶
Encoding Layer: a learnable residual encoder.
Input is of shape (batch_size, channels, height, width). Output is of shape (batch_size, num_codes, channels).
- Parameters
channels – dimension of the features or feature channels
num_codes – number of code words
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.utils.InvertedResidual(in_channels, out_channels, stride, expand_ratio, dilation=1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, with_cp=False, **kwargs)[source]¶
InvertedResidual block for MobileNetV2.
- Parameters
in_channels (int) – The input channels of the InvertedResidual block.
out_channels (int) – The output channels of the InvertedResidual block.
stride (int) – Stride of the middle (first) 3x3 convolution.
expand_ratio (int) – Adjusts number of channels of the hidden layer in InvertedResidual by this amount.
dilation (int) – Dilation rate of depthwise conv. Default: 1
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- Returns
The output tensor.
- Return type
Tensor
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.utils.InvertedResidualV3(in_channels, out_channels, mid_channels, kernel_size=3, stride=1, se_cfg=None, with_expand_conv=True, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_cp=False)[source]¶
Inverted Residual Block for MobileNetV3.
- Parameters
in_channels (int) – The input channels of this Module.
out_channels (int) – The output channels of this Module.
mid_channels (int) – The input channels of the depthwise convolution.
kernel_size (int) – The kernel size of the depthwise convolution. Default: 3.
stride (int) – The stride of the depthwise convolution. Default: 1.
se_cfg (dict) – Config dict for se layer. Default: None, which means no se layer.
with_expand_conv (bool) – Use expand conv or not. If set False, mid_channels must be the same with in_channels. Default: True.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
- Returns
The output tensor.
- Return type
Tensor
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.utils.PAPPM(in_channels: int, branch_channels: int, out_channels: int, num_scales: int, kernel_sizes: List[int] = [5, 9, 17], strides: List[int] = [2, 4, 8], paddings: List[int] = [2, 4, 8], norm_cfg: Dict = {'momentum': 0.1, 'type': 'BN'}, act_cfg: Dict = {'inplace': True, 'type': 'ReLU'}, conv_cfg: Dict = {'bias': False, 'order': ('norm', 'act', 'conv')}, upsample_mode: str = 'bilinear')[source]¶
PAPPM module in PIDNet.
- Parameters
in_channels (int) – Input channels.
branch_channels (int) – Branch channels.
out_channels (int) – Output channels.
num_scales (int) – Number of scales.
kernel_sizes (list[int]) – Kernel sizes of each scale.
strides (list[int]) – Strides of each scale.
paddings (list[int]) – Paddings of each scale.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’, momentum=0.1).
act_cfg (dict) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’, inplace=True).
conv_cfg (dict) – Config dict for convolution layer in ConvModule. Default: dict(order=(‘norm’, ‘act’, ‘conv’), bias=False).
upsample_mode (str) – Upsample mode. Default: ‘bilinear’.
- forward(inputs: torch.Tensor)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.utils.PatchEmbed(in_channels=3, embed_dims=768, conv_type='Conv2d', kernel_size=16, stride=None, padding='corner', dilation=1, bias=True, norm_cfg=None, input_size=None, init_cfg=None)[source]¶
Image to Patch Embedding.
We use a conv layer to implement PatchEmbed.
- Parameters
in_channels (int) – The num of input channels. Default: 3
embed_dims (int) – The dimensions of embedding. Default: 768
conv_type (str) – The config dict for embedding conv layer type selection. Default: “Conv2d”.
kernel_size (int) – The kernel_size of embedding conv. Default: 16.
stride (int, optional) – The slide stride of embedding conv. Default: None (Would be set as kernel_size).
padding (int | tuple | string) – The padding length of embedding conv. When it is a string, it means the mode of adaptive padding, support “same” and “corner” now. Default: “corner”.
dilation (int) – The dilation rate of embedding conv. Default: 1.
bias (bool) – Bias of embed conv. Default: True.
norm_cfg (dict, optional) – Config dict for normalization layer. Default: None.
input_size (int | tuple | None) – The size of input, which will be used to calculate the out size. Only work when dynamic_size is False. Default: None.
init_cfg (mmengine.ConfigDict, optional) – The Config for initialization. Default: None.
- class mmseg.models.utils.ResLayer(block, inplanes, planes, num_blocks, stride=1, dilation=1, avg_down=False, conv_cfg=None, norm_cfg={'type': 'BN'}, multi_grid=None, contract_dilation=False, **kwargs)[source]¶
ResLayer to build ResNet style backbone.
- Parameters
block (nn.Module) – block used to build ResLayer.
inplanes (int) – inplanes of block.
planes (int) – planes of block.
num_blocks (int) – number of blocks.
stride (int) – stride of the first block. Default: 1
avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False
conv_cfg (dict) – dictionary to construct and config conv layer. Default: None
norm_cfg (dict) – dictionary to construct and config norm layer. Default: dict(type=’BN’)
multi_grid (int | None) – Multi grid dilation rates of last stage. Default: None
contract_dilation (bool) – Whether contract first dilation of each layer Default: False
- class mmseg.models.utils.SELayer(channels, ratio=16, conv_cfg=None, act_cfg=({'type': 'ReLU'}, {'type': 'HSigmoid', 'bias': 3.0, 'divisor': 6.0}))[source]¶
Squeeze-and-Excitation Module.
- Parameters
channels (int) – The input (and output) channels of the SE layer.
ratio (int) – Squeeze ratio in SELayer, the intermediate channel will be
int(channels/ratio)
. Default: 16.conv_cfg (None or dict) – Config dict for convolution layer. Default: None, which means using conv2d.
act_cfg (dict or Sequence[dict]) – Config dict for activation layer. If act_cfg is a dict, two activation layers will be configured by this dict. If act_cfg is a sequence of dicts, the first activation layer will be configured by the first dict and the second activation layer will be configured by the second dict. Default: (dict(type=’ReLU’), dict(type=’HSigmoid’, bias=3.0, divisor=6.0)).
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmseg.models.utils.SelfAttentionBlock(key_in_channels, query_in_channels, channels, out_channels, share_key_query, query_downsample, key_downsample, key_query_num_convs, value_out_num_convs, key_query_norm, value_out_norm, matmul_norm, with_out, conv_cfg, norm_cfg, act_cfg)[source]¶
General self-attention block/non-local block.
Please refer to https://arxiv.org/abs/1706.03762 for details about key, query and value.
- Parameters
key_in_channels (int) – Input channels of key feature.
query_in_channels (int) – Input channels of query feature.
channels (int) – Output channels of key/query transform.
out_channels (int) – Output channels.
share_key_query (bool) – Whether share projection weight between key and query projection.
query_downsample (nn.Module) – Query downsample module.
key_downsample (nn.Module) – Key downsample module.
key_query_num_convs (int) – Number of convs for key/query projection.
value_num_convs (int) – Number of convs for value projection.
matmul_norm (bool) – Whether normalize attention map with sqrt of channels
with_out (bool) – Whether use out projection.
conv_cfg (dict|None) – Config of conv layers.
norm_cfg (dict|None) – Config of norm layers.
act_cfg (dict|None) – Config of activation layers.
- class mmseg.models.utils.UpConvBlock(conv_block, in_channels, skip_channels, out_channels, num_convs=2, stride=1, dilation=1, with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, upsample_cfg={'type': 'InterpConv'}, dcn=None, plugins=None)[source]¶
Upsample convolution block in decoder for UNet.
This upsample convolution block consists of one upsample module followed by one convolution block. The upsample module expands the high-level low-resolution feature map and the convolution block fuses the upsampled high-level low-resolution feature map and the low-level high-resolution feature map from encoder.
- Parameters
conv_block (nn.Sequential) – Sequential of convolutional layers.
in_channels (int) – Number of input channels of the high-level
skip_channels (int) – Number of input channels of the low-level
feature map from encoder. (high-resolution) –
out_channels (int) – Number of output channels.
num_convs (int) – Number of convolutional layers in the conv_block. Default: 2.
stride (int) – Stride of convolutional layer in conv_block. Default: 1.
dilation (int) – Dilation rate of convolutional layer in conv_block. Default: 1.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.
conv_cfg (dict | None) – Config dict for convolution layer. Default: None.
norm_cfg (dict | None) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict | None) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’).
upsample_cfg (dict) – The upsample config of the upsample module in decoder. Default: dict(type=’InterpConv’). If the size of high-level feature map is the same as that of skip feature map (low-level feature map from encoder), it does not need upsample the high-level feature map and the upsample_cfg is None.
dcn (bool) – Use deformable convolution in convolutional layer or not. Default: None.
plugins (dict) – plugins for convolutional layers. Default: None.
- class mmseg.models.utils.Upsample(size=None, scale_factor=None, mode='nearest', align_corners=None)[source]¶
- forward(x)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- mmseg.models.utils.make_divisible(value, divisor, min_value=None, min_ratio=0.9)[source]¶
Make divisible function.
This function rounds the channel number to the nearest value that can be divisible by the divisor. It is taken from the original tf repo. It ensures that all layers have a channel number that is divisible by divisor. It can be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py # noqa
- Parameters
value (int) – The original channel number.
divisor (int) – The divisor to fully divide the channel number.
min_value (int) – The minimum value of the output channel. Default: None, means that the minimum value equal to the divisor.
min_ratio (float) – The minimum ratio of the rounded channel number to the original channel number. Default: 0.9.
- Returns
The modified output channel number.
- Return type
int
- mmseg.models.utils.nchw2nlc2nchw(module, x, contiguous=False, **kwargs)[source]¶
Flatten [N, C, H, W] shape tensor x to [N, L, C] shape tensor. Use the reshaped tensor as the input of module, and the convert the output of module, whose shape is.
[N, L, C], to [N, C, H, W].
- Parameters
module (Callable) – A callable object the takes a tensor with shape [N, L, C] as input.
x (Tensor) – The input tensor of shape [N, C, H, W]. contiguous:
contiguous (Bool) – Whether to make the tensor contiguous after each shape transform.
- Returns
The output tensor of shape [N, C, H, W].
- Return type
Tensor
Example
>>> import torch >>> import torch.nn as nn >>> norm = nn.LayerNorm(4) >>> feature_map = torch.rand(4, 4, 5, 5) >>> output = nchw2nlc2nchw(norm, feature_map)
- mmseg.models.utils.nchw_to_nlc(x)[source]¶
Flatten [N, C, H, W] shape tensor to [N, L, C] shape tensor.
- Parameters
x (Tensor) – The input tensor of shape [N, C, H, W] before conversion.
- Returns
The output tensor of shape [N, L, C] after conversion.
- Return type
Tensor
- mmseg.models.utils.nlc2nchw2nlc(module, x, hw_shape, contiguous=False, **kwargs)[source]¶
Convert [N, L, C] shape tensor x to [N, C, H, W] shape tensor. Use the reshaped tensor as the input of module, and convert the output of module, whose shape is.
[N, C, H, W], to [N, L, C].
- Parameters
module (Callable) – A callable object the takes a tensor with shape [N, C, H, W] as input.
x (Tensor) – The input tensor of shape [N, L, C].
hw_shape – (Sequence[int]): The height and width of the feature map with shape [N, C, H, W].
contiguous (Bool) – Whether to make the tensor contiguous after each shape transform.
- Returns
The output tensor of shape [N, L, C].
- Return type
Tensor
Example
>>> import torch >>> import torch.nn as nn >>> conv = nn.Conv2d(16, 16, 3, 1, 1) >>> feature_map = torch.rand(4, 25, 16) >>> output = nlc2nchw2nlc(conv, feature_map, (5, 5))
- mmseg.models.utils.nlc_to_nchw(x, hw_shape)[source]¶
Convert [N, L, C] shape tensor to [N, C, H, W] shape tensor.
- Parameters
x (Tensor) – The input tensor of shape [N, L, C] before conversion.
hw_shape (Sequence[int]) – The height and width of output feature map.
- Returns
The output tensor of shape [N, C, H, W] after conversion.
- Return type
Tensor
mmseg.structures¶
structures¶
- class mmseg.structures.OHEMPixelSampler(context, thresh=None, min_kept=100000)[source]¶
Online Hard Example Mining Sampler for segmentation.
- Parameters
context (nn.Module) – The context of sampler, subclass of
BaseDecodeHead
.thresh (float, optional) – The threshold for hard example selection. Below which, are prediction with low confidence. If not specified, the hard examples will be pixels of top
min_kept
loss. Default: None.min_kept (int, optional) – The minimum number of predictions to keep. Default: 100000.
- sample(seg_logit, seg_label)[source]¶
Sample pixels that have high loss or with low prediction confidence.
- Parameters
seg_logit (torch.Tensor) – segmentation logits, shape (N, C, H, W)
seg_label (torch.Tensor) – segmentation label, shape (N, 1, H, W)
- Returns
segmentation weight, shape (N, H, W)
- Return type
torch.Tensor
- class mmseg.structures.SegDataSample(*, metainfo: Optional[dict] = None, **kwargs)[source]¶
A data structure interface of MMSegmentation. They are used as interfaces between different components.
The attributes in
SegDataSample
are divided into several parts:Examples
>>> import torch >>> import numpy as np >>> from mmengine.structures import PixelData >>> from mmseg.structures import SegDataSample
>>> data_sample = SegDataSample() >>> img_meta = dict(img_shape=(4, 4, 3), ... pad_shape=(4, 4, 3)) >>> gt_segmentations = PixelData(metainfo=img_meta) >>> gt_segmentations.data = torch.randint(0, 2, (1, 4, 4)) >>> data_sample.gt_sem_seg = gt_segmentations >>> assert 'img_shape' in data_sample.gt_sem_seg.metainfo_keys() >>> data_sample.gt_sem_seg.shape (4, 4) >>> print(data_sample)
<SegDataSample(
META INFORMATION
DATA FIELDS gt_sem_seg: <PixelData(
META INFORMATION img_shape: (4, 4, 3) pad_shape: (4, 4, 3)
DATA FIELDS data: tensor([[[1, 1, 1, 0],
[1, 0, 1, 1], [1, 1, 1, 1], [0, 1, 0, 1]]])
) at 0x1c2b4156460>
) at 0x1c2aae44d60>
>>> data_sample = SegDataSample() >>> gt_sem_seg_data = dict(sem_seg=torch.rand(1, 4, 4)) >>> gt_sem_seg = PixelData(**gt_sem_seg_data) >>> data_sample.gt_sem_seg = gt_sem_seg >>> assert 'gt_sem_seg' in data_sample >>> assert 'sem_seg' in data_sample.gt_sem_seg
sampler¶
- class mmseg.structures.sampler.OHEMPixelSampler(context, thresh=None, min_kept=100000)[source]¶
Online Hard Example Mining Sampler for segmentation.
- Parameters
context (nn.Module) – The context of sampler, subclass of
BaseDecodeHead
.thresh (float, optional) – The threshold for hard example selection. Below which, are prediction with low confidence. If not specified, the hard examples will be pixels of top
min_kept
loss. Default: None.min_kept (int, optional) – The minimum number of predictions to keep. Default: 100000.
- sample(seg_logit, seg_label)[source]¶
Sample pixels that have high loss or with low prediction confidence.
- Parameters
seg_logit (torch.Tensor) – segmentation logits, shape (N, C, H, W)
seg_label (torch.Tensor) – segmentation label, shape (N, 1, H, W)
- Returns
segmentation weight, shape (N, H, W)
- Return type
torch.Tensor
mmseg.visualization¶
- class mmseg.visualization.SegLocalVisualizer(name: str = 'visualizer', image: Optional[numpy.ndarray] = None, vis_backends: Optional[Dict] = None, save_dir: Optional[str] = None, classes: Optional[List] = None, palette: Optional[List] = None, dataset_name: Optional[str] = None, alpha: float = 0.8, **kwargs)[source]¶
Local Visualizer.
- Parameters
name (str) – Name of the instance. Defaults to ‘visualizer’.
image (np.ndarray, optional) – the origin image to draw. The format should be RGB. Defaults to None.
vis_backends (list, optional) – Visual backend config list. Defaults to None.
save_dir (str, optional) – Save file dir for all storage backends. If it is None, the backend storage will not save any data.
classes (list, optional) – Input classes for result rendering, as the prediction of segmentation model is a segment map with label indices, classes is a list which includes items responding to the label indices. If classes is not defined, visualizer will take cityscapes classes by default. Defaults to None.
palette (list, optional) – Input palette for result rendering, which is a list of color palette responding to the classes. Defaults to None.
dataset_name (str, optional) –
Dataset name or alias visulizer will use the meta information of the dataset i.e. classes and palette, but the classes and palette have higher priority. Defaults to None.
alpha (int, float) – The transparency of segmentation mask. Defaults to 0.8.
Examples
>>> import numpy as np >>> import torch >>> from mmengine.structures import PixelData >>> from mmseg.data import SegDataSample >>> from mmseg.engine.visualization import SegLocalVisualizer
>>> seg_local_visualizer = SegLocalVisualizer() >>> image = np.random.randint(0, 256, ... size=(10, 12, 3)).astype('uint8') >>> gt_sem_seg_data = dict(data=torch.randint(0, 2, (1, 10, 12))) >>> gt_sem_seg = PixelData(**gt_sem_seg_data) >>> gt_seg_data_sample = SegDataSample() >>> gt_seg_data_sample.gt_sem_seg = gt_sem_seg >>> seg_local_visualizer.dataset_meta = dict( >>> classes=('background', 'foreground'), >>> palette=[[120, 120, 120], [6, 230, 230]]) >>> seg_local_visualizer.add_datasample('visualizer_example', ... image, gt_seg_data_sample) >>> seg_local_visualizer.add_datasample( ... 'visualizer_example', image, ... gt_seg_data_sample, show=True)
- add_datasample(name: str, image: numpy.ndarray, data_sample: Optional[mmseg.structures.seg_data_sample.SegDataSample] = None, draw_gt: bool = True, draw_pred: