Shortcuts

mmseg.apis

mmseg.apis.get_root_logger(log_file=None, log_level=20)[源代码]

Get the root logger.

The logger will be initialized if it has not been initialized. By default a StreamHandler will be added. If log_file is specified, a FileHandler will also be added. The name of the root logger is the top-level package name, e.g., “mmseg”.

参数
  • log_file (str | None) – The log filename. If specified, a FileHandler will be added to the root logger.

  • log_level (int) – The root logger level. Note that only the process of rank 0 is affected, while other processes will set the level to “Error” and be silent most of the time.

返回

The root logger.

返回类型

logging.Logger

mmseg.apis.inference_segmentor(model, img)[源代码]

Inference image(s) with the segmentor.

参数
  • model (nn.Module) – The loaded segmentor.

  • imgs (str/ndarray or list[str/ndarray]) – Either image files or loaded images.

返回

The segmentation result.

返回类型

(list[Tensor])

mmseg.apis.init_random_seed(seed=None, device='cuda')[源代码]

Initialize random seed.

If the seed is not set, the seed will be automatically randomized, and then broadcast to all processes to prevent some potential bugs. :param seed: The seed. Default to None. :type seed: int, Optional :param device: The device where the seed will be put on.

Default to ‘cuda’.

返回

Seed to be used.

返回类型

int

mmseg.apis.init_segmentor(config, checkpoint=None, device='cuda:0')[源代码]

Initialize a segmentor from config file.

参数
  • config (str or mmcv.Config) – Config file path or the config object.

  • checkpoint (str, optional) – Checkpoint path. If left as None, the model will not load any weights.

  • device (str, optional) – 0’. Use ‘cpu’ for loading model on CPU.

返回

The constructed segmentor.

返回类型

nn.Module

mmseg.apis.multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False, efficient_test=False, pre_eval=False, format_only=False, format_args={})[源代码]

Test model with multiple gpus by progressive mode.

This method tests model with multiple gpus and collects the results under two different modes: gpu and cpu modes. By setting ‘gpu_collect=True’ it encodes results to gpu tensors and use gpu communication for results collection. On cpu mode it saves the results on different gpus to ‘tmpdir’ and collects them by the rank 0 worker.

参数
  • model (nn.Module) – Model to be tested.

  • data_loader (utils.data.Dataloader) – Pytorch data loader.

  • tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode. The same path is used for efficient test. Default: None.

  • gpu_collect (bool) – Option to use either gpu or cpu to collect results. Default: False.

  • efficient_test (bool) – Whether save the results as local numpy files to save CPU memory during evaluation. Mutually exclusive with pre_eval and format_results. Default: False.

  • pre_eval (bool) – Use dataset.pre_eval() function to generate pre_results for metric evaluation. Mutually exclusive with efficient_test and format_results. Default: False.

  • format_only (bool) – Only format result for results commit. Mutually exclusive with pre_eval and efficient_test. Default: False.

  • format_args (dict) – The args for format_results. Default: {}.

返回

list of evaluation pre-results or list of save file names.

返回类型

list

mmseg.apis.set_random_seed(seed, deterministic=False)[源代码]

Set random seed.

参数
  • seed (int) – Seed to be used.

  • deterministic (bool) – Whether to set the deterministic option for CUDNN backend, i.e., set torch.backends.cudnn.deterministic to True and torch.backends.cudnn.benchmark to False. Default: False.

mmseg.apis.show_result_pyplot(model, img, result, palette=None, fig_size=(15, 10), opacity=0.5, title='', block=True, out_file=None)[源代码]

Visualize the segmentation results on the image.

参数
  • model (nn.Module) – The loaded segmentor.

  • img (str or np.ndarray) – Image filename or loaded image.

  • result (list) – The segmentation result.

  • palette (list[list[int]]] | None) – The palette of segmentation map. If None is given, random palette will be generated. Default: None

  • fig_size (tuple) – Figure size of the pyplot figure.

  • opacity (float) – Opacity of painted segmentation map. Default 0.5. Must be in (0, 1] range.

  • title (str) – The title of pyplot figure. Default is ‘’.

  • block (bool) – Whether to block the pyplot figure. Default is True.

  • out_file (str or None) – The path to write the image. Default: None.

mmseg.apis.single_gpu_test(model, data_loader, show=False, out_dir=None, efficient_test=False, opacity=0.5, pre_eval=False, format_only=False, format_args={})[源代码]

Test with single GPU by progressive mode.

参数
  • model (nn.Module) – Model to be tested.

  • data_loader (utils.data.Dataloader) – Pytorch data loader.

  • show (bool) – Whether show results during inference. Default: False.

  • out_dir (str, optional) – If specified, the results will be dumped into the directory to save output results.

  • efficient_test (bool) – Whether save the results as local numpy files to save CPU memory during evaluation. Mutually exclusive with pre_eval and format_results. Default: False.

  • opacity (float) – Opacity of painted segmentation map. Default 0.5. Must be in (0, 1] range.

  • pre_eval (bool) – Use dataset.pre_eval() function to generate pre_results for metric evaluation. Mutually exclusive with efficient_test and format_results. Default: False.

  • format_only (bool) – Only format result for results commit. Mutually exclusive with pre_eval and efficient_test. Default: False.

  • format_args (dict) – The args for format_results. Default: {}.

返回

list of evaluation pre-results or list of save file names.

返回类型

list

mmseg.apis.train_segmentor(model, dataset, cfg, distributed=False, validate=False, timestamp=None, meta=None)[源代码]

Launch segmentor training.

mmseg.core

seg

class mmseg.core.seg.BasePixelSampler(**kwargs)[源代码]

Base class of pixel sampler.

abstract sample(seg_logit, seg_label)[源代码]

Placeholder for sample function.

class mmseg.core.seg.OHEMPixelSampler(context, thresh=None, min_kept=100000)[源代码]

Online Hard Example Mining Sampler for segmentation.

参数
  • context (nn.Module) – The context of sampler, subclass of BaseDecodeHead.

  • thresh (float, optional) – The threshold for hard example selection. Below which, are prediction with low confidence. If not specified, the hard examples will be pixels of top min_kept loss. Default: None.

  • min_kept (int, optional) – The minimum number of predictions to keep. Default: 100000.

sample(seg_logit, seg_label)[源代码]

Sample pixels that have high loss or with low prediction confidence.

参数
  • seg_logit (torch.Tensor) – segmentation logits, shape (N, C, H, W)

  • seg_label (torch.Tensor) – segmentation label, shape (N, 1, H, W)

返回

segmentation weight, shape (N, H, W)

返回类型

torch.Tensor

mmseg.core.seg.build_pixel_sampler(cfg, **default_args)[源代码]

Build pixel sampler for segmentation map.

evaluation

class mmseg.core.evaluation.DistEvalHook(*args, by_epoch=False, efficient_test=False, pre_eval=False, **kwargs)[源代码]

Distributed EvalHook, with efficient test support.

参数
  • by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. Default: False.

  • efficient_test (bool) – Whether save the results as local numpy files to save CPU memory during evaluation. Default: False.

  • pre_eval (bool) – Whether to use progressive mode to evaluate model. Default: False.

返回

The prediction results.

返回类型

list

class mmseg.core.evaluation.EvalHook(*args, by_epoch=False, efficient_test=False, pre_eval=False, **kwargs)[源代码]

Single GPU EvalHook, with efficient test support.

参数
  • by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. Default: False.

  • efficient_test (bool) – Whether save the results as local numpy files to save CPU memory during evaluation. Default: False.

  • pre_eval (bool) – Whether to use progressive mode to evaluate model. Default: False.

返回

The prediction results.

返回类型

list

mmseg.core.evaluation.eval_metrics(results, gt_seg_maps, num_classes, ignore_index, metrics=['mIoU'], nan_to_num=None, label_map={}, reduce_zero_label=False, beta=1)[源代码]

Calculate evaluation metrics :param results: List of prediction segmentation

maps or list of prediction result filenames.

参数
  • gt_seg_maps (list[ndarray] | list[str] | Iterables) – list of ground truth segmentation maps or list of label filenames.

  • num_classes (int) – Number of categories.

  • ignore_index (int) – Index that will be ignored in evaluation.

  • metrics (list[str] | str) – Metrics to be evaluated, ‘mIoU’ and ‘mDice’.

  • nan_to_num (int, optional) – If specified, NaN values will be replaced by the numbers defined by the user. Default: None.

  • label_map (dict) – Mapping old labels to new labels. Default: dict().

  • reduce_zero_label – Whether ignore zero label. Default: False.

mmseg.core.evaluation.get_classes(dataset)[源代码]

Get class names of a dataset.

mmseg.core.evaluation.get_palette(dataset)[源代码]

Get class palette (RGB) of a dataset.

mmseg.core.evaluation.intersect_and_union(pred_label, label, num_classes, ignore_index, label_map={}, reduce_zero_label=False)[源代码]

Calculate intersection and Union.

参数
  • pred_label (ndarray | str) – Prediction segmentation map or predict result filename.

  • label (ndarray | str) – Ground truth segmentation map or label filename.

  • num_classes (int) – Number of categories.

  • ignore_index (int) – Index that will be ignored in evaluation.

  • label_map (dict) – Mapping old labels to new labels. The parameter will work only when label is str. Default: dict().

  • reduce_zero_label – Whether ignore zero label. The parameter will work only when label is str. Default: False.

mmseg.core.evaluation.mean_dice(results, gt_seg_maps, num_classes, ignore_index, nan_to_num=None, label_map={}, reduce_zero_label=False)[源代码]

Calculate Mean Dice (mDice)

参数
  • results (list[ndarray] | list[str]) – List of prediction segmentation maps or list of prediction result filenames.

  • gt_seg_maps (list[ndarray] | list[str]) – list of ground truth segmentation maps or list of label filenames.

  • num_classes (int) – Number of categories.

  • ignore_index (int) – Index that will be ignored in evaluation.

  • nan_to_num (int, optional) – If specified, NaN values will be replaced by the numbers defined by the user. Default: None.

  • label_map (dict) – Mapping old labels to new labels. Default: dict().

  • reduce_zero_label – Whether ignore zero label. Default: False.

mmseg.core.evaluation.mean_fscore(results, gt_seg_maps, num_classes, ignore_index, nan_to_num=None, label_map={}, reduce_zero_label=False, beta=1)[源代码]

Calculate Mean F-Score (mFscore)

参数
  • results (list[ndarray] | list[str]) – List of prediction segmentation maps or list of prediction result filenames.

  • gt_seg_maps (list[ndarray] | list[str]) – list of ground truth segmentation maps or list of label filenames.

  • num_classes (int) – Number of categories.

  • ignore_index (int) – Index that will be ignored in evaluation.

  • nan_to_num (int, optional) – If specified, NaN values will be replaced by the numbers defined by the user. Default: None.

  • label_map (dict) – Mapping old labels to new labels. Default: dict().

  • reduce_zero_label (bool) – Whether ignore zero label. Default: False.

  • beta – Determines the weight of recall in the combined score. Default: False.

mmseg.core.evaluation.mean_iou(results, gt_seg_maps, num_classes, ignore_index, nan_to_num=None, label_map={}, reduce_zero_label=False)[源代码]

Calculate Mean Intersection and Union (mIoU)

参数
  • results (list[ndarray] | list[str]) – List of prediction segmentation maps or list of prediction result filenames.

  • gt_seg_maps (list[ndarray] | list[str]) – list of ground truth segmentation maps or list of label filenames.

  • num_classes (int) – Number of categories.

  • ignore_index (int) – Index that will be ignored in evaluation.

  • nan_to_num (int, optional) – If specified, NaN values will be replaced by the numbers defined by the user. Default: None.

  • label_map (dict) – Mapping old labels to new labels. Default: dict().

  • reduce_zero_label – Whether ignore zero label. Default: False.

mmseg.core.evaluation.pre_eval_to_metrics(pre_eval_results, metrics=['mIoU'], nan_to_num=None, beta=1)[源代码]

Convert pre-eval results to metrics.

参数
  • pre_eval_results (list[tuple[torch.Tensor]]) – per image eval results for computing evaluation metric

  • metrics (list[str] | str) – Metrics to be evaluated, ‘mIoU’ and ‘mDice’.

  • nan_to_num – If specified, NaN values will be replaced by the numbers defined by the user. Default: None.

utils

mmseg.core.utils.add_prefix(inputs, prefix)[源代码]

Add prefix for dict.

参数
  • inputs (dict) – The input dict with str keys.

  • prefix (str) – The prefix to add.

返回

The dict with keys updated with prefix.

返回类型

dict

mmseg.core.utils.sync_random_seed(seed=None, device='cuda')[源代码]

Make sure different ranks share the same seed. All workers must call this function, otherwise it will deadlock. This method is generally used in DistributedSampler, because the seed should be identical across all processes in the distributed group.

In distributed sampling, different ranks should sample non-overlapped data in the dataset. Therefore, this function is used to make sure that each rank shuffles the data indices in the same order based on the same seed. Then different ranks could use different indices to select non-overlapped data from the same data list.

参数
  • seed (int, Optional) – The seed. Default to None.

  • device (str) – The device where the seed will be put on. Default to ‘cuda’.

返回

Seed to be used.

返回类型

int

mmseg.datasets

datasets

class mmseg.datasets.ADE20KDataset(**kwargs)[源代码]

ADE20K dataset.

In segmentation map annotation for ADE20K, 0 stands for background, which is not included in 150 categories. reduce_zero_label is fixed to True. The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’.

format_results(results, imgfile_prefix, to_label_id=True, indices=None)[源代码]

Format the results into dir (standard format for ade20k evaluation).

参数
  • results (list) – Testing results of the dataset.

  • imgfile_prefix (str | None) – The prefix of images files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”.

  • to_label_id (bool) – whether convert output to label_id for submission. Default: False

  • indices (list[int], optional) – Indices of input results, if not set, all the indices of the dataset will be used. Default: None.

返回

(result_files, tmp_dir), result_files is a list containing
the image paths, tmp_dir is the temporal directory created

for saving json/png files when img_prefix is not specified.

返回类型

tuple

results2img(results, imgfile_prefix, to_label_id, indices=None)[源代码]

Write the segmentation results to images.

参数
  • results (list[ndarray]) – Testing results of the dataset.

  • imgfile_prefix (str) – The filename prefix of the png files. If the prefix is “somepath/xxx”, the png files will be named “somepath/xxx.png”.

  • to_label_id (bool) – whether convert output to label_id for submission.

  • indices (list[int], optional) – Indices of input results, if not set, all the indices of the dataset will be used. Default: None.

返回

str]: result txt files which contains corresponding semantic segmentation images.

返回类型

list[str

class mmseg.datasets.COCOStuffDataset(**kwargs)[源代码]

COCO-Stuff dataset.

In segmentation map annotation for COCO-Stuff, Train-IDs of the 10k version are from 1 to 171, where 0 is the ignore index, and Train-ID of COCO Stuff 164k is from 0 to 170, where 255 is the ignore index. So, they are all 171 semantic categories. reduce_zero_label is set to True and False for the 10k and 164k versions, respectively. The img_suffix is fixed to ‘.jpg’, and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.ChaseDB1Dataset(**kwargs)[源代码]

Chase_db1 dataset.

In segmentation map annotation for Chase_db1, 0 stands for background, which is included in 2 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘_1stHO.png’.

class mmseg.datasets.CityscapesDataset(img_suffix='_leftImg8bit.png', seg_map_suffix='_gtFine_labelTrainIds.png', **kwargs)[源代码]

Cityscapes dataset.

The img_suffix is fixed to ‘_leftImg8bit.png’ and seg_map_suffix is fixed to ‘_gtFine_labelTrainIds.png’ for Cityscapes dataset.

evaluate(results, metric='mIoU', logger=None, imgfile_prefix=None)[源代码]

Evaluation in Cityscapes/default protocol.

参数
  • results (list) – Testing results of the dataset.

  • metric (str | list[str]) – Metrics to be evaluated.

  • logger (logging.Logger | None | str) – Logger used for printing related information during evaluation. Default: None.

  • imgfile_prefix (str | None) – The prefix of output image file, for cityscapes evaluation only. It includes the file path and the prefix of filename, e.g., “a/b/prefix”. If results are evaluated with cityscapes protocol, it would be the prefix of output png files. The output files would be png images under folder “a/b/prefix/xxx.png”, where “xxx” is the image name of cityscapes. If not specified, a temp file will be created for evaluation. Default: None.

返回

Cityscapes/default metrics.

返回类型

dict[str, float]

format_results(results, imgfile_prefix, to_label_id=True, indices=None)[源代码]

Format the results into dir (standard format for Cityscapes evaluation).

参数
  • results (list) – Testing results of the dataset.

  • imgfile_prefix (str) – The prefix of images files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”.

  • to_label_id (bool) – whether convert output to label_id for submission. Default: False

  • indices (list[int], optional) – Indices of input results, if not set, all the indices of the dataset will be used. Default: None.

返回

(result_files, tmp_dir), result_files is a list containing

the image paths, tmp_dir is the temporal directory created for saving json/png files when img_prefix is not specified.

返回类型

tuple

results2img(results, imgfile_prefix, to_label_id, indices=None)[源代码]

Write the segmentation results to images.

参数
  • results (list[ndarray]) – Testing results of the dataset.

  • imgfile_prefix (str) – The filename prefix of the png files. If the prefix is “somepath/xxx”, the png files will be named “somepath/xxx.png”.

  • to_label_id (bool) – whether convert output to label_id for submission.

  • indices (list[int], optional) – Indices of input results, if not set, all the indices of the dataset will be used. Default: None.

返回

str]: result txt files which contains corresponding semantic segmentation images.

返回类型

list[str

class mmseg.datasets.ConcatDataset(datasets, separate_eval=True)[源代码]

A wrapper of concatenated dataset.

Same as torch.utils.data.dataset.ConcatDataset, but support evaluation and formatting results

参数
  • datasets (list[Dataset]) – A list of datasets.

  • separate_eval (bool) – Whether to evaluate the concatenated dataset results separately, Defaults to True.

evaluate(results, logger=None, **kwargs)[源代码]

Evaluate the results.

参数
  • results (list[tuple[torch.Tensor]] | list[str]]) – per image pre_eval results or predict segmentation map for computing evaluation metric.

  • logger (logging.Logger | str | None) – Logger used for printing related information during evaluation. Default: None.

返回

float]: evaluate results of the total dataset

or each separate

dataset if self.separate_eval=True.

返回类型

dict[str

format_results(results, imgfile_prefix, indices=None, **kwargs)[源代码]

format result for every sample of ConcatDataset.

get_dataset_idx_and_sample_idx(indice)[源代码]

Return dataset and sample index when given an indice of ConcatDataset.

参数

indice (int) – indice of sample in ConcatDataset

返回

the index of sub dataset the sample belong to int: the index of sample in its corresponding subset

返回类型

int

pre_eval(preds, indices)[源代码]

do pre eval for every sample of ConcatDataset.

class mmseg.datasets.CustomDataset(pipeline, img_dir, img_suffix='.jpg', ann_dir=None, seg_map_suffix='.png', split=None, data_root=None, test_mode=False, ignore_index=255, reduce_zero_label=False, classes=None, palette=None, gt_seg_map_loader_cfg=None, file_client_args={'backend': 'disk'})[源代码]

Custom dataset for semantic segmentation. An example of file structure is as followed.

├── data
│   ├── my_dataset
│   │   ├── img_dir
│   │   │   ├── train
│   │   │   │   ├── xxx{img_suffix}
│   │   │   │   ├── yyy{img_suffix}
│   │   │   │   ├── zzz{img_suffix}
│   │   │   ├── val
│   │   ├── ann_dir
│   │   │   ├── train
│   │   │   │   ├── xxx{seg_map_suffix}
│   │   │   │   ├── yyy{seg_map_suffix}
│   │   │   │   ├── zzz{seg_map_suffix}
│   │   │   ├── val

The img/gt_semantic_seg pair of CustomDataset should be of the same except suffix. A valid img/gt_semantic_seg filename pair should be like xxx{img_suffix} and xxx{seg_map_suffix} (extension is also included in the suffix). If split is given, then xxx is specified in txt file. Otherwise, all files in img_dir/``and ``ann_dir will be loaded. Please refer to docs/en/tutorials/new_dataset.md for more details.

参数
  • pipeline (list[dict]) – Processing pipeline

  • img_dir (str) – Path to image directory

  • img_suffix (str) – Suffix of images. Default: ‘.jpg’

  • ann_dir (str, optional) – Path to annotation directory. Default: None

  • seg_map_suffix (str) – Suffix of segmentation maps. Default: ‘.png’

  • split (str, optional) – Split txt file. If split is specified, only file with suffix in the splits will be loaded. Otherwise, all images in img_dir/ann_dir will be loaded. Default: None

  • data_root (str, optional) – Data root for img_dir/ann_dir. Default: None.

  • test_mode (bool) – If test_mode=True, gt wouldn’t be loaded.

  • ignore_index (int) – The label index to be ignored. Default: 255

  • reduce_zero_label (bool) – Whether to mark label zero as ignored. Default: False

  • classes (str | Sequence[str], optional) – Specify classes to load. If is None, cls.CLASSES will be used. Default: None.

  • palette (Sequence[Sequence[int]]] | np.ndarray | None) – The palette of segmentation map. If None is given, and self.PALETTE is None, random palette will be generated. Default: None

  • gt_seg_map_loader_cfg (dict, optional) – build LoadAnnotations to load gt for evaluation, load from disk by default. Default: None.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

evaluate(results, metric='mIoU', logger=None, gt_seg_maps=None, **kwargs)[源代码]

Evaluate the dataset.

参数
  • results (list[tuple[torch.Tensor]] | list[str]) – per image pre_eval results or predict segmentation map for computing evaluation metric.

  • metric (str | list[str]) – Metrics to be evaluated. ‘mIoU’, ‘mDice’ and ‘mFscore’ are supported.

  • logger (logging.Logger | None | str) – Logger used for printing related information during evaluation. Default: None.

  • gt_seg_maps (generator[ndarray]) – Custom gt seg maps as input, used in ConcatDataset

返回

Default metrics.

返回类型

dict[str, float]

format_results(results, imgfile_prefix, indices=None, **kwargs)[源代码]

Place holder to format result to dataset specific output.

get_ann_info(idx)[源代码]

Get annotation by index.

参数

idx (int) – Index of data.

返回

Annotation info of specified index.

返回类型

dict

get_classes_and_palette(classes=None, palette=None)[源代码]

Get class names of current dataset.

参数
  • classes (Sequence[str] | str | None) – If classes is None, use default CLASSES defined by builtin dataset. If classes is a string, take it as a file name. The file contains the name of classes where each line contains one class name. If classes is a tuple or list, override the CLASSES defined by the dataset.

  • palette (Sequence[Sequence[int]]] | np.ndarray | None) – The palette of segmentation map. If None is given, random palette will be generated. Default: None

get_gt_seg_map_by_idx(index)[源代码]

Get one ground truth segmentation map for evaluation.

get_gt_seg_maps(efficient_test=None)[源代码]

Get ground truth segmentation maps for evaluation.

load_annotations(img_dir, img_suffix, ann_dir, seg_map_suffix, split)[源代码]

Load annotation from directory.

参数
  • img_dir (str) – Path to image directory

  • img_suffix (str) – Suffix of images.

  • ann_dir (str|None) – Path to annotation directory.

  • seg_map_suffix (str|None) – Suffix of segmentation maps.

  • split (str|None) – Split txt file. If split is specified, only file with suffix in the splits will be loaded. Otherwise, all images in img_dir/ann_dir will be loaded. Default: None

返回

All image info of dataset.

返回类型

list[dict]

pre_eval(preds, indices)[源代码]

Collect eval result from each iteration.

参数
  • preds (list[torch.Tensor] | torch.Tensor) – the segmentation logit after argmax, shape (N, H, W).

  • indices (list[int] | int) – the prediction related ground truth indices.

返回

(area_intersect, area_union, area_prediction,

area_ground_truth).

返回类型

list[torch.Tensor]

pre_pipeline(results)[源代码]

Prepare results dict for pipeline.

prepare_test_img(idx)[源代码]

Get testing data after pipeline.

参数

idx (int) – Index of data.

返回

Testing data after pipeline with new keys introduced by

pipeline.

返回类型

dict

prepare_train_img(idx)[源代码]

Get training data and annotations after pipeline.

参数

idx (int) – Index of data.

返回

Training data and annotation after pipeline with new keys

introduced by pipeline.

返回类型

dict

class mmseg.datasets.DRIVEDataset(**kwargs)[源代码]

DRIVE dataset.

In segmentation map annotation for DRIVE, 0 stands for background, which is included in 2 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘_manual1.png’.

class mmseg.datasets.DarkZurichDataset(**kwargs)[源代码]

DarkZurichDataset dataset.

class mmseg.datasets.HRFDataset(**kwargs)[源代码]

HRF dataset.

In segmentation map annotation for HRF, 0 stands for background, which is included in 2 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘.png’.

class mmseg.datasets.ISPRSDataset(**kwargs)[源代码]

ISPRS dataset.

In segmentation map annotation for LoveDA, 0 is the ignore index. reduce_zero_label should be set to True. The img_suffix and seg_map_suffix are both fixed to ‘.png’.

class mmseg.datasets.LoveDADataset(**kwargs)[源代码]

LoveDA dataset.

In segmentation map annotation for LoveDA, 0 is the ignore index. reduce_zero_label should be set to True. The img_suffix and seg_map_suffix are both fixed to ‘.png’.

format_results(results, imgfile_prefix, indices=None)[源代码]

Format the results into dir (standard format for LoveDA evaluation).

参数
  • results (list) – Testing results of the dataset.

  • imgfile_prefix (str) – The prefix of images files. It includes the file path and the prefix of filename, e.g., “a/b/prefix”.

  • indices (list[int], optional) – Indices of input results, if not set, all the indices of the dataset will be used. Default: None.

返回

(result_files, tmp_dir), result_files is a list containing

the image paths, tmp_dir is the temporal directory created for saving json/png files when img_prefix is not specified.

返回类型

tuple

results2img(results, imgfile_prefix, indices=None)[源代码]

Write the segmentation results to images.

参数
  • results (list[ndarray]) – Testing results of the dataset.

  • imgfile_prefix (str) – The filename prefix of the png files. If the prefix is “somepath/xxx”, the png files will be named “somepath/xxx.png”.

  • indices (list[int], optional) – Indices of input results, if not set, all the indices of the dataset will be used. Default: None.

返回

str]: result txt files which contains corresponding semantic segmentation images.

返回类型

list[str

class mmseg.datasets.MultiImageMixDataset(dataset, pipeline, skip_type_keys=None)[源代码]

A wrapper of multiple images mixed dataset.

Suitable for training on multiple images mixed data augmentation like mosaic and mixup. For the augmentation pipeline of mixed image data, the get_indexes method needs to be provided to obtain the image indexes, and you can set skip_flags to change the pipeline running process.

参数
  • dataset (CustomDataset) – The dataset to be mixed.

  • pipeline (Sequence[dict]) – Sequence of transform object or config dict to be composed.

  • skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline. Default to None.

update_skip_type_keys(skip_type_keys)[源代码]

Update skip_type_keys.

It is called by an external hook.

参数

skip_type_keys (list[str], optional) – Sequence of type string to be skip pipeline.

class mmseg.datasets.NightDrivingDataset(**kwargs)[源代码]

NightDrivingDataset dataset.

class mmseg.datasets.PascalContextDataset(split, **kwargs)[源代码]

PascalContext dataset.

In segmentation map annotation for PascalContext, 0 stands for background, which is included in 60 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’.

参数

split (str) – Split txt file for PascalContext.

class mmseg.datasets.PascalContextDataset59(split, **kwargs)[源代码]

PascalContext dataset.

In segmentation map annotation for PascalContext, 0 stands for background, which is included in 60 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.jpg’ and seg_map_suffix is fixed to ‘.png’.

参数

split (str) – Split txt file for PascalContext.

class mmseg.datasets.PascalVOCDataset(split, **kwargs)[源代码]

Pascal VOC dataset.

参数

split (str) – Split txt file for Pascal VOC.

class mmseg.datasets.PotsdamDataset(**kwargs)[源代码]

ISPRS Potsdam dataset.

In segmentation map annotation for Potsdam dataset, 0 is the ignore index. reduce_zero_label should be set to True. The img_suffix and seg_map_suffix are both fixed to ‘.png’.

class mmseg.datasets.RepeatDataset(dataset, times)[源代码]

A wrapper of repeated dataset.

The length of repeated dataset will be times larger than the original dataset. This is useful when the data loading time is long but the dataset is small. Using RepeatDataset can reduce the data loading time between epochs.

参数
  • dataset (Dataset) – The dataset to be repeated.

  • times (int) – Repeat times.

class mmseg.datasets.STAREDataset(**kwargs)[源代码]

STARE dataset.

In segmentation map annotation for STARE, 0 stands for background, which is included in 2 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘.ah.png’.

mmseg.datasets.build_dataloader(dataset, samples_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, seed=None, drop_last=False, pin_memory=True, persistent_workers=True, **kwargs)[源代码]

Build PyTorch DataLoader.

In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.

参数
  • dataset (Dataset) – A PyTorch dataset.

  • samples_per_gpu (int) – Number of training samples on each GPU, i.e., batch size of each GPU.

  • workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.

  • num_gpus (int) – Number of GPUs. Only used in non-distributed training.

  • dist (bool) – Distributed training/test or not. Default: True.

  • shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.

  • seed (int | None) – Seed to be used. Default: None.

  • drop_last (bool) – Whether to drop the last incomplete batch in epoch. Default: False

  • pin_memory (bool) – Whether to use pin_memory in DataLoader. Default: True

  • persistent_workers (bool) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. The argument also has effect in PyTorch>=1.7.0. Default: True

  • kwargs – any keyword argument to be used to initialize DataLoader

返回

A PyTorch dataloader.

返回类型

DataLoader

mmseg.datasets.build_dataset(cfg, default_args=None)[源代码]

Build datasets.

class mmseg.datasets.iSAIDDataset(**kwargs)[源代码]

iSAID: A Large-scale Dataset for Instance Segmentation in Aerial Images In segmentation map annotation for iSAID dataset, which is included in 16 categories. reduce_zero_label is fixed to False. The img_suffix is fixed to ‘.png’ and seg_map_suffix is fixed to ‘_manual1.png’.

load_annotations(img_dir, img_suffix, ann_dir, seg_map_suffix=None, split=None)[源代码]

Load annotation from directory.

参数
  • img_dir (str) – Path to image directory

  • img_suffix (str) – Suffix of images.

  • ann_dir (str|None) – Path to annotation directory.

  • seg_map_suffix (str|None) – Suffix of segmentation maps.

  • split (str|None) – Split txt file. If split is specified, only file with suffix in the splits will be loaded. Otherwise, all images in img_dir/ann_dir will be loaded. Default: None

返回

All image info of dataset.

返回类型

list[dict]

pipelines

class mmseg.datasets.pipelines.AdjustGamma(gamma=1.0)[源代码]

Using gamma correction to process the image.

参数

gamma (float or int) – Gamma value used in gamma correction. Default: 1.0.

class mmseg.datasets.pipelines.CLAHE(clip_limit=40.0, tile_grid_size=(8, 8))[源代码]

Use CLAHE method to process the image.

See ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994:474-485. for more information.

参数
  • clip_limit (float) – Threshold for contrast limiting. Default: 40.0.

  • tile_grid_size (tuple[int]) – Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. It defines the number of tiles in row and column. Default: (8, 8).

class mmseg.datasets.pipelines.Collect(keys, meta_keys=('filename', 'ori_filename', 'ori_shape', 'img_shape', 'pad_shape', 'scale_factor', 'flip', 'flip_direction', 'img_norm_cfg'))[源代码]

Collect data from the loader relevant to the specific task.

This is usually the last stage of the data loader pipeline. Typically keys is set to some subset of “img”, “gt_semantic_seg”.

The “img_meta” item is always populated. The contents of the “img_meta” dictionary depends on “meta_keys”. By default this includes:

  • “img_shape”: shape of the image input to the network as a tuple

    (h, w, c). Note that images may be zero padded on the bottom/right if the batch tensor is larger than this shape.

  • “scale_factor”: a float indicating the preprocessing scale

  • “flip”: a boolean indicating if image flip transform was used

  • “filename”: path to the image file

  • “ori_shape”: original shape of the image as a tuple (h, w, c)

  • “pad_shape”: image shape after padding

  • “img_norm_cfg”: a dict of normalization information:
    • mean - per channel mean subtraction

    • std - per channel std divisor

    • to_rgb - bool indicating if bgr was converted to rgb

参数
  • keys (Sequence[str]) – Keys of results to be collected in data.

  • meta_keys (Sequence[str], optional) – Meta keys to be converted to mmcv.DataContainer and collected in data[img_metas]. Default: (filename, ori_filename, ori_shape, img_shape, pad_shape, scale_factor, flip, flip_direction, img_norm_cfg)

class mmseg.datasets.pipelines.Compose(transforms)[源代码]

Compose multiple transforms sequentially.

参数

transforms (Sequence[dict | callable]) – Sequence of transform object or config dict to be composed.

class mmseg.datasets.pipelines.ImageToTensor(keys)[源代码]

Convert image to torch.Tensor by given keys.

The dimension order of input image is (H, W, C). The pipeline will convert it to (C, H, W). If only 2 dimension (H, W) is given, the output would be (1, H, W).

参数

keys (Sequence[str]) – Key of images to be converted to Tensor.

class mmseg.datasets.pipelines.LoadAnnotations(reduce_zero_label=False, file_client_args={'backend': 'disk'}, imdecode_backend='pillow')[源代码]

Load annotations for semantic segmentation.

参数
  • reduce_zero_label (bool) – Whether reduce all label value by 1. Usually used for datasets where 0 is background label. Default: False.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

  • imdecode_backend (str) – Backend for mmcv.imdecode(). Default: ‘pillow’

class mmseg.datasets.pipelines.LoadImageFromFile(to_float32=False, color_type='color', file_client_args={'backend': 'disk'}, imdecode_backend='cv2')[源代码]

Load an image from file.

Required keys are “img_prefix” and “img_info” (a dict that must contain the key “filename”). Added or updated keys are “filename”, “img”, “img_shape”, “ori_shape” (same as img_shape), “pad_shape” (same as img_shape), “scale_factor” (1.0) and “img_norm_cfg” (means=0 and stds=1).

参数
  • to_float32 (bool) – Whether to convert the loaded image to a float32 numpy array. If set to False, the loaded image is an uint8 array. Defaults to False.

  • color_type (str) – The flag argument for mmcv.imfrombytes(). Defaults to ‘color’.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Defaults to dict(backend='disk').

  • imdecode_backend (str) – Backend for mmcv.imdecode(). Default: ‘cv2’

class mmseg.datasets.pipelines.MultiScaleFlipAug(transforms, img_scale, img_ratios=None, flip=False, flip_direction='horizontal')[源代码]

Test-time augmentation with multiple scales and flipping.

An example configuration is as followed:

img_scale=(2048, 1024),
img_ratios=[0.5, 1.0],
flip=True,
transforms=[
    dict(type='Resize', keep_ratio=True),
    dict(type='RandomFlip'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='ImageToTensor', keys=['img']),
    dict(type='Collect', keys=['img']),
]

After MultiScaleFLipAug with above configuration, the results are wrapped into lists of the same length as followed:

dict(
    img=[...],
    img_shape=[...],
    scale=[(1024, 512), (1024, 512), (2048, 1024), (2048, 1024)]
    flip=[False, True, False, True]
    ...
)
参数
  • transforms (list[dict]) – Transforms to apply in each augmentation.

  • img_scale (None | tuple | list[tuple]) – Images scales for resizing.

  • img_ratios (float | list[float]) – Image ratios for resizing

  • flip (bool) – Whether apply flip augmentation. Default: False.

  • flip_direction (str | list[str]) – Flip augmentation directions, options are “horizontal” and “vertical”. If flip_direction is list, multiple flip augmentations will be applied. It has no effect when flip == False. Default: “horizontal”.

class mmseg.datasets.pipelines.Normalize(mean, std, to_rgb=True)[源代码]

Normalize the image.

Added key is “img_norm_cfg”.

参数
  • mean (sequence) – Mean values of 3 channels.

  • std (sequence) – Std values of 3 channels.

  • to_rgb (bool) – Whether to convert the image from BGR to RGB, default is true.

class mmseg.datasets.pipelines.Pad(size=None, size_divisor=None, pad_val=0, seg_pad_val=255)[源代码]

Pad the image & mask.

There are two padding modes: (1) pad to a fixed size and (2) pad to the minimum size that is divisible by some number. Added keys are “pad_shape”, “pad_fixed_size”, “pad_size_divisor”,

参数
  • size (tuple, optional) – Fixed padding size.

  • size_divisor (int, optional) – The divisor of padded size.

  • pad_val (float, optional) – Padding value. Default: 0.

  • seg_pad_val (float, optional) – Padding value of segmentation map. Default: 255.

class mmseg.datasets.pipelines.PhotoMetricDistortion(brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18)[源代码]

Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last.

  1. random brightness

  2. random contrast (mode 0)

  3. convert color from BGR to HSV

  4. random saturation

  5. random hue

  6. convert color from HSV to BGR

  7. random contrast (mode 1)

参数
  • brightness_delta (int) – delta of brightness.

  • contrast_range (tuple) – range of contrast.

  • saturation_range (tuple) – range of saturation.

  • hue_delta (int) – delta of hue.

brightness(img)[源代码]

Brightness distortion.

contrast(img)[源代码]

Contrast distortion.

convert(img, alpha=1, beta=0)[源代码]

Multiple with alpha and add beat with clip.

hue(img)[源代码]

Hue distortion.

saturation(img)[源代码]

Saturation distortion.

class mmseg.datasets.pipelines.RGB2Gray(out_channels=None, weights=(0.299, 0.587, 0.114))[源代码]

Convert RGB image to grayscale image.

This transform calculate the weighted mean of input image channels with weights and then expand the channels to out_channels. When out_channels is None, the number of output channels is the same as input channels.

参数
  • out_channels (int) – Expected number of output channels after transforming. Default: None.

  • weights (tuple[float]) – The weights to calculate the weighted mean. Default: (0.299, 0.587, 0.114).

class mmseg.datasets.pipelines.RandomCrop(crop_size, cat_max_ratio=1.0, ignore_index=255)[源代码]

Random crop the image & seg.

参数
  • crop_size (tuple) – Expected size after cropping, (h, w).

  • cat_max_ratio (float) – The maximum ratio that single category could occupy.

crop(img, crop_bbox)[源代码]

Crop from img

get_crop_bbox(img)[源代码]

Randomly get a crop bounding box.

class mmseg.datasets.pipelines.RandomCutOut(prob, n_holes, cutout_shape=None, cutout_ratio=None, fill_in=(0, 0, 0), seg_fill_in=None)[源代码]

CutOut operation.

Randomly drop some regions of image used in Cutout. :param prob: cutout probability. :type prob: float :param n_holes: Number of regions to be dropped.

If it is given as a list, number of holes will be randomly selected from the closed interval [n_holes[0], n_holes[1]].

参数
  • cutout_shape (tuple[int, int] | list[tuple[int, int]]) – The candidate shape of dropped regions. It can be tuple[int, int] to use a fixed cutout shape, or list[tuple[int, int]] to randomly choose shape from the list.

  • cutout_ratio (tuple[float, float] | list[tuple[float, float]]) – The candidate ratio of dropped regions. It can be tuple[float, float] to use a fixed ratio or list[tuple[float, float]] to randomly choose ratio from the list. Please note that cutout_shape and cutout_ratio cannot be both given at the same time.

  • fill_in (tuple[float, float, float] | tuple[int, int, int]) – The value of pixel to fill in the dropped regions. Default: (0, 0, 0).

  • seg_fill_in (int) – The labels of pixel to fill in the dropped regions. If seg_fill_in is None, skip. Default: None.

class mmseg.datasets.pipelines.RandomFlip(prob=None, direction='horizontal')[源代码]

Flip the image & seg.

If the input dict contains the key “flip”, then the flag will be used, otherwise it will be randomly decided by a ratio specified in the init method.

参数
  • prob (float, optional) – The flipping probability. Default: None.

  • direction (str, optional) – The flipping direction. Options are ‘horizontal’ and ‘vertical’. Default: ‘horizontal’.

class mmseg.datasets.pipelines.RandomMosaic(prob, img_scale=(640, 640), center_ratio_range=(0.5, 1.5), pad_val=0, seg_pad_val=255)[源代码]

Mosaic augmentation. Given 4 images, mosaic transform combines them into one output image. The output image is composed of the parts from each sub- image.

                   mosaic transform
                      center_x
           +------------------------------+
           |       pad        |  pad      |
           |      +-----------+           |
           |      |           |           |
           |      |  image1   |--------+  |
           |      |           |        |  |
           |      |           | image2 |  |
center_y   |----+-------------+-----------|
           |    |   cropped   |           |
           |pad |   image3    |  image4   |
           |    |             |           |
           +----|-------------+-----------+
                |             |
                +-------------+

The mosaic transform steps are as follows:
    1. Choose the mosaic center as the intersections of 4 images
    2. Get the left top image according to the index, and randomly
       sample another 3 images from the custom dataset.
    3. Sub image will be cropped if image is larger than mosaic patch
参数
  • prob (float) – mosaic probability.

  • img_scale (Sequence[int]) – Image size after mosaic pipeline of a single image. The size of the output image is four times that of a single image. The output image comprises 4 single images. Default: (640, 640).

  • center_ratio_range (Sequence[float]) – Center ratio range of mosaic output. Default: (0.5, 1.5).

  • pad_val (int) – Pad value. Default: 0.

  • seg_pad_val (int) – Pad value of segmentation map. Default: 255.

get_indexes(dataset)[源代码]

Call function to collect indexes.

参数

dataset (MultiImageMixDataset) – The dataset.

返回

indexes.

返回类型

list

class mmseg.datasets.pipelines.RandomRotate(prob, degree, pad_val=0, seg_pad_val=255, center=None, auto_bound=False)[源代码]

Rotate the image & seg.

参数
  • prob (float) – The rotation probability.

  • degree (float, tuple[float]) – Range of degrees to select from. If degree is a number instead of tuple like (min, max), the range of degree will be (-degree, +degree)

  • pad_val (float, optional) – Padding value of image. Default: 0.

  • seg_pad_val (float, optional) – Padding value of segmentation map. Default: 255.

  • center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used. Default: None.

  • auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image. Default: False

class mmseg.datasets.pipelines.Rerange(min_value=0, max_value=255)[源代码]

Rerange the image pixel value.

参数
  • min_value (float or int) – Minimum value of the reranged image. Default: 0.

  • max_value (float or int) – Maximum value of the reranged image. Default: 255.

class mmseg.datasets.pipelines.Resize(img_scale=None, multiscale_mode='range', ratio_range=None, keep_ratio=True, min_size=None)[源代码]

Resize images & seg.

This transform resizes the input image to some scale. If the input dict contains the key “scale”, then the scale in the input dict is used, otherwise the specified scale in the init method is used.

img_scale can be None, a tuple (single-scale) or a list of tuple (multi-scale). There are 4 multiscale modes:

  • ratio_range is not None:

  1. When img_scale is None, img_scale is the shape of image in results

    (img_scale = results[‘img’].shape[:2]) and the image is resized based on the original size. (mode 1)

  2. When img_scale is a tuple (single-scale), randomly sample a ratio from

    the ratio range and multiply it with the image scale. (mode 2)

  • ratio_range is None and multiscale_mode == "range": randomly sample a

scale from the a range. (mode 3)

  • ratio_range is None and multiscale_mode == "value": randomly sample a

scale from multiple scales. (mode 4)

参数
  • img_scale (tuple or list[tuple]) – Images scales for resizing. Default:None.

  • multiscale_mode (str) – Either “range” or “value”. Default: ‘range’

  • ratio_range (tuple[float]) – (min_ratio, max_ratio). Default: None

  • keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Default: True

  • min_size (int, optional) – The minimum size for input and the shape of the image and seg map will not be less than min_size. As the shape of model input is fixed like ‘SETR’ and ‘BEiT’. Following the setting in these models, resized images must be bigger than the crop size in slide_inference. Default: None

static random_sample(img_scales)[源代码]

Randomly sample an img_scale when multiscale_mode=='range'.

参数

img_scales (list[tuple]) – Images scale range for sampling. There must be two tuples in img_scales, which specify the lower and upper bound of image scales.

返回

Returns a tuple (img_scale, None), where

img_scale is sampled scale and None is just a placeholder to be consistent with random_select().

返回类型

(tuple, None)

static random_sample_ratio(img_scale, ratio_range)[源代码]

Randomly sample an img_scale when ratio_range is specified.

A ratio will be randomly sampled from the range specified by ratio_range. Then it would be multiplied with img_scale to generate sampled scale.

参数
  • img_scale (tuple) – Images scale base to multiply with ratio.

  • ratio_range (tuple[float]) – The minimum and maximum ratio to scale the img_scale.

返回

Returns a tuple (scale, None), where

scale is sampled ratio multiplied with img_scale and None is just a placeholder to be consistent with random_select().

返回类型

(tuple, None)

static random_select(img_scales)[源代码]

Randomly select an img_scale from given candidates.

参数

img_scales (list[tuple]) – Images scales for selection.

返回

Returns a tuple (img_scale, scale_dix),

where img_scale is the selected image scale and scale_idx is the selected index in the given candidates.

返回类型

(tuple, int)

class mmseg.datasets.pipelines.SegRescale(scale_factor=1)[源代码]

Rescale semantic segmentation maps.

参数

scale_factor (float) – The scale factor of the final output.

class mmseg.datasets.pipelines.ToDataContainer(fields=({'key': 'img', 'stack': True}, {'key': 'gt_semantic_seg'}))[源代码]

Convert results to mmcv.DataContainer by given fields.

参数

fields (Sequence[dict]) – Each field is a dict like dict(key='xxx', **kwargs). The key in result will be converted to mmcv.DataContainer with **kwargs. Default: (dict(key='img', stack=True), dict(key='gt_semantic_seg')).

class mmseg.datasets.pipelines.ToTensor(keys)[源代码]

Convert some results to torch.Tensor by given keys.

参数

keys (Sequence[str]) – Keys that need to be converted to Tensor.

class mmseg.datasets.pipelines.Transpose(keys, order)[源代码]

Transpose some results by given keys.

参数
  • keys (Sequence[str]) – Keys of results to be transposed.

  • order (Sequence[int]) – Order of transpose.

mmseg.datasets.pipelines.to_tensor(data)[源代码]

Convert objects of various python types to torch.Tensor.

Supported types are: numpy.ndarray, torch.Tensor, Sequence, int and float.

参数

data (torch.Tensor | numpy.ndarray | Sequence | int | float) – Data to be converted.

mmseg.models

segmentors

class mmseg.models.segmentors.BaseSegmentor(init_cfg=None)[源代码]

Base class for segmentors.

abstract aug_test(imgs, img_metas, **kwargs)[源代码]

Placeholder for augmentation test.

abstract encode_decode(img, img_metas)[源代码]

Placeholder for encode images with backbone and decode into a semantic segmentation map of the same size as input.

abstract extract_feat(imgs)[源代码]

Placeholder for extract features from images.

forward(img, img_metas, return_loss=True, **kwargs)[源代码]

Calls either forward_train() or forward_test() depending on whether return_loss is True.

Note this setting will change the expected inputs. When return_loss=True, img and img_meta are single-nested (i.e. Tensor and List[dict]), and when resturn_loss=False, img and img_meta should be double nested (i.e. List[Tensor], List[List[dict]]), with the outer list indicating test time augmentations.

forward_test(imgs, img_metas, **kwargs)[源代码]
参数
  • imgs (List[Tensor]) – the outer list indicates test-time augmentations and inner Tensor should have a shape NxCxHxW, which contains all images in the batch.

  • img_metas (List[List[dict]]) – the outer list indicates test-time augs (multiscale, flip, etc.) and the inner list indicates images in a batch.

abstract forward_train(imgs, img_metas, **kwargs)[源代码]

Placeholder for Forward function for training.

show_result(img, result, palette=None, win_name='', show=False, wait_time=0, out_file=None, opacity=0.5)[源代码]

Draw result over img.

参数
  • img (str or Tensor) – The image to be displayed.

  • result (Tensor) – The semantic segmentation results to draw over img.

  • palette (list[list[int]]] | np.ndarray | None) – The palette of segmentation map. If None is given, random palette will be generated. Default: None

  • win_name (str) – The window name.

  • wait_time (int) – Value of waitKey param. Default: 0.

  • show (bool) – Whether to show the image. Default: False.

  • out_file (str or None) – The filename to write the image. Default: None.

  • opacity (float) – Opacity of painted segmentation map. Default 0.5. Must be in (0, 1] range.

返回

Only if not show or out_file

返回类型

img (Tensor)

abstract simple_test(img, img_meta, **kwargs)[源代码]

Placeholder for single image test.

train_step(data_batch, optimizer, **kwargs)[源代码]

The iteration step during training.

This method defines an iteration step during training, except for the back propagation and optimizer updating, which are done in an optimizer hook. Note that in some complicated cases or models, the whole process including back propagation and optimizer updating is also defined in this method, such as GAN.

参数
  • data (dict) – The output of dataloader.

  • optimizer (torch.optim.Optimizer | dict) – The optimizer of runner is passed to train_step(). This argument is unused and reserved.

返回

It should contain at least 3 keys: loss, log_vars,

num_samples. loss is a tensor for back propagation, which can be a weighted sum of multiple losses. log_vars contains all the variables to be sent to the logger. num_samples indicates the batch size (when the model is DDP, it means the batch size on each GPU), which is used for averaging the logs.

返回类型

dict

val_step(data_batch, optimizer=None, **kwargs)[源代码]

The iteration step during validation.

This method shares the same signature as train_step(), but used during val epochs. Note that the evaluation after training epochs is not implemented with this method, but an evaluation hook.

property with_auxiliary_head

whether the segmentor has auxiliary head

Type

bool

property with_decode_head

whether the segmentor has decode head

Type

bool

property with_neck

whether the segmentor has neck

Type

bool

class mmseg.models.segmentors.CascadeEncoderDecoder(num_stages, backbone, decode_head, neck=None, auxiliary_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[源代码]

Cascade Encoder Decoder segmentors.

CascadeEncoderDecoder almost the same as EncoderDecoder, while decoders of CascadeEncoderDecoder are cascaded. The output of previous decoder_head will be the input of next decoder_head.

encode_decode(img, img_metas)[源代码]

Encode images with backbone and decode into a semantic segmentation map of the same size as input.

class mmseg.models.segmentors.EncoderDecoder(backbone, decode_head, neck=None, auxiliary_head=None, train_cfg=None, test_cfg=None, pretrained=None, init_cfg=None)[源代码]

Encoder Decoder segmentors.

EncoderDecoder typically consists of backbone, decode_head, auxiliary_head. Note that auxiliary_head is only used for deep supervision during training, which could be dumped during inference.

aug_test(imgs, img_metas, rescale=True)[源代码]

Test with augmentations.

Only rescale=True is supported.

encode_decode(img, img_metas)[源代码]

Encode images with backbone and decode into a semantic segmentation map of the same size as input.

extract_feat(img)[源代码]

Extract features from images.

forward_dummy(img)[源代码]

Dummy forward function.

forward_train(img, img_metas, gt_semantic_seg)[源代码]

Forward function for training.

参数
  • img (Tensor) – Input images.

  • img_metas (list[dict]) – List of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:Collect.

  • gt_semantic_seg (Tensor) – Semantic segmentation masks used if the architecture supports semantic segmentation task.

返回

a dictionary of loss components

返回类型

dict[str, Tensor]

inference(img, img_meta, rescale)[源代码]

Inference with slide/whole style.

参数
  • img (Tensor) – The input image of shape (N, 3, H, W).

  • img_meta (dict) – Image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:Collect.

  • rescale (bool) – Whether rescale back to original shape.

返回

The output segmentation map.

返回类型

Tensor

simple_test(img, img_meta, rescale=True)[源代码]

Simple test with single image.

slide_inference(img, img_meta, rescale)[源代码]

Inference by sliding-window with overlap.

If h_crop > h_img or w_crop > w_img, the small patch will be used to decode without padding.

whole_inference(img, img_meta, rescale)[源代码]

Inference with full image.

backbones

class mmseg.models.backbones.BEiT(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, qv_bias=True, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, num_fcs=2, norm_eval=False, pretrained=None, init_values=0.1, init_cfg=None)[源代码]

BERT Pre-Training of Image Transformers.

参数
  • img_size (int | tuple) – Input image size. Default: 224.

  • patch_size (int) – The patch size. Default: 16.

  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (int) – Embedding dimension. Default: 768.

  • num_layers (int) – Depth of transformer. Default: 12.

  • num_heads (int) – Number of attention heads. Default: 12.

  • mlp_ratio (int) – Ratio of mlp hidden dim to embedding dim. Default: 4.

  • out_indices (list | tuple | int) – Output from which stages. Default: -1.

  • qv_bias (bool) – Enable bias for qv if True. Default: True.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – Stochastic depth rate. Default 0.0.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • pretrained (str, optional) – Model pretrained path. Default: None.

  • init_values (float) – Initialize the values of BEiTAttention and FFN with learnable scaling.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(inputs)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[源代码]

Initialize the weights.

resize_rel_pos_embed(checkpoint)[源代码]

Resize relative pos_embed weights.

This function is modified from https://github.com/microsoft/unilm/blob/master/beit/semantic_segmentation/mmcv_custom/checkpoint.py. # noqa: E501 Copyright (c) Microsoft Corporation Licensed under the MIT License :param checkpoint: Key and value of the pretrain model. :type checkpoint: dict

返回

Interpolate the relative pos_embed weights

in the pre-train model to the current model size.

返回类型

state_dict (dict)

train(mode=True)[源代码]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

返回

self

返回类型

Module

class mmseg.models.backbones.BiSeNetV1(backbone_cfg, in_channels=3, spatial_channels=(64, 64, 64, 128), context_channels=(128, 256, 512), out_indices=(0, 1, 2), align_corners=False, out_channels=256, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[源代码]

BiSeNetV1 backbone.

This backbone is the implementation of BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation.

参数
  • backbone_cfg – (dict): Config of backbone of Context Path.

  • in_channels (int) – The number of channels of input image. Default: 3.

  • spatial_channels (Tuple[int]) – Size of channel numbers of various layers in Spatial Path. Default: (64, 64, 64, 128).

  • context_channels (Tuple[int]) – Size of channel numbers of various modules in Context Path. Default: (128, 256, 512).

  • out_indices (Tuple[int] | int, optional) – Output from which stages. Default: (0, 1, 2).

  • align_corners (bool, optional) – The align_corners argument of resize operation in Bilateral Guided Aggregation Layer. Default: False.

  • out_channels (int) – The number of channels of output. It must be the same with in_channels of decode_head. Default: 256.

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.BiSeNetV2(in_channels=3, detail_channels=(64, 64, 128), semantic_channels=(16, 32, 64, 128), semantic_expansion_ratio=6, bga_channels=128, out_indices=(0, 1, 2, 3, 4), align_corners=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[源代码]

BiSeNetV2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation.

This backbone is the implementation of BiSeNetV2.

参数
  • in_channels (int) – Number of channel of input image. Default: 3.

  • detail_channels (Tuple[int], optional) – Channels of each stage in Detail Branch. Default: (64, 64, 128).

  • semantic_channels (Tuple[int], optional) – Channels of each stage in Semantic Branch. Default: (16, 32, 64, 128). See Table 1 and Figure 3 of paper for more details.

  • semantic_expansion_ratio (int, optional) – The expansion factor expanding channel number of middle channels in Semantic Branch. Default: 6.

  • bga_channels (int, optional) – Number of middle channels in Bilateral Guided Aggregation Layer. Default: 128.

  • out_indices (Tuple[int] | int, optional) – Output from which stages. Default: (0, 1, 2, 3, 4).

  • align_corners (bool, optional) – The align_corners argument of resize operation in Bilateral Guided Aggregation Layer. Default: False.

  • conv_cfg (dict | None) – Config of conv layers. Default: None.

  • norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’BN’).

  • act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.CGNet(in_channels=3, num_channels=(32, 64, 128), num_blocks=(3, 21), dilations=(2, 4), reductions=(8, 16), conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'PReLU'}, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[源代码]

CGNet backbone.

This backbone is the implementation of A Light-weight Context Guided Network for Semantic Segmentation.

参数
  • in_channels (int) – Number of input image channels. Normally 3.

  • num_channels (tuple[int]) – Numbers of feature channels at each stages. Default: (32, 64, 128).

  • num_blocks (tuple[int]) – Numbers of CG blocks at stage 1 and stage 2. Default: (3, 21).

  • dilations (tuple[int]) – Dilation rate for surrounding context extractors at stage 1 and stage 2. Default: (2, 4).

  • reductions (tuple[int]) – Reductions for global context extractors at stage 1 and stage 2. Default: (8, 16).

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’, requires_grad=True).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’PReLU’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[源代码]

Convert the model into training mode will keeping the normalization layer freezed.

class mmseg.models.backbones.ERFNet(in_channels=3, enc_downsample_channels=(16, 64, 128), enc_stage_non_bottlenecks=(5, 8), enc_non_bottleneck_dilations=(2, 4, 8, 16), enc_non_bottleneck_channels=(64, 128), dec_upsample_channels=(64, 16), dec_stages_non_bottleneck=(2, 2), dec_non_bottleneck_channels=(64, 16), dropout_ratio=0.1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, init_cfg=None)[源代码]

ERFNet backbone.

This backbone is the implementation of ERFNet: Efficient Residual Factorized ConvNet for Real-time SemanticSegmentation.

参数
  • in_channels (int) – The number of channels of input image. Default: 3.

  • enc_downsample_channels (Tuple[int]) – Size of channel numbers of various Downsampler block in encoder. Default: (16, 64, 128).

  • enc_stage_non_bottlenecks (Tuple[int]) – Number of stages of Non-bottleneck block in encoder. Default: (5, 8).

  • enc_non_bottleneck_dilations (Tuple[int]) – Dilation rate of each stage of Non-bottleneck block of encoder. Default: (2, 4, 8, 16).

  • enc_non_bottleneck_channels (Tuple[int]) – Size of channel numbers of various Non-bottleneck block in encoder. Default: (64, 128).

  • dec_upsample_channels (Tuple[int]) – Size of channel numbers of various Deconvolution block in decoder. Default: (64, 16).

  • dec_stages_non_bottleneck (Tuple[int]) – Number of stages of Non-bottleneck block in decoder. Default: (2, 2).

  • dec_non_bottleneck_channels (Tuple[int]) – Size of channel numbers of various Non-bottleneck block in decoder. Default: (64, 16).

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.1.

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.FastSCNN(in_channels=3, downsample_dw_channels=(32, 48), global_in_channels=64, global_block_channels=(64, 96, 128), global_block_strides=(2, 2, 1), global_out_channels=128, higher_in_channels=64, lower_in_channels=128, fusion_out_channels=128, out_indices=(0, 1, 2), conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, align_corners=False, dw_act_cfg=None, init_cfg=None)[源代码]

Fast-SCNN Backbone.

This backbone is the implementation of Fast-SCNN: Fast Semantic Segmentation Network.

参数
  • in_channels (int) – Number of input image channels. Default: 3.

  • downsample_dw_channels (tuple[int]) – Number of output channels after the first conv layer & the second conv layer in Learning-To-Downsample (LTD) module. Default: (32, 48).

  • global_in_channels (int) – Number of input channels of Global Feature Extractor(GFE). Equal to number of output channels of LTD. Default: 64.

  • global_block_channels (tuple[int]) – Tuple of integers that describe the output channels for each of the MobileNet-v2 bottleneck residual blocks in GFE. Default: (64, 96, 128).

  • global_block_strides (tuple[int]) – Tuple of integers that describe the strides (downsampling factors) for each of the MobileNet-v2 bottleneck residual blocks in GFE. Default: (2, 2, 1).

  • global_out_channels (int) – Number of output channels of GFE. Default: 128.

  • higher_in_channels (int) – Number of input channels of the higher resolution branch in FFM. Equal to global_in_channels. Default: 64.

  • lower_in_channels (int) – Number of input channels of the lower resolution branch in FFM. Equal to global_out_channels. Default: 128.

  • fusion_out_channels (int) – Number of output channels of FFM. Default: 128.

  • out_indices (tuple) – Tuple of indices of list [higher_res_features, lower_res_features, fusion_output]. Often set to (0,1,2) to enable aux. heads. Default: (0, 1, 2).

  • conv_cfg (dict | None) – Config of conv layers. Default: None

  • norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’BN’)

  • act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’)

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False

  • dw_act_cfg (dict) – In DepthwiseSeparableConvModule, activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.HRNet(extra, in_channels=3, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, with_cp=False, frozen_stages=- 1, zero_init_residual=False, multiscale_output=True, pretrained=None, init_cfg=None)[源代码]

HRNet backbone.

This backbone is the implementation of High-Resolution Representations for Labeling Pixels and Regions.

参数
  • extra (dict) –

    Detailed configuration for each stage of HRNet. There must be 4 stages, the configuration for each stage must have 5 keys:

    • num_modules (int): The number of HRModule in this stage.

    • num_branches (int): The number of branches in the HRModule.

    • block (str): The type of convolution block.

    • num_blocks (tuple): The number of blocks in each branch.

      The length must be equal to num_branches.

    • num_channels (tuple): The number of channels in each branch.

      The length must be equal to num_branches.

  • in_channels (int) – Number of input image channels. Normally 3.

  • conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Use BN by default.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: False.

  • multiscale_output (bool) – Whether to output multi-level features produced by multiple branches. If False, only the first level feature will be output. Default: True.

  • pretrained (str, optional) – Model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

示例

>>> from mmseg.models import HRNet
>>> import torch
>>> extra = dict(
>>>     stage1=dict(
>>>         num_modules=1,
>>>         num_branches=1,
>>>         block='BOTTLENECK',
>>>         num_blocks=(4, ),
>>>         num_channels=(64, )),
>>>     stage2=dict(
>>>         num_modules=1,
>>>         num_branches=2,
>>>         block='BASIC',
>>>         num_blocks=(4, 4),
>>>         num_channels=(32, 64)),
>>>     stage3=dict(
>>>         num_modules=4,
>>>         num_branches=3,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4),
>>>         num_channels=(32, 64, 128)),
>>>     stage4=dict(
>>>         num_modules=3,
>>>         num_branches=4,
>>>         block='BASIC',
>>>         num_blocks=(4, 4, 4, 4),
>>>         num_channels=(32, 64, 128, 256)))
>>> self = HRNet(extra, in_channels=1)
>>> self.eval()
>>> inputs = torch.rand(1, 1, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 32, 8, 8)
(1, 64, 4, 4)
(1, 128, 2, 2)
(1, 256, 1, 1)
forward(x)[源代码]

Forward function.

property norm1

the normalization layer named “norm1”

Type

nn.Module

property norm2

the normalization layer named “norm2”

Type

nn.Module

train(mode=True)[源代码]

Convert the model into training mode will keeping the normalization layer freezed.

class mmseg.models.backbones.ICNet(backbone_cfg, in_channels=3, layer_channels=(512, 2048), light_branch_middle_channels=32, psp_out_channels=512, out_channels=(64, 256, 256), pool_scales=(1, 2, 3, 6), conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, act_cfg={'type': 'ReLU'}, align_corners=False, init_cfg=None)[源代码]

ICNet for Real-Time Semantic Segmentation on High-Resolution Images.

This backbone is the implementation of ICNet.

参数
  • backbone_cfg (dict) – Config dict to build backbone. Usually it is ResNet but it can also be other backbones.

  • in_channels (int) – The number of input image channels. Default: 3.

  • layer_channels (Sequence[int]) – The numbers of feature channels at layer 2 and layer 4 in ResNet. It can also be other backbones. Default: (512, 2048).

  • light_branch_middle_channels (int) – The number of channels of the middle layer in light branch. Default: 32.

  • psp_out_channels (int) – The number of channels of the output of PSP module. Default: 512.

  • out_channels (Sequence[int]) – The numbers of output feature channels at each branches. Default: (64, 256, 256).

  • pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module. Default: (1, 2, 3, 6).

  • conv_cfg (dict) – Dictionary to construct and config conv layer. Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Dictionary to construct and config act layer. Default: dict(type=’ReLU’).

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.MAE(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, num_fcs=2, norm_eval=False, pretrained=None, init_values=0.1, init_cfg=None)[源代码]

VisionTransformer with support for patch.

参数
  • img_size (int | tuple) – Input image size. Default: 224.

  • patch_size (int) – The patch size. Default: 16.

  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (int) – embedding dimension. Default: 768.

  • num_layers (int) – depth of transformer. Default: 12.

  • num_heads (int) – number of attention heads. Default: 12.

  • mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.

  • out_indices (list | tuple | int) – Output from which stages. Default: -1.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – stochastic depth rate. Default 0.0.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_values (float) – Initialize the values of Attention and FFN with learnable scaling. Defaults to 0.1.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

fix_init_weight()[源代码]

Rescale the initialization according to layer id.

This function is copied from https://github.com/microsoft/unilm/blob/master/beit/modeling_pretrain.py. # noqa: E501 Copyright (c) Microsoft Corporation Licensed under the MIT License

forward(inputs)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[源代码]

Initialize the weights.

class mmseg.models.backbones.MixVisionTransformer(in_channels=3, embed_dims=64, num_stages=4, num_layers=[3, 4, 6, 3], num_heads=[1, 2, 4, 8], patch_sizes=[7, 3, 3, 3], strides=[4, 2, 2, 2], sr_ratios=[8, 4, 2, 1], out_indices=(0, 1, 2, 3), mlp_ratio=4, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, act_cfg={'type': 'GELU'}, norm_cfg={'eps': 1e-06, 'type': 'LN'}, pretrained=None, init_cfg=None, with_cp=False)[源代码]

The backbone of Segformer.

This backbone is the implementation of SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. :param in_channels: Number of input channels. Default: 3. :type in_channels: int :param embed_dims: Embedding dimension. Default: 768. :type embed_dims: int :param num_stags: The num of stages. Default: 4. :type num_stags: int :param num_layers: The layer number of each transformer encode

layer. Default: [3, 4, 6, 3].

参数
  • num_heads (Sequence[int]) – The attention heads of each transformer encode layer. Default: [1, 2, 4, 8].

  • patch_sizes (Sequence[int]) – The patch_size of each overlapped patch embedding. Default: [7, 3, 3, 3].

  • strides (Sequence[int]) – The stride of each overlapped patch embedding. Default: [4, 2, 2, 2].

  • sr_ratios (Sequence[int]) – The spatial reduction rate of each transformer encode layer. Default: [8, 4, 2, 1].

  • out_indices (Sequence[int] | int) – Output from which stages. Default: (0, 1, 2, 3).

  • mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.

  • qkv_bias (bool) – Enable bias for qkv if True. Default: True.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.0

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – stochastic depth rate. Default 0.0

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[源代码]

Initialize the weights.

class mmseg.models.backbones.MobileNetV2(widen_factor=1.0, strides=(1, 2, 2, 2, 1, 2, 1), dilations=(1, 1, 1, 1, 1, 1, 1), out_indices=(1, 2, 4, 6), frozen_stages=- 1, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU6'}, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[源代码]

MobileNetV2 backbone.

This backbone is the implementation of MobileNetV2: Inverted Residuals and Linear Bottlenecks.

参数
  • widen_factor (float) – Width multiplier, multiply number of channels in each layer by this amount. Default: 1.0.

  • strides (Sequence[int], optional) – Strides of the first block of each layer. If not specified, default config in arch_setting will be used.

  • dilations (Sequence[int]) – Dilation of each layer.

  • out_indices (None or Sequence[int]) – Output from which stages. Default: (7, ).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU6’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

make_layer(out_channels, num_blocks, stride, dilation, expand_ratio)[源代码]

Stack InvertedResidual blocks to build a layer for MobileNetV2.

参数
  • out_channels (int) – out_channels of block.

  • num_blocks (int) – Number of blocks.

  • stride (int) – Stride of the first block.

  • dilation (int) – Dilation of the first block.

  • expand_ratio (int) – Expand the number of channels of the hidden layer in InvertedResidual by this ratio.

train(mode=True)[源代码]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

返回

self

返回类型

Module

class mmseg.models.backbones.MobileNetV3(arch='small', conv_cfg=None, norm_cfg={'type': 'BN'}, out_indices=(0, 1, 12), frozen_stages=- 1, reduction_factor=1, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[源代码]

MobileNetV3 backbone.

This backbone is the improved implementation of Searching for MobileNetV3.

参数
  • arch (str) – Architecture of mobilnetv3, from {‘small’, ‘large’}. Default: ‘small’.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • out_indices (tuple[int]) – Output from which layer. Default: (0, 1, 12).

  • frozen_stages (int) – Stages to be frozen (all param fixed). Default: -1, which means not freezing any parameters.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[源代码]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

返回

self

返回类型

Module

class mmseg.models.backbones.PCPVT(in_channels=3, embed_dims=[64, 128, 256, 512], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], out_indices=(0, 1, 2, 3), qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, norm_cfg={'type': 'LN'}, depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1], norm_after_stage=False, pretrained=None, init_cfg=None)[源代码]

The backbone of Twins-PCPVT.

This backbone is the implementation of Twins: Revisiting the Design of Spatial Attention in Vision Transformers.

参数
  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (list) – Embedding dimension. Default: [64, 128, 256, 512].

  • patch_sizes (list) – The patch sizes. Default: [4, 2, 2, 2].

  • strides (list) – The strides. Default: [4, 2, 2, 2].

  • num_heads (int) – Number of attention heads. Default: [1, 2, 4, 8].

  • mlp_ratios (int) – Ratio of mlp hidden dim to embedding dim. Default: [4, 4, 4, 4].

  • out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • qkv_bias (bool) – Enable bias for qkv if True. Default: False.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – Stochastic depth rate. Default 0.0

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • depths (list) – Depths of each stage. Default [3, 4, 6, 3]

  • sr_ratios (list) – Kernel_size of conv in each Attn module in Transformer encoder layer. Default: [8, 4, 2, 1].

  • norm_after_stage(bool) – Add extra norm. Default False.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[源代码]

Initialize the weights.

class mmseg.models.backbones.ResNeSt(groups=1, base_width=4, radix=2, reduction_factor=4, avg_down_stride=True, **kwargs)[源代码]

ResNeSt backbone.

This backbone is the implementation of ResNeSt: Split-Attention Networks.

参数
  • groups (int) – Number of groups of Bottleneck. Default: 1

  • base_width (int) – Base width of Bottleneck. Default: 4

  • radix (int) – Radix of SpltAtConv2d. Default: 2

  • reduction_factor (int) – Reduction factor of inter_channels in SplitAttentionConv2d. Default: 4.

  • avg_down_stride (bool) – Whether to use average pool for stride in Bottleneck. Default: True.

  • kwargs (dict) – Keyword arguments for ResNet.

make_res_layer(**kwargs)[源代码]

Pack all blocks in a stage into a ResLayer.

class mmseg.models.backbones.ResNeXt(groups=1, base_width=4, **kwargs)[源代码]

ResNeXt backbone.

This backbone is the implementation of Aggregated Residual Transformations for Deep Neural Networks.

参数
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Normally 3.

  • num_stages (int) – Resnet stages, normally 4.

  • groups (int) – Group of resnext.

  • base_width (int) – Base width of resnext.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • norm_cfg (dict) – dictionary to construct and config norm layer.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

  • zero_init_residual (bool) – whether to use zero init for last norm layer in resblocks to let them behave as identity.

示例

>>> from mmseg.models import ResNeXt
>>> import torch
>>> self = ResNeXt(depth=50)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 256, 8, 8)
(1, 512, 4, 4)
(1, 1024, 2, 2)
(1, 2048, 1, 1)
make_res_layer(**kwargs)[源代码]

Pack all blocks in a stage into a ResLayer

class mmseg.models.backbones.ResNet(depth, in_channels=3, stem_channels=64, base_channels=64, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', deep_stem=False, avg_down=False, frozen_stages=- 1, conv_cfg=None, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False, dcn=None, stage_with_dcn=(False, False, False, False), plugins=None, multi_grid=None, contract_dilation=False, with_cp=False, zero_init_residual=True, pretrained=None, init_cfg=None)[源代码]

ResNet backbone.

This backbone is the improved implementation of Deep Residual Learning for Image Recognition.

参数
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • in_channels (int) – Number of input image channels. Default: 3.

  • stem_channels (int) – Number of stem channels. Default: 64.

  • base_channels (int) – Number of base channels of res layer. Default: 64.

  • num_stages (int) – Resnet stages, normally 4. Default: 4.

  • strides (Sequence[int]) – Strides of the first block of each stage. Default: (1, 2, 2, 2).

  • dilations (Sequence[int]) – Dilation of each stage. Default: (1, 1, 1, 1).

  • out_indices (Sequence[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer. Default: ‘pytorch’.

  • deep_stem (bool) – Replace 7x7 conv in input stem with 3 3x3 conv. Default: False.

  • avg_down (bool) – Use AvgPool instead of stride conv when downsampling in the bottleneck. Default: False.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters. Default: -1.

  • conv_cfg (dict | None) – Dictionary to construct and config conv layer. When conv_cfg is None, cfg will be set to dict(type=’Conv2d’). Default: None.

  • norm_cfg (dict) – Dictionary to construct and config norm layer. Default: dict(type=’BN’, requires_grad=True).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • dcn (dict | None) – Dictionary to construct and config DCN conv layer. When dcn is not None, conv_cfg must be None. Default: None.

  • stage_with_dcn (Sequence[bool]) – Whether to set DCN conv for each stage. The length of stage_with_dcn is equal to num_stages. Default: (False, False, False, False).

  • plugins (list[dict]) –

    List of plugins for stages, each dict contains:

    • cfg (dict, required): Cfg dict to build plugin.

    • position (str, required): Position inside block to insert plugin,

    options: ‘after_conv1’, ‘after_conv2’, ‘after_conv3’.

    • stages (tuple[bool], optional): Stages to apply plugin, length

    should be same as ‘num_stages’. Default: None.

  • multi_grid (Sequence[int]|None) – Multi grid dilation rates of last stage. Default: None.

  • contract_dilation (bool) – Whether contract first dilation of each layer Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • zero_init_residual (bool) – Whether to use zero init for last norm layer in resblocks to let them behave as identity. Default: True.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

示例

>>> from mmseg.models import ResNet
>>> import torch
>>> self = ResNet(depth=18)
>>> self.eval()
>>> inputs = torch.rand(1, 3, 32, 32)
>>> level_outputs = self.forward(inputs)
>>> for level_out in level_outputs:
...     print(tuple(level_out.shape))
(1, 64, 8, 8)
(1, 128, 4, 4)
(1, 256, 2, 2)
(1, 512, 1, 1)
forward(x)[源代码]

Forward function.

make_res_layer(**kwargs)[源代码]

Pack all blocks in a stage into a ResLayer.

make_stage_plugins(plugins, stage_idx)[源代码]

make plugins for ResNet ‘stage_idx’th stage .

Currently we support to insert ‘context_block’, ‘empirical_attention_block’, ‘nonlocal_block’ into the backbone like ResNet/ResNeXt. They could be inserted after conv1/conv2/conv3 of Bottleneck.

An example of plugins format could be : >>> plugins=[ … dict(cfg=dict(type=’xxx’, arg1=’xxx’), … stages=(False, True, True, True), … position=’after_conv2’), … dict(cfg=dict(type=’yyy’), … stages=(True, True, True, True), … position=’after_conv3’), … dict(cfg=dict(type=’zzz’, postfix=’1’), … stages=(True, True, True, True), … position=’after_conv3’), … dict(cfg=dict(type=’zzz’, postfix=’2’), … stages=(True, True, True, True), … position=’after_conv3’) … ] >>> self = ResNet(depth=18) >>> stage_plugins = self.make_stage_plugins(plugins, 0) >>> assert len(stage_plugins) == 3

Suppose ‘stage_idx=0’, the structure of blocks in the stage would be:

conv1-> conv2->conv3->yyy->zzz1->zzz2

Suppose ‘stage_idx=1’, the structure of blocks in the stage would be:

conv1-> conv2->xxx->conv3->yyy->zzz1->zzz2

If stages is missing, the plugin would be applied to all stages.

参数
  • plugins (list[dict]) – List of plugins cfg to build. The postfix is required if multiple same type plugins are inserted.

  • stage_idx (int) – Index of stage to build

返回

Plugins for current stage

返回类型

list[dict]

property norm1

the normalization layer named “norm1”

Type

nn.Module

train(mode=True)[源代码]

Convert the model into training mode while keep normalization layer freezed.

class mmseg.models.backbones.ResNetV1c(**kwargs)[源代码]

ResNetV1c variant described in [1]_.

Compared with default ResNet(ResNetV1b), ResNetV1c replaces the 7x7 conv in the input stem with three 3x3 convs. For more details please refer to Bag of Tricks for Image Classification with Convolutional Neural Networks.

class mmseg.models.backbones.ResNetV1d(**kwargs)[源代码]

ResNetV1d variant described in [1]_.

Compared with default ResNet(ResNetV1b), ResNetV1d replaces the 7x7 conv in the input stem with three 3x3 convs. And in the downsampling block, a 2x2 avg_pool with stride 2 is added before conv, whose stride is changed to 1.

class mmseg.models.backbones.STDCContextPathNet(backbone_cfg, last_in_channels=(1024, 512), out_channels=128, ffm_cfg={'in_channels': 512, 'out_channels': 256, 'scale_factor': 4}, upsample_mode='nearest', align_corners=None, norm_cfg={'type': 'BN'}, init_cfg=None)[源代码]

STDCNet with Context Path. The outs below is a list of three feature maps from deep to shallow, whose height and width is from small to big, respectively. The biggest feature map of outs is outputted for STDCHead, where Detail Loss would be calculated by Detail Ground-truth. The other two feature maps are used for Attention Refinement Module, respectively. Besides, the biggest feature map of outs and the last output of Attention Refinement Module are concatenated for Feature Fusion Module. Then, this fusion feature map feat_fuse would be outputted for decode_head. More details please refer to Figure 4 of original paper.

参数
  • backbone_cfg (dict) – Config dict for stdc backbone.

  • last_in_channels (tuple(int)) – two feature maps from stdc backbone. Default: (1024, 512).

  • out_channels (int) – The channels of output feature maps. Default: 128.

  • ffm_cfg (dict) – Config dict for Feature Fusion Module. Default: dict(in_channels=512, out_channels=256, scale_factor=4).

  • upsample_mode (str) – Algorithm used for upsampling: 'nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear'. Default: 'nearest'.

  • align_corners (str) – align_corners argument of F.interpolate. It must be None if upsample_mode is 'nearest'. Default: None.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

返回

The tuple of list of output feature map for

auxiliary heads and decoder head.

返回类型

outputs (tuple)

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.STDCNet(stdc_type, in_channels, channels, bottleneck_type, norm_cfg, act_cfg, num_convs=4, with_final_conv=False, pretrained=None, init_cfg=None)[源代码]

This backbone is the implementation of Rethinking BiSeNet For Real-time Semantic Segmentation.

参数
  • stdc_type (int) – The type of backbone structure, STDCNet1 and`STDCNet2` denotes two main backbones in paper, whose FLOPs is 813M and 1446M, respectively.

  • in_channels (int) – The num of input_channels.

  • channels (tuple[int]) – The output channels for each stage.

  • bottleneck_type (str) – The type of STDC Module type, the value must be ‘add’ or ‘cat’.

  • norm_cfg (dict) – Config dict for normalization layer.

  • act_cfg (dict) – The activation config for conv layers.

  • num_convs (int) – Numbers of conv layer at each STDC Module. Default: 4.

  • with_final_conv (bool) – Whether add a conv layer at the Module output. Default: True.

  • pretrained (str, optional) – Model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

示例

>>> import torch
>>> stdc_type = 'STDCNet1'
>>> in_channels = 3
>>> channels = (32, 64, 256, 512, 1024)
>>> bottleneck_type = 'cat'
>>> inputs = torch.rand(1, 3, 1024, 2048)
>>> self = STDCNet(stdc_type, in_channels,
...                 channels, bottleneck_type).eval()
>>> outputs = self.forward(inputs)
>>> for i in range(len(outputs)):
...     print(f'outputs[{i}].shape = {outputs[i].shape}')
outputs[0].shape = torch.Size([1, 256, 128, 256])
outputs[1].shape = torch.Size([1, 512, 64, 128])
outputs[2].shape = torch.Size([1, 1024, 32, 64])
forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.SVT(in_channels=3, embed_dims=[64, 128, 256], patch_sizes=[4, 2, 2, 2], strides=[4, 2, 2, 2], num_heads=[1, 2, 4], mlp_ratios=[4, 4, 4], out_indices=(0, 1, 2, 3), qkv_bias=False, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.2, norm_cfg={'type': 'LN'}, depths=[4, 4, 4], sr_ratios=[4, 2, 1], windiow_sizes=[7, 7, 7], norm_after_stage=True, pretrained=None, init_cfg=None)[源代码]

The backbone of Twins-SVT.

This backbone is the implementation of Twins: Revisiting the Design of Spatial Attention in Vision Transformers.

参数
  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (list) – Embedding dimension. Default: [64, 128, 256, 512].

  • patch_sizes (list) – The patch sizes. Default: [4, 2, 2, 2].

  • strides (list) – The strides. Default: [4, 2, 2, 2].

  • num_heads (int) – Number of attention heads. Default: [1, 2, 4].

  • mlp_ratios (int) – Ratio of mlp hidden dim to embedding dim. Default: [4, 4, 4].

  • out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • qkv_bias (bool) – Enable bias for qkv if True. Default: False.

  • drop_rate (float) – Dropout rate. Default 0.

  • attn_drop_rate (float) – Dropout ratio of attention weight. Default 0.0

  • drop_path_rate (float) – Stochastic depth rate. Default 0.2.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • depths (list) – Depths of each stage. Default [4, 4, 4].

  • sr_ratios (list) – Kernel_size of conv in each Attn module in Transformer encoder layer. Default: [4, 2, 1].

  • windiow_sizes (list) – Window size of LSA. Default: [7, 7, 7],

  • input_features_slice(bool) – Input features need slice. Default: False.

  • norm_after_stage(bool) – Add extra norm. Default False.

  • strides – Strides in patch-Embedding modules. Default: (2, 2, 2)

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

class mmseg.models.backbones.SwinTransformer(pretrain_img_size=224, in_channels=3, embed_dims=96, patch_size=4, window_size=7, mlp_ratio=4, depths=(2, 2, 6, 2), num_heads=(3, 6, 12, 24), strides=(4, 2, 2, 2), out_indices=(0, 1, 2, 3), qkv_bias=True, qk_scale=None, patch_norm=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.1, use_abs_pos_embed=False, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, with_cp=False, pretrained=None, frozen_stages=- 1, init_cfg=None)[源代码]

Swin Transformer backbone.

This backbone is the implementation of Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. Inspiration from https://github.com/microsoft/Swin-Transformer.

参数
  • pretrain_img_size (int | tuple[int]) – The size of input image when pretrain. Defaults: 224.

  • in_channels (int) – The num of input channels. Defaults: 3.

  • embed_dims (int) – The feature dimension. Default: 96.

  • patch_size (int | tuple[int]) – Patch size. Default: 4.

  • window_size (int) – Window size. Default: 7.

  • mlp_ratio (int | float) – Ratio of mlp hidden dim to embedding dim. Default: 4.

  • depths (tuple[int]) – Depths of each Swin Transformer stage. Default: (2, 2, 6, 2).

  • num_heads (tuple[int]) – Parallel attention heads of each Swin Transformer stage. Default: (3, 6, 12, 24).

  • strides (tuple[int]) – The patch merging or patch embedding stride of each Swin Transformer stage. (In swin, we set kernel size equal to stride.) Default: (4, 2, 2, 2).

  • out_indices (tuple[int]) – Output from which stages. Default: (0, 1, 2, 3).

  • qkv_bias (bool, optional) – If True, add a learnable bias to query, key, value. Default: True

  • qk_scale (float | None, optional) – Override default qk scale of head_dim ** -0.5 if set. Default: None.

  • patch_norm (bool) – If add a norm layer for patch embed and patch merging. Default: True.

  • drop_rate (float) – Dropout rate. Defaults: 0.

  • attn_drop_rate (float) – Attention dropout rate. Default: 0.

  • drop_path_rate (float) – Stochastic depth rate. Defaults: 0.1.

  • use_abs_pos_embed (bool) – If True, add absolute position embedding to the patch embedding. Defaults: False.

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’LN’).

  • norm_cfg (dict) – Config dict for normalization layer at output of backone. Defaults: dict(type=’LN’).

  • with_cp (bool, optional) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • frozen_stages (int) – Stages to be frozen (stop grad and set eval mode). -1 means not freezing any parameters.

  • init_cfg (dict, optional) – The Config for initialization. Defaults to None.

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[源代码]

Initialize the weights.

train(mode=True)[源代码]

Convert the model into training mode while keep layers freezed.

class mmseg.models.backbones.TIMMBackbone(model_name, features_only=True, pretrained=True, checkpoint_path='', in_channels=3, init_cfg=None, **kwargs)[源代码]

Wrapper to use backbones from timm library. More details can be found in timm .

参数
  • model_name (str) – Name of timm model to instantiate.

  • pretrained (bool) – Load pretrained weights if True.

  • checkpoint_path (str) – Path of checkpoint to load after model is initialized.

  • in_channels (int) – Number of input image channels. Default: 3.

  • init_cfg (dict, optional) – Initialization config dict

  • **kwargs – Other timm & model specific arguments.

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmseg.models.backbones.UNet(in_channels=3, base_channels=64, num_stages=5, strides=(1, 1, 1, 1, 1), enc_num_convs=(2, 2, 2, 2, 2), dec_num_convs=(2, 2, 2, 2), downsamples=(True, True, True, True), enc_dilations=(1, 1, 1, 1, 1), dec_dilations=(1, 1, 1, 1), with_cp=False, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, upsample_cfg={'type': 'InterpConv'}, norm_eval=False, dcn=None, plugins=None, pretrained=None, init_cfg=None)[源代码]

UNet backbone.

This backbone is the implementation of U-Net: Convolutional Networks for Biomedical Image Segmentation.

参数
  • in_channels (int) – Number of input image channels. Default” 3.

  • base_channels (int) – Number of base channels of each stage. The output channels of the first stage. Default: 64.

  • num_stages (int) – Number of stages in encoder, normally 5. Default: 5.

  • strides (Sequence[int 1 | 2]) – Strides of each stage in encoder. len(strides) is equal to num_stages. Normally the stride of the first stage in encoder is 1. If strides[i]=2, it uses stride convolution to downsample in the correspondence encoder stage. Default: (1, 1, 1, 1, 1).

  • enc_num_convs (Sequence[int]) – Number of convolutional layers in the convolution block of the correspondence encoder stage. Default: (2, 2, 2, 2, 2).

  • dec_num_convs (Sequence[int]) – Number of convolutional layers in the convolution block of the correspondence decoder stage. Default: (2, 2, 2, 2).

  • downsamples (Sequence[int]) – Whether use MaxPool to downsample the feature map after the first stage of encoder (stages: [1, num_stages)). If the correspondence encoder stage use stride convolution (strides[i]=2), it will never use MaxPool to downsample, even downsamples[i-1]=True. Default: (True, True, True, True).

  • enc_dilations (Sequence[int]) – Dilation rate of each stage in encoder. Default: (1, 1, 1, 1, 1).

  • dec_dilations (Sequence[int]) – Dilation rate of each stage in decoder. Default: (1, 1, 1, 1).

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • conv_cfg (dict | None) – Config dict for convolution layer. Default: None.

  • norm_cfg (dict | None) – Config dict for normalization layer. Default: dict(type=’BN’).

  • act_cfg (dict | None) – Config dict for activation layer in ConvModule. Default: dict(type=’ReLU’).

  • upsample_cfg (dict) – The upsample config of the upsample module in decoder. Default: dict(type=’InterpConv’).

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • dcn (bool) – Use deformable convolution in convolutional layer or not. Default: None.

  • plugins (dict) – plugins for convolutional layers. Default: None.

  • pretrained (str, optional) – model pretrained path. Default: None

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None

Notice:

The input image size should be divisible by the whole downsample rate of the encoder. More detail of the whole downsample rate can be found in UNet._check_input_divisible.

forward(x)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[源代码]

Convert the model into training mode while keep normalization layer freezed.

class mmseg.models.backbones.VisionTransformer(img_size=224, patch_size=16, in_channels=3, embed_dims=768, num_layers=12, num_heads=12, mlp_ratio=4, out_indices=- 1, qkv_bias=True, drop_rate=0.0, attn_drop_rate=0.0, drop_path_rate=0.0, with_cls_token=True, output_cls_token=False, norm_cfg={'type': 'LN'}, act_cfg={'type': 'GELU'}, patch_norm=False, final_norm=False, interpolate_mode='bicubic', num_fcs=2, norm_eval=False, with_cp=False, pretrained=None, init_cfg=None)[源代码]

Vision Transformer.

This backbone is the implementation of An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.

参数
  • img_size (int | tuple) – Input image size. Default: 224.

  • patch_size (int) – The patch size. Default: 16.

  • in_channels (int) – Number of input channels. Default: 3.

  • embed_dims (int) – embedding dimension. Default: 768.

  • num_layers (int) – depth of transformer. Default: 12.

  • num_heads (int) – number of attention heads. Default: 12.

  • mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.

  • out_indices (list | tuple | int) – Output from which stages. Default: -1.

  • qkv_bias (bool) – enable bias for qkv if True. Default: True.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.0

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • drop_path_rate (float) – stochastic depth rate. Default 0.0

  • with_cls_token (bool) – Whether concatenating class token into image tokens as transformer input. Default: True.

  • output_cls_token (bool) – Whether output the cls_token. If set True, with_cls_token must be True. Default: False.

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • patch_norm (bool) – Whether to add a norm in PatchEmbed Block. Default: False.

  • final_norm (bool) – Whether to add a additional layer to normalize final feature map. Default: False.

  • interpolate_mode (str) – Select the interpolate mode for position embeding vector resize. Default: bicubic.

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.

  • norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effect on Batch Norm and its variants only. Default: False.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed. Default: False.

  • pretrained (str, optional) – model pretrained path. Default: None.

  • init_cfg (dict or list[dict], optional) – Initialization config dict. Default: None.

forward(inputs)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

init_weights()[源代码]

Initialize the weights.

static resize_pos_embed(pos_embed, input_shpae, pos_shape, mode)[源代码]

Resize pos_embed weights.

Resize pos_embed using bicubic interpolate method. :param pos_embed: Position embedding weights. :type pos_embed: torch.Tensor :param input_shpae: Tuple for (downsampled input image height,

downsampled input image width).

参数
  • pos_shape (tuple) – The resolution of downsampled origin training image.

  • mode (str) – Algorithm used for upsampling: 'nearest' | 'linear' | 'bilinear' | 'bicubic' | 'trilinear'. Default: 'nearest'

返回

The resized pos_embed of shape [B, L_new, C]

返回类型

torch.Tensor

train(mode=True)[源代码]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

参数

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

返回

self

返回类型

Module

decode_heads

class mmseg.models.decode_heads.ANNHead(project_channels, query_scales=(1), key_pool_scales=(1, 3, 6, 8), **kwargs)[源代码]

Asymmetric Non-local Neural Networks for Semantic Segmentation.

This head is the implementation of ANNNet.

参数
  • project_channels (int) – Projection channels for Nonlocal.

  • query_scales (tuple[int]) – The scales of query feature map. Default: (1,)

  • key_pool_scales (tuple[int]) – The pooling scales of key feature map. Default: (1, 3, 6, 8).

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.APCHead(pool_scales=(1, 2, 3, 6), fusion=True, **kwargs)[源代码]

Adaptive Pyramid Context Network for Semantic Segmentation.

This head is the implementation of APCNet.

参数
  • pool_scales (tuple[int]) – Pooling scales used in Adaptive Context Module. Default: (1, 2, 3, 6).

  • fusion (bool) – Add one conv to fuse residual feature.

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.ASPPHead(dilations=(1, 6, 12, 18), **kwargs)[源代码]

Rethinking Atrous Convolution for Semantic Image Segmentation.

This head is the implementation of DeepLabV3.

参数

dilations (tuple[int]) – Dilation rates for ASPP module. Default: (1, 6, 12, 18).

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.CCHead(recurrence=2, **kwargs)[源代码]

CCNet: Criss-Cross Attention for Semantic Segmentation.

This head is the implementation of CCNet.

参数

recurrence (int) – Number of recurrence of Criss Cross Attention module. Default: 2.

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.DAHead(pam_channels, **kwargs)[源代码]

Dual Attention Network for Scene Segmentation.

This head is the implementation of DANet.

参数

pam_channels (int) – The channels of Position Attention Module(PAM).

cam_cls_seg(feat)[源代码]

CAM feature classification.

forward(inputs)[源代码]

Forward function.

forward_test(inputs, img_metas, test_cfg)[源代码]

Forward function for testing, only pam_cam is used.

losses(seg_logit, seg_label)[源代码]

Compute pam_cam, pam, cam loss.

pam_cls_seg(feat)[源代码]

PAM feature classification.

class mmseg.models.decode_heads.DMHead(filter_sizes=(1, 3, 5, 7), fusion=False, **kwargs)[源代码]

Dynamic Multi-scale Filters for Semantic Segmentation.

This head is the implementation of DMNet.

参数
  • filter_sizes (tuple[int]) – The size of generated convolutional filters used in Dynamic Convolutional Module. Default: (1, 3, 5, 7).

  • fusion (bool) – Add one conv to fuse DCM output feature.

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.DNLHead(reduction=2, use_scale=True, mode='embedded_gaussian', temperature=0.05, **kwargs)[源代码]

Disentangled Non-Local Neural Networks.

This head is the implementation of DNLNet.

参数
  • reduction (int) – Reduction factor of projection transform. Default: 2.

  • use_scale (bool) – Whether to scale pairwise_weight by sqrt(1/inter_channels). Default: False.

  • mode (str) – The nonlocal mode. Options are ‘embedded_gaussian’, ‘dot_product’. Default: ‘embedded_gaussian.’.

  • temperature (float) – Temperature to adjust attention. Default: 0.05

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.DPTHead(embed_dims=768, post_process_channels=[96, 192, 384, 768], readout_type='ignore', patch_size=16, expand_channels=False, act_cfg={'type': 'ReLU'}, norm_cfg={'type': 'BN'}, **kwargs)[源代码]

Vision Transformers for Dense Prediction.

This head is implemented of DPT.

参数
  • embed_dims (int) – The embed dimension of the ViT backbone. Default: 768.

  • post_process_channels (List) – Out channels of post process conv layers. Default: [96, 192, 384, 768].

  • readout_type (str) – Type of readout operation. Default: ‘ignore’.

  • patch_size (int) – The patch size. Default: 16.

  • expand_channels (bool) – Whether expand the channels in post process block. Default: False.

  • act_cfg (dict) – The activation config for residual conv unit. Default dict(type=’ReLU’).

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).

forward(inputs)[源代码]

Placeholder of forward function.

class mmseg.models.decode_heads.DepthwiseSeparableASPPHead(c1_in_channels, c1_channels, **kwargs)[源代码]

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation.

This head is the implementation of DeepLabV3+.

参数
  • c1_in_channels (int) – The input channels of c1 decoder. If is 0, the no decoder will be used.

  • c1_channels (int) – The intermediate channels of c1 decoder.

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.DepthwiseSeparableFCNHead(dw_act_cfg=None, **kwargs)[源代码]

Depthwise-Separable Fully Convolutional Network for Semantic Segmentation.

This head is implemented according to Fast-SCNN: Fast Semantic Segmentation Network.

参数
  • in_channels (int) – Number of output channels of FFM.

  • channels (int) – Number of middle-stage channels in the decode head.

  • concat_input (bool) – Whether to concatenate original decode input into the result of several consecutive convolution layers. Default: True.

  • num_classes (int) – Used to determine the dimension of final prediction tensor.

  • in_index (int) – Correspond with ‘out_indices’ in FastSCNN backbone.

  • norm_cfg (dict | None) – Config of norm layers.

  • align_corners (bool) – align_corners argument of F.interpolate. Default: False.

  • loss_decode (dict) – Config of loss type and some relevant additional options.

  • dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: None.

class mmseg.models.decode_heads.EMAHead(ema_channels, num_bases, num_stages, concat_input=True, momentum=0.1, **kwargs)[源代码]

Expectation Maximization Attention Networks for Semantic Segmentation.

This head is the implementation of EMANet.

参数
  • ema_channels (int) – EMA module channels

  • num_bases (int) – Number of bases.

  • num_stages (int) – Number of the EM iterations.

  • concat_input (bool) – Whether concat the input and output of convs before classification layer. Default: True

  • momentum (float) – Momentum to update the base. Default: 0.1.

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.EncHead(num_codes=32, use_se_loss=True, add_lateral=False, loss_se_decode={'loss_weight': 0.2, 'type': 'CrossEntropyLoss', 'use_sigmoid': True}, **kwargs)[源代码]

Context Encoding for Semantic Segmentation.

This head is the implementation of EncNet.

参数
  • num_codes (int) – Number of code words. Default: 32.

  • use_se_loss (bool) – Whether use Semantic Encoding Loss (SE-loss) to regularize the training. Default: True.

  • add_lateral (bool) – Whether use lateral connection to fuse features. Default: False.

  • loss_se_decode (dict) – Config of decode loss. Default: dict(type=’CrossEntropyLoss’, use_sigmoid=True).

forward(inputs)[源代码]

Forward function.

forward_test(inputs, img_metas, test_cfg)[源代码]

Forward function for testing, ignore se_loss.

losses(seg_logit, seg_label)[源代码]

Compute segmentation and semantic encoding loss.

class mmseg.models.decode_heads.FCNHead(num_convs=2, kernel_size=3, concat_input=True, dilation=1, **kwargs)[源代码]

Fully Convolution Networks for Semantic Segmentation.

This head is implemented of FCNNet.

参数
  • num_convs (int) – Number of convs in the head. Default: 2.

  • kernel_size (int) – The kernel size for convs in the head. Default: 3.

  • concat_input (bool) – Whether concat the input and output of convs before classification layer.

  • dilation (int) – The dilation rate for convs in the head. Default: 1.

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.FPNHead(feature_strides, **kwargs)[源代码]

Panoptic Feature Pyramid Networks.

This head is the implementation of Semantic FPN.

参数

feature_strides (tuple[int]) – The strides for input feature maps. stack_lateral. All strides suppose to be power of 2. The first one is of largest resolution.

forward(inputs)[源代码]

Placeholder of forward function.

class mmseg.models.decode_heads.GCHead(ratio=0.25, pooling_type='att', fusion_types=('channel_add'), **kwargs)[源代码]

GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond.

This head is the implementation of GCNet.

参数
  • ratio (float) – Multiplier of channels ratio. Default: 1/4.

  • pooling_type (str) – The pooling type of context aggregation. Options are ‘att’, ‘avg’. Default: ‘avg’.

  • fusion_types (tuple[str]) – The fusion type for feature fusion. Options are ‘channel_add’, ‘channel_mul’. Default: (‘channel_add’,)

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.ISAHead(isa_channels, down_factor=(8, 8), **kwargs)[源代码]

Interlaced Sparse Self-Attention for Semantic Segmentation.

This head is the implementation of ISA.

参数
  • isa_channels (int) – The channels of ISA Module.

  • down_factor (tuple[int]) – The local group size of ISA.

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.IterativeDecodeHead(num_stages, kernel_generate_head, kernel_update_head, **kwargs)[源代码]

K-Net: Towards Unified Image Segmentation.

This head is the implementation of `K-Net: <https://arxiv.org/abs/2106.14855>`_.

参数
  • num_stages (int) – The number of stages (kernel update heads) in IterativeDecodeHead. Default: 3.

  • kernel_generate_head – (dict): Config of kernel generate head which generate mask predictions, dynamic kernels and class predictions for next kernel update heads.

  • kernel_update_head (dict) – Config of kernel update head which refine dynamic kernels and class predictions iteratively.

forward(inputs)[源代码]

Forward function.

losses(seg_logit, seg_label)[源代码]

Compute segmentation loss.

class mmseg.models.decode_heads.KernelUpdateHead(num_classes=150, num_ffn_fcs=2, num_heads=8, num_mask_fcs=3, feedforward_channels=2048, in_channels=256, out_channels=256, dropout=0.0, act_cfg={'inplace': True, 'type': 'ReLU'}, ffn_act_cfg={'inplace': True, 'type': 'ReLU'}, conv_kernel_size=1, feat_transform_cfg=None, kernel_init=False, with_ffn=True, feat_gather_stride=1, mask_transform_stride=1, kernel_updator_cfg={'act_cfg': {'inplace': True, 'type': 'ReLU'}, 'feat_channels': 64, 'in_channels': 256, 'norm_cfg': {'type': 'LN'}, 'out_channels': 256, 'type': 'DynamicConv'})[源代码]

Kernel Update Head in K-Net.

参数
  • num_classes (int) – Number of classes. Default: 150.

  • num_ffn_fcs (int) – The number of fully-connected layers in FFNs. Default: 2.

  • num_heads (int) – The number of parallel attention heads. Default: 8.

  • num_mask_fcs (int) – The number of fully connected layers for mask prediction. Default: 3.

  • feedforward_channels (int) – The hidden dimension of FFNs. Defaults: 2048.

  • in_channels (int) – The number of channels of input feature map. Default: 256.

  • out_channels (int) – The number of output channels. Default: 256.

  • dropout (float) – The Probability of an element to be zeroed in MultiheadAttention and FFN. Default 0.0.

  • act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).

  • ffn_act_cfg (dict) – Config of activation layers in FFN. Default: dict(type=’ReLU’).

  • conv_kernel_size (int) – The kernel size of convolution in Kernel Update Head for dynamic kernel updation. Default: 1.

  • feat_transform_cfg (dict | None) – Config of feature transform. Default: None.

  • kernel_init (bool) – Whether initiate mask kernel in mask head. Default: False.

  • with_ffn (bool) – Whether add FFN in kernel update head. Default: True.

  • feat_gather_stride (int) – Stride of convolution in feature transform. Default: 1.

  • mask_transform_stride (int) – Stride of mask transform. Default: 1.

  • kernel_updator_cfg (dict) –

    Config of kernel updator. Default: dict(

    type=’DynamicConv’, in_channels=256, feat_channels=64, out_channels=256, act_cfg=dict(type=’ReLU’, inplace=True), norm_cfg=dict(type=’LN’)).

forward(x, proposal_feat, mask_preds, mask_shape=None)[源代码]

Forward function of Dynamic Instance Interactive Head.

参数
  • x (Tensor) – Feature map from FPN with shape (batch_size, feature_dimensions, H , W).

  • proposal_feat (Tensor) – Intermediate feature get from diihead in last stage, has shape (batch_size, num_proposals, feature_dimensions)

  • mask_preds (Tensor) – mask prediction from the former stage in shape (batch_size, num_proposals, H, W).

返回

The first tensor is predicted mask with shape (N, num_classes, H, W), the second tensor is dynamic kernel with shape (N, num_classes, channels, K, K).

返回类型

Tuple

init_weights()[源代码]

Use xavier initialization for all weight parameter and set classification head bias as a specific value when use focal loss.

class mmseg.models.decode_heads.KernelUpdator(in_channels=256, feat_channels=64, out_channels=None, gate_sigmoid=True, gate_norm_act=False, activate_out=False, norm_cfg={'type': 'LN'}, act_cfg={'inplace': True, 'type': 'ReLU'})[源代码]

Dynamic Kernel Updator in Kernel Update Head.

参数
  • in_channels (int) – The number of channels of input feature map. Default: 256.

  • feat_channels (int) – The number of middle-stage channels in the kernel updator. Default: 64.

  • out_channels (int) – The number of output channels.

  • gate_sigmoid (bool) – Whether use sigmoid function in gate mechanism. Default: True.

  • gate_norm_act (bool) – Whether add normalization and activation layer in gate mechanism. Default: False.

  • activate_out – Whether add activation after gate mechanism. Default: False.

  • norm_cfg (dict | None) – Config of norm layers. Default: dict(type=’LN’).

  • act_cfg (dict) – Config of activation layers. Default: dict(type=’ReLU’).

forward(update_feature, input_feature)[源代码]

Forward function of KernelUpdator.

参数
  • update_feature (torch.Tensor) – Feature map assembled from each group. It would be reshaped with last dimension shape: self.in_channels.

  • input_feature (torch.Tensor) – Intermediate feature with shape: (N, num_classes, conv_kernel_size**2, channels).

返回

The output tensor of shape (N*C1/C2, K*K, C2), where N is the number of classes, C1 and C2 are the feature map channels of KernelUpdateHead and KernelUpdator, respectively.

返回类型

Tensor

class mmseg.models.decode_heads.LRASPPHead(branch_channels=(32, 64), **kwargs)[源代码]

Lite R-ASPP (LRASPP) head is proposed in Searching for MobileNetV3.

This head is the improved implementation of Searching for MobileNetV3.

参数

branch_channels (tuple[int]) – The number of output channels in every each branch. Default: (32, 64).

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.NLHead(reduction=2, use_scale=True, mode='embedded_gaussian', **kwargs)[源代码]

Non-local Neural Networks.

This head is the implementation of NLNet.

参数
  • reduction (int) – Reduction factor of projection transform. Default: 2.

  • use_scale (bool) – Whether to scale pairwise_weight by sqrt(1/inter_channels). Default: True.

  • mode (str) – The nonlocal mode. Options are ‘embedded_gaussian’, ‘dot_product’. Default: ‘embedded_gaussian.’.

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.OCRHead(ocr_channels, scale=1, **kwargs)[源代码]

Object-Contextual Representations for Semantic Segmentation.

This head is the implementation of OCRNet.

参数
  • ocr_channels (int) – The intermediate channels of OCR block.

  • scale (int) – The scale of probability map in SpatialGatherModule in Default: 1.

forward(inputs, prev_output)[源代码]

Forward function.

class mmseg.models.decode_heads.PSAHead(mask_size, psa_type='bi-direction', compact=False, shrink_factor=2, normalization_factor=1.0, psa_softmax=True, **kwargs)[源代码]

Point-wise Spatial Attention Network for Scene Parsing.

This head is the implementation of PSANet.

参数
  • mask_size (tuple[int]) – The PSA mask size. It usually equals input size.

  • psa_type (str) – The type of psa module. Options are ‘collect’, ‘distribute’, ‘bi-direction’. Default: ‘bi-direction’

  • compact (bool) – Whether use compact map for ‘collect’ mode. Default: True.

  • shrink_factor (int) – The downsample factors of psa mask. Default: 2.

  • normalization_factor (float) – The normalize factor of attention.

  • psa_softmax (bool) – Whether use softmax for attention.

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.PSPHead(pool_scales=(1, 2, 3, 6), **kwargs)[源代码]

Pyramid Scene Parsing Network.

This head is the implementation of PSPNet.

参数

pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module. Default: (1, 2, 3, 6).

forward(inputs)[源代码]

Forward function.

class mmseg.models.decode_heads.PointHead(num_fcs=3, coarse_pred_each_layer=True, conv_cfg={'type': 'Conv1d'}, norm_cfg=None, act_cfg={'inplace': False, 'type': 'ReLU'}, **kwargs)[源代码]

A mask point head use in PointRend.

This head is implemented of PointRend: Image Segmentation as Rendering. PointHead use shared multi-layer perceptron (equivalent to nn.Conv1d) to predict the logit of input points. The fine-grained feature and coarse feature will be concatenate together for predication.

参数
  • num_fcs (int) – Number of fc layers in the head. Default: 3.

  • in_channels (int) – Number of input channels. Default: 256.

  • fc_channels (int) – Number of fc channels. Default: 256.

  • num_classes (int) – Number of classes for logits. Default: 80.

  • class_agnostic (bool) – Whether use class agnostic classification. If so, the output channels of logits will be 1. Default: False.

  • coarse_pred_each_layer (bool) – Whether concatenate coarse feature with the output of each fc layer. Default: True.

  • conv_cfg (dict|None) – Dictionary to construct and config conv layer. Default: dict(type=’Conv1d’))

  • norm_cfg (dict|None) – Dictionary to construct and config norm layer. Default: None.

  • loss_point (dict) – Dictionary to construct and config loss layer of point head. Default: dict(type=’CrossEntropyLoss’, use_mask=True, loss_weight=1.0).

cls_seg(feat)[源代码]

Classify each pixel with fc.

forward(fine_grained_point_feats, coarse_point_feats)[源代码]

Placeholder of forward function.

forward_test(inputs, prev_output, img_metas, test_cfg)[源代码]

Forward function for testing.

参数
  • inputs (list[Tensor]) – List of multi-level img features.

  • prev_output (Tensor) – The output of previous decode head.

  • img_metas (list[dict]) – List of image info dict where each dict has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:Collect.

  • test_cfg (dict) – The testing config.

返回

Output segmentation map.

返回类型

Tensor

forward_train(inputs, prev_output, img_metas, gt_semantic_seg, train_cfg)[源代码]

Forward function for training. :param inputs: List of multi-level img features. :type inputs: list[Tensor] :param prev_output: The output of previous decode head. :type prev_output: Tensor :param img_metas: List of image info dict where each dict

has: ‘img_shape’, ‘scale_factor’, ‘flip’, and may also contain ‘filename’, ‘ori_shape’, ‘pad_shape’, and ‘img_norm_cfg’. For details on the values of these keys see mmseg/datasets/pipelines/formatting.py:Collect.

参数
  • gt_semantic_seg (Tensor) – Semantic segmentation masks used if the architecture supports semantic segmentation task.

  • train_cfg (dict) – The training config.

返回

a dictionary of loss components

返回类型

dict[str, Tensor]

get_points_test(seg_logits, uncertainty_func, cfg)[源代码]

Sample points for testing.

Find num_points most uncertain points from uncertainty_map.

参数
  • seg_logits (Tensor) – A tensor of shape (batch_size, num_classes, height, width) for class-specific or class-agnostic prediction.

  • uncertainty_func (func) – uncertainty calculation function.

  • cfg (dict) – Testing config of point head.

返回

A tensor of shape (batch_size, num_points)

that contains indices from [0, height x width) of the most uncertain points.

point_coords (Tensor): A tensor of shape (batch_size, num_points,

2) that contains [0, 1] x [0, 1] normalized coordinates of the most uncertain points from the height x width grid .

返回类型

point_indices (Tensor)

get_points_train(seg_logits, uncertainty_func, cfg)[源代码]

Sample points for training.

Sample points in [0, 1] x [0, 1] coordinate space based on their uncertainty. The uncertainties are calculated for each point using ‘uncertainty_func’ function that takes point’s logit prediction as input.

参数
  • seg_logits (Tensor) – Semantic segmentation logits, shape ( batch_size, num_classes, height, width).

  • uncertainty_func (func) – uncertainty calculation function.

  • cfg (dict) – Training config of point head.

返回

A tensor of shape (batch_size, num_points,

2) that contains the coordinates of num_points sampled points.

返回类型

point_coords (Tensor)

losses(point_logits, point_label)[源代码]

Compute segmentation loss.

class mmseg.models.decode_heads.SETRMLAHead(mla_channels=128, up_scale=4, **kwargs)[源代码]

Multi level feature aggretation head of SETR.

MLA head of SETR.

参数
  • mlahead_channels (int) – Channels of conv-conv-4x of multi-level feature aggregation. Default: 128.

  • up_scale (int) – The scale factor of interpolate. Default:4.

forward(inputs)[源代码]

Placeholder of forward function.

class mmseg.models.decode_heads.SETRUPHead(norm_layer={'eps': 1e-06, 'requires_grad': True, 'type': 'LN'}, num_convs=1, up_scale=4, kernel_size=3, init_cfg=[{'type': 'Constant', 'val': 1.0, 'bias': 0, 'layer': 'LayerNorm'}, {'type': 'Normal', 'std': 0.01, 'override': {'name': 'conv_seg'}}], **kwargs)[源代码]

Naive upsampling head and Progressive upsampling head of SETR.

Naive or PUP head of SETR.

参数
  • norm_layer (dict) – Config dict for input normalization. Default: norm_layer=dict(type=’LN’, eps=1e-6, requires_grad=True).

  • num_convs (int) – Number of decoder convolutions. Default: 1.

  • up_scale (int) – The scale factor of interpolate. Default:4.

  • kernel_size (int) – The kernel size of convolution when decoding feature information from backbone. Default: 3.

  • init_cfg (dict | list[dict] | None) –

    Initialization config dict. Default: dict(

    type=’Constant’, val=1.0, bias=0, layer=’LayerNorm’).

forward(x)[源代码]

Placeholder of forward function.

class mmseg.models.decode_heads.STDCHead(boundary_threshold=0.1, **kwargs)[源代码]

This head is the implementation of Rethinking BiSeNet For Real-time Semantic Segmentation.

参数

boundary_threshold (float) – The threshold of calculating boundary. Default: 0.1.

losses(seg_logit, seg_label)[源代码]

Compute Detail Aggregation Loss.

class mmseg.models.decode_heads.SegformerHead(interpolate_mode='bilinear', **kwargs)[源代码]

The all mlp Head of segformer.

This head is the implementation of Segformer <https://arxiv.org/abs/2105.15203> _.

参数

interpolate_mode – The interpolate mode of MLP head upsample operation. Default: ‘bilinear’.

forward(inputs)[源代码]

Placeholder of forward function.

class mmseg.models.decode_heads.SegmenterMaskTransformerHead(in_channels, num_layers, num_heads, embed_dims, mlp_ratio=4, drop_path_rate=0.1, drop_rate=0.0, attn_drop_rate=0.0, num_fcs=2, qkv_bias=True, act_cfg={'type': 'GELU'}, norm_cfg={'type': 'LN'}, init_std=0.02, **kwargs)[源代码]

Segmenter: Transformer for Semantic Segmentation.

This head is the implementation of `Segmenter: <https://arxiv.org/abs/2105.05633>`_.

参数
  • backbone_cfg – (dict): Config of backbone of Context Path.

  • in_channels (int) – The number of channels of input image.

  • num_layers (int) – The depth of transformer.

  • num_heads (int) – The number of attention heads.

  • embed_dims (int) – The number of embedding dimension.

  • mlp_ratio (int) – ratio of mlp hidden dim to embedding dim. Default: 4.

  • drop_path_rate (float) – stochastic depth rate. Default 0.1.

  • drop_rate (float) – Probability of an element to be zeroed. Default 0.0

  • attn_drop_rate (float) – The drop out rate for attention layer. Default 0.0

  • num_fcs (int) – The number of fully-connected layers for FFNs. Default: 2.

  • qkv_bias (bool) – Enable bias for qkv if True. Default: True.

  • act_cfg (dict) – The activation config for FFNs. Default: dict(type=’GELU’).

  • norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’LN’)

  • init_std (float) – The value of std in weight initialization. Default: 0.02.

forward(inputs)[源代码]

Placeholder of forward function.

init_weights()[源代码]

Initialize the weights.

class mmseg.models.decode_heads.UPerHead(pool_scales=(1, 2, 3, 6), **kwargs)[源代码]

Unified Perceptual Parsing for Scene Understanding.

This head is the implementation of UPerNet.

参数

pool_scales (tuple[int]) – Pooling scales used in Pooling Pyramid Module applied on the last feature. Default: (1, 2, 3, 6).

forward(inputs)[源代码]

Forward function.

psp_forward(inputs)[源代码]

Forward function of PSP module.

losses

class mmseg.models.losses.Accuracy(topk=(1), thresh=None, ignore_index=None)[源代码]

Accuracy calculation module.

forward(pred, target)[源代码]

Forward function to calculate accuracy.

参数
  • pred (torch.Tensor) – Prediction of models.

  • target (torch.Tensor) – Target for each prediction.

返回

The accuracies under different topk criterions.

返回类型

tuple[float]

class mmseg.models.losses.CrossEntropyLoss(use_sigmoid=False, use_mask=False, reduction='mean', class_weight=None, loss_weight=1.0, loss_name='loss_ce', avg_non_ignore=False)[源代码]

CrossEntropyLoss.

参数
  • use_sigmoid (bool, optional) – Whether the prediction uses sigmoid of softmax. Defaults to False.

  • use_mask (bool, optional) – Whether to use mask cross entropy loss. Defaults to False.

  • reduction (str, optional) – . Defaults to ‘mean’. Options are “none”, “mean” and “sum”.

  • class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.

  • loss_weight (float, optional) – Weight of the loss. Defaults to 1.0.

  • loss_name (str, optional) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_ce’.

  • avg_non_ignore (bool) – The flag decides to whether the loss is only averaged over non-ignored targets. Default: False. New in version 0.23.0.

extra_repr()[源代码]

Extra repr.

forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, ignore_index=- 100, **kwargs)[源代码]

Forward function.

property loss_name

Loss Name.

This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name.

返回

The name of this loss item.

返回类型

str

class mmseg.models.losses.DiceLoss(smooth=1, exponent=2, reduction='mean', class_weight=None, loss_weight=1.0, ignore_index=255, loss_name='loss_dice', **kwargs)[源代码]

DiceLoss.

This loss is proposed in V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation.

参数
  • smooth (float) – A float number to smooth loss, and avoid NaN error. Default: 1

  • exponent (float) – An float number to calculate denominator value: sum{x^exponent} + sum{y^exponent}. Default: 2.

  • reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. This parameter only works when per_image is True. Default: ‘mean’.

  • class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.

  • loss_weight (float, optional) – Weight of the loss. Default to 1.0.

  • ignore_index (int | None) – The label index to be ignored. Default: 255.

  • loss_name (str, optional) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_dice’.

forward(pred, target, avg_factor=None, reduction_override=None, **kwargs)[源代码]

Defines the computation performed at every call.

Should be overridden by all subclasses.

注解

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

property loss_name

Loss Name.

This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str

class mmseg.models.losses.FocalLoss(use_sigmoid=True, gamma=2.0, alpha=0.5, reduction='mean', class_weight=None, loss_weight=1.0, loss_name='loss_focal')[源代码]
forward(pred, target, weight=None, avg_factor=None, reduction_override=None, ignore_index=255, **kwargs)[源代码]

Forward function.

参数
  • pred (torch.Tensor) – The prediction with shape (N, C) where C = number of classes, or (N, C, d_1, d_2, …, d_K) with K≥1 in the case of K-dimensional loss.

  • target (torch.Tensor) – The ground truth. If containing class indices, shape (N) where each value is 0≤targets[i]≤C−1, or (N, d_1, d_2, …, d_K) with K≥1 in the case of K-dimensional loss. If containing class probabilities, same shape as the input.

  • weight (torch.Tensor, optional) – The weight of loss for each prediction. Defaults to None.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • reduction_override (str, optional) – The reduction method used to override the original reduction method of the loss. Options are “none”, “mean” and “sum”.

  • ignore_index (int, optional) – The label index to be ignored. Default: 255

返回

The calculated loss

返回类型

torch.Tensor

property loss_name

Loss Name.

This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str

class mmseg.models.losses.LovaszLoss(loss_type='multi_class', classes='present', per_image=False, reduction='mean', class_weight=None, loss_weight=1.0, loss_name='loss_lovasz')[源代码]

LovaszLoss.

This loss is proposed in The Lovasz-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks.

参数
  • loss_type (str, optional) – Binary or multi-class loss. Default: ‘multi_class’. Options are “binary” and “multi_class”.

  • classes (str | list[int], optional) – Classes chosen to calculate loss. ‘all’ for all classes, ‘present’ for classes present in labels, or a list of classes to average. Default: ‘present’.

  • per_image (bool, optional) – If per_image is True, compute the loss per image instead of per batch. Default: False.

  • reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”. This parameter only works when per_image is True. Default: ‘mean’.

  • class_weight (list[float] | str, optional) – Weight of each class. If in str format, read them from a file. Defaults to None.

  • loss_weight (float, optional) – Weight of the loss. Defaults to 1.0.

  • loss_name (str, optional) – Name of the loss item. If you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. Defaults to ‘loss_lovasz’.

forward(cls_score, label, weight=None, avg_factor=None, reduction_override=None, **kwargs)[源代码]

Forward function.

property loss_name

Loss Name.

This function must be implemented and will return the name of this loss function. This name will be used to combine different loss items by simple sum operation. In addition, if you want this loss item to be included into the backward graph, loss_ must be the prefix of the name. :returns: The name of this loss item. :rtype: str

mmseg.models.losses.accuracy(pred, target, topk=1, thresh=None, ignore_index=None)[源代码]

Calculate accuracy according to the prediction and target.

参数
  • pred (torch.Tensor) – The model prediction, shape (N, num_class, …)

  • target (torch.Tensor) – The target of each prediction, shape (N, , …)

  • ignore_index (int | None) – The label index to be ignored. Default: None

  • topk (int | tuple[int], optional) – If the predictions in topk matches the target, the predictions will be regarded as correct ones. Defaults to 1.

  • thresh (float, optional) – If not None, predictions with scores under this threshold are considered incorrect. Default to None.

返回

If the input topk is a single integer,

the function will return a single float as accuracy. If topk is a tuple containing multiple integers, the function will return a tuple containing accuracies of each topk number.

返回类型

float | tuple[float]

mmseg.models.losses.binary_cross_entropy(pred, label, weight=None, reduction='mean', avg_factor=None, class_weight=None, ignore_index=- 100, avg_non_ignore=False, **kwargs)[源代码]

Calculate the binary CrossEntropy loss.

参数
  • pred (torch.Tensor) – The prediction with shape (N, 1).

  • label (torch.Tensor) – The learning label of the prediction. Note: In bce loss, label < 0 is invalid.

  • weight (torch.Tensor, optional) – Sample-wise loss weight.

  • reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • class_weight (list[float], optional) – The weight for each class.

  • ignore_index (int) – The label index to be ignored. Default: -100.

  • avg_non_ignore (bool) – The flag decides to whether the loss is only averaged over non-ignored targets. Default: False. New in version 0.23.0.

返回

The calculated loss

返回类型

torch.Tensor

mmseg.models.losses.cross_entropy(pred, label, weight=None, class_weight=None, reduction='mean', avg_factor=None, ignore_index=- 100, avg_non_ignore=False)[源代码]

cross_entropy. The wrapper function for F.cross_entropy()

参数
  • pred (torch.Tensor) – The prediction with shape (N, 1).

  • label (torch.Tensor) – The learning label of the prediction.

  • weight (torch.Tensor, optional) – Sample-wise loss weight. Default: None.

  • class_weight (list[float], optional) – The weight for each class. Default: None.

  • reduction (str, optional) – The method used to reduce the loss. Options are ‘none’, ‘mean’ and ‘sum’. Default: ‘mean’.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Default: None.

  • ignore_index (int) – Specifies a target value that is ignored and does not contribute to the input gradients. When avg_non_ignore `` is ``True, and the reduction is ''mean'', the loss is averaged over non-ignored targets. Defaults: -100.

  • avg_non_ignore (bool) – The flag decides to whether the loss is only averaged over non-ignored targets. Default: False. New in version 0.23.0.

mmseg.models.losses.mask_cross_entropy(pred, target, label, reduction='mean', avg_factor=None, class_weight=None, ignore_index=None, **kwargs)[源代码]

Calculate the CrossEntropy loss for masks.

参数
  • pred (torch.Tensor) – The prediction with shape (N, C), C is the number of classes.

  • target (torch.Tensor) – The learning label of the prediction.

  • label (torch.Tensor) – label indicates the class label of the mask’ corresponding object. This will be used to select the mask in the of the class which the object belongs to when the mask prediction if not class-agnostic.

  • reduction (str, optional) – The method used to reduce the loss. Options are “none”, “mean” and “sum”.

  • avg_factor (int, optional) – Average factor that is used to average the loss. Defaults to None.

  • class_weight (list[float], optional) – The weight for each class.

  • ignore_index (None) – Placeholder, to be consistent with other loss. Default: None.

返回

The calculated loss

返回类型

torch.Tensor

mmseg.models.losses.reduce_loss(loss, reduction)[源代码]

Reduce loss as specified.

参数
  • loss (Tensor) – Elementwise loss tensor.

  • reduction (str) – Options are “none”, “mean” and “sum”.

返回

Reduced loss tensor.

返回类型

Tensor

mmseg.models.losses.weight_reduce_loss(loss, weight=None, reduction='mean', avg_factor=None)[源代码]

Apply element-wise weight and reduce loss.

参数
  • loss (Tensor) – Element-wise loss.

  • weight (Tensor) – Element-wise weights.

  • reduction (str) – Same as built-in losses of PyTorch.

  • avg_factor (float) – Average factor when computing the mean of losses.

返回

Processed loss values.

返回类型

Tensor

mmseg.models.losses.weighted_loss(loss_func)[源代码]

Create a weighted version of a given loss function.

To use this decorator, the loss function must have the signature like loss_func(pred, target, **kwargs). The function only needs to compute element-wise loss without any reduction. This decorator will add weight and reduction arguments to the function. The decorated function will have the signature like loss_func(pred, target, weight=None, reduction=’mean’, avg_factor=None, **kwargs).

Example

>>> import torch
>>> @weighted_loss
>>> def l1_loss(pred, target):
>>>     return (pred - target).abs()
>>> pred = torch.Tensor([0, 2, 3])
>>> target = torch.Tensor([1, 1, 1])
>>> weight = torch.Tensor([1, 0, 1])
>>> l1_loss(pred, target)
tensor(1.3333)
>>> l1_loss(pred, target, weight)
tensor(1.)
>>> l1_loss(pred, target, reduction='none')
tensor([1., 1., 2.])
>>> l1_loss(pred, target, weight, avg_factor=2)
tensor(1.5000)
Read the Docs v: latest
Versions
latest
stable
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.