Training Engine¶

MMEngine defined some basic loop controllers such as epoch-based training loop (EpochBasedTrainLoop), iteration-based training loop (IterBasedTrainLoop), standard validation loop (ValLoop), and standard testing loop (TestLoop).

OpenMMLab’s algorithm libraries like MMSegmentation abstract model training, testing, and inference as Runner to handle. Users can use the default Runner in MMEngine directly or modify the Runner to meet customized needs. This document mainly introduces how users can configure existing running settings, hooks, and optimizers’ basic concepts and usage methods.

Configuring Runtime Settings¶

Configuring Training Iterations¶

Loop controllers refer to the execution process during training, validation, and testing. train_cfg, val_cfg, and test_cfg are used to build these processes in the configuration file. MMSegmentation sets commonly used training iterations in train_cfg under the configs/_base_/schedules folder. For example, to train for 80,000 iterations using the iteration-based training loop (IterBasedTrainLoop) and perform validation every 8,000 iterations, you can set it as follows:

train_cfg = dict(type='IterBasedTrainLoop', max_iters=80000, val_interval=8000)

Configuring Training Optimizers¶

Here’s an example of a SGD optimizer:

optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005),
    clip_grad=None)

OpenMMLab supports all optimizers in PyTorch. For more details, please refer to the MMEngine optimizer documentation.

It is worth emphasizing that optim_wrapper is a variable of runner, so when configuring the optimizer, the field to configure is the optim_wrapper field. For more information on using optimizers, see the Optimizer section below.

Configuring Training Parameter Schedulers¶

Before configuring the training parameter scheduler, it is recommended to first understand the basic concepts of parameter schedulers in the MMEngine documentation.

Here’s an example of a parameter scheduler. During training, a linearly changing learning rate strategy is used for warm-up in the first 1,000 iterations. After the first 1,000 iterations until the 16,000 iterations in the end, the default polynomial learning rate decay is used:

param_scheduler = [
    dict(type='LinearLR', by_epoch=False, start_factor=0.1, begin=0, end=1000),
    dict(
        type='PolyLR',
        eta_min=1e-4,
        power=0.9,
        begin=1000,
        end=160000,
        by_epoch=False,
    )
]

Note: When modifying the max_iters in train_cfg, make sure the parameters in the parameter scheduler param_scheduler are also modified accordingly.

Hook¶

Introduction¶

OpenMMLab abstracts the model training and testing process as Runner. Inserting hooks can implement the corresponding functionality needed at different training and testing stages (such as “before and after each training iter”, “before and after each validation iter”, etc.) in Runner. For more introduction on hook mechanisms, please refer to here.

Hooks used in Runner are divided into two categories:

Default hooks:

They implement essential functions during training and are defined in the configuration file by default_hooks and passed to Runner. Runner registers them through the register_default_hooks method.

Hooks have corresponding priorities; the higher the priority, the earlier the runner calls them. If the priorities are the same, the calling order is consistent with the hook registration order.

It is not recommended for users to modify the default hook priorities. Please refer to the MMEngine hooks documentation to understand the hook priority definitions.

The following are the default hooks used in MMSegmentation:

Hook	Function	Priority
IterTimerHook	Record the time spent on each iteration.	NORMAL (50)
LoggerHook	Collect log records from different components in `Runner` and output them to terminal, JSON file, tensorboard, wandb, etc.	BELOW_NORMAL (60)
ParamSchedulerHook	Update some hyperparameters in the optimizer, such as learning rate momentum.	LOW (70)
CheckpointHook	Regularly save checkpoint files.	VERY_LOW (90)
DistSamplerSeedHook	Ensure the distributed sampler shuffle is enabled.	NORMAL (50)
SegVisualizationHook	Visualize prediction results during validation and testing.	NORMAL (50)

MMSegmentation registers some hooks with essential training functions in default_hooks:

default_hooks = dict(
    timer=dict(type='IterTimerHook'),
    logger=dict(type='LoggerHook', interval=50, log_metric_by_epoch=False),
    param_scheduler=dict(type='ParamSchedulerHook'),
    checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=32000),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    visualization=dict(type='SegVisualizationHook'))

All the default hooks mentioned above, except for SegVisualizationHook, are implemented in MMEngine. The SegVisualizationHook is a hook implemented in MMSegmentation, which will be introduced later.

Modifying default hooks

We will use the logger and checkpoint in default_hooks as examples to demonstrate how to modify the default hooks in default_hooks.

(1) Model saving configuration

default_hooks uses the checkpoint field to initialize the model saving hook (CheckpointHook).

checkpoint = dict(type='CheckpointHook', interval=1)

Users can set max_keep_ckpts to save only a small number of checkpoints or use save_optimizer to determine whether to save optimizer information. More details on related parameters can be found here.

(2) Logging configuration

The LoggerHook is used to collect log information from different components in Runner and write it to terminal, JSON files, tensorboard, wandb, etc.

logger=dict(type='LoggerHook', interval=10)

In the latest 1.x version of MMSegmentation, some logger hooks (LoggerHook) such as TextLoggerHook, WandbLoggerHook, and TensorboardLoggerHook will no longer be used. Instead, MMEngine uses LogProcessor to handle the information processed by the aforementioned hooks, which are now in MessageHub, WandbVisBackend, and TensorboardVisBackend.

Detailed usage is as follows, configuring the visualizer and specifying the visualization backend at the same time, here using Tensorboard as the visualizer’s backend:

# TensorboardVisBackend
visualizer = dict(
    type='SegLocalVisualizer', vis_backends=[dict(type='TensorboardVisBackend')], name='visualizer')

For more related usage, please refer to MMEngine Visualization Backend User Tutorial.

Custom hooks

Custom hooks are defined in the configuration through custom_hooks, and Runner registers them using the register_custom_hooks method.

The priority of custom hooks needs to be set in the configuration file; if not, it will be set to NORMAL by default. The following are some custom hooks implemented in MMEngine:

Hook	Usage
EMAHook	Use Exponential Moving Average (EMA) during model training.
EmptyCacheHook	Release all GPU memory not occupied by the cache during training
SyncBuffersHook	Synchronize the parameters in the model buffer, such as `running_mean` and `running_var` in BN, at the end of each training epoch.

The following is a use case for EMAHook, where the config file includes the configuration of the implemented custom hooks as members of the custom_hooks list.

custom_hooks = [
    dict(type='EMAHook', start_iters=500, priority='NORMAL')
]

SegVisualizationHook¶

MMSegmentation implemented SegVisualizationHook, which is used to visualize prediction results during validation and testing. SegVisualizationHook overrides the _after_iter method in the base class Hook. During validation or testing, it calls the add_datasample method of visualizer to draw semantic segmentation results according to the specified iteration interval. The specific implementation is as follows:

...
@HOOKS.register_module()
class SegVisualizationHook(Hook):
...
    def _after_iter(self,
                    runner: Runner,
                    batch_idx: int,
                    data_batch: dict,
                    outputs: Sequence[SegDataSample],
                    mode: str = 'val') -> None:
...
        # If it's a training phase or self.draw is False, then skip it
        if self.draw is False or mode == 'train':
            return
...
        if self.every_n_inner_iters(batch_idx, self.interval):
            for output in outputs:
                img_path = output.img_path
                img_bytes = self.file_client.get(img_path)
                img = mmcv.imfrombytes(img_bytes, channel_order='rgb')
                window_name = f'{mode}_{osp.basename(img_path)}'

                self._visualizer.add_datasample(
                    window_name,
                    img,
                    data_sample=output,
                    show=self.show,
                    wait_time=self.wait_time,
                    step=runner.iter)

For more details about visualization, you can check here.

Optimizer¶

In the previous configuration and runtime settings, we provided a simple example of configuring the training optimizer. This section will further detailly introduce how to configure optimizers in MMSegmentation.

Optimizer Wrapper¶

OpenMMLab 2.0 introduces an optimizer wrapper that supports different training strategies, including mixed-precision training, gradient accumulation, and gradient clipping. Users can choose the appropriate training strategy according to their needs. The optimizer wrapper also defines a standard parameter update process, allowing users to switch between different training strategies within the same code. For more information, please refer to the MMEngine optimizer wrapper documentation.

Here are some common usage methods in MMSegmentation:

Configuring PyTorch Supported Optimizers¶

OpenMMLab 2.0 supports all native PyTorch optimizers, as referenced here.

To set the optimizer used by the Runner during training in the configuration file, you need to define optim_wrapper instead of optimizer. Below is an example of configuring an optimizer during training:

optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005),
    clip_grad=None)

Configuring Gradient Clipping¶

When the model training requires gradient clipping, you can configure it as shown in the following example:

optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optim_wrapper = dict(type='OptimWrapper', optimizer=optimizer,
                        clip_grad=dict(max_norm=0.01, norm_type=2))

Here, max_norm refers to the maximum value of the gradient after clipping, and norm_type refers to the norm used when clipping the gradient. Related methods can be found in torch.nn.utils.clip_grad_norm_.

Configuring Mixed Precision Training¶

When mixed precision training is needed to reduce memory usage, you can use AmpOptimWrapper. The specific configuration is as follows:

optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001)
optim_wrapper = dict(type='AmpOptimWrapper', optimizer=optimizer)

The default setting for loss_scale in AmpOptimWrapper is dynamic.

Configuring Hyperparameters for Different Layers of the Model Network¶

In model training, if you want to set different optimization strategies for different parameters in the optimizer, such as setting different learning rates, weight decay, and other hyperparameters, you can achieve this by setting paramwise_cfg in the optim_wrapper of the configuration file.

The following config file uses the ViT optim_wrapper as an example to introduce the use of paramwise_cfg parameters. During training, the weight decay parameter coefficients for the pos_embed, mask_token, and norm modules are set to 0. That is, during training, the weight decay for these modules will be changed to weight_decay * decay_mult=0.

optimizer = dict(
        type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01)
optim_wrapper = dict(
    type='OptimWrapper',
    optimizer=optimizer,
    paramwise_cfg=dict(
        custom_keys={
            'pos_embed': dict(decay_mult=0.),
            'cls_token': dict(decay_mult=0.),
            'norm': dict(decay_mult=0.)
        }))

Here, decay_mult refers to the weight decay coefficient for the corresponding parameters. For more information on the usage of paramwise_cfg, please refer to the MMEngine optimizer wrapper documentation.

Optimizer Wrapper Constructor¶

The default optimizer wrapper constructor DefaultOptimWrapperConstructor builds the optimizer used in training based on the input optim_wrapper and paramwise_cfg defined in the optim_wrapper. When the functionality of DefaultOptimWrapperConstructor does not meet the requirements, you can customize the optimizer wrapper constructor to implement the configuration of hyperparameters.

MMSegmentation has implemented the LearningRateDecayOptimizerConstructor, which can decay the learning rate of model parameters in the backbone networks of ConvNeXt, BEiT, and MAE models during training according to the defined decay ratio (decay_rate). The configuration in the configuration file is as follows:

optim_wrapper = dict(
    _delete_=True,
    type='AmpOptimWrapper',
    optimizer=dict(
        type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.05),
    paramwise_cfg={
        'decay_rate': 0.9,
        'decay_type': 'stage_wise',
        'num_layers': 12
    },
    constructor='LearningRateDecayOptimizerConstructor',
    loss_scale='dynamic')

The purpose of _delete_=True is to ignore the inherited configuration in the OpenMMLab Config. In this code snippet, the inherited optim_wrapper configuration is ignored. For more information on _delete_ fields, please refer to the MMEngine documentation.