Normalization¶

class tllib.normalization.afn.AdaptiveFeatureNorm(delta)[source]

Instead of using restrictive scalar R to match the corresponding feature norm, Stepwise Adaptive Feature Norm is used in order to learn task-specific features with large norms in a progressive manner. We denote parameters of backbone $$G$$ as $$\theta_g$$, parameters of bottleneck $$F_f$$ as $$\theta_f$$ , parameters of classifier head $$F_y$$ as $$\theta_y$$, and features extracted from sample $$x_i$$ as $$h(x_i;\theta)$$. Full loss is calculated as follows

$\begin{split}L(\theta_g,\theta_f,\theta_y)=\frac{1}{n_s}\sum_{(x_i,y_i)\in D_s}L_y(x_i,y_i)+\frac{\lambda}{n_s+n_t} \sum_{x_i\in D_s\cup D_t}L_d(h(x_i;\theta_0)+\Delta_r,h(x_i;\theta))\\\end{split}$

where $$L_y$$ denotes classification loss, $$L_d$$ denotes norm loss, $$\theta_0$$ and $$\theta$$ represent the updated and updating model parameters in the last and current iterations respectively.

Parameters

delta (float) – positive residual scalar to control the feature norm enlargement.

Inputs:
• f (tensor): feature representations on source or target domain.

Shape:
• f: $$(N, F)$$ where F means the dimension of input features.

• Outputs: scalar.

Examples:

>>> adaptive_feature_norm = AdaptiveFeatureNorm(delta=1)
>>> f_s = torch.randn(32, 1000)
>>> f_t = torch.randn(32, 1000)

class tllib.normalization.afn.Block(in_features, bottleneck_dim=1000, dropout_p=0.5)[source]

Basic building block for Image Classifier with structure: FC-BN-ReLU-Dropout. We use $$L_2$$ preserved dropout layers. Given mask probability $$p$$, input $$x_k$$, generated mask $$a_k$$, vanilla dropout layers calculate

$\begin{split}\hat{x}_k = a_k\frac{1}{1-p}x_k\\\end{split}$

While in $$L_2$$ preserved dropout layers

$\begin{split}\hat{x}_k = a_k\frac{1}{\sqrt{1-p}}x_k\\\end{split}$
Parameters
• in_features (int) – Dimension of input features

• bottleneck_dim (int, optional) – Feature dimension of the bottleneck layer. Default: 1000

• dropout_p (float, optional) – dropout probability. Default: 0.5

class tllib.normalization.afn.ImageClassifier(backbone, num_classes, num_blocks=1, bottleneck_dim=1000, dropout_p=0.5, **kwargs)[source]

ImageClassifier for AFN.

Parameters
• backbone (torch.nn.Module) – Any backbone to extract 2-d features from data

• num_classes (int) – Number of classes

• num_blocks (int, optional) – Number of basic blocks. Default: 1

• bottleneck_dim (int, optional) – Feature dimension of the bottleneck layer. Default: 1000

• dropout_p (float, optional) – dropout probability. Default: 0.5

StochNorm: Stochastic Normalization¶

class tllib.normalization.stochnorm.StochNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, p=0.5)[source]

Applies Stochastic Normalization over a 2D or 3D input (a mini-batch of 1D inputs with optional additional channel dimension)

Stochastic Normalization is proposed in Stochastic Normalization (NIPS 2020)

\begin{align}\begin{aligned}\hat{x}_{i,0} = \frac{x_i - \tilde{\mu}}{ \sqrt{\tilde{\sigma} + \epsilon}}\\\hat{x}_{i,1} = \frac{x_i - \mu}{ \sqrt{\sigma + \epsilon}}\\\hat{x}_i = (1-s)\cdot \hat{x}_{i,0} + s\cdot \hat{x}_{i,1}\\ y_i = \gamma \hat{x}_i + \beta\end{aligned}\end{align}

where $$\mu$$ and $$\sigma$$ are mean and variance of current mini-batch data.

$$\tilde{\mu}$$ and $$\tilde{\sigma}$$ are current moving statistics of training data.

$$s$$ is a branch-selection variable generated from a Bernoulli distribution, where $$P(s=1)=p$$.

During training, there are two normalization branches. One uses mean and variance of current mini-batch data, while the other uses current moving statistics of the training data as usual batch normalization.

During evaluation, the moving statistics is used for normalization.

Parameters
• num_features (int) – $$c$$ from an expected input of size $$(b, c, l)$$ or $$l$$ from an expected input of size $$(b, l)$$.

• eps (float) – A value added to the denominator for numerical stability. Default: 1e-5

• momentum (float) – The value used for the running_mean and running_var computation. Default: 0.1

• affine (bool) – A boolean value that when set to True, gives the layer learnable affine parameters. Default: True

• track_running_stats (bool) – A boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics, and initializes statistics buffers running_mean and running_var as None. When these buffers are None, this module always uses batch statistics in both training and eval modes. Default: True p (float): The probability to choose the second branch (usual BN). Default: 0.5

Shape:
• Input: $$(b, l)$$ or $$(b, c, l)$$

• Output: $$(b, l)$$ or $$(b, c, l)$$ (same shape as input)

class tllib.normalization.stochnorm.StochNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, p=0.5)[source]

Applies Stochastic Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension)

Stochastic Normalization is proposed in Stochastic Normalization (NIPS 2020)

\begin{align}\begin{aligned}\hat{x}_{i,0} = \frac{x_i - \tilde{\mu}}{ \sqrt{\tilde{\sigma} + \epsilon}}\\\hat{x}_{i,1} = \frac{x_i - \mu}{ \sqrt{\sigma + \epsilon}}\\\hat{x}_i = (1-s)\cdot \hat{x}_{i,0} + s\cdot \hat{x}_{i,1}\\ y_i = \gamma \hat{x}_i + \beta\end{aligned}\end{align}

where $$\mu$$ and $$\sigma$$ are mean and variance of current mini-batch data.

$$\tilde{\mu}$$ and $$\tilde{\sigma}$$ are current moving statistics of training data.

$$s$$ is a branch-selection variable generated from a Bernoulli distribution, where $$P(s=1)=p$$.

During training, there are two normalization branches. One uses mean and variance of current mini-batch data, while the other uses current moving statistics of the training data as usual batch normalization.

During evaluation, the moving statistics is used for normalization.

Parameters
• num_features (int) – $$c$$ from an expected input of size $$(b, c, h, w)$$.

• eps (float) – A value added to the denominator for numerical stability. Default: 1e-5

• momentum (float) – The value used for the running_mean and running_var computation. Default: 0.1

• affine (bool) – A boolean value that when set to True, gives the layer learnable affine parameters. Default: True

• track_running_stats (bool) – A boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics, and initializes statistics buffers running_mean and running_var as None. When these buffers are None, this module always uses batch statistics in both training and eval modes. Default: True p (float): The probability to choose the second branch (usual BN). Default: 0.5

Shape:
• Input: $$(b, c, h, w)$$

• Output: $$(b, c, h, w)$$ (same shape as input)

class tllib.normalization.stochnorm.StochNorm3d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, p=0.5)[source]

Applies Stochastic Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension)

Stochastic Normalization is proposed in Stochastic Normalization (NIPS 2020)

\begin{align}\begin{aligned}\hat{x}_{i,0} = \frac{x_i - \tilde{\mu}}{ \sqrt{\tilde{\sigma} + \epsilon}}\\\hat{x}_{i,1} = \frac{x_i - \mu}{ \sqrt{\sigma + \epsilon}}\\\hat{x}_i = (1-s)\cdot \hat{x}_{i,0} + s\cdot \hat{x}_{i,1}\\ y_i = \gamma \hat{x}_i + \beta\end{aligned}\end{align}

where $$\mu$$ and $$\sigma$$ are mean and variance of current mini-batch data.

$$\tilde{\mu}$$ and $$\tilde{\sigma}$$ are current moving statistics of training data.

$$s$$ is a branch-selection variable generated from a Bernoulli distribution, where $$P(s=1)=p$$.

During training, there are two normalization branches. One uses mean and variance of current mini-batch data, while the other uses current moving statistics of the training data as usual batch normalization.

During evaluation, the moving statistics is used for normalization.

Parameters
• num_features (int) – $$c$$ from an expected input of size $$(b, c, d, h, w)$$

• eps (float) – A value added to the denominator for numerical stability. Default: 1e-5

• momentum (float) – The value used for the running_mean and running_var computation. Default: 0.1

• affine (bool) – A boolean value that when set to True, gives the layer learnable affine parameters. Default: True

• track_running_stats (bool) – A boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics, and initializes statistics buffers running_mean and running_var as None. When these buffers are None, this module always uses batch statistics in both training and eval modes. Default: True p (float): The probability to choose the second branch (usual BN). Default: 0.5

Shape:
• Input: $$(b, c, d, h, w)$$

• Output: $$(b, c, d, h, w)$$ (same shape as input)

tllib.normalization.stochnorm.convert_model(module, p)[source]

Traverses the input module and its child recursively and replaces all instance of BatchNorm to StochNorm.

Parameters
• module (torch.nn.Module) – The input module needs to be convert to StochNorm model.

• p (float) – The hyper-parameter for StochNorm layer.

Returns

The module converted to StochNorm version.

IBN-Net: Instance-Batch Normalization Network¶

class tllib.normalization.ibn.InstanceBatchNorm2d(planes, ratio=0.5)[source]

Instance-Batch Normalization layer from Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net (ECCV 2018).

Given input feature map $$f\_input$$ of dimension $$(C,H,W)$$, we first split $$f\_input$$ into two parts along channel dimension. They are denoted as $$f_1$$ of dimension $$(C_1,H,W)$$ and $$f_2$$ of dimension $$(C_2,H,W)$$, where $$C_1+C_2=C$$. Then we pass $$f_1$$ and $$f_2$$ through IN and BN layer, respectively, to get $$IN(f_1)$$ and $$BN(f_2)$$. Last, we concat them along channel dimension to create $$f\_output=concat(IN(f_1), BN(f_2))$$.

Parameters
• planes (int) – Number of channels for the input tensor

• ratio (float) – Ratio of instance normalization in the IBN layer

class tllib.normalization.ibn.IBNNet(block, layers, ibn_cfg=('a', 'a', 'a', None))[source]

IBNNet without fully connected layer

property out_features

The dimension of output features

Modified from https://github.com/XingangPan/IBN-Net @author: Baixu Chen @contact: cbx_99_hasta@outlook.com

tllib.normalization.ibn.resnet18_ibn_a(pretrained=False)[source]

Constructs a ResNet-18-IBN-a model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

tllib.normalization.ibn.resnet18_ibn_b(pretrained=False)[source]

Constructs a ResNet-18-IBN-b model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

tllib.normalization.ibn.resnet34_ibn_a(pretrained=False)[source]

Constructs a ResNet-34-IBN-a model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

tllib.normalization.ibn.resnet34_ibn_b(pretrained=False)[source]

Constructs a ResNet-34-IBN-b model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

tllib.normalization.ibn.resnet50_ibn_a(pretrained=False)[source]

Constructs a ResNet-50-IBN-a model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

tllib.normalization.ibn.resnet50_ibn_b(pretrained=False)[source]

Constructs a ResNet-50-IBN-b model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

tllib.normalization.ibn.resnet101_ibn_a(pretrained=False)[source]

Constructs a ResNet-101-IBN-a model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

tllib.normalization.ibn.resnet101_ibn_b(pretrained=False)[source]

Constructs a ResNet-101-IBN-b model.

Parameters

pretrained (bool) – If True, returns a model pre-trained on ImageNet

MixStyle: Domain Generalization with MixStyle¶

class tllib.normalization.mixstyle.MixStyle(p=0.5, alpha=0.1, eps=1e-06)[source]

MixStyle module from DOMAIN GENERALIZATION WITH MIXSTYLE (ICLR 2021). Given input $$x$$, we first compute mean $$\mu(x)$$ and standard deviation $$\sigma(x)$$ across spatial dimension. Then we permute $$x$$ and get $$\tilde{x}$$, corresponding mean $$\mu(\tilde{x})$$ and standard deviation $$\sigma(\tilde{x})$$. MixUp is performed using mean and standard deviation

$\gamma_{mix} = \lambda\sigma(x) + (1-\lambda)\sigma(\tilde{x})$
$\beta_{mix} = \lambda\mu(x) + (1-\lambda)\mu(\tilde{x})$

where $$\lambda$$ is instance-wise weight sampled from Beta distribution. MixStyle is then

$MixStyle(x) = \gamma_{mix}\frac{x-\mu(x)}{\sigma(x)} + \beta_{mix}$
Parameters
• p (float) – probability of using MixStyle.

• alpha (float) – parameter of the Beta distribution.

• eps (float) – scaling parameter to avoid numerical issues.

Note

MixStyle is only activated during training stage, with some probability $$p$$.

@author: Baixu Chen @contact: cbx_99_hasta@outlook.com

tllib.normalization.mixstyle.resnet.resnet18(pretrained=False, progress=True, **kwargs)[source]

Constructs a ResNet-18 model with MixStyle.

Parameters
• pretrained (bool) – If True, returns a model pre-trained on ImageNet

• progress (bool) – If True, displays a progress bar of the download to stderr

tllib.normalization.mixstyle.resnet.resnet34(pretrained=False, progress=True, **kwargs)[source]

Constructs a ResNet-34 model with MixStyle.

Parameters
• pretrained (bool) – If True, returns a model pre-trained on ImageNet

• progress (bool) – If True, displays a progress bar of the download to stderr

tllib.normalization.mixstyle.resnet.resnet50(pretrained=False, progress=True, **kwargs)[source]

Constructs a ResNet-50 model with MixStyle.

Parameters
• pretrained (bool) – If True, returns a model pre-trained on ImageNet

• progress (bool) – If True, displays a progress bar of the download to stderr

tllib.normalization.mixstyle.resnet.resnet101(pretrained=False, progress=True, **kwargs)[source]

Constructs a ResNet-101 model with MixStyle.

Parameters
• pretrained (bool) – If True, returns a model pre-trained on ImageNet

• progress (bool) – If True, displays a progress bar of the download to stderr