# Regularization¶

## L2¶

class tllib.regularization.delta.L2Regularization(model)[source]

The L2 regularization of parameters $$w$$ can be described as:

${\Omega} (w) = \dfrac{1}{2} \Vert w\Vert_2^2 ,$
Parameters

model (torch.nn.Module) – The model to apply L2 penalty.

Shape:
• Output: scalar.

## L2-SP¶

class tllib.regularization.delta.SPRegularization(source_model, target_model)[source]

The SP (Starting Point) regularization from Explicit inductive bias for transfer learning with convolutional networks (ICML 2018)

The SP regularization of parameters $$w$$ can be described as:

${\Omega} (w) = \dfrac{1}{2} \Vert w-w_0\Vert_2^2 ,$

where $$w_0$$ is the parameter vector of the model pretrained on the source problem, acting as the starting point (SP) in fine-tuning.

Parameters
Shape:
• Output: scalar.

## DELTA: DEep Learning Transfer using Feature Map with Attention¶

class tllib.regularization.delta.BehavioralRegularization[source]

The behavioral regularization from DELTA:DEep Learning Transfer using Feature Map with Attention for convolutional networks (ICLR 2019)

It can be described as:

${\Omega} (w) = \sum_{j=1}^{N} \Vert FM_j(w, \boldsymbol x)-FM_j(w^0, \boldsymbol x)\Vert_2^2 ,$

where $$w^0$$ is the parameter vector of the model pretrained on the source problem, acting as the starting point (SP) in fine-tuning, $$FM_j(w, \boldsymbol x)$$ is feature maps generated from the $$j$$-th layer of the model parameterized with $$w$$, given the input $$\boldsymbol x$$.

Inputs:

layer_outputs_source (OrderedDict): The dictionary for source model, where the keys are layer names and the values are feature maps correspondingly.

layer_outputs_target (OrderedDict): The dictionary for target model, where the keys are layer names and the values are feature maps correspondingly.

Shape:
• Output: scalar.

class tllib.regularization.delta.AttentionBehavioralRegularization(channel_attention)[source]

The behavioral regularization with attention from DELTA:DEep Learning Transfer using Feature Map with Attention for convolutional networks (ICLR 2019)

It can be described as:

${\Omega} (w) = \sum_{j=1}^{N} W_j(w) \Vert FM_j(w, \boldsymbol x)-FM_j(w^0, \boldsymbol x)\Vert_2^2 ,$

where $$w^0$$ is the parameter vector of the model pretrained on the source problem, acting as the starting point (SP) in fine-tuning. $$FM_j(w, \boldsymbol x)$$ is feature maps generated from the $$j$$-th layer of the model parameterized with $$w$$, given the input $$\boldsymbol x$$. $$W_j(w)$$ is the channel attention of the $$j$$-th layer of the model parameterized with $$w$$.

Parameters

channel_attention (list) – The channel attentions of feature maps generated by each selected layer. For the layer with C channels, the channel attention is a tensor of shape [C].

Inputs:

layer_outputs_source (OrderedDict): The dictionary for source model, where the keys are layer names and the values are feature maps correspondingly.

layer_outputs_target (OrderedDict): The dictionary for target model, where the keys are layer names and the values are feature maps correspondingly.

Shape:
• Output: scalar.

class tllib.regularization.delta.IntermediateLayerGetter(model, return_layers, keep_output=True)[source]

Wraps a model to get intermediate output values of selected layers.

Parameters
• model (torch.nn.Module) – The model to collect intermediate layer feature maps.

• return_layers (list) – The names of selected modules to return the output.

• keep_output (bool) – If True, model_output contains the final model’s output, else return None. Default: True

Returns

• An OrderedDict of intermediate outputs. The keys are selected layer names in return_layers and the values are the feature map outputs. The order is the same as return_layers.

• The model’s final output. If keep_output is False, return None.

## LWF: Learning without Forgetting¶

class tllib.regularization.lwf.Classifier(backbone, num_classes, head_source, head_target=None, bottleneck=None, bottleneck_dim=-1, finetune=True, pool_layer=None)[source]

A Classifier used in Learning Without Forgetting (ECCV 2016)..

Parameters
Inputs:
• x (tensor): input data fed to backbone

Outputs:
• y_s: predictions of source classifier head

• y_t: predictions of target classifier head

Shape:
• Inputs: (b, *) where b is the batch size and * means any number of additional dimensions

• y_s: (b, N), where b is the batch size and N is the number of classes

• y_t: (b, N), where b is the batch size and N is the number of classes

## Co-Tuning¶

class tllib.regularization.co_tuning.CoTuningLoss[source]

The Co-Tuning loss in Co-Tuning for Transfer Learning (NIPS 2020).

Inputs:
• input: p(y_s) predicted by source classifier.

• target: p(y_s|y_t), where y_t is the ground truth class label in target dataset.

Shape:
• input: (b, N_p), where b is the batch size and N_p is the number of classes in source dataset

• target: (b, N_p), where b is the batch size and N_p is the number of classes in source dataset

• Outputs: scalar.

class tllib.regularization.co_tuning.Relationship(data_loader, classifier, device, cache=None)[source]

Learns the category relationship p(y_s|y_t) between source dataset and target dataset.

Parameters

## Bi-Tuning¶

class tllib.regularization.bi_tuning.BiTuning(encoder_q, encoder_k, num_classes, K=40, m=0.999, T=0.07)[source]

Bi-Tuning Module in Bi-tuning of Pre-trained Representations.

Parameters
• encoder_q (Classifier) – Query encoder.

• encoder_k (Classifier) – Key encoder.

• num_classes (int) – Number of classes

• K (int) – Queue size. Default: 40

• m (float) – Momentum coefficient. Default: 0.999

• T (float) – Temperature. Default: 0.07

Inputs:
• im_q (tensor): input data fed to encoder_q

• im_k (tensor): input data fed to encoder_k

• labels (tensor): classification labels of input data

Outputs: y_q, logits_z, logits_y, labels_c
• y_q: query classifier’s predictions

• logits_z: projector’s predictions on both positive and negative samples

• logits_y: classifier’s predictions on both positive and negative samples

• labels_c: contrastive labels

Shape:
• im_q, im_k: (minibatch, *) where * means, any number of additional dimensions

• labels: (minibatch, )

• y_q: (minibatch, num_classes)

• logits_z: (minibatch, 1 + num_classes x K, projection_dim)

• logits_y: (minibatch, 1 + num_classes x K, num_classes)

• labels_c: (minibatch, 1 + num_classes x K)

## BSS: Batch Spectral Shrinkage¶

class tllib.regularization.bss.BatchSpectralShrinkage(k=1)[source]

The BSS regularization of feature matrix $$F$$ can be described as:

$L_{bss}(F) = \sum_{i=1}^{k} \sigma_{-i}^2 ,$

where $$k$$ is the number of singular values to be penalized, $$\sigma_{-i}$$ is the $$i$$-th smallest singular value of feature matrix $$F$$.

All the singular values of feature matrix $$F$$ are computed by SVD:

$F = U\Sigma V^T,$

where the main diagonal elements of the singular value matrix $$\Sigma$$ is $$[\sigma_1, \sigma_2, ..., \sigma_b]$$.

Parameters

k (int) – The number of singular values to be penalized. Default: 1

Shape:
• Input: $$(b, |\mathcal{f}|)$$ where $$b$$ is the batch size and $$|\mathcal{f}|$$ is feature dimension.

• Output: scalar.