Regularization¶
L2¶

class
tllib.regularization.delta.
L2Regularization
(model)[source]¶ The L2 regularization of parameters \(w\) can be described as:
\[{\Omega} (w) = \dfrac{1}{2} \Vert w\Vert_2^2 ,\] Parameters
model (torch.nn.Module) – The model to apply L2 penalty.
 Shape:
Output: scalar.
L2SP¶

class
tllib.regularization.delta.
SPRegularization
(source_model, target_model)[source]¶ The SP (Starting Point) regularization from Explicit inductive bias for transfer learning with convolutional networks (ICML 2018)
The SP regularization of parameters \(w\) can be described as:
\[{\Omega} (w) = \dfrac{1}{2} \Vert ww_0\Vert_2^2 ,\]where \(w_0\) is the parameter vector of the model pretrained on the source problem, acting as the starting point (SP) in finetuning.
 Parameters
source_model (torch.nn.Module) – The source (starting point) model.
target_model (torch.nn.Module) – The target (finetuning) model.
 Shape:
Output: scalar.
DELTA: DEep Learning Transfer using Feature Map with Attention¶

class
tllib.regularization.delta.
BehavioralRegularization
[source]¶ The behavioral regularization from DELTA:DEep Learning Transfer using Feature Map with Attention for convolutional networks (ICLR 2019)
It can be described as:
\[{\Omega} (w) = \sum_{j=1}^{N} \Vert FM_j(w, \boldsymbol x)FM_j(w^0, \boldsymbol x)\Vert_2^2 ,\]where \(w^0\) is the parameter vector of the model pretrained on the source problem, acting as the starting point (SP) in finetuning, \(FM_j(w, \boldsymbol x)\) is feature maps generated from the \(j\)th layer of the model parameterized with \(w\), given the input \(\boldsymbol x\).
 Inputs:
layer_outputs_source (OrderedDict): The dictionary for source model, where the keys are layer names and the values are feature maps correspondingly.
layer_outputs_target (OrderedDict): The dictionary for target model, where the keys are layer names and the values are feature maps correspondingly.
 Shape:
Output: scalar.

class
tllib.regularization.delta.
AttentionBehavioralRegularization
(channel_attention)[source]¶ The behavioral regularization with attention from DELTA:DEep Learning Transfer using Feature Map with Attention for convolutional networks (ICLR 2019)
It can be described as:
\[{\Omega} (w) = \sum_{j=1}^{N} W_j(w) \Vert FM_j(w, \boldsymbol x)FM_j(w^0, \boldsymbol x)\Vert_2^2 ,\]where \(w^0\) is the parameter vector of the model pretrained on the source problem, acting as the starting point (SP) in finetuning. \(FM_j(w, \boldsymbol x)\) is feature maps generated from the \(j\)th layer of the model parameterized with \(w\), given the input \(\boldsymbol x\). \(W_j(w)\) is the channel attention of the \(j\)th layer of the model parameterized with \(w\).
 Parameters
channel_attention (list) – The channel attentions of feature maps generated by each selected layer. For the layer with C channels, the channel attention is a tensor of shape [C].
 Inputs:
layer_outputs_source (OrderedDict): The dictionary for source model, where the keys are layer names and the values are feature maps correspondingly.
layer_outputs_target (OrderedDict): The dictionary for target model, where the keys are layer names and the values are feature maps correspondingly.
 Shape:
Output: scalar.

class
tllib.regularization.delta.
IntermediateLayerGetter
(model, return_layers, keep_output=True)[source]¶ Wraps a model to get intermediate output values of selected layers.
 Parameters
model (torch.nn.Module) – The model to collect intermediate layer feature maps.
return_layers (list) – The names of selected modules to return the output.
keep_output (bool) – If True, model_output contains the final model’s output, else return None. Default: True
 Returns
An OrderedDict of intermediate outputs. The keys are selected layer names in return_layers and the values are the feature map outputs. The order is the same as return_layers.
The model’s final output. If keep_output is False, return None.
LWF: Learning without Forgetting¶

class
tllib.regularization.lwf.
Classifier
(backbone, num_classes, head_source, head_target=None, bottleneck=None, bottleneck_dim=1, finetune=True, pool_layer=None)[source]¶ A Classifier used in Learning Without Forgetting (ECCV 2016)..
 Parameters
backbone (torch.nn.Module) – Any backbone to extract 2d features from data.
num_classes (int) – Number of classes.
head_source (torch.nn.Module) – Classifier head of source model.
head_target (torch.nn.Module, optional) – Any classifier head. Use
torch.nn.Linear
by defaultfinetune (bool) – Whether finetune the classifier or train from scratch. Default: True
 Inputs:
x (tensor): input data fed to backbone
 Outputs:
y_s: predictions of source classifier head
y_t: predictions of target classifier head
 Shape:
Inputs: (b, *) where b is the batch size and * means any number of additional dimensions
y_s: (b, N), where b is the batch size and N is the number of classes
y_t: (b, N), where b is the batch size and N is the number of classes
CoTuning¶

class
tllib.regularization.co_tuning.
CoTuningLoss
[source]¶ The CoTuning loss in CoTuning for Transfer Learning (NIPS 2020).
 Inputs:
input: p(y_s) predicted by source classifier.
target: p(y_sy_t), where y_t is the ground truth class label in target dataset.
 Shape:
input: (b, N_p), where b is the batch size and N_p is the number of classes in source dataset
target: (b, N_p), where b is the batch size and N_p is the number of classes in source dataset
Outputs: scalar.

class
tllib.regularization.co_tuning.
Relationship
(data_loader, classifier, device, cache=None)[source]¶ Learns the category relationship p(y_sy_t) between source dataset and target dataset.
 Parameters
data_loader (torch.utils.data.DataLoader) – A data loader of target dataset.
classifier (torch.nn.Module) – A classifier for CoTuning.
device (torch.nn.Module) – The device to run classifier.
cache (str, optional) – Path to find and save the relationship file.
BiTuning¶

class
tllib.regularization.bi_tuning.
BiTuning
(encoder_q, encoder_k, num_classes, K=40, m=0.999, T=0.07)[source]¶ BiTuning Module in Bituning of Pretrained Representations.
 Parameters
encoder_q (Classifier) – Query encoder.
encoder_k (Classifier) – Key encoder.
num_classes (int) – Number of classes
K (int) – Queue size. Default: 40
m (float) – Momentum coefficient. Default: 0.999
T (float) – Temperature. Default: 0.07
 Inputs:
im_q (tensor): input data fed to encoder_q
im_k (tensor): input data fed to encoder_k
labels (tensor): classification labels of input data
 Outputs: y_q, logits_z, logits_y, labels_c
y_q: query classifier’s predictions
logits_z: projector’s predictions on both positive and negative samples
logits_y: classifier’s predictions on both positive and negative samples
labels_c: contrastive labels
 Shape:
im_q, im_k: (minibatch, *) where * means, any number of additional dimensions
labels: (minibatch, )
y_q: (minibatch, num_classes)
logits_z: (minibatch, 1 + num_classes x K, projection_dim)
logits_y: (minibatch, 1 + num_classes x K, num_classes)
labels_c: (minibatch, 1 + num_classes x K)
BSS: Batch Spectral Shrinkage¶

class
tllib.regularization.bss.
BatchSpectralShrinkage
(k=1)[source]¶ The regularization term in Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning (NIPS 2019).
The BSS regularization of feature matrix \(F\) can be described as:
\[L_{bss}(F) = \sum_{i=1}^{k} \sigma_{i}^2 ,\]where \(k\) is the number of singular values to be penalized, \(\sigma_{i}\) is the \(i\)th smallest singular value of feature matrix \(F\).
All the singular values of feature matrix \(F\) are computed by SVD:
\[F = U\Sigma V^T,\]where the main diagonal elements of the singular value matrix \(\Sigma\) is \([\sigma_1, \sigma_2, ..., \sigma_b]\).
 Parameters
k (int) – The number of singular values to be penalized. Default: 1
 Shape:
Input: \((b, \mathcal{f})\) where \(b\) is the batch size and \(\mathcal{f}\) is feature dimension.
Output: scalar.