Shortcuts

Regularization

L2

class tllib.regularization.delta.L2Regularization(model)[source]

The L2 regularization of parameters \(w\) can be described as:

\[{\Omega} (w) = \dfrac{1}{2} \Vert w\Vert_2^2 ,\]
Parameters

model (torch.nn.Module) – The model to apply L2 penalty.

Shape:
  • Output: scalar.

L2-SP

class tllib.regularization.delta.SPRegularization(source_model, target_model)[source]

The SP (Starting Point) regularization from Explicit inductive bias for transfer learning with convolutional networks (ICML 2018)

The SP regularization of parameters \(w\) can be described as:

\[{\Omega} (w) = \dfrac{1}{2} \Vert w-w_0\Vert_2^2 ,\]

where \(w_0\) is the parameter vector of the model pretrained on the source problem, acting as the starting point (SP) in fine-tuning.

Parameters
Shape:
  • Output: scalar.

DELTA: DEep Learning Transfer using Feature Map with Attention

class tllib.regularization.delta.BehavioralRegularization[source]

The behavioral regularization from DELTA:DEep Learning Transfer using Feature Map with Attention for convolutional networks (ICLR 2019)

It can be described as:

\[{\Omega} (w) = \sum_{j=1}^{N} \Vert FM_j(w, \boldsymbol x)-FM_j(w^0, \boldsymbol x)\Vert_2^2 ,\]

where \(w^0\) is the parameter vector of the model pretrained on the source problem, acting as the starting point (SP) in fine-tuning, \(FM_j(w, \boldsymbol x)\) is feature maps generated from the \(j\)-th layer of the model parameterized with \(w\), given the input \(\boldsymbol x\).

Inputs:

layer_outputs_source (OrderedDict): The dictionary for source model, where the keys are layer names and the values are feature maps correspondingly.

layer_outputs_target (OrderedDict): The dictionary for target model, where the keys are layer names and the values are feature maps correspondingly.

Shape:
  • Output: scalar.

class tllib.regularization.delta.AttentionBehavioralRegularization(channel_attention)[source]

The behavioral regularization with attention from DELTA:DEep Learning Transfer using Feature Map with Attention for convolutional networks (ICLR 2019)

It can be described as:

\[{\Omega} (w) = \sum_{j=1}^{N} W_j(w) \Vert FM_j(w, \boldsymbol x)-FM_j(w^0, \boldsymbol x)\Vert_2^2 ,\]

where \(w^0\) is the parameter vector of the model pretrained on the source problem, acting as the starting point (SP) in fine-tuning. \(FM_j(w, \boldsymbol x)\) is feature maps generated from the \(j\)-th layer of the model parameterized with \(w\), given the input \(\boldsymbol x\). \(W_j(w)\) is the channel attention of the \(j\)-th layer of the model parameterized with \(w\).

Parameters

channel_attention (list) – The channel attentions of feature maps generated by each selected layer. For the layer with C channels, the channel attention is a tensor of shape [C].

Inputs:

layer_outputs_source (OrderedDict): The dictionary for source model, where the keys are layer names and the values are feature maps correspondingly.

layer_outputs_target (OrderedDict): The dictionary for target model, where the keys are layer names and the values are feature maps correspondingly.

Shape:
  • Output: scalar.

class tllib.regularization.delta.IntermediateLayerGetter(model, return_layers, keep_output=True)[source]

Wraps a model to get intermediate output values of selected layers.

Parameters
  • model (torch.nn.Module) – The model to collect intermediate layer feature maps.

  • return_layers (list) – The names of selected modules to return the output.

  • keep_output (bool) – If True, model_output contains the final model’s output, else return None. Default: True

Returns

  • An OrderedDict of intermediate outputs. The keys are selected layer names in return_layers and the values are the feature map outputs. The order is the same as return_layers.

  • The model’s final output. If keep_output is False, return None.

LWF: Learning without Forgetting

class tllib.regularization.lwf.Classifier(backbone, num_classes, head_source, head_target=None, bottleneck=None, bottleneck_dim=-1, finetune=True, pool_layer=None)[source]

A Classifier used in Learning Without Forgetting (ECCV 2016)..

Parameters
  • backbone (torch.nn.Module) – Any backbone to extract 2-d features from data.

  • num_classes (int) – Number of classes.

  • head_source (torch.nn.Module) – Classifier head of source model.

  • head_target (torch.nn.Module, optional) – Any classifier head. Use torch.nn.Linear by default

  • finetune (bool) – Whether finetune the classifier or train from scratch. Default: True

Inputs:
  • x (tensor): input data fed to backbone

Outputs:
  • y_s: predictions of source classifier head

  • y_t: predictions of target classifier head

Shape:
  • Inputs: (b, *) where b is the batch size and * means any number of additional dimensions

  • y_s: (b, N), where b is the batch size and N is the number of classes

  • y_t: (b, N), where b is the batch size and N is the number of classes

Co-Tuning

class tllib.regularization.co_tuning.CoTuningLoss[source]

The Co-Tuning loss in Co-Tuning for Transfer Learning (NIPS 2020).

Inputs:
  • input: p(y_s) predicted by source classifier.

  • target: p(y_s|y_t), where y_t is the ground truth class label in target dataset.

Shape:
  • input: (b, N_p), where b is the batch size and N_p is the number of classes in source dataset

  • target: (b, N_p), where b is the batch size and N_p is the number of classes in source dataset

  • Outputs: scalar.

class tllib.regularization.co_tuning.Relationship(data_loader, classifier, device, cache=None)[source]

Learns the category relationship p(y_s|y_t) between source dataset and target dataset.

Parameters

Bi-Tuning

class tllib.regularization.bi_tuning.BiTuning(encoder_q, encoder_k, num_classes, K=40, m=0.999, T=0.07)[source]

Bi-Tuning Module in Bi-tuning of Pre-trained Representations.

Parameters
  • encoder_q (Classifier) – Query encoder.

  • encoder_k (Classifier) – Key encoder.

  • num_classes (int) – Number of classes

  • K (int) – Queue size. Default: 40

  • m (float) – Momentum coefficient. Default: 0.999

  • T (float) – Temperature. Default: 0.07

Inputs:
  • im_q (tensor): input data fed to encoder_q

  • im_k (tensor): input data fed to encoder_k

  • labels (tensor): classification labels of input data

Outputs: y_q, logits_z, logits_y, labels_c
  • y_q: query classifier’s predictions

  • logits_z: projector’s predictions on both positive and negative samples

  • logits_y: classifier’s predictions on both positive and negative samples

  • labels_c: contrastive labels

Shape:
  • im_q, im_k: (minibatch, *) where * means, any number of additional dimensions

  • labels: (minibatch, )

  • y_q: (minibatch, num_classes)

  • logits_z: (minibatch, 1 + num_classes x K, projection_dim)

  • logits_y: (minibatch, 1 + num_classes x K, num_classes)

  • labels_c: (minibatch, 1 + num_classes x K)

BSS: Batch Spectral Shrinkage

class tllib.regularization.bss.BatchSpectralShrinkage(k=1)[source]

The regularization term in Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning (NIPS 2019).

The BSS regularization of feature matrix \(F\) can be described as:

\[L_{bss}(F) = \sum_{i=1}^{k} \sigma_{-i}^2 ,\]

where \(k\) is the number of singular values to be penalized, \(\sigma_{-i}\) is the \(i\)-th smallest singular value of feature matrix \(F\).

All the singular values of feature matrix \(F\) are computed by SVD:

\[F = U\Sigma V^T,\]

where the main diagonal elements of the singular value matrix \(\Sigma\) is \([\sigma_1, \sigma_2, ..., \sigma_b]\).

Parameters

k (int) – The number of singular values to be penalized. Default: 1

Shape:
  • Input: \((b, |\mathcal{f}|)\) where \(b\) is the batch size and \(|\mathcal{f}|\) is feature dimension.

  • Output: scalar.

Docs

Access comprehensive documentation for Transfer Learning Library

View Docs

Tutorials

Get started for Transfer Learning Library

Get Started

Paper List

Get started for transfer learning

View Resources