Regularization¶
L2¶
-
class
tllib.regularization.delta.
L2Regularization
(model)[source]¶ The L2 regularization of parameters \(w\) can be described as:
\[{\Omega} (w) = \dfrac{1}{2} \Vert w\Vert_2^2 ,\]- Parameters
model (torch.nn.Module) – The model to apply L2 penalty.
- Shape:
Output: scalar.
L2-SP¶
-
class
tllib.regularization.delta.
SPRegularization
(source_model, target_model)[source]¶ The SP (Starting Point) regularization from Explicit inductive bias for transfer learning with convolutional networks (ICML 2018)
The SP regularization of parameters \(w\) can be described as:
\[{\Omega} (w) = \dfrac{1}{2} \Vert w-w_0\Vert_2^2 ,\]where \(w_0\) is the parameter vector of the model pretrained on the source problem, acting as the starting point (SP) in fine-tuning.
- Parameters
source_model (torch.nn.Module) – The source (starting point) model.
target_model (torch.nn.Module) – The target (fine-tuning) model.
- Shape:
Output: scalar.
DELTA: DEep Learning Transfer using Feature Map with Attention¶
-
class
tllib.regularization.delta.
BehavioralRegularization
[source]¶ The behavioral regularization from DELTA:DEep Learning Transfer using Feature Map with Attention for convolutional networks (ICLR 2019)
It can be described as:
\[{\Omega} (w) = \sum_{j=1}^{N} \Vert FM_j(w, \boldsymbol x)-FM_j(w^0, \boldsymbol x)\Vert_2^2 ,\]where \(w^0\) is the parameter vector of the model pretrained on the source problem, acting as the starting point (SP) in fine-tuning, \(FM_j(w, \boldsymbol x)\) is feature maps generated from the \(j\)-th layer of the model parameterized with \(w\), given the input \(\boldsymbol x\).
- Inputs:
layer_outputs_source (OrderedDict): The dictionary for source model, where the keys are layer names and the values are feature maps correspondingly.
layer_outputs_target (OrderedDict): The dictionary for target model, where the keys are layer names and the values are feature maps correspondingly.
- Shape:
Output: scalar.
-
class
tllib.regularization.delta.
AttentionBehavioralRegularization
(channel_attention)[source]¶ The behavioral regularization with attention from DELTA:DEep Learning Transfer using Feature Map with Attention for convolutional networks (ICLR 2019)
It can be described as:
\[{\Omega} (w) = \sum_{j=1}^{N} W_j(w) \Vert FM_j(w, \boldsymbol x)-FM_j(w^0, \boldsymbol x)\Vert_2^2 ,\]where \(w^0\) is the parameter vector of the model pretrained on the source problem, acting as the starting point (SP) in fine-tuning. \(FM_j(w, \boldsymbol x)\) is feature maps generated from the \(j\)-th layer of the model parameterized with \(w\), given the input \(\boldsymbol x\). \(W_j(w)\) is the channel attention of the \(j\)-th layer of the model parameterized with \(w\).
- Parameters
channel_attention (list) – The channel attentions of feature maps generated by each selected layer. For the layer with C channels, the channel attention is a tensor of shape [C].
- Inputs:
layer_outputs_source (OrderedDict): The dictionary for source model, where the keys are layer names and the values are feature maps correspondingly.
layer_outputs_target (OrderedDict): The dictionary for target model, where the keys are layer names and the values are feature maps correspondingly.
- Shape:
Output: scalar.
-
class
tllib.regularization.delta.
IntermediateLayerGetter
(model, return_layers, keep_output=True)[source]¶ Wraps a model to get intermediate output values of selected layers.
- Parameters
model (torch.nn.Module) – The model to collect intermediate layer feature maps.
return_layers (list) – The names of selected modules to return the output.
keep_output (bool) – If True, model_output contains the final model’s output, else return None. Default: True
- Returns
An OrderedDict of intermediate outputs. The keys are selected layer names in return_layers and the values are the feature map outputs. The order is the same as return_layers.
The model’s final output. If keep_output is False, return None.
LWF: Learning without Forgetting¶
-
class
tllib.regularization.lwf.
Classifier
(backbone, num_classes, head_source, head_target=None, bottleneck=None, bottleneck_dim=-1, finetune=True, pool_layer=None)[source]¶ A Classifier used in Learning Without Forgetting (ECCV 2016)..
- Parameters
backbone (torch.nn.Module) – Any backbone to extract 2-d features from data.
num_classes (int) – Number of classes.
head_source (torch.nn.Module) – Classifier head of source model.
head_target (torch.nn.Module, optional) – Any classifier head. Use
torch.nn.Linear
by defaultfinetune (bool) – Whether finetune the classifier or train from scratch. Default: True
- Inputs:
x (tensor): input data fed to backbone
- Outputs:
y_s: predictions of source classifier head
y_t: predictions of target classifier head
- Shape:
Inputs: (b, *) where b is the batch size and * means any number of additional dimensions
y_s: (b, N), where b is the batch size and N is the number of classes
y_t: (b, N), where b is the batch size and N is the number of classes
Co-Tuning¶
-
class
tllib.regularization.co_tuning.
CoTuningLoss
[source]¶ The Co-Tuning loss in Co-Tuning for Transfer Learning (NIPS 2020).
- Inputs:
input: p(y_s) predicted by source classifier.
target: p(y_s|y_t), where y_t is the ground truth class label in target dataset.
- Shape:
input: (b, N_p), where b is the batch size and N_p is the number of classes in source dataset
target: (b, N_p), where b is the batch size and N_p is the number of classes in source dataset
Outputs: scalar.
-
class
tllib.regularization.co_tuning.
Relationship
(data_loader, classifier, device, cache=None)[source]¶ Learns the category relationship p(y_s|y_t) between source dataset and target dataset.
- Parameters
data_loader (torch.utils.data.DataLoader) – A data loader of target dataset.
classifier (torch.nn.Module) – A classifier for Co-Tuning.
device (torch.nn.Module) – The device to run classifier.
cache (str, optional) – Path to find and save the relationship file.
Bi-Tuning¶
-
class
tllib.regularization.bi_tuning.
BiTuning
(encoder_q, encoder_k, num_classes, K=40, m=0.999, T=0.07)[source]¶ Bi-Tuning Module in Bi-tuning of Pre-trained Representations.
- Parameters
encoder_q (Classifier) – Query encoder.
encoder_k (Classifier) – Key encoder.
num_classes (int) – Number of classes
K (int) – Queue size. Default: 40
m (float) – Momentum coefficient. Default: 0.999
T (float) – Temperature. Default: 0.07
- Inputs:
im_q (tensor): input data fed to encoder_q
im_k (tensor): input data fed to encoder_k
labels (tensor): classification labels of input data
- Outputs: y_q, logits_z, logits_y, labels_c
y_q: query classifier’s predictions
logits_z: projector’s predictions on both positive and negative samples
logits_y: classifier’s predictions on both positive and negative samples
labels_c: contrastive labels
- Shape:
im_q, im_k: (minibatch, *) where * means, any number of additional dimensions
labels: (minibatch, )
y_q: (minibatch, num_classes)
logits_z: (minibatch, 1 + num_classes x K, projection_dim)
logits_y: (minibatch, 1 + num_classes x K, num_classes)
labels_c: (minibatch, 1 + num_classes x K)
BSS: Batch Spectral Shrinkage¶
-
class
tllib.regularization.bss.
BatchSpectralShrinkage
(k=1)[source]¶ The regularization term in Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning (NIPS 2019).
The BSS regularization of feature matrix \(F\) can be described as:
\[L_{bss}(F) = \sum_{i=1}^{k} \sigma_{-i}^2 ,\]where \(k\) is the number of singular values to be penalized, \(\sigma_{-i}\) is the \(i\)-th smallest singular value of feature matrix \(F\).
All the singular values of feature matrix \(F\) are computed by SVD:
\[F = U\Sigma V^T,\]where the main diagonal elements of the singular value matrix \(\Sigma\) is \([\sigma_1, \sigma_2, ..., \sigma_b]\).
- Parameters
k (int) – The number of singular values to be penalized. Default: 1
- Shape:
Input: \((b, |\mathcal{f}|)\) where \(b\) is the batch size and \(|\mathcal{f}|\) is feature dimension.
Output: scalar.