Modules¶
Classifier¶
-
class
tllib.modules.classifier.
Classifier
(backbone, num_classes, bottleneck=None, bottleneck_dim=-1, head=None, finetune=True, pool_layer=None)[source]¶ A generic Classifier class for domain adaptation.
- Parameters
backbone (torch.nn.Module) – Any backbone to extract 2-d features from data
num_classes (int) – Number of classes
bottleneck (torch.nn.Module, optional) – Any bottleneck layer. Use no bottleneck by default
bottleneck_dim (int, optional) – Feature dimension of the bottleneck layer. Default: -1
head (torch.nn.Module, optional) – Any classifier head. Use
torch.nn.Linear
by defaultfinetune (bool) – Whether finetune the classifier or train from scratch. Default: True
Note
Different classifiers are used in different domain adaptation algorithms to achieve better accuracy respectively, and we provide a suggested Classifier for different algorithms. Remember they are not the core of algorithms. You can implement your own Classifier and combine it with the domain adaptation algorithm in this algorithm library.
Note
The learning rate of this classifier is set 10 times to that of the feature extractor for better accuracy by default. If you have other optimization strategies, please over-ride
get_parameters()
.- Inputs:
x (tensor): input data fed to backbone
- Outputs:
predictions: classifier’s predictions
features: features after bottleneck layer and before head layer
- Shape:
Inputs: (minibatch, *) where * means, any number of additional dimensions
predictions: (minibatch, num_classes)
features: (minibatch, features_dim)
-
property
features_dim
¶ The dimension of features before the final head layer
Regressor¶
-
class
tllib.modules.regressor.
Regressor
(backbone, num_factors, bottleneck=None, bottleneck_dim=-1, head=None, finetune=True)[source]¶ A generic Regressor class for domain adaptation.
- Parameters
backbone (torch.nn.Module) – Any backbone to extract 2-d features from data
num_factors (int) – Number of factors
bottleneck (torch.nn.Module, optional) – Any bottleneck layer. Use no bottleneck by default
bottleneck_dim (int, optional) – Feature dimension of the bottleneck layer. Default: -1
head (torch.nn.Module, optional) – Any classifier head. Use nn.Linear by default
finetune (bool) – Whether finetune the classifier or train from scratch. Default: True
Note
The learning rate of this regressor is set 10 times to that of the feature extractor for better accuracy by default. If you have other optimization strategies, please over-ride
get_parameters()
.- Inputs:
x (tensor): input data fed to backbone
- Outputs:
predictions: regressor’s predictions
features: features after bottleneck layer and before head layer
- Shape:
Inputs: (minibatch, *) where * means, any number of additional dimensions
predictions: (minibatch, num_factors)
features: (minibatch, features_dim)
-
property
features_dim
¶ The dimension of features before the final head layer
Domain Discriminator¶
-
class
tllib.modules.domain_discriminator.
DomainDiscriminator
(in_feature, hidden_size, batch_norm=True, sigmoid=True)[source]¶ Domain discriminator model from Domain-Adversarial Training of Neural Networks (ICML 2015)
Distinguish whether the input features come from the source domain or the target domain. The source domain label is 1 and the target domain label is 0.
- Parameters
in_feature (int) – dimension of the input feature
hidden_size (int) – dimension of the hidden features
batch_norm (bool) – whether use
BatchNorm1d
. UseDropout
ifbatch_norm
is False. Default: True.
- Shape:
Inputs: (minibatch, in_feature)
Outputs: \((minibatch, 1)\)
GRL: Gradient Reverse Layer¶
-
class
tllib.modules.grl.
WarmStartGradientReverseLayer
(alpha=1.0, lo=0.0, hi=1.0, max_iters=1000.0, auto_step=False)[source]¶ Gradient Reverse Layer \(\mathcal{R}(x)\) with warm start
The forward and backward behaviours are:
\[ \begin{align}\begin{aligned}\mathcal{R}(x) = x,\\\dfrac{ d\mathcal{R}} {dx} = - \lambda I.\end{aligned}\end{align} \]\(\lambda\) is initiated at \(lo\) and is gradually changed to \(hi\) using the following schedule:
\[\lambda = \dfrac{2(hi-lo)}{1+\exp(- α \dfrac{i}{N})} - (hi-lo) + lo\]where \(i\) is the iteration step.
- Parameters
alpha (float, optional) – \(α\). Default: 1.0
lo (float, optional) – Initial value of \(\lambda\). Default: 0.0
hi (float, optional) – Final value of \(\lambda\). Default: 1.0
max_iters (int, optional) – \(N\). Default: 1000
auto_step (bool, optional) – If True, increase \(i\) each time forward is called. Otherwise use function step to increase \(i\). Default: False
Gaussian Kernels¶
-
class
tllib.modules.kernels.
GaussianKernel
(sigma=None, track_running_stats=True, alpha=1.0)[source]¶ Gaussian Kernel Matrix
Gaussian Kernel k is defined by
\[k(x_1, x_2) = \exp \left( - \dfrac{\| x_1 - x_2 \|^2}{2\sigma^2} \right)\]where \(x_1, x_2 \in R^d\) are 1-d tensors.
Gaussian Kernel Matrix K is defined on input group \(X=(x_1, x_2, ..., x_m),\)
\[K(X)_{i,j} = k(x_i, x_j)\]Also by default, during training this layer keeps running estimates of the mean of L2 distances, which are then used to set hyperparameter \(\sigma\). Mathematically, the estimation is \(\sigma^2 = \dfrac{\alpha}{n^2}\sum_{i,j} \| x_i - x_j \|^2\). If
track_running_stats
is set toFalse
, this layer then does not keep running estimates, and use a fixed \(\sigma\) instead.- Parameters
sigma (float, optional) – bandwidth \(\sigma\). Default: None
track_running_stats (bool, optional) – If
True
, this module tracks the running mean of \(\sigma^2\). Otherwise, it won’t track such statistics and always uses fix \(\sigma^2\). Default:True
alpha (float, optional) – \(\alpha\) which decides the magnitude of \(\sigma^2\) when track_running_stats is set to
True
- Inputs:
X (tensor): input group \(X\)
- Shape:
Inputs: \((minibatch, F)\) where F means the dimension of input features.
Outputs: \((minibatch, minibatch)\)
Entropy¶
-
tllib.modules.entropy.
entropy
(predictions, reduction='none')[source]¶ Entropy of prediction. The definition is:
\[entropy(p) = - \sum_{c=1}^C p_c \log p_c\]where C is number of classes.
- Parameters
predictions (tensor) – Classifier predictions. Expected to contain raw, normalized scores for each class
reduction (str, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output. Default:'mean'
- Shape:
predictions: \((minibatch, C)\) where C means the number of classes.
Output: \((minibatch, )\) by default. If
reduction
is'mean'
, then scalar.
Knowledge Distillation Loss¶
-
class
tllib.modules.loss.
KnowledgeDistillationLoss
(T=1.0, reduction='batchmean')[source]¶ Knowledge Distillation Loss.
- Parameters
T (double) – Temperature. Default: 1.
reduction (str, optional) – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Default:'batchmean'
- Inputs:
y_student (tensor): logits output of the student
y_teacher (tensor): logits output of the teacher
- Shape:
y_student: (minibatch, num_classes)
y_teacher: (minibatch, num_classes)