# Modules¶

## Classifier¶

class tllib.modules.classifier.Classifier(backbone, num_classes, bottleneck=None, bottleneck_dim=-1, head=None, finetune=True, pool_layer=None)[source]

A generic Classifier class for domain adaptation.

Parameters
• backbone (torch.nn.Module) – Any backbone to extract 2-d features from data

• num_classes (int) – Number of classes

• bottleneck (torch.nn.Module, optional) – Any bottleneck layer. Use no bottleneck by default

• bottleneck_dim (int, optional) – Feature dimension of the bottleneck layer. Default: -1

• head (torch.nn.Module, optional) – Any classifier head. Use torch.nn.Linear by default

• finetune (bool) – Whether finetune the classifier or train from scratch. Default: True

Note

Different classifiers are used in different domain adaptation algorithms to achieve better accuracy respectively, and we provide a suggested Classifier for different algorithms. Remember they are not the core of algorithms. You can implement your own Classifier and combine it with the domain adaptation algorithm in this algorithm library.

Note

The learning rate of this classifier is set 10 times to that of the feature extractor for better accuracy by default. If you have other optimization strategies, please over-ride get_parameters().

Inputs:
• x (tensor): input data fed to backbone

Outputs:
• predictions: classifier’s predictions

• features: features after bottleneck layer and before head layer

Shape:
• Inputs: (minibatch, *) where * means, any number of additional dimensions

• predictions: (minibatch, num_classes)

• features: (minibatch, features_dim)

property features_dim

The dimension of features before the final head layer

get_parameters(base_lr=1.0)[source]

A parameter list which decides optimization hyper-parameters, such as the relative learning rate of each layer

## Regressor¶

class tllib.modules.regressor.Regressor(backbone, num_factors, bottleneck=None, bottleneck_dim=-1, head=None, finetune=True)[source]

A generic Regressor class for domain adaptation.

Parameters
• backbone (torch.nn.Module) – Any backbone to extract 2-d features from data

• num_factors (int) – Number of factors

• bottleneck (torch.nn.Module, optional) – Any bottleneck layer. Use no bottleneck by default

• bottleneck_dim (int, optional) – Feature dimension of the bottleneck layer. Default: -1

• head (torch.nn.Module, optional) – Any classifier head. Use nn.Linear by default

• finetune (bool) – Whether finetune the classifier or train from scratch. Default: True

Note

The learning rate of this regressor is set 10 times to that of the feature extractor for better accuracy by default. If you have other optimization strategies, please over-ride get_parameters().

Inputs:
• x (tensor): input data fed to backbone

Outputs:
• predictions: regressor’s predictions

• features: features after bottleneck layer and before head layer

Shape:
• Inputs: (minibatch, *) where * means, any number of additional dimensions

• predictions: (minibatch, num_factors)

• features: (minibatch, features_dim)

property features_dim

The dimension of features before the final head layer

get_parameters(base_lr=1.0)[source]

A parameter list which decides optimization hyper-parameters, such as the relative learning rate of each layer

## Domain Discriminator¶

class tllib.modules.domain_discriminator.DomainDiscriminator(in_feature, hidden_size, batch_norm=True, sigmoid=True)[source]

Domain discriminator model from Domain-Adversarial Training of Neural Networks (ICML 2015)

Distinguish whether the input features come from the source domain or the target domain. The source domain label is 1 and the target domain label is 0.

Parameters
Shape:
• Inputs: (minibatch, in_feature)

• Outputs: $$(minibatch, 1)$$

class tllib.modules.grl.WarmStartGradientReverseLayer(alpha=1.0, lo=0.0, hi=1.0, max_iters=1000.0, auto_step=False)[source]

Gradient Reverse Layer $$\mathcal{R}(x)$$ with warm start

The forward and backward behaviours are:

\begin{align}\begin{aligned}\mathcal{R}(x) = x,\\\dfrac{ d\mathcal{R}} {dx} = - \lambda I.\end{aligned}\end{align}

$$\lambda$$ is initiated at $$lo$$ and is gradually changed to $$hi$$ using the following schedule:

$\lambda = \dfrac{2(hi-lo)}{1+\exp(- α \dfrac{i}{N})} - (hi-lo) + lo$

where $$i$$ is the iteration step.

Parameters
• alpha (float, optional) – $$α$$. Default: 1.0

• lo (float, optional) – Initial value of $$\lambda$$. Default: 0.0

• hi (float, optional) – Final value of $$\lambda$$. Default: 1.0

• max_iters (int, optional) – $$N$$. Default: 1000

• auto_step (bool, optional) – If True, increase $$i$$ each time forward is called. Otherwise use function step to increase $$i$$. Default: False

step()[source]

Increase iteration number $$i$$ by 1

## Gaussian Kernels¶

class tllib.modules.kernels.GaussianKernel(sigma=None, track_running_stats=True, alpha=1.0)[source]

Gaussian Kernel Matrix

Gaussian Kernel k is defined by

$k(x_1, x_2) = \exp \left( - \dfrac{\| x_1 - x_2 \|^2}{2\sigma^2} \right)$

where $$x_1, x_2 \in R^d$$ are 1-d tensors.

Gaussian Kernel Matrix K is defined on input group $$X=(x_1, x_2, ..., x_m),$$

$K(X)_{i,j} = k(x_i, x_j)$

Also by default, during training this layer keeps running estimates of the mean of L2 distances, which are then used to set hyperparameter $$\sigma$$. Mathematically, the estimation is $$\sigma^2 = \dfrac{\alpha}{n^2}\sum_{i,j} \| x_i - x_j \|^2$$. If track_running_stats is set to False, this layer then does not keep running estimates, and use a fixed $$\sigma$$ instead.

Parameters
• sigma (float, optional) – bandwidth $$\sigma$$. Default: None

• track_running_stats (bool, optional) – If True, this module tracks the running mean of $$\sigma^2$$. Otherwise, it won’t track such statistics and always uses fix $$\sigma^2$$. Default: True

• alpha (float, optional) – $$\alpha$$ which decides the magnitude of $$\sigma^2$$ when track_running_stats is set to True

Inputs:
• X (tensor): input group $$X$$

Shape:
• Inputs: $$(minibatch, F)$$ where F means the dimension of input features.

• Outputs: $$(minibatch, minibatch)$$

## Entropy¶

tllib.modules.entropy.entropy(predictions, reduction='none')[source]

Entropy of prediction. The definition is:

$entropy(p) = - \sum_{c=1}^C p_c \log p_c$

where C is number of classes.

Parameters
• predictions (tensor) – Classifier predictions. Expected to contain raw, normalized scores for each class

• reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output. Default: 'mean'

Shape:
• predictions: $$(minibatch, C)$$ where C means the number of classes.

• Output: $$(minibatch, )$$ by default. If reduction is 'mean', then scalar.

## Knowledge Distillation Loss¶

class tllib.modules.loss.KnowledgeDistillationLoss(T=1.0, reduction='batchmean')[source]

Knowledge Distillation Loss.

Parameters
• T (double) – Temperature. Default: 1.

• reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'batchmean'

Inputs:
• y_student (tensor): logits output of the student

• y_teacher (tensor): logits output of the teacher

Shape:
• y_student: (minibatch, num_classes)

• y_teacher: (minibatch, num_classes)