## MCD: Maximum Classifier Discrepancy¶

tllib.alignment.mcd.classifier_discrepancy(predictions1, predictions2)[source]

The Classifier Discrepancy in Maximum Classiﬁer Discrepancy for Unsupervised Domain Adaptation (CVPR 2018).

The classfier discrepancy between predictions $$p_1$$ and $$p_2$$ can be described as:

$d(p_1, p_2) = \dfrac{1}{K} \sum_{k=1}^K | p_{1k} - p_{2k} |,$

where K is number of classes.

Parameters
• predictions1 (torch.Tensor) – Classifier predictions $$p_1$$. Expected to contain raw, normalized scores for each class

• predictions2 (torch.Tensor) – Classifier predictions $$p_2$$

tllib.alignment.mcd.entropy(predictions)[source]

Entropy of N predictions $$(p_1, p_2, ..., p_N)$$. The definition is:

$d(p_1, p_2, ..., p_N) = -\dfrac{1}{K} \sum_{k=1}^K \log \left( \dfrac{1}{N} \sum_{i=1}^N p_{ik} \right)$

where K is number of classes.

Note

This entropy function is specifically used in MCD and different from the usual entropy() function.

Parameters

predictions (torch.Tensor) – Classifier predictions. Expected to contain raw, normalized scores for each class

class tllib.alignment.mcd.ImageClassifierHead(in_features, num_classes, bottleneck_dim=1024, pool_layer=None)[source]

Parameters
• in_features (int) – Dimension of input features

• num_classes (int) – Number of classes

• bottleneck_dim (int, optional) – Feature dimension of the bottleneck layer. Default: 1024

Shape:
• Inputs: $$(minibatch, F)$$ where F = in_features.

• Output: $$(minibatch, C)$$ where C = num_classes.

## MDD: Margin Disparity Discrepancy¶

class tllib.alignment.mdd.MarginDisparityDiscrepancy(source_disparity, target_disparity, margin=4, reduction='mean')[source]

The margin disparity discrepancy (MDD) proposed in Bridging Theory and Algorithm for Domain Adaptation (ICML 2019).

MDD can measure the distribution discrepancy in domain adaptation.

The $$y^s$$ and $$y^t$$ are logits output by the main head on the source and target domain respectively. The $$y_{adv}^s$$ and $$y_{adv}^t$$ are logits output by the adversarial head.

The definition can be described as:

$\mathcal{D}_{\gamma}(\hat{\mathcal{S}}, \hat{\mathcal{T}}) = -\gamma \mathbb{E}_{y^s, y_{adv}^s \sim\hat{\mathcal{S}}} L_s (y^s, y_{adv}^s) + \mathbb{E}_{y^t, y_{adv}^t \sim\hat{\mathcal{T}}} L_t (y^t, y_{adv}^t),$

where $$\gamma$$ is a margin hyper-parameter, $$L_s$$ refers to the disparity function defined on the source domain and $$L_t$$ refers to the disparity function defined on the target domain.

Parameters
• source_disparity (callable) – The disparity function defined on the source domain, $$L_s$$.

• target_disparity (callable) – The disparity function defined on the target domain, $$L_t$$.

• margin (float) – margin $$\gamma$$. Default: 4

• reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean'

Inputs:
• y_s: output $$y^s$$ by the main head on the source domain

• y_s_adv: output $$y^s$$ by the adversarial head on the source domain

• y_t: output $$y^t$$ by the main head on the target domain

• y_t_adv: output $$y_{adv}^t$$ by the adversarial head on the target domain

• w_s (optional): instance weights for source domain

• w_t (optional): instance weights for target domain

Examples:

>>> num_outputs = 2
>>> batch_size = 10
>>> loss = MarginDisparityDiscrepancy(margin=4., source_disparity=F.l1_loss, target_disparity=F.l1_loss)
>>> # output from source domain and target domain
>>> y_s, y_t = torch.randn(batch_size, num_outputs), torch.randn(batch_size, num_outputs)
>>> # adversarial output from source domain and target domain


MDD for Classification

class tllib.alignment.mdd.ClassificationMarginDisparityDiscrepancy(margin=4, **kwargs)[source]

The margin disparity discrepancy (MDD) proposed in Bridging Theory and Algorithm for Domain Adaptation (ICML 2019).

It measures the distribution discrepancy in domain adaptation for classification.

When margin is equal to 1, it’s also called disparity discrepancy (DD).

The $$y^s$$ and $$y^t$$ are logits output by the main classifier on the source and target domain respectively. The $$y_{adv}^s$$ and $$y_{adv}^t$$ are logits output by the adversarial classifier. They are expected to contain raw, unnormalized scores for each class.

The definition can be described as:

$\mathcal{D}_{\gamma}(\hat{\mathcal{S}}, \hat{\mathcal{T}}) = \gamma \mathbb{E}_{y^s, y_{adv}^s \sim\hat{\mathcal{S}}} \log\left(\frac{\exp(y_{adv}^s[h_{y^s}])}{\sum_j \exp(y_{adv}^s[j])}\right) + \mathbb{E}_{y^t, y_{adv}^t \sim\hat{\mathcal{T}}} \log\left(1-\frac{\exp(y_{adv}^t[h_{y^t}])}{\sum_j \exp(y_{adv}^t[j])}\right),$

where $$\gamma$$ is a margin hyper-parameter and $$h_y$$ refers to the predicted label when the logits output is $$y$$. You can see more details in Bridging Theory and Algorithm for Domain Adaptation.

Parameters
• margin (float) – margin $$\gamma$$. Default: 4

• reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean'

Inputs:
• y_s: logits output $$y^s$$ by the main classifier on the source domain

• y_s_adv: logits output $$y^s$$ by the adversarial classifier on the source domain

• y_t: logits output $$y^t$$ by the main classifier on the target domain

• y_t_adv: logits output $$y_{adv}^t$$ by the adversarial classifier on the target domain

Shape:
• Inputs: $$(minibatch, C)$$ where C = number of classes, or $$(minibatch, C, d_1, d_2, ..., d_K)$$ with $$K \geq 1$$ in the case of K-dimensional loss.

• Output: scalar. If reduction is 'none', then the same size as the target: $$(minibatch)$$, or $$(minibatch, d_1, d_2, ..., d_K)$$ with $$K \geq 1$$ in the case of K-dimensional loss.

Examples:

>>> num_classes = 2
>>> batch_size = 10
>>> loss = ClassificationMarginDisparityDiscrepancy(margin=4.)
>>> # logits output from source domain and target domain
>>> y_s, y_t = torch.randn(batch_size, num_classes), torch.randn(batch_size, num_classes)
>>> # adversarial logits output from source domain and target domain

class tllib.alignment.mdd.ImageClassifier(backbone, num_classes, bottleneck_dim=1024, width=1024, grl=None, finetune=True, pool_layer=None)[source]

Classifier for MDD.

Classifier for MDD has one backbone, one bottleneck, while two classifier heads. The first classifier head is used for final predictions. The adversarial classifier head is only used when calculating MarginDisparityDiscrepancy.

Parameters
• backbone (torch.nn.Module) – Any backbone to extract 1-d features from data

• num_classes (int) – Number of classes

• bottleneck_dim (int, optional) – Feature dimension of the bottleneck layer. Default: 1024

• width (int, optional) – Feature dimension of the classifier head. Default: 1024

• grl (nn.Module) – Gradient reverse layer. Will use default parameters if None. Default: None.

• finetune (bool, optional) – Whether use 10x smaller learning rate in the backbone. Default: True

Inputs:
• x (tensor): input data

Outputs:
• outputs: logits outputs by the main classifier

Shape:
• x: $$(minibatch, *)$$, same shape as the input of the backbone.

• outputs, outputs_adv: $$(minibatch, C)$$, where C means the number of classes.

Note

Remember to call function step() after function forward() during training phase! For instance,

>>> # x is inputs, classifier is an ImageClassifier
>>> classifier.step()

tllib.alignment.mdd.shift_log(x, offset=1e-06)[source]

First shift, then calculate log, which can be described as:

$y = \max(\log(x+\text{offset}), 0)$

Used to avoid the gradient explosion problem in log(x) function when x=0.

Parameters
• x (torch.Tensor) – input tensor

• offset (float, optional) – offset size. Default: 1e-6

Note

Input tensor falls in [0., 1.] and the output tensor falls in [-log(offset), 0]

MDD for Regression

class tllib.alignment.mdd.RegressionMarginDisparityDiscrepancy(margin=1, loss_function=<function l1_loss>, **kwargs)[source]

The margin disparity discrepancy (MDD) proposed in Bridging Theory and Algorithm for Domain Adaptation (ICML 2019).

It measures the distribution discrepancy in domain adaptation for regression.

The $$y^s$$ and $$y^t$$ are logits output by the main regressor on the source and target domain respectively. The $$y_{adv}^s$$ and $$y_{adv}^t$$ are logits output by the adversarial regressor. They are expected to contain normalized values for each factors.

The definition can be described as:

$\mathcal{D}_{\gamma}(\hat{\mathcal{S}}, \hat{\mathcal{T}}) = -\gamma \mathbb{E}_{y^s, y_{adv}^s \sim\hat{\mathcal{S}}} L (y^s, y_{adv}^s) + \mathbb{E}_{y^t, y_{adv}^t \sim\hat{\mathcal{T}}} L (y^t, y_{adv}^t),$

where $$\gamma$$ is a margin hyper-parameter and $$L$$ refers to the disparity function defined on both domains. You can see more details in Bridging Theory and Algorithm for Domain Adaptation.

Parameters
• loss_function (callable) – The disparity function defined on both domains, $$L$$.

• margin (float) – margin $$\gamma$$. Default: 1

• reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean'

Inputs:
• y_s: logits output $$y^s$$ by the main regressor on the source domain

• y_s_adv: logits output $$y^s$$ by the adversarial regressor on the source domain

• y_t: logits output $$y^t$$ by the main regressor on the target domain

• y_t_adv: logits output $$y_{adv}^t$$ by the adversarial regressor on the target domain

Shape:
• Inputs: $$(minibatch, F)$$ where F = number of factors, or $$(minibatch, F, d_1, d_2, ..., d_K)$$ with $$K \geq 1$$ in the case of K-dimensional loss.

• Output: scalar. The same size as the target: $$(minibatch)$$, or $$(minibatch, d_1, d_2, ..., d_K)$$ with $$K \geq 1$$ in the case of K-dimensional loss.

Examples:

>>> num_outputs = 2
>>> batch_size = 10
>>> loss = RegressionMarginDisparityDiscrepancy(margin=4., loss_function=F.l1_loss)
>>> # output from source domain and target domain
>>> y_s, y_t = torch.randn(batch_size, num_outputs), torch.randn(batch_size, num_outputs)
>>> # adversarial output from source domain and target domain

class tllib.alignment.mdd.ImageRegressor(backbone, num_factors, bottleneck=None, head=None, adv_head=None, bottleneck_dim=1024, width=1024, finetune=True)[source]

Regressor for MDD.

Regressor for MDD has one backbone, one bottleneck, while two regressor heads. The first regressor head is used for final predictions. The adversarial regressor head is only used when calculating MarginDisparityDiscrepancy.

Parameters
• backbone (torch.nn.Module) – Any backbone to extract 1-d features from data

• num_factors (int) – Number of factors

• bottleneck_dim (int, optional) – Feature dimension of the bottleneck layer. Default: 1024

• width (int, optional) – Feature dimension of the classifier head. Default: 1024

• finetune (bool, optional) – Whether use 10x smaller learning rate in the backbone. Default: True

Inputs:
• x (Tensor): input data

• outputs: outputs by the main regressor

Shape:
• x: $$(minibatch, *)$$, same shape as the input of the backbone.

• outputs, outputs_adv: $$(minibatch, F)$$, where F means the number of factors.

Note

Remember to call function step() after function forward() during training phase! For instance,

>>> # x is inputs, regressor is an ImageRegressor
>>> regressor.step()


class tllib.alignment.regda.PseudoLabelGenerator2d(num_keypoints, height=64, width=64, sigma=2)[source]

Generate ground truth heatmap and ground false heatmap from a prediction.

Parameters
• num_keypoints (int) – Number of keypoints

• height (int) – height of the heatmap. Default: 64

• width (int) – width of the heatmap. Default: 64

• sigma (int) – sigma parameter when generate the heatmap. Default: 2

Inputs:
• y: predicted heatmap

Outputs:
• ground_truth: heatmap conforming to Gaussian distribution

• ground_false: ground false heatmap

Shape:
• y: $$(minibatch, K, H, W)$$ where K means the number of keypoints, H and W is the height and width of the heatmap respectively.

• ground_truth: $$(minibatch, K, H, W)$$

• ground_false: $$(minibatch, K, H, W)$$

class tllib.alignment.regda.RegressionDisparity(pseudo_label_generator, criterion)[source]

Regression Disparity proposed by Regressive Domain Adaptation for Unsupervised Keypoint Detection (CVPR 2021).

Parameters
• pseudo_label_generator (PseudoLabelGenerator2d) – generate ground truth heatmap and ground false heatmap from a prediction.

• criterion (torch.nn.Module) – the loss function to calculate distance between two predictions.

Inputs:
• y: output by the main head

• weight (optional): instance weights

• mode (str): whether minimize the disparity or maximize the disparity. Choices includes min, max. Default: min.

Shape:
• y: $$(minibatch, K, H, W)$$ where K means the number of keypoints, H and W is the height and width of the heatmap respectively.

• y_adv: $$(minibatch, K, H, W)$$

• weight: $$(minibatch, K)$$.

• Output: depends on the criterion.

Examples:

>>> num_keypoints = 5
>>> batch_size = 10
>>> H = W = 64
>>> pseudo_label_generator = PseudoLabelGenerator2d(num_keypoints)
>>> from tllibvision.models.keypoint_detection.loss import JointsKLLoss
>>> loss = RegressionDisparity(pseudo_label_generator, JointsKLLoss())
>>> # output from source domain and target domain
>>> y_s, y_t = torch.randn(batch_size, num_keypoints, H, W), torch.randn(batch_size, num_keypoints, H, W)
>>> # adversarial output from source domain and target domain
>>> y_s_adv, y_t_adv = torch.randn(batch_size, num_keypoints, H, W), torch.randn(batch_size, num_keypoints, H, W)
>>> # minimize regression disparity on source domain
>>> output = loss(y_s, y_s_adv, mode='min')
>>> # maximize regression disparity on target domain
>>> output = loss(y_t, y_t_adv, mode='max')

class tllib.alignment.regda.PoseResNet2d(backbone, upsampling, feature_dim, num_keypoints, gl=None, finetune=True, num_head_layers=2)[source]

Pose ResNet for RegDA has one backbone, one upsampling, while two regression heads.

Parameters
• backbone (torch.nn.Module) – Backbone to extract 2-d features from data

• upsampling (torch.nn.Module) – Layer to upsample image feature to heatmap size

• feature_dim (int) – The dimension of the features from upsampling layer.

• num_keypoints (int) – Number of keypoints

• finetune (bool, optional) – Whether use 10x smaller learning rate in the backbone. Default: True

Inputs:
• x (tensor): input data

Outputs:
• outputs: logits outputs by the main regressor

Shape:
• x: $$(minibatch, *)$$, same shape as the input of the backbone.

• outputs, outputs_adv: $$(minibatch, K, H, W)$$, where K means the number of keypoints.

Note

Remember to call function step() after function forward() during training phase! For instance,

>>> # x is inputs, model is an PoseResNet