# Statistics Matching¶

class tllib.alignment.dan.MultipleKernelMaximumMeanDiscrepancy(kernels, linear=False)[source]

The Multiple Kernel Maximum Mean Discrepancy (MK-MMD) used in Learning Transferable Features with Deep Adaptation Networks (ICML 2015)

Given source domain $$\mathcal{D}_s$$ of $$n_s$$ labeled points and target domain $$\mathcal{D}_t$$ of $$n_t$$ unlabeled points drawn i.i.d. from P and Q respectively, the deep networks will generate activations as $$\{z_i^s\}_{i=1}^{n_s}$$ and $$\{z_i^t\}_{i=1}^{n_t}$$. The MK-MMD $$D_k (P, Q)$$ between probability distributions P and Q is defined as

$D_k(P, Q) \triangleq \| E_p [\phi(z^s)] - E_q [\phi(z^t)] \|^2_{\mathcal{H}_k},$

$$k$$ is a kernel function in the function space

$\mathcal{K} \triangleq \{ k=\sum_{u=1}^{m}\beta_{u} k_{u} \}$

where $$k_{u}$$ is a single kernel.

Using kernel trick, MK-MMD can be computed as

$\begin{split}\hat{D}_k(P, Q) &= \dfrac{1}{n_s^2} \sum_{i=1}^{n_s}\sum_{j=1}^{n_s} k(z_i^{s}, z_j^{s})\\ &+ \dfrac{1}{n_t^2} \sum_{i=1}^{n_t}\sum_{j=1}^{n_t} k(z_i^{t}, z_j^{t})\\ &- \dfrac{2}{n_s n_t} \sum_{i=1}^{n_s}\sum_{j=1}^{n_t} k(z_i^{s}, z_j^{t}).\\\end{split}$
Parameters
• kernels (tuple(torch.nn.Module)) – kernel functions.

• linear (bool) – whether use the linear version of DAN. Default: False

Inputs:
• z_s (tensor): activations from the source domain, $$z^s$$

• z_t (tensor): activations from the target domain, $$z^t$$

Shape:
• Inputs: $$(minibatch, *)$$ where * means any dimension

• Outputs: scalar

Note

Activations $$z^{s}$$ and $$z^{t}$$ must have the same shape.

Note

The kernel values will add up when there are multiple kernels.

Examples:

>>> from tllib.modules.kernels import GaussianKernel
>>> feature_dim = 1024
>>> batch_size = 10
>>> kernels = (GaussianKernel(alpha=0.5), GaussianKernel(alpha=1.), GaussianKernel(alpha=2.))
>>> loss = MultipleKernelMaximumMeanDiscrepancy(kernels)
>>> # features from source domain and target domain
>>> z_s, z_t = torch.randn(batch_size, feature_dim), torch.randn(batch_size, feature_dim)
>>> output = loss(z_s, z_t)


## Deep CORAL: Correlation Alignment for Deep Domain Adaptation¶

class tllib.alignment.coral.CorrelationAlignmentLoss[source]

The Correlation Alignment Loss in Deep CORAL: Correlation Alignment for Deep Domain Adaptation (ECCV 2016).

Given source features $$f_S$$ and target features $$f_T$$, the covariance matrices are given by

$C_S = \frac{1}{n_S-1}(f_S^Tf_S-\frac{1}{n_S}(\textbf{1}^Tf_S)^T(\textbf{1}^Tf_S))$
$C_T = \frac{1}{n_T-1}(f_T^Tf_T-\frac{1}{n_T}(\textbf{1}^Tf_T)^T(\textbf{1}^Tf_T))$

where $$\textbf{1}$$ denotes a column vector with all elements equal to 1, $$n_S, n_T$$ denotes number of source and target samples, respectively. We use $$d$$ to denote feature dimension, use $${\Vert\cdot\Vert}^2_F$$ to denote the squared matrix Frobenius norm. The correlation alignment loss is given by

$l_{CORAL} = \frac{1}{4d^2}\Vert C_S-C_T \Vert^2_F$
Inputs:
• f_s (tensor): feature representations on source domain, $$f^s$$

• f_t (tensor): feature representations on target domain, $$f^t$$

Shape:
• f_s, f_t: $$(N, d)$$ where d means the dimension of input features, $$N=n_S=n_T$$ is mini-batch size.

• Outputs: scalar.

class tllib.alignment.jan.JointMultipleKernelMaximumMeanDiscrepancy(kernels, linear=True, thetas=None)[source]

The Joint Multiple Kernel Maximum Mean Discrepancy (JMMD) used in Deep Transfer Learning with Joint Adaptation Networks (ICML 2017)

Given source domain $$\mathcal{D}_s$$ of $$n_s$$ labeled points and target domain $$\mathcal{D}_t$$ of $$n_t$$ unlabeled points drawn i.i.d. from P and Q respectively, the deep networks will generate activations in layers $$\mathcal{L}$$ as $$\{(z_i^{s1}, ..., z_i^{s|\mathcal{L}|})\}_{i=1}^{n_s}$$ and $$\{(z_i^{t1}, ..., z_i^{t|\mathcal{L}|})\}_{i=1}^{n_t}$$. The empirical estimate of $$\hat{D}_{\mathcal{L}}(P, Q)$$ is computed as the squared distance between the empirical kernel mean embeddings as

$\begin{split}\hat{D}_{\mathcal{L}}(P, Q) &= \dfrac{1}{n_s^2} \sum_{i=1}^{n_s}\sum_{j=1}^{n_s} \prod_{l\in\mathcal{L}} k^l(z_i^{sl}, z_j^{sl}) \\ &+ \dfrac{1}{n_t^2} \sum_{i=1}^{n_t}\sum_{j=1}^{n_t} \prod_{l\in\mathcal{L}} k^l(z_i^{tl}, z_j^{tl}) \\ &- \dfrac{2}{n_s n_t} \sum_{i=1}^{n_s}\sum_{j=1}^{n_t} \prod_{l\in\mathcal{L}} k^l(z_i^{sl}, z_j^{tl}). \\\end{split}$
Parameters
• kernels (tuple(tuple(torch.nn.Module))) – kernel functions, where kernels[r] corresponds to kernel $$k^{\mathcal{L}[r]}$$.

• linear (bool) – whether use the linear version of JAN. Default: False

• thetas (list(Theta) – use adversarial version JAN if not None. Default: None

Inputs:
• z_s (tuple(tensor)): multiple layers’ activations from the source domain, $$z^s$$

• z_t (tuple(tensor)): multiple layers’ activations from the target domain, $$z^t$$

Shape:
• $$z^{sl}$$ and $$z^{tl}$$: $$(minibatch, *)$$ where * means any dimension

• Outputs: scalar

Note

Activations $$z^{sl}$$ and $$z^{tl}$$ must have the same shape.

Note

The kernel values will add up when there are multiple kernels for a certain layer.

Examples:

>>> feature_dim = 1024
>>> batch_size = 10
>>> layer1_kernels = (GaussianKernel(alpha=0.5), GaussianKernel(1.), GaussianKernel(2.))
>>> layer2_kernels = (GaussianKernel(1.), )
>>> loss = JointMultipleKernelMaximumMeanDiscrepancy((layer1_kernels, layer2_kernels))
>>> # layer1 features from source domain and target domain
>>> z1_s, z1_t = torch.randn(batch_size, feature_dim), torch.randn(batch_size, feature_dim)
>>> # layer2 features from source domain and target domain
>>> z2_s, z2_t = torch.randn(batch_size, feature_dim), torch.randn(batch_size, feature_dim)
>>> output = loss((z1_s, z2_s), (z1_t, z2_t))