Statistics Matching¶
DAN: Deep Adaptation Network¶
-
class
tllib.alignment.dan.
MultipleKernelMaximumMeanDiscrepancy
(kernels, linear=False)[source]¶ The Multiple Kernel Maximum Mean Discrepancy (MK-MMD) used in Learning Transferable Features with Deep Adaptation Networks (ICML 2015)
Given source domain \(\mathcal{D}_s\) of \(n_s\) labeled points and target domain \(\mathcal{D}_t\) of \(n_t\) unlabeled points drawn i.i.d. from P and Q respectively, the deep networks will generate activations as \(\{z_i^s\}_{i=1}^{n_s}\) and \(\{z_i^t\}_{i=1}^{n_t}\). The MK-MMD \(D_k (P, Q)\) between probability distributions P and Q is defined as
\[D_k(P, Q) \triangleq \| E_p [\phi(z^s)] - E_q [\phi(z^t)] \|^2_{\mathcal{H}_k},\]\(k\) is a kernel function in the function space
\[\mathcal{K} \triangleq \{ k=\sum_{u=1}^{m}\beta_{u} k_{u} \}\]where \(k_{u}\) is a single kernel.
Using kernel trick, MK-MMD can be computed as
\[\begin{split}\hat{D}_k(P, Q) &= \dfrac{1}{n_s^2} \sum_{i=1}^{n_s}\sum_{j=1}^{n_s} k(z_i^{s}, z_j^{s})\\ &+ \dfrac{1}{n_t^2} \sum_{i=1}^{n_t}\sum_{j=1}^{n_t} k(z_i^{t}, z_j^{t})\\ &- \dfrac{2}{n_s n_t} \sum_{i=1}^{n_s}\sum_{j=1}^{n_t} k(z_i^{s}, z_j^{t}).\\\end{split}\]- Parameters
kernels (tuple(torch.nn.Module)) – kernel functions.
linear (bool) – whether use the linear version of DAN. Default: False
- Inputs:
z_s (tensor): activations from the source domain, \(z^s\)
z_t (tensor): activations from the target domain, \(z^t\)
- Shape:
Inputs: \((minibatch, *)\) where * means any dimension
Outputs: scalar
Note
Activations \(z^{s}\) and \(z^{t}\) must have the same shape.
Note
The kernel values will add up when there are multiple kernels.
Examples:
>>> from tllib.modules.kernels import GaussianKernel >>> feature_dim = 1024 >>> batch_size = 10 >>> kernels = (GaussianKernel(alpha=0.5), GaussianKernel(alpha=1.), GaussianKernel(alpha=2.)) >>> loss = MultipleKernelMaximumMeanDiscrepancy(kernels) >>> # features from source domain and target domain >>> z_s, z_t = torch.randn(batch_size, feature_dim), torch.randn(batch_size, feature_dim) >>> output = loss(z_s, z_t)
Deep CORAL: Correlation Alignment for Deep Domain Adaptation¶
-
class
tllib.alignment.coral.
CorrelationAlignmentLoss
[source]¶ The Correlation Alignment Loss in Deep CORAL: Correlation Alignment for Deep Domain Adaptation (ECCV 2016).
Given source features \(f_S\) and target features \(f_T\), the covariance matrices are given by
\[C_S = \frac{1}{n_S-1}(f_S^Tf_S-\frac{1}{n_S}(\textbf{1}^Tf_S)^T(\textbf{1}^Tf_S))\]\[C_T = \frac{1}{n_T-1}(f_T^Tf_T-\frac{1}{n_T}(\textbf{1}^Tf_T)^T(\textbf{1}^Tf_T))\]where \(\textbf{1}\) denotes a column vector with all elements equal to 1, \(n_S, n_T\) denotes number of source and target samples, respectively. We use \(d\) to denote feature dimension, use \({\Vert\cdot\Vert}^2_F\) to denote the squared matrix Frobenius norm. The correlation alignment loss is given by
\[l_{CORAL} = \frac{1}{4d^2}\Vert C_S-C_T \Vert^2_F\]- Inputs:
f_s (tensor): feature representations on source domain, \(f^s\)
f_t (tensor): feature representations on target domain, \(f^t\)
- Shape:
f_s, f_t: \((N, d)\) where d means the dimension of input features, \(N=n_S=n_T\) is mini-batch size.
Outputs: scalar.
JAN: Joint Adaptation Network¶
-
class
tllib.alignment.jan.
JointMultipleKernelMaximumMeanDiscrepancy
(kernels, linear=True, thetas=None)[source]¶ The Joint Multiple Kernel Maximum Mean Discrepancy (JMMD) used in Deep Transfer Learning with Joint Adaptation Networks (ICML 2017)
Given source domain \(\mathcal{D}_s\) of \(n_s\) labeled points and target domain \(\mathcal{D}_t\) of \(n_t\) unlabeled points drawn i.i.d. from P and Q respectively, the deep networks will generate activations in layers \(\mathcal{L}\) as \(\{(z_i^{s1}, ..., z_i^{s|\mathcal{L}|})\}_{i=1}^{n_s}\) and \(\{(z_i^{t1}, ..., z_i^{t|\mathcal{L}|})\}_{i=1}^{n_t}\). The empirical estimate of \(\hat{D}_{\mathcal{L}}(P, Q)\) is computed as the squared distance between the empirical kernel mean embeddings as
\[\begin{split}\hat{D}_{\mathcal{L}}(P, Q) &= \dfrac{1}{n_s^2} \sum_{i=1}^{n_s}\sum_{j=1}^{n_s} \prod_{l\in\mathcal{L}} k^l(z_i^{sl}, z_j^{sl}) \\ &+ \dfrac{1}{n_t^2} \sum_{i=1}^{n_t}\sum_{j=1}^{n_t} \prod_{l\in\mathcal{L}} k^l(z_i^{tl}, z_j^{tl}) \\ &- \dfrac{2}{n_s n_t} \sum_{i=1}^{n_s}\sum_{j=1}^{n_t} \prod_{l\in\mathcal{L}} k^l(z_i^{sl}, z_j^{tl}). \\\end{split}\]- Parameters
kernels (tuple(tuple(torch.nn.Module))) – kernel functions, where kernels[r] corresponds to kernel \(k^{\mathcal{L}[r]}\).
linear (bool) – whether use the linear version of JAN. Default: False
thetas (list(Theta) – use adversarial version JAN if not None. Default: None
- Inputs:
z_s (tuple(tensor)): multiple layers’ activations from the source domain, \(z^s\)
z_t (tuple(tensor)): multiple layers’ activations from the target domain, \(z^t\)
- Shape:
\(z^{sl}\) and \(z^{tl}\): \((minibatch, *)\) where * means any dimension
Outputs: scalar
Note
Activations \(z^{sl}\) and \(z^{tl}\) must have the same shape.
Note
The kernel values will add up when there are multiple kernels for a certain layer.
Examples:
>>> feature_dim = 1024 >>> batch_size = 10 >>> layer1_kernels = (GaussianKernel(alpha=0.5), GaussianKernel(1.), GaussianKernel(2.)) >>> layer2_kernels = (GaussianKernel(1.), ) >>> loss = JointMultipleKernelMaximumMeanDiscrepancy((layer1_kernels, layer2_kernels)) >>> # layer1 features from source domain and target domain >>> z1_s, z1_t = torch.randn(batch_size, feature_dim), torch.randn(batch_size, feature_dim) >>> # layer2 features from source domain and target domain >>> z2_s, z2_t = torch.randn(batch_size, feature_dim), torch.randn(batch_size, feature_dim) >>> output = loss((z1_s, z2_s), (z1_t, z2_t))