Shortcuts

Models

Image Classification

ResNets

Modified based on torchvision.models.resnet. @author: Junguang Jiang @contact: JiangJunguang1123@outlook.com

class tllib.vision.models.resnet.ResNet(*args, **kwargs)[source]

ResNets without fully connected layer

copy_head()[source]

Copy the origin fully connected layer

property out_features

The dimension of output features

tllib.vision.models.resnet.resnet18(pretrained=False, progress=True, **kwargs)[source]

ResNet-18 model from “Deep Residual Learning for Image Recognition”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

tllib.vision.models.resnet.resnet34(pretrained=False, progress=True, **kwargs)[source]

ResNet-34 model from “Deep Residual Learning for Image Recognition”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

tllib.vision.models.resnet.resnet50(pretrained=False, progress=True, **kwargs)[source]

ResNet-50 model from “Deep Residual Learning for Image Recognition”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

tllib.vision.models.resnet.resnet101(pretrained=False, progress=True, **kwargs)[source]

ResNet-101 model from “Deep Residual Learning for Image Recognition”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

tllib.vision.models.resnet.resnet152(pretrained=False, progress=True, **kwargs)[source]

ResNet-152 model from “Deep Residual Learning for Image Recognition”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

tllib.vision.models.resnet.resnext50_32x4d(pretrained=False, progress=True, **kwargs)[source]

ResNeXt-50 32x4d model from “Aggregated Residual Transformation for Deep Neural Networks”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

tllib.vision.models.resnet.resnext101_32x8d(pretrained=False, progress=True, **kwargs)[source]

ResNeXt-101 32x8d model from “Aggregated Residual Transformation for Deep Neural Networks”

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

tllib.vision.models.resnet.wide_resnet50_2(pretrained=False, progress=True, **kwargs)[source]

Wide ResNet-50-2 model from “Wide Residual Networks”

The model is the same as ResNet except for the bottleneck number of channels which is twice larger in every block. The number of channels in outer 1x1 convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048 channels, and in Wide ResNet-50-2 has 2048-1024-2048.

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

tllib.vision.models.resnet.wide_resnet101_2(pretrained=False, progress=True, **kwargs)[source]

Wide ResNet-101-2 model from “Wide Residual Networks”

The model is the same as ResNet except for the bottleneck number of channels which is twice larger in every block. The number of channels in outer 1x1 convolutions is the same, e.g. last block in ResNet-50 has 2048-512-2048 channels, and in Wide ResNet-50-2 has 2048-1024-2048.

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

LeNet

LeNet model from “Gradient-based learning applied to document recognition”

param num_classes

number of classes. Default: 10

type num_classes

int

Note

The input image size must be 28 x 28.

DTN

DTN model

param num_classes

number of classes. Default: 10

type num_classes

int

Note

The input image size must be 32 x 32.

Object Detection

class tllib.vision.models.object_detection.meta_arch.TLGeneralizedRCNN(*args, finetune=False, **kwargs)[source]

Generalized R-CNN for Transfer Learning. Similar to that in in Supervised Learning, TLGeneralizedRCNN has the following three components: 1. Per-image feature extraction (aka backbone) 2. Region proposal generation 3. Per-region feature extraction and prediction

Different from that in Supervised Learning, TLGeneralizedRCNN 1. accepts unlabeled images during training (return no losses) 2. return both detection outputs, features, and losses during training

Parameters
  • backbone – a backbone module, must follow detectron2’s backbone interface

  • proposal_generator – a module that generates proposals using backbone features

  • roi_heads – a ROI head that performs per-region computation

  • pixel_std (pixel_mean,) – list or tuple with #channels element, representing the per-channel mean and std to be used to normalize the input image

  • input_format – describe the meaning of channels of input. Needed by visualization

  • vis_period – the period to run visualization. Set to 0 to disable.

  • finetune (bool) – whether finetune the detector or train from scratch. Default: True

Inputs:
  • batched_inputs: a list, batched outputs of DatasetMapper. Each item in the list contains the inputs for one image. For now, each item in the list is a dict that contains:

    • image: Tensor, image in (C, H, W) format.

    • instances (optional): groundtruth Instances

    • proposals (optional): Instances, precomputed proposals.

    • “height”, “width” (int): the output resolution of the model, used in inference. See postprocess() for details.

  • labeled (bool, optional): whether has ground-truth label

Outputs:
  • outputs: A list of dict where each dict is the output for one input image. The dict contains a key “instances” whose value is a Instances and a key “features” whose value is the features of middle layers. The Instances object has the following keys: “pred_boxes”, “pred_classes”, “scores”, “pred_masks”, “pred_keypoints”

  • losses: A dict of different losses

get_parameters(lr=1.0)[source]

Return a parameter list which decides optimization hyper-parameters, such as the learning rate of each layer

class tllib.vision.models.object_detection.meta_arch.TLRetinaNet(*args, finetune=False, **kwargs)[source]

RetinaNet for Transfer Learning.

Different from that in Supervised Learning, TLRetinaNet 1. accepts unlabeled images during training (return no losses) 2. return both detection outputs, features, and losses during training

Parameters
  • backbone – a backbone module, must follow detectron2’s backbone interface

  • head (nn.Module) – a module that predicts logits and regression deltas for each level from a list of per-level features

  • head_in_features (Tuple[str]) – Names of the input feature maps to be used in head

  • anchor_generator (nn.Module) – a module that creates anchors from a list of features. Usually an instance of AnchorGenerator

  • box2box_transform (Box2BoxTransform) – defines the transform from anchors boxes to instance boxes

  • anchor_matcher (Matcher) – label the anchors by matching them with ground truth.

  • num_classes (int) – number of classes. Used to label background proposals.

  • Loss parameters (#) –

  • focal_loss_alpha (float) – focal_loss_alpha

  • focal_loss_gamma (float) – focal_loss_gamma

  • smooth_l1_beta (float) – smooth_l1_beta

  • box_reg_loss_type (str) – Options are “smooth_l1”, “giou”

  • Inference parameters (#) –

  • test_score_thresh (float) – Inference cls score threshold, only anchors with score > INFERENCE_TH are considered for inference (to improve speed)

  • test_topk_candidates (int) – Select topk candidates before NMS

  • test_nms_thresh (float) – Overlap threshold used for non-maximum suppression (suppress boxes with IoU >= this threshold)

  • max_detections_per_image (int) – Maximum number of detections to return per image during inference (100 is based on the limit established for the COCO dataset).

  • Input parameters (#) –

  • pixel_mean (Tuple[float]) – Values to be used for image normalization (BGR order). To train on images of different number of channels, set different mean & std. Default values are the mean pixel value from ImageNet: [103.53, 116.28, 123.675]

  • pixel_std (Tuple[float]) – When using pre-trained models in Detectron1 or any MSRA models, std has been absorbed into its conv1 weights, so the std needs to be set 1. Otherwise, you can use [57.375, 57.120, 58.395] (ImageNet std)

  • vis_period (int) – The period (in terms of steps) for minibatch visualization at train time. Set to 0 to disable.

  • input_format (str) – Whether the model needs RGB, YUV, HSV etc.

  • finetune (bool) – whether finetune the detector or train from scratch. Default: True

Inputs:
  • batched_inputs: a list, batched outputs of DatasetMapper. Each item in the list contains the inputs for one image. For now, each item in the list is a dict that contains:

    • image: Tensor, image in (C, H, W) format.

    • instances (optional): groundtruth Instances

    • “height”, “width” (int): the output resolution of the model, used in inference. See postprocess() for details.

  • labeled (bool, optional): whether has ground-truth label

Outputs:
  • outputs: A list of dict where each dict is the output for one input image. The dict contains a key “instances” whose value is a Instances and a key “features” whose value is the features of middle layers. The Instances object has the following keys: “pred_boxes”, “pred_classes”, “scores”, “pred_masks”, “pred_keypoints”

  • losses: A dict of different losses

get_parameters(lr=1.0)[source]

Return a parameter list which decides optimization hyper-parameters, such as the learning rate of each layer

class tllib.vision.models.object_detection.proposal_generator.rpn.TLRPN(*args, **kwargs)[source]

Region Proposal Network, introduced by Faster R-CNN.

Parameters
  • in_features (list[str]) – list of names of input features to use

  • head (nn.Module) – a module that predicts logits and regression deltas for each level from a list of per-level features

  • anchor_generator (nn.Module) – a module that creates anchors from a list of features. Usually an instance of AnchorGenerator

  • anchor_matcher (Matcher) – label the anchors by matching them with ground truth.

  • box2box_transform (Box2BoxTransform) – defines the transform from anchors boxes to instance boxes

  • batch_size_per_image (int) – number of anchors per image to sample for training

  • positive_fraction (float) – fraction of foreground anchors to sample for training

  • pre_nms_topk (tuple[float]) – (train, test) that represents the number of top k proposals to select before NMS, in training and testing.

  • post_nms_topk (tuple[float]) – (train, test) that represents the number of top k proposals to select after NMS, in training and testing.

  • nms_thresh (float) – NMS threshold used to de-duplicate the predicted proposals

  • min_box_size (float) – remove proposal boxes with any side smaller than this threshold, in the unit of input image pixels

  • anchor_boundary_thresh (float) – legacy option

  • loss_weight (float|dict) –

    weights to use for losses. Can be single float for weighting all rpn losses together, or a dict of individual weightings. Valid dict keys are:

    ”loss_rpn_cls” - applied to classification loss “loss_rpn_loc” - applied to box regression loss

  • box_reg_loss_type (str) – Loss type to use. Supported losses: “smooth_l1”, “giou”.

  • smooth_l1_beta (float) – beta parameter for the smooth L1 regression loss. Default to use L1 loss. Only used when box_reg_loss_type is “smooth_l1”

Inputs:
  • images (ImageList): input images of length N

  • features (dict[str, Tensor]): input data as a mapping from feature map name to tensor. Axis 0 represents the number of images N in the input data; axes 1-3 are channels, height, and width, which may vary between feature maps (e.g., if a feature pyramid is used).

  • gt_instances (list[Instances], optional): a length N list of Instances`s. Each `Instances stores ground-truth instances for the corresponding image.

  • labeled (bool, optional): whether has ground-truth label. Default: True

Outputs:
  • proposals: list[Instances]: contains fields “proposal_boxes”, “objectness_logits”

  • loss: dict[Tensor] or None

class tllib.vision.models.object_detection.roi_heads.TLRes5ROIHeads(*args, **kwargs)[source]

The ROIHeads in a typical “C4” R-CNN model, where the box and mask head share the cropping and the per-region feature computation by a Res5 block.

Parameters
  • in_features (list[str]) – list of backbone feature map names to use for feature extraction

  • pooler (ROIPooler) – pooler to extra region features from backbone

  • res5 (nn.Sequential) – a CNN to compute per-region features, to be used by box_predictor and mask_head. Typically this is a “res5” block from a ResNet.

  • box_predictor (nn.Module) – make box predictions from the feature. Should have the same interface as FastRCNNOutputLayers.

  • mask_head (nn.Module) – transform features to make mask predictions

Inputs:
  • images (ImageList):

  • features (dict[str,Tensor]): input data as a mapping from feature map name to tensor. Axis 0 represents the number of images N in the input data; axes 1-3 are channels, height, and width, which may vary between feature maps (e.g., if a feature pyramid is used).

  • proposals (list[Instances]): length N list of Instances. The i-th Instances contains object proposals for the i-th input image, with fields “proposal_boxes” and “objectness_logits”.

  • targets (list[Instances], optional): length N list of Instances. The i-th Instances contains the ground-truth per-instance annotations for the i-th input image. Specify targets during training only. It may have the following fields:

    • gt_boxes: the bounding box of each instance.

    • gt_classes: the label for each instance with a category ranging in [0, #class].

    • gt_masks: PolygonMasks or BitMasks, the ground-truth masks of each instance.

    • gt_keypoints: NxKx3, the groud-truth keypoints for each instance.

  • labeled (bool, optional): whether has ground-truth label. Default: True

Outputs:
  • list[Instances]: length N list of Instances containing the detected instances. Returned during inference only; may be [] during training.

  • dict[str->Tensor]: mapping from a named loss to a tensor storing the loss. Used during training only.

sample_unlabeled_proposals(proposals)[source]

Prepare some unlabeled proposals. It returns top self.batch_size_per_image samples from proposals

Parameters

proposals (list[Instances]) – length N list of Instances. The i-th Instances contains object proposals for the i-th input image, with fields “proposal_boxes” and “objectness_logits”.

Returns

length N list of `Instances`s containing the proposals sampled for training.

class tllib.vision.models.object_detection.roi_heads.TLStandardROIHeads(*args, **kwargs)[source]

It’s “standard” in a sense that there is no ROI transform sharing or feature sharing between tasks. Each head independently processes the input features by each head’s own pooler and head.

Parameters
  • box_in_features (list[str]) – list of feature names to use for the box head.

  • box_pooler (ROIPooler) – pooler to extra region features for box head

  • box_head (nn.Module) – transform features to make box predictions

  • box_predictor (nn.Module) – make box predictions from the feature. Should have the same interface as FastRCNNOutputLayers.

  • mask_in_features (list[str]) – list of feature names to use for the mask pooler or mask head. None if not using mask head.

  • mask_pooler (ROIPooler) – pooler to extract region features from image features. The mask head will then take region features to make predictions. If None, the mask head will directly take the dict of image features defined by mask_in_features

  • mask_head (nn.Module) – transform features to make mask predictions

  • keypoint_pooler, keypoint_head (keypoint_in_features,) – similar to mask_*.

  • train_on_pred_boxes (bool) – whether to use proposal boxes or predicted boxes from the box head to train other heads.

Inputs:
  • images (ImageList):

  • features (dict[str,Tensor]): input data as a mapping from feature map name to tensor. Axis 0 represents the number of images N in the input data; axes 1-3 are channels, height, and width, which may vary between feature maps (e.g., if a feature pyramid is used).

  • proposals (list[Instances]): length N list of Instances. The i-th Instances contains object proposals for the i-th input image, with fields “proposal_boxes” and “objectness_logits”.

  • targets (list[Instances], optional): length N list of Instances. The i-th Instances contains the ground-truth per-instance annotations for the i-th input image. Specify targets during training only. It may have the following fields:

    • gt_boxes: the bounding box of each instance.

    • gt_classes: the label for each instance with a category ranging in [0, #class].

    • gt_masks: PolygonMasks or BitMasks, the ground-truth masks of each instance.

    • gt_keypoints: NxKx3, the groud-truth keypoints for each instance.

  • labeled (bool, optional): whether has ground-truth label. Default: True

Outputs:
  • list[Instances]: length N list of Instances containing the detected instances. Returned during inference only; may be [] during training.

  • dict[str->Tensor]: mapping from a named loss to a tensor storing the loss. Used during training only.

sample_unlabeled_proposals(proposals)[source]

Prepare some unlabeled proposals. It returns top self.batch_size_per_image samples from proposals

Parameters

proposals (list[Instances]) – length N list of Instances. The i-th Instances contains object proposals for the i-th input image, with fields “proposal_boxes” and “objectness_logits”.

Returns

length N list of `Instances`s containing the proposals sampled for training.

Semantic Segmentation

tllib.vision.models.segmentation.deeplabv2.deeplabv2_resnet101(num_classes=19, pretrained_backbone=True)[source]

Constructs a DeepLabV2 model with a ResNet-101 backbone.

Parameters
  • num_classes (int, optional) – number of classes. Default: 19

  • pretrained_backbone (bool, optional) – If True, returns a model pre-trained on ImageNet. Default: True.

Keypoint Detection

PoseResNet

tllib.vision.models.keypoint_detection.pose_resnet.pose_resnet101(num_keypoints, pretrained_backbone=True, deconv_with_bias=False, finetune=False, progress=True, **kwargs)[source]

Constructs a Simple Baseline model with a ResNet-101 backbone.

Parameters
  • num_keypoints (int) – number of keypoints

  • pretrained_backbone (bool, optional) – If True, returns a model pre-trained on ImageNet. Default: True.

  • deconv_with_bias (bool, optional) – Whether use bias in the deconvolution layer. Default: False

  • finetune (bool, optional) – Whether use 10x smaller learning rate in the backbone. Default: False

  • progress (bool, optional) – If True, displays a progress bar of the download to stderr. Default: True

class tllib.vision.models.keypoint_detection.pose_resnet.PoseResNet(backbone, upsampling, feature_dim, num_keypoints, finetune=False)[source]

Simple Baseline for keypoint detection.

Parameters
  • backbone (torch.nn.Module) – Backbone to extract 2-d features from data

  • upsampling (torch.nn.Module) – Layer to upsample image feature to heatmap size

  • feature_dim (int) – The dimension of the features from upsampling layer.

  • num_keypoints (int) – Number of keypoints

  • finetune (bool, optional) – Whether use 10x smaller learning rate in the backbone. Default: False

class tllib.vision.models.keypoint_detection.pose_resnet.Upsampling(in_channel=2048, hidden_dims=(256, 256, 256), kernel_sizes=(4, 4, 4), bias=False)[source]

3-layers deconvolution used in Simple Baseline.

Joint Loss

class tllib.vision.models.keypoint_detection.loss.JointsMSELoss(reduction='mean')[source]

Typical MSE loss for keypoint detection.

Parameters

reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output. Default: 'mean'

Inputs:
  • output (tensor): heatmap predictions

  • target (tensor): heatmap labels

  • target_weight (tensor): whether the keypoint is visible. All keypoint is visible if None. Default: None.

Shape:
  • output: \((minibatch, K, H, W)\) where K means the number of keypoints, H and W is the height and width of the heatmap respectively.

  • target: \((minibatch, K, H, W)\).

  • target_weight: \((minibatch, K)\).

  • Output: scalar by default. If reduction is 'none', then \((minibatch, K)\).

class tllib.vision.models.keypoint_detection.loss.JointsKLLoss(reduction='mean', epsilon=0.0)[source]

KL Divergence for keypoint detection proposed by Regressive Domain Adaptation for Unsupervised Keypoint Detection.

Parameters

reduction (str, optional) – Specifies the reduction to apply to the output: 'none' | 'mean'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output. Default: 'mean'

Inputs:
  • output (tensor): heatmap predictions

  • target (tensor): heatmap labels

  • target_weight (tensor): whether the keypoint is visible. All keypoint is visible if None. Default: None.

Shape:
  • output: \((minibatch, K, H, W)\) where K means the number of keypoints, H and W is the height and width of the heatmap respectively.

  • target: \((minibatch, K, H, W)\).

  • target_weight: \((minibatch, K)\).

  • Output: scalar by default. If reduction is 'none', then \((minibatch, K)\).

Re-Identification

Models

class tllib.vision.models.reid.resnet.ReidResNet(*args, **kwargs)[source]

Modified ResNet architecture for ReID from Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification (ICLR 2020). We change stride of \(layer4\_group1\_conv2, layer4\_group1\_downsample1\) to 1. During forward pass, we will not activate self.relu. Please refer to source code for details.

@author: Baixu Chen @contact: cbx_99_hasta@outlook.com

tllib.vision.models.reid.resnet.reid_resnet18(pretrained=False, progress=True, **kwargs)[source]

Constructs a Reid-ResNet-18 model.

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

tllib.vision.models.reid.resnet.reid_resnet34(pretrained=False, progress=True, **kwargs)[source]

Constructs a Reid-ResNet-34 model.

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

tllib.vision.models.reid.resnet.reid_resnet50(pretrained=False, progress=True, **kwargs)[source]

Constructs a Reid-ResNet-50 model.

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

tllib.vision.models.reid.resnet.reid_resnet101(pretrained=False, progress=True, **kwargs)[source]

Constructs a Reid-ResNet-101 model.

Parameters
  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr

class tllib.vision.models.reid.identifier.ReIdentifier(backbone, num_classes, bottleneck=None, bottleneck_dim=-1, finetune=True, pool_layer=None)[source]

Person reIdentifier from Bag of Tricks and A Strong Baseline for Deep Person Re-identification (CVPR 2019). Given 2-d features \(f\) from backbone network, the authors pass \(f\) through another BatchNorm1d layer and get \(bn\_f\), which will then pass through a Linear layer to output predictions. During training, we use \(f\) to compute triplet loss. While during testing, \(bn\_f\) is used as feature. This may be a little confusing. The figures in the origin paper will help you understand better.

property features_dim

The dimension of features before the final head layer

get_parameters(base_lr=1.0, rate=0.1)[source]

A parameter list which decides optimization hyper-parameters, such as the relative learning rate of each layer

Loss

class tllib.vision.models.reid.loss.TripletLoss(margin, normalize_feature=False)[source]

Triplet loss augmented with batch hard from In defense of the Triplet Loss for Person Re-Identification (ICCV 2017).

Parameters
  • margin (float) – margin of triplet loss

  • normalize_feature (bool, optional) – if True, normalize features into unit norm first before computing loss. Default: False.

Sampler

class tllib.utils.data.RandomMultipleGallerySampler(dataset, num_instances=4)[source]

Sampler from In defense of the Triplet Loss for Person Re-Identification (ICCV 2017). Assume there are \(N\) identities in the dataset, this implementation simply samples \(K\) images for every identity to form an iter of size \(N\times K\). During training, we will call __iter__ method of pytorch dataloader once we reach a StopIteration, this guarantees every image in the dataset will eventually be selected and we are not wasting any training data.

Parameters
  • dataset (list) – each element of this list is a tuple (image_path, person_id, camera_id)

  • num_instances (int, optional) – number of images to sample for every identity (\(K\) here)

Docs

Access comprehensive documentation for Transfer Learning Library

View Docs

Tutorials

Get started for Transfer Learning Library

Get Started

Paper List

Get started for transfer learning

View Resources