Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss:

University of Maryland, College Park1      Tsinghua University2
Microsoft Research3

Premier-TACO pretrains a general feature representation using a multitask offline datasets, capturing essential environmental dynamics. This representation can then be fine-tuned to specific tasks with few expert demonstrations.

Description of First Image

Performance of Premier-TACO pretrained visual representation for few-shot imitation learning on downstream unseen tasks from Deepmind Control Suite, MetaWorld, and LIBERO. LfS here represents learning from scratch.

Abstract

We present Premier-TACO, a multitask feature representation learning approach designed to improve few-shot policy learning efficiency in sequential decision-making tasks. Premier-TACO leverages a subset of multitask offline datasets for pretraining a general feature representation, which captures critical environmental dynamics and is fine-tuned using minimal expert demonstrations. It advances the temporal action con- trastive learning (TACO) objective, known for state-of-the-art results in visual control tasks, by incorporating a novel negative example sampling strategy. This strategy is crucial in significantly boosting TACO’s computational efficiency, making large-scale multitask offline pretraining feasible. Our extensive empirical evaluation in a diverse set of continuous control benchmarks including Deepmind Control Suite, MetaWorld, and LIBERO demonstrate Premier-TACO’s effectiveness in pretraining visual representations, significantly enhancing few-shot imitation learning of novel tasks.

Method

Building upon the success of temporal contrastive loss, exemplified by TACO, in acquiring latent state representations that encapsulate individual task dynamics, our goal is to foster representation learning that effectively captures the intrinsic dynamics spanning a diverse set of tasks found in offline datasets.

Description of First Image

For Premier-TACO, we propose a straightforward yet highly effective mechanism for selecting challenging negative examples. Instead of treating all the remaining examples in the batch as negatives, Premier-TACO selects the negative example from a window within the same episode centered at positive examples as shown in the Figure above. This approach is both computationally efficient and more statistically powerful due to negative examples which are challenging to distinguish from similar positive examples, forcing the model capture temporal dynamics differentiating between positive and negative examples.

Generalizing to unseen embodiments

Feature representation pretrained by Premier-TACO also exhibits the capability to generalize across distinct embodiments.

Generalizing to unseen tasks with unseen views

Visual representation pretrained by Premier-TACO also possess the capacity to adapt to unseen tasks under novel camera views.

Description of First Image

(Left): generalizing to unseen embodiments. (Right): generalizing to unseen tasks with unseen views.

Finetune existing large pretrained visual encoders (PVRs)

The learning objective of Premier-TACO could also be used to finetune existing large pretrained visual encoder such as R3M.

Description of First Image

Finetune R3M, a generalized Pretrained Visual Encoder with Premier-TACO learning objective vs. R3M with in-domain finetuning proposed in Nicklas et al. on Deepmind Control Suite.

Description of First Image

Finetune R3M, a generalized Pretrained Visual Encoder with Premier-TACO learning objective vs. R3M with in-domain finetuning proposed in Nicklas et al. on MetaWorld.

Resiliency to low-quality pretraining data

Pretrained visual representation of Premier-TACO is also resilient to low-quality data. Across all downstream tasks in Deepmind Control Suite, even when using randomly collected data for pretraining, the Premier-TACO pretrained model still maintains a significant advantage over learning-from-scratch.

Description of First Image

BibTeX

If you find our method or code relevant to your research, please consider citing the paper as follows:
@misc{zheng2024premiertaco,
        title={Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss}, 
        author={Ruijie Zheng and Yongyuan Liang and Xiyao Wang and Shuang Ma and Hal Daumé III au2 and Huazhe Xu and John Langford and Praveen Palanisamy and Kalyan Shankar Basu and Furong Huang},
        year={2024},
        eprint={2402.06187},
        archivePrefix={arXiv},
        primaryClass={cs.LG}
      }