Premier-TACO is a Few-Shot Policy Learner: Pretraining Multitask Representation via Temporal Action-Driven Contrastive Loss:

Ruijie Zheng¹, Yongyuan Liang¹, Xiyao Wang¹, Shuang Ma³, Hal Daumé III³
Huazhe Xu², John Langford³, Praveen Palanisamy³ Kalyan Shankar Basu³ Furong Huang¹

University of Maryland, College Park¹ Tsinghua University²
Microsoft Research³

Paper Code

Premier-TACO pretrains a general feature representation using a multitask offline datasets, capturing essential environmental dynamics. This representation can then be fine-tuned to specific tasks with few expert demonstrations.

Performance of Premier-TACO pretrained visual representation for few-shot imitation learning on downstream unseen tasks from Deepmind Control Suite, MetaWorld, and LIBERO. LfS here represents learning from scratch.

Method

Building upon the success of temporal contrastive loss, exemplified by TACO, in acquiring latent state representations that encapsulate individual task dynamics, our goal is to foster representation learning that effectively captures the intrinsic dynamics spanning a diverse set of tasks found in offline datasets.

For Premier-TACO, we propose a straightforward yet highly effective mechanism for selecting challenging negative examples. Instead of treating all the remaining examples in the batch as negatives, Premier-TACO selects the negative example from a window within the same episode centered at positive examples as shown in the Figure above. This approach is both computationally efficient and more statistically powerful due to negative examples which are challenging to distinguish from similar positive examples, forcing the model capture temporal dynamics differentiating between positive and negative examples.

Generalizing to unseen embodiments

Feature representation pretrained by Premier-TACO also exhibits the capability to generalize across distinct embodiments.

Generalizing to unseen tasks with unseen views

Visual representation pretrained by Premier-TACO also possess the capacity to adapt to unseen tasks under novel camera views.

(Left): generalizing to unseen embodiments. (Right): generalizing to unseen tasks with unseen views.

Finetune existing large pretrained visual encoders (PVRs)

The learning objective of Premier-TACO could also be used to finetune existing large pretrained visual encoder such as R3M.

Finetune R3M, a generalized Pretrained Visual Encoder with Premier-TACO learning objective vs. R3M with in-domain finetuning proposed in Nicklas et al. on Deepmind Control Suite.

Finetune R3M, a generalized Pretrained Visual Encoder with Premier-TACO learning objective vs. R3M with in-domain finetuning proposed in Nicklas et al. on MetaWorld.

Resiliency to low-quality pretraining data

Pretrained visual representation of Premier-TACO is also resilient to low-quality data. Across all downstream tasks in Deepmind Control Suite, even when using randomly collected data for pretraining, the Premier-TACO pretrained model still maintains a significant advantage over learning-from-scratch.