Evolving Parameterized Prompt Memory for Continual Learning

1Xi'an Jiaotong University, 2Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
in AAAI 2024 (Oral, Top 2.3%)

Evolving Parameterized Prompt Memory for Continual Learning

Delving into prompt-based continual learning, we are interested in scenarios with non-expandable prompt pools and end-to-end training devoid of discrete selection. Our solution, EvoPrompt (Evolving Parameterized Prompt), leverages a multi-layer perceptron (MLP) bottleneck for formulating prompting function. These prompts are stored in the weight space of the network, gradually evolving as new tasks are learned, all without expansion. Additionally, we present a novel method for synthesizing future classifiers from previously acquired knowledge. Remarkably, our approach employs minimal parameters, being 5X and 13X smaller than CODA-P, while exhibiting superior performance.

Abstract

Recent studies have demonstrated the potency of leveraging prompts in Transformers for continual learning (CL). Nevertheless, employing a discrete key-prompt bottleneck can lead to selection mismatches and inappropriate prompt associations during testing. Furthermore, this approach hampers adaptive prompting due to the lack of shareability among nearly identical instances at a more granular level. To address these challenges, we introduce the Evolving Parameterized Prompt Memory (EvoPrompt), a novel method involving adaptive and continuous prompting attached to pre-trained Vision Transformer (ViT), conditioned on specific instance. We formulate a continuous prompt function as a neural bottleneck and encode the collection of prompts on network weights. We establish a paired prompt memory system consisting of a stable reference and a flexible working prompt memory. Inspired by linear mode connectivity, we progressively fuse the working prompt memory and reference prompt memory during inter-task periods, resulting in continually evolved prompt memory. This fusion involves aligning functionally equivalent prompts using optimal transport and aggregating them in parameter space with an adjustable bias based on prompt node attribution. Additionally, to enhance backward compatibility, we propose compositional classifier initialization, which leverages prior prototypes from pre-trained models to guide the initialization of new classifiers in a subspace-aware manner. Comprehensive experiments validate that our approach achieves state-of-the-art performance in both class and domain incremental learning scenarios.

Evolving Parameterized Prompt Memory

Proposed Components

  1. Reformulating Incremental Prompt Tuning: We use a Feedforward Neural Networks (FFNs) to shift from discrete to continuous prompting, employing a Multilayer Perceptron (MLP) bottleneck for prompt encoding in the neural weight space, called prompt memory.
  2. Prompt Memory Evolution via Incremental Fusion: We adopt a dual memory approach, combining plastic working prompt memory (WPM) and stable reference prompt memory (RPM). These memories are integrated with alignment and attribution-dependent momentum, instead of appended.
  3. Compositional Classifier Initialization: Leveraging insights from the anchoring-and-adjustment heuristic in psychology, we predict future classifiers by referencing current classifiers and the prototype relationships between classes.

Prompt Memory Architecture and Its Evolution

The parameterization of memory prompts through Feedforward Neural Networks (FFN) and a training routine that involves step-by-step merging with alignment and attribution-aware momentum. When given input, the linear key memory identifies patterns to calculate a positive memory coefficient. This coefficient is then employed to allocate weights to the value memory, resulting in the ultimate prompt. We introduce a dual-functional prompt memory consisting of Reference Prompt Memory (RPM) and Working Prompt Memory (WPM). The RPM encompasses all prompts encountered so far, while the WPM is task-specific and adjusts swiftly to emerging tasks.

To address catastrophic forgetting, we adopt ideas from linear mode connectivity, featuring a singular basin with a low error landscape amid different task solutions. Concretely, we introduce incremental fusion during inter-task periods to align the functionality of WPM with RPM, and subsequently fuse them in parameter space. We formulate the alignment as an Optimal Transport (OT) problem and the fusion as a linearly weighted aggregation adjusted by neuron attribution.

Compositional Classifier Initialization

The introduction of Compositional Classifier Initialization (CCI) is based on the bias adjustment heuristic. It entails estimating the unknown, such as a future classifiers, by leveraging relevant existing information. We compute class mean embeddings, or prototypes, from pre-trained models. Through attention mechanisms, we establish inter-class relationships among these prototypes, forming a foundational relation between past and target tasks. The resulting probability simplex from multihead attention is then used to linearly combine previous classifiers, facilitating the initialization of future classifiers and introducing implicit bias.

Empirical Benchmark Results

Class Incremental Learning

Methods evaluated on Split CIFAR-100.

Method 5 Steps 10 Steps 20 Steps Avg
Acc(↑) Forget(↓) Acc(↑) Forget(↓) Acc(↑) Forget(↓) Acc(↑) Forget(↓)
FT-seq 73.17 2.95 62.77 20.73 55.97 32.74 63.97 (+0.00) 18.81 (-0.00)
L2P 86.53 7.67 84.97 8.21 83.39 10.18 84.96 (+20.99) 8.69 (-10.12)
DualPrompt 88.26 5.72 86.83 6.21 84.11 8.75 86.40 (+22.43) 6.89 (-11.92)
ESN 88.09 5.18 85.96 4.54 82.71 6.44 85.59 (+21.62) 5.39 (-13.42)
CODA-P-S 88.90 6.29 86.33 6.29 81.71 9.41 85.65 (+21.68) 7.33 (-11.48)
CODA-P 89.16 6.08 87.31 5.95 81.69 9.85 86.05 (+22.08) 7.29 (-11.52)
EvoPrompt-S 88.69 9.93 87.95 2.38 84.98 3.42 87.20 (+23.23) 5.24 (-13.57)
EvoPrompt 88.97 10.12 87.97 2.60 84.64 3.98 87.19 (+23.22) 5.57 (-13.24)

Methods evaluated on Split ImageNet-R.

Method 5 Steps 10 Steps 20 Steps Avg
Acc(↑) Forget(↓) Acc(↑) Forget(↓) Acc(↑) Forget(↓) Acc(↑) Forget(↓)
FT-seq 61.41 5.76 50.28 24.28 39.25 40.38 50.31 (+0.00) 23.48 (-0.00)
L2P 66.63 6.65 64.05 10.05 60.34 14.44 63.67 (+13.36) 10.38 (-13.10)
DualPrompt 71.06 4.19 69.71 5.44 66.26 8.74 69.01 (+18.70) 6.12 (-17.36)
ESN 73.42 3.79 71.07 4.99 64.77 6.65 69.75 (+19.44) 5.14 (-18.34)
CODA-P-S 73.80 5.56 71.95 5.92 69.67 6.23 71.81 (+21.50) 5.90 (-17.58)
CODA-P 73.77 6.60 72.42 6.26 70.18 5.53 72.12 (+21.81) 6.13 (-17.35)
EvoPrompt-S 76.79 9.84 76.22 2.33 74.68 2.70 75.90 (+25.59) 4.96 (-18.52)
EvoPrompt 77.16 9.89 76.83 2.78 74.41 2.56 76.13 (+25.82) 5.08 (-18.40)

Domain Incremental Learning

Benchmark results evaluated on CORe50 dataset.

Method Test Acc. (%) Δ Acc. (%)
NME-seq 78.20 +00.00
L2P 78.33 +0.13
S-iPrompts 83.13 +4.93
S-liPrompts 89.06 +10.86
ESN 91.80 +13.60
EvoPrompt-S 94.77 +16.57
EvoPrompt 95.27 +17.07

Online Learning

Benchmark results evaluated on online setting on both Split CIFAR-100 and Split ImageNet-R.

Method Split CIFAR-100 Split ImageNet-R
Acc.(↑) Forget.(↓) Acc.(↑) Forget.(↓)
L2P 80.49 8.74 57.52 6.54
DualPrompt 82.17 7.52 61.09 4.40
ESN 74.17 10.59 - -
CODA-P-S 79.46 11.92 64.60 6.09
CODA-P 81.07 10.10 66.47 5.42
EvoPrompt-S 84.23 1.64 73.56 3.82
EvoPrompt 84.72 0.89 74.05 3.66

Further Analysis

Stability Gap

There is no observable stability gap in both random and compositional classifier initialization. Despite this, our compositional initialization displays stability in its performance, smooth transitions between tasks, and rapid acquisition of current knowledge.

Separability and Backward-compatibility

Through our initialization method, we minimize the distances among points within the same class, illustrating increased compactness within classes. Simultaneously, we maintain balanced margins between different classes, resulting in smaller inter-class distances compared to random initialization, thus improving backward compatibility.

Conclusion

This paper presents EvoPrompt, a prompt-based approach that employs continually evolved parameterized memory prompt with continuous bottleneck using FFN and attribution-aware incremental prompt fusion, which facilitates the sharing and adaptability during prompting. Maximizing learned knowledge is achieved through the introduction of compositional classifier initialization, enhancing both learning stability and backward compatibility. Our framework scales to multiple steps scenarios and datasets with high intra-diversity, such as Split ImageNet-R and CORe50, proving the generalization capability introduced by our proposed method. Comprehensive experiments exhibit superior performance compared to the state-of-the-art.

BibTeX

@article{kurniawan2024evoprompt,
      title = {Evolving Parameterized Prompt Memory for Continual Learning},
      author = {Kurniawan, Muhammad Rifki and Song, Xiang and Ma, Zhiheng and He, Yuhang and Gong, Yihong and Yang, Qi and Wei, Xing},
      year = {2024},
      booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
}