Delving into prompt-based continual learning, we are interested in scenarios with non-expandable prompt pools and end-to-end training devoid of discrete selection. Our solution, EvoPrompt (Evolving Parameterized Prompt), leverages a multi-layer perceptron (MLP) bottleneck for formulating prompting function. These prompts are stored in the weight space of the network, gradually evolving as new tasks are learned, all without expansion. Additionally, we present a novel method for synthesizing future classifiers from previously acquired knowledge. Remarkably, our approach employs minimal parameters, being 5X and 13X smaller than CODA-P, while exhibiting superior performance.
Recent studies have demonstrated the potency of leveraging prompts in Transformers for continual learning (CL). Nevertheless, employing a discrete key-prompt bottleneck can lead to selection mismatches and inappropriate prompt associations during testing. Furthermore, this approach hampers adaptive prompting due to the lack of shareability among nearly identical instances at a more granular level. To address these challenges, we introduce the Evolving Parameterized Prompt Memory (EvoPrompt), a novel method involving adaptive and continuous prompting attached to pre-trained Vision Transformer (ViT), conditioned on specific instance. We formulate a continuous prompt function as a neural bottleneck and encode the collection of prompts on network weights. We establish a paired prompt memory system consisting of a stable reference and a flexible working prompt memory. Inspired by linear mode connectivity, we progressively fuse the working prompt memory and reference prompt memory during inter-task periods, resulting in continually evolved prompt memory. This fusion involves aligning functionally equivalent prompts using optimal transport and aggregating them in parameter space with an adjustable bias based on prompt node attribution. Additionally, to enhance backward compatibility, we propose compositional classifier initialization, which leverages prior prototypes from pre-trained models to guide the initialization of new classifiers in a subspace-aware manner. Comprehensive experiments validate that our approach achieves state-of-the-art performance in both class and domain incremental learning scenarios.
The parameterization of memory prompts through Feedforward Neural Networks (FFN) and a training routine that involves step-by-step merging with alignment and attribution-aware momentum. When given input, the linear key memory identifies patterns to calculate a positive memory coefficient. This coefficient is then employed to allocate weights to the value memory, resulting in the ultimate prompt. We introduce a dual-functional prompt memory consisting of Reference Prompt Memory (RPM) and Working Prompt Memory (WPM). The RPM encompasses all prompts encountered so far, while the WPM is task-specific and adjusts swiftly to emerging tasks.
To address catastrophic forgetting, we adopt ideas from linear mode connectivity, featuring a singular basin with a low error landscape amid different task solutions. Concretely, we introduce incremental fusion during inter-task periods to align the functionality of WPM with RPM, and subsequently fuse them in parameter space. We formulate the alignment as an Optimal Transport (OT) problem and the fusion as a linearly weighted aggregation adjusted by neuron attribution.
The introduction of Compositional Classifier Initialization (CCI) is based on the bias adjustment heuristic. It entails estimating the unknown, such as a future classifiers, by leveraging relevant existing information. We compute class mean embeddings, or prototypes, from pre-trained models. Through attention mechanisms, we establish inter-class relationships among these prototypes, forming a foundational relation between past and target tasks. The resulting probability simplex from multihead attention is then used to linearly combine previous classifiers, facilitating the initialization of future classifiers and introducing implicit bias.
Methods evaluated on Split CIFAR-100.
Method | 5 Steps | 10 Steps | 20 Steps | Avg | ||||
---|---|---|---|---|---|---|---|---|
Acc(↑) | Forget(↓) | Acc(↑) | Forget(↓) | Acc(↑) | Forget(↓) | Acc(↑) | Forget(↓) | |
FT-seq | 73.17 | 2.95 | 62.77 | 20.73 | 55.97 | 32.74 | 63.97 (+0.00) | 18.81 (-0.00) |
L2P | 86.53 | 7.67 | 84.97 | 8.21 | 83.39 | 10.18 | 84.96 (+20.99) | 8.69 (-10.12) |
DualPrompt | 88.26 | 5.72 | 86.83 | 6.21 | 84.11 | 8.75 | 86.40 (+22.43) | 6.89 (-11.92) |
ESN | 88.09 | 5.18 | 85.96 | 4.54 | 82.71 | 6.44 | 85.59 (+21.62) | 5.39 (-13.42) |
CODA-P-S | 88.90 | 6.29 | 86.33 | 6.29 | 81.71 | 9.41 | 85.65 (+21.68) | 7.33 (-11.48) |
CODA-P | 89.16 | 6.08 | 87.31 | 5.95 | 81.69 | 9.85 | 86.05 (+22.08) | 7.29 (-11.52) |
EvoPrompt-S | 88.69 | 9.93 | 87.95 | 2.38 | 84.98 | 3.42 | 87.20 (+23.23) | 5.24 (-13.57) |
EvoPrompt | 88.97 | 10.12 | 87.97 | 2.60 | 84.64 | 3.98 | 87.19 (+23.22) | 5.57 (-13.24) |
Methods evaluated on Split ImageNet-R.
Method | 5 Steps | 10 Steps | 20 Steps | Avg | ||||
---|---|---|---|---|---|---|---|---|
Acc(↑) | Forget(↓) | Acc(↑) | Forget(↓) | Acc(↑) | Forget(↓) | Acc(↑) | Forget(↓) | |
FT-seq | 61.41 | 5.76 | 50.28 | 24.28 | 39.25 | 40.38 | 50.31 (+0.00) | 23.48 (-0.00) |
L2P | 66.63 | 6.65 | 64.05 | 10.05 | 60.34 | 14.44 | 63.67 (+13.36) | 10.38 (-13.10) |
DualPrompt | 71.06 | 4.19 | 69.71 | 5.44 | 66.26 | 8.74 | 69.01 (+18.70) | 6.12 (-17.36) |
ESN | 73.42 | 3.79 | 71.07 | 4.99 | 64.77 | 6.65 | 69.75 (+19.44) | 5.14 (-18.34) |
CODA-P-S | 73.80 | 5.56 | 71.95 | 5.92 | 69.67 | 6.23 | 71.81 (+21.50) | 5.90 (-17.58) |
CODA-P | 73.77 | 6.60 | 72.42 | 6.26 | 70.18 | 5.53 | 72.12 (+21.81) | 6.13 (-17.35) |
EvoPrompt-S | 76.79 | 9.84 | 76.22 | 2.33 | 74.68 | 2.70 | 75.90 (+25.59) | 4.96 (-18.52) |
EvoPrompt | 77.16 | 9.89 | 76.83 | 2.78 | 74.41 | 2.56 | 76.13 (+25.82) | 5.08 (-18.40) |
Benchmark results evaluated on CORe50 dataset.
Method | Test Acc. (%) | Δ Acc. (%) |
---|---|---|
NME-seq | 78.20 | +00.00 |
L2P | 78.33 | +0.13 |
S-iPrompts | 83.13 | +4.93 |
S-liPrompts | 89.06 | +10.86 |
ESN | 91.80 | +13.60 |
EvoPrompt-S | 94.77 | +16.57 |
EvoPrompt | 95.27 | +17.07 |
Benchmark results evaluated on online setting on both Split CIFAR-100 and Split ImageNet-R.
Method | Split CIFAR-100 | Split ImageNet-R | ||
---|---|---|---|---|
Acc.(↑) | Forget.(↓) | Acc.(↑) | Forget.(↓) | |
L2P | 80.49 | 8.74 | 57.52 | 6.54 |
DualPrompt | 82.17 | 7.52 | 61.09 | 4.40 |
ESN | 74.17 | 10.59 | - | - |
CODA-P-S | 79.46 | 11.92 | 64.60 | 6.09 |
CODA-P | 81.07 | 10.10 | 66.47 | 5.42 |
EvoPrompt-S | 84.23 | 1.64 | 73.56 | 3.82 |
EvoPrompt | 84.72 | 0.89 | 74.05 | 3.66 |
There is no observable stability gap in both random and compositional classifier initialization. Despite this, our compositional initialization displays stability in its performance, smooth transitions between tasks, and rapid acquisition of current knowledge.
Through our initialization method, we minimize the distances among points within the same class, illustrating increased compactness within classes. Simultaneously, we maintain balanced margins between different classes, resulting in smaller inter-class distances compared to random initialization, thus improving backward compatibility.
This paper presents EvoPrompt, a prompt-based approach that employs continually evolved parameterized memory prompt with continuous bottleneck using FFN and attribution-aware incremental prompt fusion, which facilitates the sharing and adaptability during prompting. Maximizing learned knowledge is achieved through the introduction of compositional classifier initialization, enhancing both learning stability and backward compatibility. Our framework scales to multiple steps scenarios and datasets with high intra-diversity, such as Split ImageNet-R and CORe50, proving the generalization capability introduced by our proposed method. Comprehensive experiments exhibit superior performance compared to the state-of-the-art.
@article{kurniawan2024evoprompt,
title = {Evolving Parameterized Prompt Memory for Continual Learning},
author = {Kurniawan, Muhammad Rifki and Song, Xiang and Ma, Zhiheng and He, Yuhang and Gong, Yihong and Yang, Qi and Wei, Xing},
year = {2024},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
}