The study proposes Kernel-Eigen Pair Sparse Variational Gaussian Processes (KEP-SVGP) for building uncertainty-aware self-attention in transformers. The asymmetry of attention kernels is addressed using Kernel SVD (KSVD), yielding reduced complexity. KEP-SVGP fully characterizes the asymmetry and reduces time complexity through a small set of adjoint eigenfunctions from KSVD. The study also derives an evidence lower bound for optimizing variational parameters. The effectiveness and efficiency of KEP-SVGP were validated through experiments on in-distribution, distribution-shift, and out-of-distribution benchmarks.
Publication date: 2 Feb 2024
Project Page: https://arxiv.org/abs/2402.01476v1
Paper: https://arxiv.org/pdf/2402.01476