This article presents Mamba-ND, a design that extends the Mamba architecture to multi-dimensional data. Transformers, the de-facto architecture for sequence modeling, have significant compute and memory complexity. Mamba, a state space model, has shown comparable performance with linear scaling. Mamba-ND extends this to multi-dimensional data, showing competitive performance on benchmarks like ImageNet-1K classification, HMDB-51 action recognition, and ERA5 weather forecasting. The design unravels input data across different dimensions, providing a flexible and scalable solution for neural network architectures.
Publication date: 8 Feb 2024
Project Page: https://arxiv.org/abs/2402.05892v1
Paper: https://arxiv.org/pdf/2402.05892