Параллельные вычисления xLSTM - несовпадение размерностей.
Краткое содержание
Inthisrecent paper, a new architecture is proposed, called xLSTM. I've implemented the sequential version in PyTorch, but it's slower than I would like, so I'm now implementing the parallel version that's explained in the appendix (page 25-26). I feel like this page might contain a mistake, or maybe I'm missing something, so I wanted to check here.The issue is as follows. We consider an input sequence$X\in\mathbb{R}^{T\times d}$and we obtain two matrices$F, I\in\mathbb{R}^{T\times T}$(see the paper for details), which combine into$D\in\mathbb{R}^{T\times T}$. All well and good. Now, it is stated that$Q, K, V\in\mathbb{R}^{T\times d}$. I believe this should be$\mathbb{R}^{T\times e}$where$e$denotes the embedding dimension, but that is not the main source of confusion. The point is that we have the equation$C=QK^T\odot D$in the paper, and subsequently,$H$is obtained as$CV$. But$QK^T\in \mathbb{R}^{T\times e\times e}$, which makes sense since this is a sequence of linear transformations which will transform the sequence$V\in\mathbb{R}^{T\times e}$. But then the dimensions of$QK^T$do not match those of$D$, so the equation$QK^T\odot D$does not make sense to me.Have I missed something here, or is there an issue with the equation in the paper?
Полный текст статьи пока не загружен.