Center for Nonlinear Studies

Thursday, April 24, 20252:00 PM - 3:00 PMCNLS Conference Room (TA-3, Bldg 1690)
Seminar
Transformers: A Spatially Non-local Diffusion Implementation of Self-Attention
Kevin QuUniversity of Sydney
The success of transformers in sequential modeling has inspired the search for architectures that are more interpretable and align with biological principles. A recent work on Fractional Neural Sampling (FNS) has used stochastic differential equations (SDEs) driven by Levy processes to characterized neural circuits and sampling in visual-spatial attention for cognitive functions. Motivated by this, we integrate the corresponding fractional diffusion component into the attention mechanism of transformers and refer to it as fractional self-attention (FSA). Under appropriate constructions, FSA can be viewed as a discretization of the fractional Laplacian with order alpha, which is precisely the infinitesimal generator of a symmetric Levy-alpha stable process. The parameter alpha controls the jump size distribution of this underlying stochastic process implicitly generated by self-attention and the non-Gaussian case (alpha < 2) encourages long-range spatial interactions in the embedding space. Through utilizing tools from geometric harmonics and stochastic analysis, we construct a random walk counterpart of fractional diffusion dynamics on a manifold. The model includes a kernel bandwidth hyperparameter epsilon which naturally introduces the notion of a continuous time scale; we also investigate how alpha affects the model performance under various settings. Under small time scales, tools from diffusion maps can additionally be leveraged to obtain dimensionality reduction that respects the underlying geometry and distribution of the initial embeddings and hidden states.

Host: Syed Shah