Perverformer Scat Official
1️⃣ Performer – Linear‑time attention via kernel tricks
Linear‑Sparse Transformers: Merging Performers with SCAT
| # | Paper | Year | Idea | |---|-------|------|------| | 1 | (Liu et al. ) | 2023 | Uses Performer’s random‑feature map only on the dense local windows of SCAT, leaving the global sparse connections exact. | | 2 | Hybrid Efficient Attention (HEA) (Gupta et al. ) | 2024 | Provides a unified PyTorch library where you can toggle linear , sparse , or linear‑sparse modes on a per‑layer basis. | | 3 | Fast Autoregressive Generation with Performer‑SCAT (Zhang et al. ) | 2024 | Benchmarks the hybrid on GPT‑style language models up to 2 B parameters; shows ~4× speed‑up vs full softmax at comparable perplexity. |