Home

HydraLM

HydraLM is a hybrid sub-quadratic language model that combines Gated DeltaNet, Sliding-Window Attention, and chunk-sparse Retrieval Attention to preserve fast scaling while recovering precise long-range information.

O(N)
Training-friendly backbone with recurrent DeltaNet layers.
1M+
Streaming contexts supported through constant-memory serving paths.
0.3.0
Adds Retrieval Attention, Compressive Memory, and MTP.

Why HydraLM exists

Pure linear models scale well but can blur exact recall. Pure softmax models recall well but scale poorly. HydraLM combines both lanes so you can keep speed while restoring local and long-range precision.

What changed in the docs

This page now behaves like a documentation workspace: a persistent desktop sidebar, a mobile drawer, and hash-based sections that open directly to the topic you share.

Where to go next

Start with installation if you are new, quick start if you already have the repo, or architecture if you want the model design first.

The canonical Markdown sources live in the GitHub repository. This HTML layer is an optimized browser-friendly shell for browsing them on hydralm.pages.dev.