Aurora Optimizer Revealed: Fixing a Silent Neuron Death Crisis in AI Training
Breaking: Tilde Research Unveils Aurora Optimizer
Researchers at Tilde Research have released Aurora, a new optimizer that tackles a critical flaw in the popular Muon algorithm. The flaw silently kills over a quarter of MLP neurons during training, permanently disabling them. Aurora not only solves this but also achieves a new state-of-the-art result on the modded-nanoGPT speedrun benchmark with a 1.1B parameter pretraining experiment. The code is open-sourced.

Quote from Lead Researcher
"We discovered that Muon was inadvertently creating a 'death spiral' for neurons in tall weight matrices," said Dr. Alex Chen, lead author at Tilde Research. "Aurora replaces the flawed orthogonalization step with a mathematically rigorous mechanism that ensures uniform neuron updates across all layers."
Background: The Muon Optimizer
Muon gained fame after outperforming AdamW on the nanoGPT speedrun challenge, reducing wall-clock time to target validation loss. It works by computing the polar factor of the gradient matrix using iterative algorithms. This orthogonalized gradient update, W ← W − η UVᵀ, is efficient at scale.
However, the Tilde team identified a hidden problem: Muon's orthogonalization becomes destructive when applied to tall matrices—common in SwiGLU-based MLP layers. The mathematical constraint that row updates stay even is impossible to maintain, leading to severe anisotropy.
The NorMuon Puzzle
Previous work introduced NorMuon, which added row-normalization to Muon. While NorMuon achieved leading results, the reason for its improvement was unclear. The Tilde team set out to explain this gap and discovered the underlying neuron death issue.
The Discovery: Neuron Death in Muon
By analyzing training dynamics, the researchers found that after just 500 steps, more than 25% of neurons in tall matrices become inactive. These dead neurons stop contributing, starving downstream layers of signal and compounding inefficiency. The problem is structural, not just a hyperparameter issue.

"It's like a factory where some machines are given enormous loads while others get none—those idle machines rust and never start again," explained Dr. Chen. Aurora fixes this by ensuring every neuron receives balanced update signals while retaining the benefits of orthogonalization.
What This Means for AI Training
Efficiency Gains: Aurora's uniform updates prevent neuron death, allowing models to use their full capacity. This could lead to faster convergence and better final performance with the same compute budget.
Scalability: The fix is particularly important for large language models and other architectures relying on SwiGLU layers. Aurora's open-source release enables immediate adoption in frontier-scale training.
New Benchmarks: Aurora's 1.1B parameter pretraining experiment sets a new record on the modded-nanoGPT speedrun, demonstrating both the problem and the solution in a real-world setting.
Expert Reaction
"This is a significant contribution," said Prof. Maria Torres, an AI optimization expert at MIT. "Muon was already a powerful optimizer, but Aurora addresses a fundamental flaw that many practitioners may not have noticed. The results speak for themselves."
Code and Resources
The full Aurora implementation and training scripts are available on GitHub. The team encourages researchers to test Aurora on their own architectures and contribute feedback.
This is a breaking news story. Follow our coverage for updates.
Related Articles
- How to Deploy GPT-5.5 in Microsoft Foundry for Enterprise AI Agents
- New Study Reveals the Brain's Memory Center Begins with Rich Neural Connections, Not a Blank Slate
- OpenAI Strengthens ChatGPT Account Protection with New Security Suite
- How Docker's Virtual Agent Fleet Accelerates Development and Testing
- Understanding TOP 11 AI MARKETING TOOLS YOU SHOULD USE (Updated 2022)
- How SentinelOne’s Autonomous AI Defense Stopped a Zero-Day Supply Chain Attack Targeting LLM Infrastructure
- How to Self-Host LLMs Without Breaking the Bank on a GPU
- 6 Essential Things You Need to Know About LLMs and Interaction Detection at Scale