Media Summary: Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard The podcast delves into a research paper on An illustrated, plain-English walkthrough of the SubQ-1.1-Small Technical Report from Subquadratic AI — a long-context ...

280 Native Sparse Attention From - Detailed Analysis & Overview

Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard The podcast delves into a research paper on An illustrated, plain-English walkthrough of the SubQ-1.1-Small Technical Report from Subquadratic AI — a long-context ... In this AI Research Roundup episode, Alex discusses the paper: 'Full

Photo Gallery

#280 Native sparse attention from DeepSeek
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
DeepSeek Sparse Attention Explained: 80% Cheaper Long-Context AI
How Attention Got So Efficient [GQA/MLA/DSA]
What is Native Sparse Attention?
2502.11089 - Native Sparse Attention: Hardware Aligned and Natively Trainable Sparse Attention
Native Sparse Attention  Hardware Aligned and Natively Trainable Sparse Attention
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Native Sparse Attention- Hardware-Aligned and Natively Trainable Sparse Attention(DeepSeek 2025)
Native Sparse Attention Boosts Speed by 6x: Long Text Processing with Large Language Models
DeepSeek new paper—Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
SubQ-1.1-Small: Subquadratic Sparse Attention Explained (Theory Illustrated)
View Detailed Profile
#280 Native sparse attention from DeepSeek

#280 Native sparse attention from DeepSeek

Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper: https://arxiv.org/abs/2502.11089 Notes: ...

DeepSeek Sparse Attention Explained: 80% Cheaper Long-Context AI

DeepSeek Sparse Attention Explained: 80% Cheaper Long-Context AI

00:00:00 Introduction to DeepSeek

How Attention Got So Efficient [GQA/MLA/DSA]

How Attention Got So Efficient [GQA/MLA/DSA]

Attention

What is Native Sparse Attention?

What is Native Sparse Attention?

What is

2502.11089 - Native Sparse Attention: Hardware Aligned and Natively Trainable Sparse Attention

2502.11089 - Native Sparse Attention: Hardware Aligned and Natively Trainable Sparse Attention

title:

Native Sparse Attention  Hardware Aligned and Natively Trainable Sparse Attention

Native Sparse Attention Hardware Aligned and Natively Trainable Sparse Attention

https://arxiv.org/abs/2502.11089.

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

The podcast delves into a research paper on

Native Sparse Attention- Hardware-Aligned and Natively Trainable Sparse Attention(DeepSeek 2025)

Native Sparse Attention- Hardware-Aligned and Natively Trainable Sparse Attention(DeepSeek 2025)

Native Sparse Attention

Native Sparse Attention Boosts Speed by 6x: Long Text Processing with Large Language Models

Native Sparse Attention Boosts Speed by 6x: Long Text Processing with Large Language Models

Reference: Arxiv: https://arxiv.org/abs/2502.11089 MoBoard (Video Maker): https://moboard.netlify.app/

DeepSeek new paper—Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

DeepSeek new paper—Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper: https://arxiv.org/abs/2502.11089 RibbitRibbit: ...

SubQ-1.1-Small: Subquadratic Sparse Attention Explained (Theory Illustrated)

SubQ-1.1-Small: Subquadratic Sparse Attention Explained (Theory Illustrated)

An illustrated, plain-English walkthrough of the SubQ-1.1-Small Technical Report from Subquadratic AI — a long-context ...

RTPurbo: 100-Step Sparse Attention for LLMs

RTPurbo: 100-Step Sparse Attention for LLMs

In this AI Research Roundup episode, Alex discusses the paper: 'Full