Sparse Maximal Update Parameterization A

Media Summary: In this video we provide a brief overview of our NeurIPS 2024 paper titled " Bruno Olshausen, UC Berkeley Computational Theories of the Brain. In this AI Research Roundup episode, Alex discusses the paper: '$μ$-Parametrization for Mixture of Experts(2508.09752v1)' This ...

Sparse Maximal Update Parameterization A - Detailed Analysis & Overview

In this video we provide a brief overview of our NeurIPS 2024 paper titled " Bruno Olshausen, UC Berkeley Computational Theories of the Brain. In this AI Research Roundup episode, Alex discusses the paper: '$μ$-Parametrization for Mixture of Experts(2508.09752v1)' This ... Yandex School of Data Analysis Conference Machine Learning: Prospects and Applications ... SAME: Sparse and Anchored Model Editing - CVPR 2026 Highlight Join our Discord community ‍ ‍ ‍ In this video I cover "Tensor Programs V: Tuning Large ...

MIT 15.773 Hands-On Deep Learning Spring 2024 Instructor: Rama Ramakrishnan View the complete course: ... In this video, we explore Bayesian Optimization, which constructs probabilistic models of unknown functions and strategically ... Short intro video for HPCA 2021 paper: "SpAtten: Efficient Compressive sensing (CS) as an approach for data acquisition has recently received much attention. In CS, the signal recovery ...

Photo Gallery

Sparse maximal update parameterization: A holistic approach to sparse training dynamics

Pushing the Limits of Sparse Attention in LLMs - Marcos Treviso | ASAP 49

The Sparse Manifold Transform

µP for MoE: Hyperparams that Transfer

MiniMax Sparse Attention: Efficient Blockwise Sparsity for Ultra-Long Contexts

MiniMax Sparse Attention: Blockwise Sparse GQA with 28x Attention Compute Reduction at 1M Conte

Sparse Regression and Auto Regression model in Seismic Data Processing - Maxim Ryabinskiy

SAME: Sparse and Anchored Model Editing - CVPR 2026 Highlight

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer (μTransfer)

10: Generative AI – Adapting LLMs with Parameter-Efficient Fine-Tuning

Bayesian Optimization

Short Intro HPCA'21 SpAtten: Efficient Sparse Attention Architecture with Cascade Token/Head Pruning

View Detailed Profile

Sparse maximal update parameterization: A holistic approach to sparse training dynamics

Sparse maximal update parameterization: A holistic approach to sparse training dynamics

In this video we provide a brief overview of our NeurIPS 2024 paper titled "

Pushing the Limits of Sparse Attention in LLMs - Marcos Treviso | ASAP 49

Pushing the Limits of Sparse Attention in LLMs - Marcos Treviso | ASAP 49

Paper: https://arxiv.org/pdf/2502.12082 Speaker: https://mtreviso.github.io/ Slides: ...

The Sparse Manifold Transform

The Sparse Manifold Transform

Bruno Olshausen, UC Berkeley https://simons.berkeley.edu/talks/bruno-olshausen-4-18-18 Computational Theories of the Brain.

µP for MoE: Hyperparams that Transfer

µP for MoE: Hyperparams that Transfer

In this AI Research Roundup episode, Alex discusses the paper: '$μ$-Parametrization for Mixture of Experts(2508.09752v1)' This ...

MiniMax Sparse Attention: Efficient Blockwise Sparsity for Ultra-Long Contexts

MiniMax Sparse Attention: Efficient Blockwise Sparsity for Ultra-Long Contexts

Introducing the MiniMax

MiniMax Sparse Attention: Blockwise Sparse GQA with 28x Attention Compute Reduction at 1M Conte

MiniMax Sparse Attention: Blockwise Sparse GQA with 28x Attention Compute Reduction at 1M Conte

This video breaks down MiniMax

Sparse Regression and Auto Regression model in Seismic Data Processing - Maxim Ryabinskiy

Sparse Regression and Auto Regression model in Seismic Data Processing - Maxim Ryabinskiy

Yandex School of Data Analysis Conference Machine Learning: Prospects and Applications ...

SAME: Sparse and Anchored Model Editing - CVPR 2026 Highlight

SAME: Sparse and Anchored Model Editing - CVPR 2026 Highlight

SAME: Sparse and Anchored Model Editing - CVPR 2026 Highlight

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer (μTransfer)

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer (μTransfer)

Join our Discord community ‍ ‍ ‍ https://discord.gg/peBrCpheKE In this video I cover "Tensor Programs V: Tuning Large ...

10: Generative AI – Adapting LLMs with Parameter-Efficient Fine-Tuning

10: Generative AI – Adapting LLMs with Parameter-Efficient Fine-Tuning

MIT 15.773 Hands-On Deep Learning Spring 2024 Instructor: Rama Ramakrishnan View the complete course: ...

Bayesian Optimization

Bayesian Optimization

In this video, we explore Bayesian Optimization, which constructs probabilistic models of unknown functions and strategically ...

Short Intro HPCA'21 SpAtten: Efficient Sparse Attention Architecture with Cascade Token/Head Pruning

Short Intro HPCA'21 SpAtten: Efficient Sparse Attention Architecture with Cascade Token/Head Pruning

Short intro video for HPCA 2021 paper: "SpAtten: Efficient

ECE 804 - Dr Bhaskar D. Rao - Bayesian Methods for Sparse Signal Recovery and Compressed Sensing

ECE 804 - Dr Bhaskar D. Rao - Bayesian Methods for Sparse Signal Recovery and Compressed Sensing

Compressive sensing (CS) as an approach for data acquisition has recently received much attention. In CS, the signal recovery ...