Media Summary: These talks were given by Sugu Sougoumarane and Stu Hood as part of the South Bay Systems meetup on March 31st, 2026. Learn how kernel density estimation (KDE) works with a simple exam score example. We'll explore how statisticians use kernels, ... Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention ...

Generalized Consensus Native Top K - Detailed Analysis & Overview

These talks were given by Sugu Sougoumarane and Stu Hood as part of the South Bay Systems meetup on March 31st, 2026. Learn how kernel density estimation (KDE) works with a simple exam score example. We'll explore how statisticians use kernels, ... Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention ... tl;dr: This lecture focuses on various advanced decoding strategies that are reshaping how Large Language Models process and ... A trained model gives you a probability distribution over the next token; decoding is how you turn that into actual text — and it's ... LLMs explained—fast and clearly. In 15 minutes we unpack how transformers work: self-attention, multi-head + positional ...

As large language models (LLMs) become increasingly capable, the optimization community is exploring how these tools may ... Attention mechanisms have been the key behind the recent AI boom. What happened after the multi-head attention in the seminal ... Ready to become a certified watsonx Generative AI Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ... A Google TechTalk, presented by Insu Han, 2023-02-02 Algorithms Seminar Series. ABSTRACT: Infinite width limit has shed light ... Speaker: Bogumił Kamiński, SGH Warsaw School of Economics Thursday, June 18th, 2026 ...

Photo Gallery

Generalized Consensus & ​Native Top-K Joins in ParadeDB
Kernel Density Estimation - Explained
No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval (May 2026)
#280 Native sparse attention from DeepSeek
LLMs | Efficient LLM Decoding-II | Lec15.2
Decoding Strategies — Greedy, Temperature, Top-k, Top-p & Beam Search | datarekha
What are the LLM’s Top-P + Top-K ?
Inside Transformers: How Attention Powers Modern LLMs
LLMs and Optimization Modeling: Insights for Education and Practice
How Attention Got So Efficient [GQA/MLA/DSA]
RAG vs. CAG: Solving Knowledge Gaps in AI Models
Fast Neural Kernel Embeddings for General Activations
View Detailed Profile
Generalized Consensus & ​Native Top-K Joins in ParadeDB

Generalized Consensus & ​Native Top-K Joins in ParadeDB

These talks were given by Sugu Sougoumarane and Stu Hood as part of the South Bay Systems meetup on March 31st, 2026.

Kernel Density Estimation - Explained

Kernel Density Estimation - Explained

Learn how kernel density estimation (KDE) works with a simple exam score example. We'll explore how statisticians use kernels, ...

No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval (May 2026)

No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval (May 2026)

Title: No More

#280 Native sparse attention from DeepSeek

#280 Native sparse attention from DeepSeek

Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention ...

LLMs | Efficient LLM Decoding-II | Lec15.2

LLMs | Efficient LLM Decoding-II | Lec15.2

tl;dr: This lecture focuses on various advanced decoding strategies that are reshaping how Large Language Models process and ...

Decoding Strategies — Greedy, Temperature, Top-k, Top-p & Beam Search | datarekha

Decoding Strategies — Greedy, Temperature, Top-k, Top-p & Beam Search | datarekha

A trained model gives you a probability distribution over the next token; decoding is how you turn that into actual text — and it's ...

What are the LLM’s Top-P + Top-K ?

What are the LLM’s Top-P + Top-K ?

VIDEO TITLE What are the LLM's Top-P +

Inside Transformers: How Attention Powers Modern LLMs

Inside Transformers: How Attention Powers Modern LLMs

LLMs explained—fast and clearly. In 15 minutes we unpack how transformers work: self-attention, multi-head + positional ...

LLMs and Optimization Modeling: Insights for Education and Practice

LLMs and Optimization Modeling: Insights for Education and Practice

As large language models (LLMs) become increasingly capable, the optimization community is exploring how these tools may ...

How Attention Got So Efficient [GQA/MLA/DSA]

How Attention Got So Efficient [GQA/MLA/DSA]

Attention mechanisms have been the key behind the recent AI boom. What happened after the multi-head attention in the seminal ...

RAG vs. CAG: Solving Knowledge Gaps in AI Models

RAG vs. CAG: Solving Knowledge Gaps in AI Models

Ready to become a certified watsonx Generative AI Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

Fast Neural Kernel Embeddings for General Activations

Fast Neural Kernel Embeddings for General Activations

A Google TechTalk, presented by Insu Han, 2023-02-02 Algorithms Seminar Series. ABSTRACT: Infinite width limit has shed light ...

Tutorial: Community Detection

Tutorial: Community Detection

Speaker: Bogumił Kamiński, SGH Warsaw School of Economics Thursday, June 18th, 2026 ...