Media Summary: Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI ... Open-source LLMs are great for conversational applications, but they can be difficult Why does a 70B language model crawl at 8 tokens per second on one setup, then feel instant on another? The difference is ...

Deep Dive Into Inference Optimization - Detailed Analysis & Overview

Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI ... Open-source LLMs are great for conversational applications, but they can be difficult Why does a 70B language model crawl at 8 tokens per second on one setup, then feel instant on another? The difference is ... LLM Caching strategies. As Large Language Models (LLMs) migrate from massive data centers Philip Kiely, Head of Developer Relations at Baseten, presents the “Golden Triangle” of

Photo Gallery

Deep Dive into Inference Optimization for LLMs with Philip Kiely
AI Inference: The Secret to AI's Superpowers
Deep Dive: Optimizing LLM inference
Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou
LLM Inference Optimization Explained — From 8 Tokens/sec to 50+
Deep Dive into LLMs like ChatGPT
Faster LLMs: Accelerate Inference with Speculative Decoding
LLM Inference Optimization. Coherence in KV Cache Management.  LLM Intra-Turn Cache Dynamics.
LLM inference optimization: Architecture, KV cache and Flash attention
AI Optimization Lecture 01 -  Prefill vs Decode - Mastering LLM Techniques from NVIDIA
LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE)
The Golden Triangle of Inference Optimization: Balancing Latency, Throughput, and Quality
View Detailed Profile
Deep Dive into Inference Optimization for LLMs with Philip Kiely

Deep Dive into Inference Optimization for LLMs with Philip Kiely

Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI ...

AI Inference: The Secret to AI's Superpowers

AI Inference: The Secret to AI's Superpowers

Download the AI model guide

Deep Dive: Optimizing LLM inference

Deep Dive: Optimizing LLM inference

Open-source LLMs are great for conversational applications, but they can be difficult

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

LLM

LLM Inference Optimization Explained — From 8 Tokens/sec to 50+

LLM Inference Optimization Explained — From 8 Tokens/sec to 50+

Why does a 70B language model crawl at 8 tokens per second on one setup, then feel instant on another? The difference is ...

Deep Dive into LLMs like ChatGPT

Deep Dive into LLMs like ChatGPT

This is a general audience

Faster LLMs: Accelerate Inference with Speculative Decoding

Faster LLMs: Accelerate Inference with Speculative Decoding

Ready

LLM Inference Optimization. Coherence in KV Cache Management.  LLM Intra-Turn Cache Dynamics.

LLM Inference Optimization. Coherence in KV Cache Management. LLM Intra-Turn Cache Dynamics.

LLM Caching strategies. As Large Language Models (LLMs) migrate from massive data centers

LLM inference optimization: Architecture, KV cache and Flash attention

LLM inference optimization: Architecture, KV cache and Flash attention

...

AI Optimization Lecture 01 -  Prefill vs Decode - Mastering LLM Techniques from NVIDIA

AI Optimization Lecture 01 - Prefill vs Decode - Mastering LLM Techniques from NVIDIA

Video 1 of 6 | Mastering LLM Techniques:

LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE)

LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE)

Part 2 of 5

The Golden Triangle of Inference Optimization: Balancing Latency, Throughput, and Quality

The Golden Triangle of Inference Optimization: Balancing Latency, Throughput, and Quality

Philip Kiely, Head of Developer Relations at Baseten, presents the “Golden Triangle” of

Understand training and inference optimizations in deep learning: Technical Deep Dive #3

Understand training and inference optimizations in deep learning: Technical Deep Dive #3

In