Media Summary: Download the AI model guide to learn more → Learn more about the technology → Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ... Philip Kiely, Head of Developer Relations at Baseten, presents the “Golden Triangle” of

Optimize Training And Inference With - Detailed Analysis & Overview

Download the AI model guide to learn more → Learn more about the technology → Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ... Philip Kiely, Head of Developer Relations at Baseten, presents the “Golden Triangle” of This video demonstrates the composability of ONNX Runtime Try Voice Writer - speak your thoughts and let AI handle the grammar: Four techniques to This clip clearly contrasts the technical differences between the two core stages of how large language models (LLMs) ...

Faradawn Yang delivers a three-part hands-on workshop covering GPU architecture fundamentals including tensor cores and ... Description: In this talk, we dive into techniques for accelerating Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI ... Learn more about artificial intelligence → In Episode 10 of Mixture of Experts we are talking all hardware ... For more information about Stanford's graduate programs, visit: October 17, 2025 ...

Photo Gallery

AI Inference: The Secret to AI's Superpowers
Faster LLMs: Accelerate Inference with Speculative Decoding
The Golden Triangle of Inference Optimization: Balancing Latency, Throughput, and Quality
Optimize Training and Inference with ONNX Runtime (ORT/ACPT/DeepSpeed)
Quantization vs Pruning vs Distillation: Optimizing NNs for Inference
Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou
LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE)
The Dual Nature of LLMs   Training vs  Inference
Optimizing LLM Training and Inference Performance on GPUs (Workshop) - Faradawn Yang
Dr  Hicham Badri - Optimizing Linear Layers for Faster Inference
Deep Dive into Inference Optimization for LLMs with Philip Kiely
AI Hardware: Training, Inference, Devices and Model Optimization
View Detailed Profile
AI Inference: The Secret to AI's Superpowers

AI Inference: The Secret to AI's Superpowers

Download the AI model guide to learn more → https://ibm.biz/BdaJTb Learn more about the technology → https://ibm.biz/BdaJTp ...

Faster LLMs: Accelerate Inference with Speculative Decoding

Faster LLMs: Accelerate Inference with Speculative Decoding

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

The Golden Triangle of Inference Optimization: Balancing Latency, Throughput, and Quality

The Golden Triangle of Inference Optimization: Balancing Latency, Throughput, and Quality

Philip Kiely, Head of Developer Relations at Baseten, presents the “Golden Triangle” of

Optimize Training and Inference with ONNX Runtime (ORT/ACPT/DeepSpeed)

Optimize Training and Inference with ONNX Runtime (ORT/ACPT/DeepSpeed)

This video demonstrates the composability of ONNX Runtime

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io Four techniques to

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

LLM

LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE)

LLM Inference Optimization #2: Tensor, Data & Expert Parallelism (TP, DP, EP, MoE)

Part 2 of 5 in the “5 Essential LLM

The Dual Nature of LLMs   Training vs  Inference

The Dual Nature of LLMs Training vs Inference

This clip clearly contrasts the technical differences between the two core stages of how large language models (LLMs) ...

Optimizing LLM Training and Inference Performance on GPUs (Workshop) - Faradawn Yang

Optimizing LLM Training and Inference Performance on GPUs (Workshop) - Faradawn Yang

Faradawn Yang delivers a three-part hands-on workshop covering GPU architecture fundamentals including tensor cores and ...

Dr  Hicham Badri - Optimizing Linear Layers for Faster Inference

Dr Hicham Badri - Optimizing Linear Layers for Faster Inference

Description: In this talk, we dive into techniques for accelerating

Deep Dive into Inference Optimization for LLMs with Philip Kiely

Deep Dive into Inference Optimization for LLMs with Philip Kiely

Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI ...

AI Hardware: Training, Inference, Devices and Model Optimization

AI Hardware: Training, Inference, Devices and Model Optimization

Learn more about artificial intelligence → https://ibm.biz/BdKDhG In Episode 10 of Mixture of Experts we are talking all hardware ...

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 4 - LLM Training

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 4 - LLM Training

For more information about Stanford's graduate programs, visit: https://online.stanford.edu/graduate-education October 17, 2025 ...