Media Summary: Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI ... Open-source LLMs are great for conversational applications, but they can be difficult Why does a 70B language model crawl at 8 tokens per second on one setup, then feel instant on another? The difference is ...
Deep Dive Into Inference Optimization - Detailed Analysis & Overview
Today we have Philip Kiely from Baseten on the show. Baseten is a Series B startup focused on providing infrastructure for AI ... Open-source LLMs are great for conversational applications, but they can be difficult Why does a 70B language model crawl at 8 tokens per second on one setup, then feel instant on another? The difference is ... LLM Caching strategies. As Large Language Models (LLMs) migrate from massive data centers Philip Kiely, Head of Developer Relations at Baseten, presents the “Golden Triangle” of