Media Summary: Speaker: Wenqi Jiang Abstract: Despite the recent popularity of large language models (LLMs), the transformer neural network ...

Isca 2025 Rago Systematic Performance - Detailed Analysis & Overview

Speaker: Wenqi Jiang Abstract: Despite the recent popularity of large language models (LLMs), the transformer neural network ...

Photo Gallery

[ISCA 2025] RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving
ISCA'25 - Session 6A - SRAGO: Systematic Performance Optimization for Retrieval-Augmented Generation
ISCA'25 - Session 6C - REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storag
ISCA'25 - Session 6B - The Sparsity-Aware LazyGPU Architecture
ISCA'25 - Session 6C - DReX: Accurate and Scalable Dense Retrieval Acceleration via Algorithmic-Hard
ISCA'25 - Session 5A - MoPAC: Efficiently Mitigating Rowhammer with Probabilistic Activation Countin
ISCA'25 - Session 5A - DREAM: Enabling Low-Overhead Rowhammer Mitigation via Directed Refresh Manage
ISCA'25 - Session 6A - Hermes: Algorithm-System Co-design for Efficient Retrieval-Augmented Generati
Vector-Centric Machine Learning Systems: A Cross-Stack Approach
ISCA'25 - Session 7B - Nyx: Virtualizing dataflow execution on shared FPGA platforms
ISCA'25 - Session 7A - MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscal
ISCA'25 - Session 6A - Bishop: Sparsified Bundling Spiking Transformers on Heterogeneous Cores with
View Detailed Profile
[ISCA 2025] RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving

[ISCA 2025] RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving

Our

ISCA'25 - Session 6A - SRAGO: Systematic Performance Optimization for Retrieval-Augmented Generation

ISCA'25 - Session 6A - SRAGO: Systematic Performance Optimization for Retrieval-Augmented Generation

ISCA

ISCA'25 - Session 6C - REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storag

ISCA'25 - Session 6C - REIS: A High-Performance and Energy-Efficient Retrieval System with In-Storag

ISCA

ISCA'25 - Session 6B - The Sparsity-Aware LazyGPU Architecture

ISCA'25 - Session 6B - The Sparsity-Aware LazyGPU Architecture

ISCA

ISCA'25 - Session 6C - DReX: Accurate and Scalable Dense Retrieval Acceleration via Algorithmic-Hard

ISCA'25 - Session 6C - DReX: Accurate and Scalable Dense Retrieval Acceleration via Algorithmic-Hard

ISCA

ISCA'25 - Session 5A - MoPAC: Efficiently Mitigating Rowhammer with Probabilistic Activation Countin

ISCA'25 - Session 5A - MoPAC: Efficiently Mitigating Rowhammer with Probabilistic Activation Countin

ISCA

ISCA'25 - Session 5A - DREAM: Enabling Low-Overhead Rowhammer Mitigation via Directed Refresh Manage

ISCA'25 - Session 5A - DREAM: Enabling Low-Overhead Rowhammer Mitigation via Directed Refresh Manage

ISCA

ISCA'25 - Session 6A - Hermes: Algorithm-System Co-design for Efficient Retrieval-Augmented Generati

ISCA'25 - Session 6A - Hermes: Algorithm-System Co-design for Efficient Retrieval-Augmented Generati

ISCA

Vector-Centric Machine Learning Systems: A Cross-Stack Approach

Vector-Centric Machine Learning Systems: A Cross-Stack Approach

Speaker: Wenqi Jiang Abstract: Despite the recent popularity of large language models (LLMs), the transformer neural network ...

ISCA'25 - Session 7B - Nyx: Virtualizing dataflow execution on shared FPGA platforms

ISCA'25 - Session 7B - Nyx: Virtualizing dataflow execution on shared FPGA platforms

ISCA

ISCA'25 - Session 7A - MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscal

ISCA'25 - Session 7A - MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscal

ISCA

ISCA'25 - Session 6A - Bishop: Sparsified Bundling Spiking Transformers on Heterogeneous Cores with

ISCA'25 - Session 6A - Bishop: Sparsified Bundling Spiking Transformers on Heterogeneous Cores with

ISCA

ISCA'25 - Session 6A - Transitive Array: An Efficient GEMM Accelerator with Result Reuse

ISCA'25 - Session 6A - Transitive Array: An Efficient GEMM Accelerator with Result Reuse

ISCA