Media Summary: The XACC Tech Talks are a series of virtual talks covering a broad range of topics related to Adaptive Compute. To learn more about the latest research at the Harvard VLSI-Architecture group, please visit Want to optimize Large Language Model (LLM)

Microrec Efficient Recommendation Inference On - Detailed Analysis & Overview

The XACC Tech Talks are a series of virtual talks covering a broad range of topics related to Adaptive Compute. To learn more about the latest research at the Harvard VLSI-Architecture group, please visit Want to optimize Large Language Model (LLM) An FPGA can be a very attractive platform for many Machine Learning (ML) Learn how modern AI systems optimize Large Language Model (LLM) Learn how to deploy machine learning and AI applications from a Jupyter Notebook to a production-ready system. This completeĀ ...

Photo Gallery

MicroRec: Efficient Recommendation Inference on FPGAs
A Hands On Tutorial Using DeepRecSys to Optimize At-Scale Neural Recommendation Inference
DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference (ISCA 2020)
Verflux Live Workshop: Systematic Review & Meta-Analysis from Protocol to PRISMA
LLM Inference Optimization Explained | Quantization, KV Cache, Batching & GPU Performance
Unlocking the Full Potential of FPGAs for Real-Time ML Inference, by Salvador Alvarez, Achronix
LLM Inference Optimization Explained | Quantization, Batching & Parallelism
10. High-Performance Inference with vLLM | Production AI Engineering
View Detailed Profile
MicroRec: Efficient Recommendation Inference on FPGAs

MicroRec: Efficient Recommendation Inference on FPGAs

The XACC Tech Talks are a series of virtual talks covering a broad range of topics related to Adaptive Compute.

A Hands On Tutorial Using DeepRecSys to Optimize At-Scale Neural Recommendation Inference

A Hands On Tutorial Using DeepRecSys to Optimize At-Scale Neural Recommendation Inference

To learn more about the latest research at the Harvard VLSI-Architecture group, please visit https://vlsiarch.eecs.harvard.edu.

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference (ISCA 2020)

DeepRecSys: A System for Optimizing End-To-End At-scale Neural Recommendation Inference (ISCA 2020)

To learn more about the latest research at the Harvard VLSI-Architecture group, please visit https://vlsiarch.eecs.harvard.edu.

Verflux Live Workshop: Systematic Review & Meta-Analysis from Protocol to PRISMA

Verflux Live Workshop: Systematic Review & Meta-Analysis from Protocol to PRISMA

Learn how to run a complete systematic

LLM Inference Optimization Explained | Quantization, KV Cache, Batching & GPU Performance

LLM Inference Optimization Explained | Quantization, KV Cache, Batching & GPU Performance

Want to optimize Large Language Model (LLM)

Unlocking the Full Potential of FPGAs for Real-Time ML Inference, by Salvador Alvarez, Achronix

Unlocking the Full Potential of FPGAs for Real-Time ML Inference, by Salvador Alvarez, Achronix

An FPGA can be a very attractive platform for many Machine Learning (ML)

LLM Inference Optimization Explained | Quantization, Batching & Parallelism

LLM Inference Optimization Explained | Quantization, Batching & Parallelism

Learn how modern AI systems optimize Large Language Model (LLM)

10. High-Performance Inference with vLLM | Production AI Engineering

10. High-Performance Inference with vLLM | Production AI Engineering

Learn how to deploy machine learning and AI applications from a Jupyter Notebook to a production-ready system. This completeĀ ...