Media Summary: Want to play with the technology yourself? Explore our interactive demo → Learn more about the ... In this AI Research Roundup episode, Alex discusses the paper: ' In this AI Research Roundup episode, Alex discusses the paper: 'DiscoverPhysics:

Spatialbench Benchmarking Multimodal Large Language - Detailed Analysis & Overview

Want to play with the technology yourself? Explore our interactive demo → Learn more about the ... In this AI Research Roundup episode, Alex discusses the paper: ' In this AI Research Roundup episode, Alex discusses the paper: 'DiscoverPhysics: This talk was recorded at NDC AI in Oslo, Norway. Attend the next NDC ... As LLM systems increasingly rely on retrieval, long contexts, and extended interaction, it becomes essential to A panel discussion following the NeurIPS 2025 tutorial "The Science of

A talk by Li Fu, Data & AI Scientist While most enterprise AI projects start with excitement, only 20% survive the move from demo to ... Paper: Abstract: Humans regularly engage in analogical thinking, relating personal experiences ... In this AI Research Roundup episode, Alex discusses the paper: 'Why Far Looks Up: Probing Spatial Representation in ... By Srinivasa Gopal The rapid proliferation of code generation through In this AI Research Roundup episode, Alex discusses the paper: 'Multi-LCB: Extending LiveCodeBench to Multiple Programming ... InfiniBench: Infinite Benchmarking for Visual Spatial Reasoning with Customizable Scene Complexity

Photo Gallery

SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition [Podcast]
What are Large Language Model (LLM) Benchmarks?
SpatialBench: Benchmark for Spatial Models
DiscoverPhysics: New LLM Scientific Benchmark
Between the Layers– Interpreting Large Language Models - Michelle Frost - NDC AI 2025
Benchmarking Memory in LLMs: Retrieval, Long Context, and Multi-Turn Interaction - Ali Modarressi
The Science of Benchmarking Panel (NeurIPS 2025 Tutorial)
Beyond Benchmarks 2 0: A Practical Framework for Measuring Multimodal and Agentic AI Success
AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies (EMNLP 2024)
SpatialTunnel: Probing 3D Spatial Bias in VLMs
Benchmarking Software Productivity in the Era of Large Language Model Code Generation & Low-Code
Multi-LCB: New Multilingual LLM Coding Benchmark
View Detailed Profile
SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition [Podcast]

SpatialBench: Benchmarking Multimodal Large Language Models for Spatial Cognition [Podcast]

Podcast conversation covering "

What are Large Language Model (LLM) Benchmarks?

What are Large Language Model (LLM) Benchmarks?

Want to play with the technology yourself? Explore our interactive demo → https://ibm.biz/BdKetJ Learn more about the ...

SpatialBench: Benchmark for Spatial Models

SpatialBench: Benchmark for Spatial Models

In this AI Research Roundup episode, Alex discusses the paper: '

DiscoverPhysics: New LLM Scientific Benchmark

DiscoverPhysics: New LLM Scientific Benchmark

In this AI Research Roundup episode, Alex discusses the paper: 'DiscoverPhysics:

Between the Layers– Interpreting Large Language Models - Michelle Frost - NDC AI 2025

Between the Layers– Interpreting Large Language Models - Michelle Frost - NDC AI 2025

This talk was recorded at NDC AI in Oslo, Norway. #ndcai #ndcconferences #developer #softwaredeveloper Attend the next NDC ...

Benchmarking Memory in LLMs: Retrieval, Long Context, and Multi-Turn Interaction - Ali Modarressi

Benchmarking Memory in LLMs: Retrieval, Long Context, and Multi-Turn Interaction - Ali Modarressi

As LLM systems increasingly rely on retrieval, long contexts, and extended interaction, it becomes essential to

The Science of Benchmarking Panel (NeurIPS 2025 Tutorial)

The Science of Benchmarking Panel (NeurIPS 2025 Tutorial)

A panel discussion following the NeurIPS 2025 tutorial "The Science of

Beyond Benchmarks 2 0: A Practical Framework for Measuring Multimodal and Agentic AI Success

Beyond Benchmarks 2 0: A Practical Framework for Measuring Multimodal and Agentic AI Success

A talk by Li Fu, Data & AI Scientist While most enterprise AI projects start with excitement, only 20% survive the move from demo to ...

AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies (EMNLP 2024)

AnaloBench: Benchmarking the Identification of Abstract and Long-context Analogies (EMNLP 2024)

Paper: https://arxiv.org/abs/2402.12370 Abstract: Humans regularly engage in analogical thinking, relating personal experiences ...

SpatialTunnel: Probing 3D Spatial Bias in VLMs

SpatialTunnel: Probing 3D Spatial Bias in VLMs

In this AI Research Roundup episode, Alex discusses the paper: 'Why Far Looks Up: Probing Spatial Representation in ...

Benchmarking Software Productivity in the Era of Large Language Model Code Generation & Low-Code

Benchmarking Software Productivity in the Era of Large Language Model Code Generation & Low-Code

By Srinivasa Gopal The rapid proliferation of code generation through

Multi-LCB: New Multilingual LLM Coding Benchmark

Multi-LCB: New Multilingual LLM Coding Benchmark

In this AI Research Roundup episode, Alex discusses the paper: 'Multi-LCB: Extending LiveCodeBench to Multiple Programming ...

InfiniBench: Infinite Benchmarking for Visual Spatial Reasoning with Customizable Scene Complexity

InfiniBench: Infinite Benchmarking for Visual Spatial Reasoning with Customizable Scene Complexity

InfiniBench: Infinite Benchmarking for Visual Spatial Reasoning with Customizable Scene Complexity