GRASP: A Novel Benchmark for Evaluating Language Grounding and Situated Physics Understanding in Multimodal Language Models

Published in Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, 2024

This paper introduces GRASP, a novel benchmark designed to evaluate language grounding and situated physics understanding in multimodal language models. The benchmark provides a comprehensive framework for assessing how well these models can understand and reason about physical interactions in visual contexts.

Recommended citation: Jassim, S., Holubar, M., Richter, A., Wolff, C., Ohmer, X., & Bruni, E. (2024). "GRASP: A Novel Benchmark for Evaluating Language Grounding and Situated Physics Understanding in Multimodal Language Models." Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, 6297-6305.
Download Paper

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Cornelius Wolff

Share on