How well do LLMs reason over tabular data?

Date: September 03, 2025

In this presentation at AlphaXiv, I talked about my paper on “How well do LLMs reason over tabular data, really?”, which is about whether general-purpose Large Language Models can effectively reason over tabular data. We identified flaws in current evaluation methods and proposes an LLM-as-a-judge approach that reveals significant performance deficits. Testing against realistic variations like missing values and duplicate entities, we found that common real-world characteristics substantially impair LLM tabular reasoning capabilities.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Cornelius Wolff

Share on