How well do LLMs reason over tabular data?
Date:
In this presentation at AlphaXiv, I talked about my paper on “How well do LLMs reason over tabular data, really?”, which is about whether general-purpose Large Language Models can effectively reason over tabular data. We identified flaws in current evaluation methods and proposes an LLM-as-a-judge approach that reveals significant performance deficits. Testing against realistic variations like missing values and duplicate entities, we found that common real-world characteristics substantially impair LLM tabular reasoning capabilities.
