Question Answering over Tabular Data with DataBench: A Large-Scale Empirical Evaluation of LLMs