Staff Working Paper No. 1,150
By Marcus Buckmann, Quynh Anh Nguyen and Ed Hill
We investigate whether hidden states of large language models (LLMs) can be used to estimate and impute economic and financial statistics. Focusing on county-level (eg unemployment) and firm-level (eg total assets) variables, we show that a linear regression trained on the hidden states of open-source LLMs outperforms the models' own text outputs. This indicates that internal representations encode richer economic information than is revealed directly in generated responses. A learning curve analysis shows that, in many cases, only a few dozen labelled examples suffice for training. We further propose a transfer learning method that improves estimation accuracy without requiring any labelled data for the target variable. Finally, we demonstrate the practical utility of hidden states in data imputation and super-resolution tasks.