LLMs can help locate data sources, format data, extract data
from text, classify and score text, create figures, extract sentiment,
and even simulate human test subjects. Most of these capabilities
can be accessed not only through a web interface as shown in the demonstrations
below, but also via an API (Application Programming Interface)...
that allows large amounts of data to be formatted, extracted, classified
etc. The operations can also be performed in batches to remain within
the token limit for each request. Moreover, building on the section
on coding, it goes without saying that LLMs can write the computer
code necessary to access their own APIs — for example, try out "Write
python code to ask GPT-4 to do [any data extraction or manipulation
task]''.
When performing data analysis tasks in bulk, cost is an important
consideration. Although a single prompt to a cutting-edge LLM costs
just fractions of a cent, the cost of performing thousands or millions
of queries quickly adds up. For many of the tasks described below,
smaller and cheaper models are available. In those cases, it is not
advisable to use the most cutting-edge LLM.
Creating figures
One of the most useful functions of ChatGPT for economists
is Advanced Data Analysis, which employs the coding capabilities of
GPT-4 to create versatile figures and graphs.
In the following example, I uploaded a file that contained
stock market prices for three large technology companies and instructed
ChatGPT Advanced Data Analysis to create one graph displaying stock
performance labeled with the corresponding betas and another graph
displaying portfolio weights.*To compile the underlying stock market data, I asked ChatGPT to write
a script to download the data, as described in the Online Appendix
of the paper.
[Code generated and executed]
The dataset contains the closing prices for the securities Google (GOOG), Meta (META), Microsoft (MSFT), and the S&P 500 index (GSPC) over a certain period of time.
Let's normalize the prices to start at 100 and plot the resulting performance of these securities. This normalization will allow us to compare the performance of the securities relative to their initial prices. To normalize the data, we'll divide all prices by their initial price and then multiply by 100.
[Code generated and executed] [Code generated and executed]
Here is the graph showing the performance of the securities Google (GOOG), Meta (META), Microsoft (MSFT), and the S&P 500 index (GSPC), all normalized to start at 100. The x-axis represents time (the dates), and the y-axis represents the normalized prices.
From the graph, we can see how each of these securities has performed relative to its starting point over time. This type of comparison can be helpful when assessing the relative performance of different securities or indices.
The beta of a stock is a measure of the stock's volatility in relation to the overall market. The overall market has a beta of 1.0, and individual stocks are ranked according to how much they deviate from the market.
A stock that swings more than the market over time has a beta above 1.0. If a stock moves less than the market, the stock's beta is less than 1.0. High-beta stocks are supposed to be riskier but provide higher return potential; low-beta stocks pose less risk but also lower returns.
Let's calculate the beta for each of these stocks over the given period. We can calculate beta using the formula: \beta = \fracCov(r_E,r_M)Var(r_M) where:
r_E is the return of the stock
r_M is the return of the market
Cov(r_E,r_M) is the covariance between the stock and market returns
Var(r_M) is the variance of the market returns
For the market return, we'll use the S&P 500 index (GSPC). We'll calculate daily returns for each stock and the market, then calculate the beta for each stock.
After that, I'll recreate the graph with the new colors and the calculated betas included in the legend. The colors will be chosen to provide good contrast and clear visibility.
[Code generated and executed]
Here are the beta values for the stocks:
Google (GOOG): 1.1317
Meta (META): 1.2537
Microsoft (MSFT): 1.2108
These values indicate that all three stocks are generally more volatile than the market (S&P 500), since their beta values are greater than 1.
Now let's update the graph with the new title, colors, and the beta values in the legend. [Code generated and executed]
Here is the updated graph with the new title, colors, and beta values included in the legend.
The graph represents the one-decade performance of the three big tech companies: Google (GOOG), Meta (META), and Microsoft (MSFT), as well as the S&P 500 index (GSPC), all normalized to start at 100. The beta values for the stocks, which measure their volatility in relation to the overall market, are included in the legend.
[Code generated and executed]
Here is the updated graph with the requested changes:
The title, legend, axes labels, and ticks have all been doubled in size.
The "GSPC" label in the legend has been changed to "S&P 500".
This graph shows the one-decade performance of the three big tech companies: Google (GOOG), Meta (META), and Microsoft (MSFT), as well as the S&P 500 index, all normalized to start at 100. The beta values for the stocks are included in the legend.
[Code generated and executed]
Here is a pie chart that shows the current value of a $100 investment made in each of the three big tech stocks (Google, Meta, and Microsoft) at the start of the period represented in the data.
The pie chart is divided into three sections, each representing one of the stocks. The size of each section corresponds to the current value of the initial $100 investment in that stock, expressed as a percentage of the total value of all three investments. The percentages are also labeled on the chart for clarity.
Extracting data from text
Mark & A & B+
Sally & A- & A-
Frank & B & C
The LLM correctly processes the meaning of the expression
"both in econ and math'' for Sally and extract the grades accordingly.
Other applications of extracting data from text include numerical
data,... e.g., stock prices from news articles or dosage
information from drug databases. When I prompted the LLM with "Can
you provide examples of what kinds of numbers you can extract from
text?'' it answered the following 10 types: phone numbers, zip codes,
social security numbers, credit card numbers, bank account numbers,
dates, times, prices, percentages, measurements (length, weight etc.)
The process can be automated for large quantities of data using API
access, and can typically performed with smaller and cheaper models
than GPT-4. Dunn et al. (2022)Dunn, A., Dagdelen, J., Walker, N., Lee, S., Rosen, A. S., Ceder, G., Persson, K., and Jain, A. (2022). Structured information extraction from complex scientific text with fine-tuned large language models. arXiv:2212.05238. show how to use LLMs for structured
information extraction tasks from scientific texts. This can also
be used in economics, for example, for entity recognition in economic
history research.
\begintabularlcc \hline Name & econ grade & math grade \hline Mark & A & B+ Sally & A- & A- Frank & B & C \hline \endtabular
GPT-4 excelled at both these and other similar tasks.
Classifying and scoring text
Social science research frequently employs statistical techniques
to represent text as data (Gentzkow et al., 2019)Gentzkow, M., Kelly, B. T., and Taddy, M. (2019). Text as data. Journal of Economic Literature, 57(3):535-74.. Modern LLMs can
go beyond traditional techniques for this because they are increasingly
capable of processing the meaning of the sentences that they are fed.
The following example asks GPT-4 to classify whether a given task
listed in the US Department of Labor's Occupational Information Network
(O*NET) database is easy or hard to automate... and
to justify its classification.*Eloundou et al. (2023)Eloundou, T., Manning, S., Mishkin, P., and Rock, D. (2023). GPTs are GPTs: An early look at the labor market impact potential of large language models. arXiv:2303.10130. employ GPT-4 in this manner to systematically
estimate the labor market impact of LLMs. Following the principle of chain-of-thought prompting suggested by
Wei et al. (2022b)Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., and Zhou, D. (2022b). Chain-of-thought prompting elicits reasoning in large language models. arXiv:2201.11903., the prompt asks first for the justification in
order to induce the LLM to reason about its response before performing
the actual classification. This is akin to asking a student to think
before they respond to a question.
The results are reasonable, although I should emphasize
that they are not entirely robust. Just like a human evaluator who
may make different calls depending on seemingly random circumstances,
the model's answers to what is essentially the same question change
when the wording of the prompt is changed — and in some cases even
when the order of the listed tasks is modified.
Extracting sentiment
The statement is hawkish. The FOMC is indicating a concern about
elevated inflation and is taking action by raising the target range
for the federal funds rate and reducing its holdings of Treasury securities
and agency debt. The statement also suggests that further increases
in the target range may be appropriate in the future to return inflation
to the 2 percent objective. This indicates a tightening of monetary
policy, which is a characteristic of a hawkish stance.
The assessment is correct and well-argued.
I also explored whether the LLM could identify whether the
December 2022 or February 2023 FOMC statement was more hawkish, but
its ability to assess Fed-speak was not quite nuanced enough...
— it focused mainly on the level of interest rates in February 2023
being higher as opposed to the small and nuanced changes in the text
of the statement that indicated a potential change in direction. It
did so even when I explicitly instructed it to report its assessment
while "disregarding the target level for the federal funds rate.''
Only when I manually replaced the numbers for the target level by
"[range]'' did the system correctly replicate the assessment
that the February 2023 statement was slightly more dovish, as was
widely reported in the financial press at the time.*See, for example, \urlhttps://www.cnbc.com/2023/02/01/live-updates-fed-rate-hike-february.htmlArdekani et al. (2023)Ardekani, A. M., Bertz, J., Dowling, M. M., and Long, S. (2023). EconSentGPT: a universal economic sentiment engine? SSRN Working Paper. develop an economic sentiment prediction
model along similar lines and employ it to analyze US economic news
and the ECB's monetary policy announcements.
Simulating human subjects
Argyle et al. (2022)Argyle, L. P., Busby, E. C., Fulda, N., Gubler, J., Rytting, C., and Wingate, D. (2022). Out of one, many: Using language models to simulate human samples. arXiv:2209.06899. propose the use of LLMs to simulate human
subjects, based on the observation that the training data of LLMs
contains a large amount of information about humanity....
They condition GPT-3 on the socio-demographic backstories of real
humans and demonstrate that subsequent answers to survey questions
are highly correlated with the actual responses of humans with the
described backgrounds, in a nuanced and multifaceted manner. Horton (2022)Horton, J. J. (2022). Large language models as simulated economic agents: What can we learn from homo silicus? NBER Working Paper 31122.
showcases applications to economics, using simulated test subjects
to replicate and extend upon several behavioral experiments.
There is a significant risk that the simulated results simply
propagate false stereotypes, and they must hence be used with great
care. However, they also contain valuable information....
If used correctly, they can provide useful insights about our society,
from which all the data used to train the LLMs ultimately originate.
For experimental economists who prefer keeping to human subjects,
Charness et al. (2023)Charness, G., Jabarian, B., and List, J. A. (2023). Generation next: Experimentation with AI. Working Paper, University of Chicago. describe how LLMs can help to improve the
design and implementation of experiments.