Research starts with the process of ideation, i.e., generating,
developing, and selecting ideas. I start my exploration of LLMs with
use cases that involve ideation and feedback for two reasons. First,
starting with ideas follows the natural sequence of research. Second,
ideation and feedback showcase a new set of capabilities that starkly
distinguish LLMs from earlier applications of deep learning in research
— they display a form of creativity that had long been reserved
for humans. Ideation and feedback are areas where it pays off to use
the most advanced LLMs available. ...A model of idea generation
by Girotra et al. (2010)Girotra, K., Terwiesch, C., and Ulrich, K. T. (2010). Idea generation and the quality of the best idea. Management science, 56(4):591--605. observes that creative performance depends
on (i) the quantity of ideas, (ii) the average quality of ideas and
(iii) the variance which determines how many exceptional ideas are
generated. Girotra et al. (2023)Girotra, K., Meincke, L., Terwiesch, C., and Ulrich, K. T. (2023). Ideas are dimes a dozen: Large language models for idea generation in innovation. SSRN Working Paper. find that GPT-4 outperforms
MBA students at a top US business school on all three dimensions in
a contest to develop innovative new product ideas. As a result, they
argue that the bottleneck in ideation is increasingly shifting from
generating to evaluating ideas.
As we will see in the following, although the current capabilities
of cutting-edge LLMs in the areas of ideation and feedback are impressive,
they also have limitations. There are also broader potential pitfalls.
Whereas any researcher who uses LLMs for ideation and feedback will
naturally be careful about which points they use and which points
they reject in any given use case — just as we do when we discuss
ideas with colleagues — there may be subtle downsides that materialize
over time. The reliance on LLM-generated ideas may make individual
researchers rely more on automation and practice less critical thinking
of their own. Moreover, if more and more economists rely on the same
one or two cutting-edge LLMs to generate ideas and obtain feedback,
there is a risk that the ideas that economists work on will become
more and more homogeneous and include fewer truly novel ideas. This
risk of homogenization is also discussed in Bommasani et al. (2021)Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., and others (2021). On the opportunities and risks of foundation models. arXiv:2108.07258..
Moreover, when using GPT-4 for brainstorming or feedback, it is important
to keep in mind that its training data cuts off in Fall 2021.
Brainstorming
Cutting-edge LLMs are quite useful for brainstorming (or,
perhaps more aptly, neural-net-storming) ideas and examples
related to a defined theme. Having been trained on a vast amount of
data that represents a cross-section of all human knowledge, the breadth
of the representation of the world that cutting-edge LLMs have developed
from their training data includes a fair bit of knowledge of economics.
However, at present, human experts still have an edge when it comes
to depth, and so LLMs are best suited for brainstorming in areas in
which one is not an expert.
The following prompt illustrates a simple example using GPT-4.
Throughout the remainder of this section, I will present all examples
generated by LLMs in boxes, with the prompt in bold in the header
and the LLM's generated response in the body of the box. Notice that
I added an instruction to limit the response to 10 words for each
point — otherwise the LLM produced a whole paragraph on each point,
which may be useful in general but would be too lengthy for our purposes
here:
1. Job loss due to automation in lower-skilled industries.
2. AI-driven wealth concentration in tech-savvy organizations.
3. Digital literacy gap leading to economic disparity.
4. Lack of universal access to AI technology.
5. AI-driven bias in hiring and selection processes.
6. Imbalance in job market due to AI specializations.
7. Data privacy issues affecting vulnerable populations.
9. Algorithms exacerbating social inequality through bias.
10. Inclusive AI product development lacking.
11. Higher prices due to AI-enhanced products/services.
12. AI-fueled gentrification in tech-centered cities.
13. Anticompetitive practices bolstered by AI applications.
14. Lack of labor rights for jobs displaced by AI.
15. Educational imbalance due to AI-learning tools.
16. AI in healthcare excluding lower socioeconomic groups.
17. Disproportionate influence of AI in political campaigns.
18. Undervaluing of human skills in favor of AI.
19. Biased AI systems perpetuating discrimination.
20. AI reinforcing societal hierarchies via data analysis.
The channels listed by the model are on-topic, largely
reasonable, and span a wide range of what I could think of plus some
more. However, some of the bullet points are a bit repetitive.
A noteworthy aspect to underscore is the remarkable speed
and volume of responses generated by LLMs during activities like brainstorming,
which generates its own distinct form of usefulness. Even if only
a single suggestion out of 20 in examples like this proves beneficial,
it may make our research significantly more productive.
Other brainstorming prompts that I found useful include the
following:...
I am an economist working on AI and inequality. Can you brainstorm
an outline on [insert topic]?
I am an economist working on AI and inequality. Can you brainstorm
5 potential paper topics and describe each in one sentence?
I am an economist working on an academic paper on [insert topic].
Can you brainstorm a research plan for me?
Feedback
LLMs can also evaluate ideas, highlighting, for example, the
pros and cons of different hypotheses or research plans. The following
example asks the LLM to list the pros and cons of working on a specific
area of research. This shows that LLMs can provide useful input on
different research directions.
Pros of Working on the Economics of AGI:
1. High Impact: The development of AGI (Artificial General Intelligence)
has the potential to revolutionize many sectors of the economy, from
healthcare to finance to transportation. Working on the economics
of AGI allows you to be at the forefront of this transformation and
potentially shape its direction.
2. Interdisciplinary Approach: The economics of AGI involves a combination
of economics, computer science, and philosophy, among other disciplines.
This interdisciplinary approach can be intellectually stimulating
and rewarding.
3. Potential for Innovation: Given that AGI is a relatively new and
rapidly evolving field, there is a lot of room for innovative thinking
and novel approaches. This can be exciting and fulfilling for those
who enjoy problem-solving and creative thinking.
Cons of Working on the Economics of AGI:
1. Uncertainty: The future of AGI is highly uncertain, making it difficult
to make accurate predictions or develop robust economic models. This
uncertainty can make the work challenging and sometimes frustrating.
2. Ethical and Societal Implications: The development and deployment
of AGI raise a host of ethical and societal issues, from job displacement
to privacy concerns to potential misuse. Working on the economics
of AGI means grappling with these complex and often contentious issues.
3. Limited Data: Given that AGI is still in its early stages, there
is limited data available for economic analysis. This lack of data
can make it difficult to test hypotheses or draw definitive conclusions.
Speaking from experience, I very much agree with this
assessment.
Another example of a useful prompt for eliciting feedback
is:
I am an economist working on an academic paper on [insert topic].
What are the main challenges in researching this topic? How can I
best address them?
Iteration
What is particularly useful is to iterate between brainstorming and
evaluation. Similar to how a researcher comes up with ideas, selects
the most promising ones, and refines them, LLMs can be prompted to
brainstorm, select which ideas it rates as the most promising, and
brainstorm further on them.
Feedback on entire paper drafts
The long context window of Claude 2 makes it possible to upload entire
research papers into the LLM and ask for feedback. I fed the Feb. 2023
NBER working paper version of this paper (Korinek, 2023)Korinek, A. (2023). Language models and cognitive automation for economic research. NBER Working Paper 30957. into
Claude 2 and asked it the following:
Here are some suggestions for improving
the paper on large language models and cognitive automation:
Provide more specific details and examples when describing the different
use cases. For some of them, the explanations are a bit vague. Adding
more concrete illustrations would strengthen those sections.
Expand the discussion of the limitations and potential downsides of
over-relying on LLMs. The paper focuses mostly on the benefits, but
more balance discussing the risks would be helpful.
Strengthen the analysis of the long-term implications. The outlook
section is thought-provoking but remains speculative. Adding more
rigorous analysis to back up the predictions could make it more convincing.
Consider testing the use cases on multiple LLMs and comparing their
capabilities. The examples all seem to use a single LLM, but testing
different ones could reveal variation in strengths/weaknesses.
Provide guidance on how to properly evaluate LLM outputs before using
them. Since LLMs can make mistakes, some tips on double-checking the
results would be useful.
(additional suggestions omitted for space reasons)
In general, providing more specifics, analysis, examples, and structure
could take the paper from a promising start to a more polished and
convincing final product. Let me know if you would like me to elaborate
on any of these suggestions!
All of the listed bullet points are good suggestions,
and I have incorporated several of them in the paper during the revision
process.
Since Claude 2 can hold the content of the entire paper in
its memory, it can offer comments on any parts of it if requested.
The following are additional examples of useful prompts:...
What are the main strengths and weaknesses of this paper?
What are the main novel ideas in the paper that are not sufficiently
emphasized?
Can you identify any instances of bias in this paper?
How could I improve section [insert number]?
Can you draft a referee report for this paper for the Journal of Economic
Literature?
Here is a draft referee report for the paper "Large Language Models and Cognitive Automation for Economic Research":
To the editors of the Journal of Economic Literature:
I have completed my review of the paper "Large Language Models and Cognitive Automation for Economic Research" by Anton Korinek.
This paper provides a timely and useful overview of how large language models (LLMs) like ChatGPT can be leveraged to automate and enhance various research tasks for economists. The paper is well-written and clearly presented. I recommend it for publication after minor revisions.
Summary:
The paper categorizes potential applications of LLMs into six domains - ideation, writing, background research, coding, data analysis, and mathematical derivations. For each domain, the author provides general guidance on prompt engineering as well as specific examples to showcase the capabilities and limitations of LLMs. Based on experiments using the GPT-3 model, the author assigns a usefulness rating to each application ranging from experimental (inconsistent results requiring oversight) to highly useful (will likely save researcher time).
The paper concludes by predicting how LLMs may affect economic research productivity in the short and long run. In the short run, the author argues LLMs will act as useful assistants to automate micro-tasks, increasing researcher productivity. In the long run, the author speculates LLMs may produce superior economic models, potentially displacing human economists. The paper ends by proposing new research questions economists should explore regarding the impacts of cognitive automation.
Comments:
The paper succeeds in its aim of providing an accessible introduction and framework for economists to begin leveraging LLMs in their work. The catalog of potential applications is comprehensive and the examples are illuminating. Prompt engineering is crucially important to effectively using LLMs, and the author helpfully includes prompts for replicability.
I have only minor recommendations to further strengthen the paper:
Given the speed of developments in this area, mentioning the timing of experiments and capabilities of the specific LLM used would help frame the generalizability of the results.
The usefulness ratings are subjective. Surveying researchers who have used LLMs may produce more robust assessments.
Additional guidance on the factors that contribute to inconsistent vs. highly useful capabilities could better set expectations.
Exploring collaborative applications where LLMs complement humans would enrich the long-run outlook.
Overall, I believe this paper makes a valuable contribution in a rapidly evolving area. It will serve as a helpful guide to researchers looking to exploit LLMs to improve their productivity. I recommend publication after minor revisions.
Please let me know if I can provide any clarification or additional feedback.
Sincerely,
[Your name here]
The LLM offered several comments that resemble comments
made by the actual (presumably human) referees of the paper and wrote
that a revised version of this paper would be a good fit for the *EMPHJournal
of Economic Literature. Interestingly, however, the comments in the
draft referee report are shallower than the comments in previous chat,
which directly prompted Claude 2 for ways to improve the paper.
It is well known that Claude 2 is programmed to be friendly
and upbeat. To check whether the positive assessment in the previous
chat simply reflected a positivity bias, I also asked Claude 2 whether
the paper would be suitable for the American Economic Review:
Based on my review, I do not think this paper would be suitable for publication in the American Economic Review in its current form. Here is my assessment:
The American Economic Review is one of the most prestigious and selective economics journals, publishing theoretical and empirical research that makes significant advancements in economic analysis and knowledge.
While this paper provides a useful and timely overview of how economists can apply large language models in their research, it is more expository and demonstrative rather than presenting novel research or analysis. As such, it may not rise to the level of contributing major new economic insights demanded by AER.
Some factors to consider regarding AER suitability:
The paper catalogs potential LLM applications but does not rigorously evaluate their impact on economic research or productivity. An empirical analysis demonstrating and quantifying the benefits would strengthen it.
There is limited theoretical or conceptual economic analysis. Expanding on implications for economic theory and modeling could improve fit for AER.
The mathematical derivations section is quite limited in coverage. More technical treatment would be expected for AER.
Long-run predictions about LLMs surpassing human abilities are interesting but speculative. Linking these to formal economic growth models would make them more rigorous.
With additional empirical analysis, technical depth, and formal economic modeling, this line of research could potentially yield papers suitable for AER. However, in its current form as an applications overview, the paper may not satisfy AER's standards for novelty and rigor. Journals focused on applications like the Journal of Economic Perspectives may be better fits.
I hope these comments are useful. Please let me know if you would like me to expand on my assessment of the paper's suitability for AER. I'm happy to provide additional feedback.
Claude 2 argued that the paper would not be suitable
since "it is more expository and demonstrative rather than presenting
novel research or analysis.'' It continued: "As such, it may not
rise to the level of contributing major new economic insights demanded
by AER.'' This (correct) assessment suggests that the model already
possesses the rudimentary capacity to judge what makes for suitable
research papers in different journals.
Whereas Claude 2 is able to provide reasonable feedback on
a qualitative paper like this one, the current generation of LLMs
struggle with evaluating more analytic or quantitative papers in an
insightful manner.
Providing counterarguments
No matter what point we are arguing, there are always counterarguments.
LLMs do not care about which side of an argument they are on — they
are just as good at providing arguments in favor of a given point
as they are for the counterarguments. ...They are also
unconcerned about hurting our ego when we ask them for a critique.
This may be helpful to counteract the confirmation bias common to
our human brains. The following is an example (for space reasons,
asking for short responses):
1. Cognitive automation could lead to job creation in new sectors,
offsetting job losses in others.
2. It may increase productivity and economic growth, leading to overall
wealth increase.
3. Automation could potentially lower the cost of goods and services,
benefiting all consumers.
4. It could provide opportunities for upskilling and reskilling, leading
to wage growth.
5. Cognitive automation might reduce human error, increasing efficiency
and potentially leading to more equitable outcomes.
The counter-arguments make sense, although I would point
out that — in contrast to most laypeople — economists would view
points 2. and 3. as the same since productivity is a real concept
that adjusts for cheaper goods and services.