Ideation and Feedback

Research starts with the process of ideation, i.e., generating, developing, and selecting ideas. I start my exploration of LLMs with use cases that involve ideation and feedback for two reasons. First, starting with ideas follows the natural sequence of research. Second, ideation and feedback showcase a new set of capabilities that starkly distinguish LLMs from earlier applications of deep learning in research — they display a form of creativity that had long been reserved for humans. Ideation and feedback are areas where it pays off to use the most advanced LLMs available. A model of idea generation by Girotra et al. (2010)Girotra, K., Terwiesch, C., and Ulrich, K.
T. (2010). Idea generation and the quality
of the best idea. Management science, 56(4):591--605.
observes that creative performance depends on (i) the quantity of ideas, (ii) the average quality of ideas and (iii) the variance which determines how many exceptional ideas are generated. Girotra et al. (2023)Girotra, K., Meincke, L., Terwiesch, C., and
Ulrich, K. T. (2023). Ideas are
dimes a dozen: Large language models for
idea generation in innovation. SSRN Working Paper
.
find that GPT-4 outperforms MBA students at a top US business school on all three dimensions in a contest to develop innovative new product ideas. As a result, they argue that the bottleneck in ideation is increasingly shifting from generating to evaluating ideas.

As we will see in the following, although the current capabilities of cutting-edge LLMs in the areas of ideation and feedback are impressive, they also have limitations. There are also broader potential pitfalls. Whereas any researcher who uses LLMs for ideation and feedback will naturally be careful about which points they use and which points they reject in any given use case — just as we do when we discuss ideas with colleagues — there may be subtle downsides that materialize over time. The reliance on LLM-generated ideas may make individual researchers rely more on automation and practice less critical thinking of their own. Moreover, if more and more economists rely on the same one or two cutting-edge LLMs to generate ideas and obtain feedback, there is a risk that the ideas that economists work on will become more and more homogeneous and include fewer truly novel ideas. This risk of homogenization is also discussed in Bommasani et al. (2021)Bommasani, R., Hudson, D. A., Adeli, E.,
Altman, R., and others (2021). On
the opportunities and risks of foundation models.
arXiv:2108.07258
.
. Moreover, when using GPT-4 for brainstorming or feedback, it is important to keep in mind that its training data cuts off in Fall 2021.

Brainstorming

Cutting-edge LLMs are quite useful for brainstorming (or, perhaps more aptly, neural-net-storming) ideas and examples related to a defined theme. Having been trained on a vast amount of data that represents a cross-section of all human knowledge, the breadth of the representation of the world that cutting-edge LLMs have developed from their training data includes a fair bit of knowledge of economics. However, at present, human experts still have an edge when it comes to depth, and so LLMs are best suited for brainstorming in areas in which one is not an expert.
The following prompt illustrates a simple example using GPT-4. Throughout the remainder of this section, I will present all examples generated by LLMs in boxes, with the prompt in bold in the header and the LLM's generated response in the body of the box. Notice that I added an instruction to limit the response to 10 words for each point — otherwise the LLM produced a whole paragraph on each point, which may be useful in general but would be too lengthy for our purposes here:



A noteworthy aspect to underscore is the remarkable speed and volume of responses generated by LLMs during activities like brainstorming, which generates its own distinct form of usefulness. Even if only a single suggestion out of 20 in examples like this proves beneficial, it may make our research significantly more productive.
Other brainstorming prompts that I found useful include the following:
  • I am an economist working on AI and inequality. Can you brainstorm an outline on [insert topic]?
  • I am an economist working on AI and inequality. Can you brainstorm 5 potential paper topics and describe each in one sentence?
  • I am an economist working on an academic paper on [insert topic]. Can you brainstorm a research plan for me?

Feedback

LLMs can also evaluate ideas, highlighting, for example, the pros and cons of different hypotheses or research plans. The following example asks the LLM to list the pros and cons of working on a specific area of research. This shows that LLMs can provide useful input on different research directions.



Another example of a useful prompt for eliciting feedback is:

  • I am an economist working on an academic paper on [insert topic]. What are the main challenges in researching this topic? How can I best address them?
Iteration What is particularly useful is to iterate between brainstorming and evaluation. Similar to how a researcher comes up with ideas, selects the most promising ones, and refines them, LLMs can be prompted to brainstorm, select which ideas it rates as the most promising, and brainstorm further on them.
Feedback on entire paper drafts The long context window of Claude 2 makes it possible to upload entire research papers into the LLM and ask for feedback. I fed the Feb. 2023 NBER working paper version of this paper (Korinek, 2023)Korinek, A. (2023). Language models and
cognitive automation for economic research. NBER Working
Paper 30957
.
into Claude 2 and asked it the following:



Since Claude 2 can hold the content of the entire paper in its memory, it can offer comments on any parts of it if requested. The following are additional examples of useful prompts:
  • What are the main strengths and weaknesses of this paper?
  • What are the main novel ideas in the paper that are not sufficiently emphasized?
  • Can you identify any instances of bias in this paper?
  • How could I improve section [insert number]?
  • Can you draft a referee report for this paper for the Journal of Economic Literature?
The capability unlocked in the last example is likely to revolutionize editing and refereeing, for better or worse. To provide an example, I asked Claude 2 to draft a referee report of the same working paper (Korinek, 2023)Korinek, A. (2023). Language models and
cognitive automation for economic research. NBER Working
Paper 30957
.
:



It is well known that Claude 2 is programmed to be friendly and upbeat. To check whether the positive assessment in the previous chat simply reflected a positivity bias, I also asked Claude 2 whether the paper would be suitable for the American Economic Review:



Whereas Claude 2 is able to provide reasonable feedback on a qualitative paper like this one, the current generation of LLMs struggle with evaluating more analytic or quantitative papers in an insightful manner.

Providing counterarguments

No matter what point we are arguing, there are always counterarguments. LLMs do not care about which side of an argument they are on — they are just as good at providing arguments in favor of a given point as they are for the counterarguments. They are also unconcerned about hurting our ego when we ask them for a critique. This may be helpful to counteract the confirmation bias common to our human brains. The following is an example (for space reasons, asking for short responses):




From: Generative AI for Economic Research: Use Cases and Implications for Economists
by Anton Korinek, Journal of Economic Literature, Vol. 61, No. 4, December 2023.
Copyright (c) by American Economic Association. Reproduced with permission.