Overview of Commonly Used LLMs

Product Model Company Released Tokens Data Cutoff URL
ChatGPT GPT-3.5 OpenAI 11/2022 4k 9/2021 Link
New Bing GPT-4 Microsoft 1/2023 4k web access Link
ChatGPT+ GPT-4 OpenAI 1/2023 4k 9/2021 Link
Bard PaLM-2 Google 3/2023 2k 7/2023 Link
Claude Claude 2 Anthropic 7/2023 100k early 2023 Link
LlaMA LlaMA 2 Meta 7/2023 4k early 2023 open source
Gemini Gemini Google 2/2024 ~30K early 2023 Link
Gemini Ultra Gemini Google 2/2024 ~30K early 2023 Link

Table last updated February 11th, 2024

The Table provides an overview of commonly used LLMs as of September 2023, together with some of their key properties and limitations, including their release date, the maximum token limit that they can process, and the date as of which the training data cut off. It also lists the URLs at which chatbots powered by these LLMs can be accessed.

A website that provides occasional users with a user-friendly interface with access to all leading LLMs is https://poe.com.
Plugins The capabilities of base LLMs can be significantly enhanced with plugins that allow them to perform additional tasks that LLMs by themselves are not good at. For economists, the plugin that is perhaps most noteworthy at the time of writing is ChatGPT's Advanced Data Analysis, which is available to ChatGPT Plus subscribers. The plugin allows ChatGPT to write and execute computer code in a sandboxed environment and to display the results as well as to build and iterate on them. Advanced Data Analysis also allows users to upload files and perform data processing tasks on them, ranging from complex analysis like regressions to file conversions. We will cover several of these capabilities below. Google Bard also runs code in the background to perform certain mathematical tasks.

Another ChatGPT plugin that is useful for economists is Wolfram's Alpha, which can be activated in the plugin store that is available to ChatGPT Plus subscribers. The site https://www.wolfram.com/wolfram-plugin-chatgpt/ describes a range of examples for how to use this plugin.
Vision-Language Models (VLMs) combine LLMs with the ability to process visual information and integrate the two. A version of GPT-4, which is not publicly available at the time of writing, can incorporate visual information in its prompts. Bard can display images from Google Search in its responses. This is an area with a lot of potential for future use cases. For example, early demonstrations suggest that VLMs are able to produce complex outputs based on hand-drawn back-of-the-envelope drafts.
Reproducibility Most of the applications in the remainder of this section use the leading publicly available LLM at the time of writing, OpenAI's GPT-4, version gpt4-0613. In the online materials associated with this article (see footnote on the frontpage of the article), I provide python code to reproduce the results by calling OpenAI's API. The code sets the parameter "Temperature'' to zero, which makes the LLM responses close to deterministic. For non-programmers, a user-friendly way to replicate the results is the OpenAI web interface https://platform.openai.com/playground, in which "Temperature'' can also be set to zero. Both the OpenAI API and the Playground require a paid subscription to access GPT-4.*Executing all of the examples labeled GPT3.5/GPT-4
below required a bit over 5k of input
and 5k of output tokens each. At
the time of writing, the total cost was
slightly below 50 cents. Further pricing information is
available at https://openai.com/pricing

There are two factors that limit the reproducibility of my results. First, OpenAI states that "setting temperature to 0 will make the outputs mostly deterministic, but a small amount of variability will remain.'' I have observed these limits to reproducibility in particular for examples with responses that span multiple sentences.*See \urlhttps://platform.openai.com/docs/guides/gpt/why-are-model-outputs-inconsistent for further information on the inconsistency of model output, even at temperature zero, and \urlhttps://community.openai.com/t/a-question-on-determinism/8185 for a discussion of the inherent indeterminacy of efficiently performing LLM inference. In a nutshell, the efficient execution of LLMs with hundreds of billions of parameters requires that calculations are parallelized. However, given the discrete nature of computers, calculations such as (a\\cdot b)\\cdot c sometimes deliver a slightly different result than a\\cdot(b\\cdot c). When an LLM calculate which word has the top probability to be next, minor differences in the parallelization of the exact same calculations sometimes come to matter, resulting in different word choices. And once one word changes, everything that follows becomes different.

Second, OpenAI states that "as we launch safer and more capable models, we regularly retire older models.'' Moreover, "after a new version is launched, older versions will typically be deprecated 3 months later.'' If the gpt4-0613 model is retired, my results may no longer be reproducible.*Moreover, see https://platform.openai.com/docs/deprecations on OpenAI's policy of model deprecations as well as the current timelines for how long existing models are guaranteed to remain available.

The most convenient user interface is ChatGPT, available at https://chat.openai.com/, which employs a "Temperature'' parameter greater than zero, which introduces more variation into the model's responses. Accessing GPT-4 via this interface requires a paid subscription to ChatGPT Plus. This allows users to try out the spirit of all the examples employing GPT-4 below, but the extra variability implies that the exact results will differ every time a prompt is executed. The same applies to ChatGPT Advanced Data Analysis and the Wolfram plugin, which both rely on ChatGPT, and to Claude 2, which offers the ability to upload files. My reproduction code therefore exlcudes the results of the latter three models.

From: Generative AI for Economic Research: Use Cases and Implications for Economists
by Anton Korinek, Journal of Economic Literature, Vol. 61, No. 4, December 2023.
Copyright (c) by American Economic Association. Reproduced with permission.