Chapter 17 Large Language Model APIs

Felix Rusche

This chapter provides an introduction to APIs for Large Language Models (LLMs). Given the wealth of models, I focus on text based models only. I provide information on one commercial and one open source ‘platform’: OpenAI and Ollama. Both offer a range of different models, depending on use case and budget. While OpenAI’s (and other commercial providers’) models are (potentially) more capable and accurate, open source models provided on Ollama are free to use, offer additional features (such as uncensored versions), and can be run on local machines. They may also better replicate given that commercial providers tend to deprecate outdated models once new ones are made available.

In this short introduction, I show how to make API calls to LLMs via R. Overall, LLMs offer substantial efficiency gains in certain tasks (such as classifying text) and allow researchers to delve into new data sets or revisit old ones with new tools at hand. However, it should also be noted that working with LLMs poses risks, some of which are yet to be discovered. For example, LLMs may amplify stereotypes and are well known to ‘hallucinate’, i.e. provide false information with much confidence. This requires researchers to thoroughly evaluate the quality of LLMs’ output.

Final disclaimer: this chapter solely provides a short introduction to the use of LLMs via API requests and does not provide an in-depth introduction to the fine tuning of LLMs or prompts.

17.1 Prerequisites

17.1.1 Software and Registration

For OpenAI, the key requirement is a registration via their website, including the provision of a method of payment. Users can then generate an API key. As suggested in the Best Practices Chapter, it is recommended to store the key as an environment variable. To do so, type the following in the console:

usethis::edit_r_environ(scope = "user")

A document will open. Add a new line with the key and re-start R:

OPENAI_API_KEY=ENTER_KEY_HERE

The key can now be called using the Sys.getenv(“OPENAI_API_KEY”) command (see below). While not recommended, users may also replace this command with the actual key.

To use Ollama, users simply need to download, install, and run Ollama.

17.1.2 Choosing a Model

Depending on use case and budget, researchers can choose from a host of different models from both Ollama and OpenAI. These mainly differ in their power and accuracy. They may also differ in other dimensions, e.g. if models are created for more specific use cases.

Starting with OpenAI, the firm offers models of different quality and pricing. For some tasks (like simple classification tasks), cheaper models may be sufficient. For more complex ones, users may prefer to draw on more expensive and capable ones. It is generally advisable to test the quality of different models to determine which one is the best fit. Models are paid by the length of input and output text. OpenAI’s pricing page allows users to estimate costs.

At Ollama, the use of models is free of charge. However, open source models currently remain less powerful than commercial models. On the website, users can choose from a wide range of different models. For this article, Llama3 is chosen, a capable open source model developed by Meta. After choosing the model, a version of the model may need to be selected. Usually, multiple versions of models with the same name are offered. These vary by use case and, more importantly, parameter size. For example, Llama3 comes in two sizes: 8 billion and 70 billion parameters. While a higher number of parameters translates into a more powerful model, it also requires substantially more (GPU) RAM and storage space. For instance, while the 8B version is likely to run on a (good) notebook or computer, the 70B one likely requires an external server / high speed computer.

To install the 8B version of Llama3, users simply open their terminal/console and type:

ollama run llama3

This will download an start the model. It also enables users to directly chat with the model via the terminal. This window can be closed once the respective model was downloaded. In order to call Ollama’s API, one then needs to start the previously installed application and run it in the background. This will create an active access point for API calls.

17.2 Simple API Call in R

To access the APIs and prepare its results in R, the following three packages are required:

library(httr)
library(jsonlite)
library(stringr)

Further, a prompt needs to be defined, e.g.

prompt <- "Briefly answer: What is the most unusual item that has ever been used as currency?"

17.2.1 OpenAI: ChatGPT-4o

For OpenAI, I choose the model GPT-4o. Models and their respective names can be found via OpenAI’s website. A simple API call then looks like this:

response_OpenAI <- POST(
  url = "https://api.openai.com/v1/chat/completions", 
  add_headers(Authorization = paste("Bearer", Sys.getenv("OPENAI_API_KEY"))),
  content_type_json(),
  encode = "json",
  body = list(
    model = "gpt-4o", # choose model
    messages = list(list(role = "user", content = prompt)), # enter prompt to be sent to model
    temperature = 0 # choose "temperature"
  )
)

# here, the answer is extracted from the json file provided by the API
answer_OpenAI <- content(response_OpenAI)$choices[[1]]$message$content
answer_OpenAI <- str_trim(answer_OpenAI)

Print answer:

cat(answer_OpenAI)
One of the most unusual items ever used as currency is the **rai stones** of Yap Island in Micronesia. These large, circular stone disks, some of which can be up to 12 feet in diameter, were used in various transactions, including dowries and political deals. Despite their size and weight, ownership of the stones, rather than physical possession, was often what mattered, making them a unique form of currency.

17.2.1.1 Alternative: Using ChatGPT-4o via an API Wrapper:

An alternative to directly calling the API via httr is the use of an API wrapper, i.e. a package that simplifies the call further. For Python, OpenAI maintains its own wrapper. For R (which this article is focused on) Rudnytskyi (2023) maintains a package. This keeps getting updated, so please visit the package’s website for updates. The package is applied as follows:

# if not yet installed, install the package 
remotes::install_github("irudnyts/openai", ref = "r6")

#-------
# load it
library(openai)

# load the API key. The package expects it to be stored as an 
# environment variable called OPENAI_API_KEY!! 
# Make sure it is stored this way (see Prerequisites above)
client <- OpenAI()

# send API request
completion <- client$chat$completions$create(
    model = "gpt-4o", # choose model
    messages = list(list(role = "user", content = prompt)), # enter prompt to be sent to model
    temperature = 0 # choose "temperature" (and potentially other settings)
)

# Extract answer from returned object
answer_OpenAI2 <- completion[["choices"]][[1]][["message"]][["content"]]

Print answer:

cat(answer_OpenAI2)
## Historically, some unusual items used as currency include large stones, shells, and even human teeth.

17.2.2 Ollama: Llama3

Similarly, a simple API call using Llama3 can be conducted as follows:

response_Llama <- POST(
  url =  "http://localhost:11434/api/generate", 
  body =
    list(
      model = "llama3", # choose model
      prompt = prompt, # enter prompt to be sent to model
      stream = FALSE,
      options = list(
        temperature = 0 # choose "temperature"
    )),
  encode = "json"
)

# here, the answer is extracted from the json file provided by the API and prepared as a full text file
response_text <- content(response_Llama, "text")
json_strings <- strsplit(response_text, "\n")[[1]]
parsed_jsons <- lapply(json_strings, fromJSON)
responses <- sapply(parsed_jsons, function(x) x$response)
answer_Llama <- paste(responses, collapse = " ")

Print answer:

cat(answer_Llama)
## One of the most unusual items used as currency is whale vomit, also known as ambergris. In the 18th century, it was used as a form of currency in some Pacific Island cultures, particularly in Fiji and Tonga. Ambergris is a rare and valuable substance produced by sperm whales, and its unique properties made it highly sought after for use in perfumes and medicines. The value of ambergris was so great that it was even used to settle debts and buy land!

17.2.3 Some Parameter Choices and Settings

For better replicability, the temperature (usually defined between 0 and 2) of the above LLMs is set to 0. A lower temperature ensures that the algorithm will tend to select words with the highest probability. While some randomness remains, this increases the probability that results will replicate. It also decreases the creativity/diversity of responses and, hence, the probability of ‘halucination’.

Models also allow users to control a number of additional parameters. For example, this includes the maximum response length, the number of responses created, or (sometimes even) to set a seed. To find out about specific models’ parameters, it is recommendable to visit their documentation pages.

Another setting that may be relevant is the role assigned to the model. Specifically, one can tell the OpenAI model to answer and behave in specific ways via the messages item. The model will then behave accordingly. For example, one of the most common roles assigned is that of the “helpful assistant”:

messages = list(list(role = "system", content = "You are a helpful assistant."),
                list(role = "user", content = prompt))

To use this setting, simply replace the message item in the API request above.

The most important ‘setting’ is the prompt itself. Prompts can substantially affect the quality of answers. It is advisable to read guides on how to best prompt specific models and to test different versions of the ‘same’ prompt. Models also differ in how well their prompting works. For instance, it is argued that OpenAI models are easy to prompt while, e.g., Llama3 can produce results of similar quality but the prompt is more difficult to get right.

Finally, OpenAI allows users to send batch requests. In theory, these should be particularly interesting to power users that aim to send many of the same requests on different texts. However, at the time this article is written, batch requests only start getting interesting once users made a considerable number of requests and moved up OpenAI’s user ladder. Specifically, to increase rate limits, users have to effectively spend money and time on the platform. They subsequently move up the user ladder from “free” to tier 1 and finally tier 5. Through this, they receive higher rate limits. Batch request only really get interesting once users reach tier 3.

17.3 Social science examples

LLMs are a recent tool and its applications in research are still being explored. One application is the use of LLMs as a cheap research assistant. LLMs can read and classify hundreds or even thousands of texts within minutes and at very lost costs. For example, in Evsyukova, Rusche, and Mill (2023) my co-authors and I send responses received in an experiment to ChatGPT-4 in order to evaluate their usefulness and classify their content. In a different application, Djourelova et al. (2024) explore newspaper coverage following extreme weather events. Specifically, the authors send local newspaper articles on the event to a LLM, asking whether the respective article draws a causal connection between the event and climate change (among other questions). In both papers, the authors find that agreement between LLMs and human annotators is at a similar level as agreement between any two human annotators. Another potential pathway for social science is ‘random silicon sampling’ as suggested by Sun et al. (2024). Specifically, LLMs can be assigned specific demographic features and be asked to answer surveys or questions in ways that resemble this demographic group.

References

Djourelova, Milena, Ruben Durante, Elliot Motte, and Eleonora Patacchini. 2024. “Experience, Narratives, and Climate Change Beliefs.” Working Paper.
Evsyukova, Yulia, Felix Rusche, and Wladislaw Mill. 2023. “LinkedOut? A Field Experiment on Discrimination in Job Network Formation.” CRC TR 224 Discussion Paper Series No. 482. https://www.crctr224.de/research/discussion-papers/archive/dp482.
Rudnytskyi, Iegor. 2023. Openai: R Wrapper for OpenAI API. https://github.com/irudnyts/openai.
Sun, Seungjong, Eungu Lee, Dongyan Nan, Xiangying Zhao, Wonbyung Lee, Bernard J. Jansen, and Jang Hyun Kim. 2024. “Random Silicon Sampling: Simulating Human Sub-Population Opinion Using a Large Language Model Based on Group-Level Demographic Information.” https://arxiv.org/abs/2402.18144.