Chapter 1 Introduction

This project APIs for Social Scientists: A collaborative Review is an outcome of the seminar Computational Social Science (CSS) taught at the University of Mannheim in 2021. While teaching the seminar, we had trouble finding short reviews of APIs with quick R-code examples. Fortunately, almost everyone participating in the seminar was motivated enough to write a quick API review. Hopefully, our resource will help future students to start diving into different APIs.

Below we review different data- and service-APIs that may be useful to social scientists. The chapters always include a simple R code example as well as references to social science research that has relied on them. The idea is to provide short reviews of max. 10 pages for the corresponding API with code to get you started. Each chapter follows a systematic set of questions:

  • What data/service is provided by the API? (+ who provides it?)
  • What are the prerequisites to access the API (e.g., authentication)?
  • What does a simple API call look like?
  • How can we access the API from R (httr + other packages)?
  • Are there social science research examples using the API?

1.1 Prerequisites: Authentication

A lot of the APIs require that you authenticate with the API provider. The underlying script of this review is written in such a way that it contains R chunks for authentication, however they will not be visible in the examples below (we only show placeholders for you to recognize at which step you will need to authenticate). These chunks in most cases make use of so-called keys in JSON format (e.g., service account key for Google APIs). However cloning the corresponding repository of this review will not result in giving you the keys, hence in order to replicate our API calls, you will have to generate and use your own individual keys.

1.2 Prerequisites: Software & packages

The code examples rely on R and different packages thereof. It is probably easiest if you install all of them in one go using the code below. The p_load() function (pacman package) checks whether packages are installed. If not, they are installed and loaded.

## # install.packages('pacman')
## library(pacman)
## p_load('checkpoint', 'httr',
## 'memoise', 'googleway', 'httr', 'tidyverse', 'ckanr', 'jsonlite', 'readxl',
## 'curl', 'httr', 'devtools', 'RCrowdTangle', 'dplyr', 'jsonlite', 'httr',
## 'remotes', 'dplyr', 'ggplot2', 'tidyr', 'Radlibrary', 'dplyr', 'tidyr', 'DT',
## 'DemografixeR', 'jsonlite', 'httr', 'httr', 'dplyr', 'httr', 'googleLanguageR',
## 'tidyverse', 'tm', 'ggwordcloud', 'httr', 'googleway', 'ggplot2', 'tidyverse',
## 'mapsapi', 'stars', 'tidyverse', 'googleLanguageR', 'googleLanguageR', 'httr',
## 'gtrendsR', 'ggplot2', 'dplyr', 'httr', 'httr', 'httr', 'jsonlite', 'tibble',
## 'archiveRetriever', 'httr', 'jsonlite', 'stringr', 'openai', 'httr', 'stringr',
## 'mediacloud', 'tidytext', 'quanteda', 'quanteda', 'tidyverse', 'osmdata', 'sf',
## 'ggmap', 'osmdata', 'osmdata', 'osmdata', 'ggmap', 'kableExtra', 'spotifyr',
## 'tidyverse', 'lubridate', 'httr', 'academictwitteR', 'tidyverse', 'lubridate',
## 'tidyverse', 'lubridate', 'dplyr', 'rtweet', 'WikipediR', 'rvest', 'xml2',
## 'httr', 'jsonlite', 'here', 'dplyr', 'ggplot2', 'tuber', 'tidyverse', 'purrr')
p_load_gh("quanteda/quanteda.corpora")
p_load_gh("cbpuschmann/RCrowdTangle")
p_load_gh("joon-e/mediacloud")
p_load_gh("facebookresearch/Radlibrary")

1.3 Replication

A lot of the R packages that you have installed and loaded above regularly get updated, which often comes with slight changes in functionality. Also, new versions of the R programming environment itself are constantly developed. While we rely on an automatized way to regularly check whether all code that is presented in the subsequent chapters replicates without errors, it might be that you are executing code just before we find out about conflicts that come with newer versions of R or used packages. If you would like to ensure that all of the presented code runs smoothly, you can execute the following commands before proceeding.

p_load('checkpoint')
checkpoint("2022-08-03")

The checkpoint package from the Reproducible R Toolkit (RRT) makes use of the daily snapshots of CRAN which are mirrored by the RRT-team on a separate server. Once you execute the command above, R will install all packages that you have loaded with the p_load() function in the same version as we used them on the date at which we last worked on the code. You may choose to only take this route if you would like to replicate the code presented in a particular chapter but run into an error message. To reset your session to the state it was before using checkpoint, call uncheckpoint() or simply restart R.