Chapter 8 Google News API
With the News API (formerly Google News API), you can get article snippets and news headlines, both up to four years old and real-time, from over 80,000 news sources worldwide.
You will need to install the following packages for this chapter (run the code):
# install.packages('pacman')
library(pacman)
p_load('httr', 'dplyr', 'httr')
8.1 Prerequisites
What are the prerequisites to access the API (authentication)?
You need an API key, which can be requested via https://newsapi.org/register.
One big drawback of the News API is that the free version (“Developer”) does not get you very far. Some serious limitations are that you can only get articles that are up to a month old; that it is restricted to 100 requests per day; that the article content is truncated to the first 200 characters. In addition, the Developer key expires after a while even if you stick to those limits, although it is easy to sign up for a new Developer account (gmail address or else). The “Business” version costs $449 per month and allows searching for articles to up to 4 years old as well as 250,000 requests per month. More details on pricing can be found here.
Up until at least 2019, the Business version also allowed you to get the entire news article. The documentation is not clear whether this is still the case.
8.2 Simple API call
What does a simple API call look like?
The documentation of the API is available here. A couple of good examples can be found on the landing page of the API. For instance, we could get all articles mentioning Biden since four weeks ago, sorted by recency, with the following call:
library(httr)
endpoint_url <- "https://newsapi.org/v2/everything?"
my_query <- "biden"
my_start_date <- Sys.Date() - 28
my_api_key <- Sys.getenv("GoogleNews_token") # <YOUR_API_KEY>
params <- list(
"q" = my_query,
"from" = my_start_date,
"language" = "en",
"sortBy" = "publishedAt")
news <- httr::GET(url = endpoint_url,
httr::add_headers(Authorization = my_api_key),
query = params)
httr::content(news) # the resulting articles[[1]]$content shows that the article content is truncated
8.3 API access in R
How can we access the API from R (httr + other packages)?
To date, there is no R package facilitating access, but the API structure is simple enough to rely on httr
. The API has three main endpoints:
- https://newsapi.org/v2/everything?, documented at https://newsapi.org/docs/endpoints/everything
- https://newsapi.org/v2/top-headlines/sources?, documented at https://newsapi.org/docs/endpoints/sources
- https://newsapi.org/v2/top-headlines?, documented at https://newsapi.org/docs/endpoints/top-headlines
We have already explored the everything
endpoint. Additional parameters to use are, for example, searchIn
(specifying whether you want to search in the title, the description or the main text), to
(specifying until what date to search) or pageSize
(how many results to return per page).
Though perhaps not so interesting from a research perspective, the sources
endpoint is useful because it allows to explore the list of sources in each country (not really documented anywhere). Let’s get all sources from Germany—we can see that there are ten from which the News API draws content:
library(dplyr)
library(httr)
endpoint_url <- "https://newsapi.org/v2/top-headlines/sources?"
my_country <- "de"
my_api_key <- Sys.getenv("GoogleNews_token") # <YOUR_API_KEY>
params <- list("country" = my_country)
sources <- httr::GET(url = endpoint_url,
httr::add_headers(Authorization = my_api_key),
query = params)
sources_df <- bind_rows(httr::content(sources)$sources)
sources_df[,c("id", "name", "url", "category")]
# A tibble: 10 × 4
id name url category
<chr> <chr> <chr> <chr>
1 bild Bild http://www.bild.de general
2 der-tagesspiegel Der Tagesspiegel http://www.tagesspiegel.de general
3 die-zeit Die Zeit http://www.zeit.de/index business
4 focus Focus http://www.focus.de general
5 gruenderszene Gruenderszene http://www.gruenderszene.de technology
6 handelsblatt Handelsblatt http://www.handelsblatt.com business
7 spiegel-online Spiegel Online http://www.spiegel.de general
8 t3n T3n https://t3n.de technology
9 wired-de Wired.de https://www.wired.de technology
10 wirtschafts-woche Wirtschafts Woche http://www.wiwo.de business
This illustrates another weakness of the News API: The selection of sources is not neither comprehensive nor transparent. In any case, let’s use this information to try out the headlines
endpoint, getting breaking headlines from Bild (via its id
), with 5 results per page. Note that in the Developer version, these headlines are not really breaking, but actually from one hour ago.
endpoint_url <- "https://newsapi.org/v2/top-headlines?"
my_source <- "bild"
my_api_key <- Sys.getenv("GoogleNews_token") # <YOUR_API_KEY>
params <- list(
"sources" = my_source,
"pageSize" = 5)
headlines <- httr::GET(url = endpoint_url,
httr::add_headers(Authorization = my_api_key),
query = params)
headlines_df <- bind_rows(httr::content(headlines)$articles) %>%
mutate(source = tolower(source)) %>% unique()
headlines_df[,c("publishedAt","title")]
# A tibble: 5 × 2
publishedAt title
<chr> <chr>
1 2022-02-16T15:52:26.3782532Z Köln: Erweiterte Anklage gegen Pfarrer Hans U. – 85 weitere Missbrauchsfälle!
2 2022-02-16T15:46:52Z Hai-Attacke in Australien: Schwimmer vor der Küste Sydneys zerfetzt
3 2022-02-16T15:37:24.2222797Z Essen: Polizei befreit Hundewelpen aus Kofferraum
4 2022-02-16T15:20:45Z BVB – Erling Haaland: Idol verrät brisante Interna vom Adidas-Geheimtreffen
5 2022-02-16T15:07:32Z Texas, USA: Raubopfer ballert um sich, um Dieb aufzuhalten – Mädchen (9) tot
8.4 Social science examples
Are there social science research examples using the API?
A search on Google Scholar (queries “Google News API” and “News API”) reveals that surprisingly few social-science studies make use of the News API, although many rely the web site of Google News for research (e.g., Haim, Graefe, and Brosius (2018)). One example from the social sciences is Chrisinger et al. (2021), who ask how the discourse on food stamps in the United States has changed over time. Through the News API, they collected 13,987 newspaper articles using keyword queries, and ran a structural topic model. In one of my papers, I ask whether US conservatives and liberals differ in their ability to discern true from false information, and in their tendency to give more credit to information that is ideologically congruent. As I argue that these questions can best be answered if the news items used in a survey represent the universe of news well, the News API helps me get a decent approximation of this universe. Note, however, that at the time I was still able to get complete articles through the Business version of the API, and it is unclear whether this is still the case (Clemm von Hohenberg 2022)10).