Introduction

Sources: Bertini (2017), Wickham (2021) and Fay et al. (2021)

1 Interactive data analysis

  • Source(s): Bertini (2017) and others.

1.1 Why?

  • “From Data Visualization to Interactive Data Analysis” (Bertini 2017)
  • Main uses of data visualization: Inspirational, explanatory and analytical1
  • “data analysis […] can help people improve their understanding of complex phenomena”
    • “if I understand a problem better, there are higher chances I can find a better solution for it”

1.2 Interactivity data visualization history

  • interactive data visualization enables direct actions on a plot to change elements and link between multiple plots” (Swayne 1999) (Wikipedia)
  • Interactivity revolutionizes the way we work with and how we perceive data (cf. Cleveland and McGill 1984)
  • Started ~last quarter of the 20th century, PRIM-9 (1974) (Friendly 2006, 23, see also Cleveland and McGill, 1988, Young et al. 2006)
  • Interactivity allows for…
    • …making sense of big data (more dimensions)
    • …exploring data
    • …making data accessible to those without background
    • …generating interactive “publications”

1.3 How Does Interactive Data Analysis Work?

  • Figure 1 outlines process underlying interactive data analysis
    • Loop
      • start with loosely specified goal/problem (Decrease crime!)
      • translate goal into one or more questions (What causes crime?)
      • gather, organize and analyze the data to answer these questions (Gather data on crime and other factors, model and visualize it)
      • generate knowledge and new questions and start over
Figure 1: Process underlying interactive data analysis (Source: Bertini, 2017)

1.4 Steps of interactive data analysis

  1. Defining the problem: What problem/goal are you trying to solve/reach through interactive data analysis?
  2. Generating questions: Translate high-level problem into number of data analysis questions
  3. Gathering, transforming and familiarizing with the data, e.g., often slicing, dicing and aggregating the data and to prepare it for the analysis one is planning to perform.
  4. Creating models out of data (not always): using statistical modeling and machine learning methods to summarize and analyze data
  5. Visualizing data and models: results obtained from data transformation and querying (or from some model) are turned into something our eyes can digest and hopefully understand.
    • Simple representations like tables and lists rather than fancy charts are perfectly reasonable visualization for many problems.
  6. Interpreting the results: once results have been generated and represented in visual format, they need to be interpreted by someone (crucial step!)
    • complex activity including understanding how to read the graph, understanding what graph communicates about phenomenon of interest, linking results to questions and pre-existing knowledge of problem (think of your audience!)
    • Interpretation heavily influenced by pre-existing knowledge (about domain problem, data transformation process, modeling, visual representation)
  7. Generating inferences and more questions: steps above lead to creating new knowledge, additional questions or hypotheses
    • Outcome: not only answers but also (hopefully better, more refined) questions

1.5 Important aspects of data analysis & quo vadis interaction?

  • Process not sequential but highly iterative (jumping back/forth between steps)
  • Some activities exclusively human, e.g., defining problems, generating questions, etc.
  • Visualization only small portion of process and effectiveness depends on other steps
  • Interaction: all over the place… every time you tell your computer what to do (and it returns information)
    • Gather and transform the data
    • Specify a model and/or a query from the data
    • Specify how to represent the results (and the model)
    • Browse the results
    • Synthesize and communicate the facts gathered
  • Direct manipulation vs. command-Line interaction: WIMP interfaces (direct manipulation, clicks, mouse overs, etc.,) are interactive but so is command line
    • You can let users type!
  • Audience: what (interaction) skills and pre-knowledge do the have? (domain knowledge, statistics, graphs)

1.6 Challenges of Interactive Visual Data Analysis

  • Broadly three parts… (Bertini 2017)
  • Specification (Mind → Data/Model): necessary to translate our questions and ideas into specifications the computer can read
    • Shiny allows non-coders to perform data analysis, but requires R knowledge to built apps
    • But even simpler tools out there
  • Representation (Data/Model → Eyes)
    • next step is to find a (visual) representation so users can inspect and understand them
    • “deciding what to visualize is often equally, if not more, important, than deciding how to visualize it”
    • “how fancy does a visualization need to be in order to be useful for data analysis?”
      • “most visualization problems can be solved with a handful of graphs”
    • really hard to use, tweak, and combine graphs in clever/effective/innovative ways
  • Interpretation (Eyes → Mind)
    • “what does one need to know in order to reason effectively about the results of modeling and visualization?”
    • “Are people able to interpret and trust [your shiny app]?”

2 Why visualize?

2.1 Anscombes’s quartet (1)

  • Table 1 shows results from a linear regression based on Anscombe’s quartet (Anscombe 1973) often used to illustrate the usefulness of visualization
    • Q: What do we find here?
Table 1: Linear models based on sets of Anscombe’s quartet
y1 (Set 1) y2 (Set 2) y3 (Set 3) y4 (Set 4)
(Intercept) 3.000 3.001 3.002 3.002
(1.125) (1.125) (1.124) (1.124)
x1 0.500
(0.118)
x2 0.500
(0.118)
x3 0.500
(0.118)
x4 0.500
(0.118)
Notes: some notes...

2.2 Anscombes’s quartet (2)

  • Table 2 displays Anscombe’s quartet (Anscombe 1973), a dataset (or 4 little datasets)
    • Q: What does the table reveal about the data? Is it easy to read?
Table 2: Anscombe’s quartett: Visualization
Anscombe's quartet data
x1 y1 x2 y2 x3 y3 x4 y4
10 8.04 10 9.14 10 7.46 8 6.58
8 6.95 8 8.14 8 6.77 8 5.76
13 7.58 13 8.74 13 12.74 8 7.71
9 8.81 9 8.77 9 7.11 8 8.84
11 8.33 11 9.26 11 7.81 8 8.47
14 9.96 14 8.10 14 8.84 8 7.04
6 7.24 6 6.13 6 6.08 8 5.25
4 4.26 4 3.10 4 5.39 19 12.50
12 10.84 12 9.13 12 8.15 8 5.56
7 4.82 7 7.26 7 6.42 8 7.91
5 5.68 5 4.74 5 5.73 8 6.89

2.3 Anscombes’s quartet (3)

  • Figure 2 finally visualizes the data underlying those data
    • Q: What do we see here? What is the insight?
Figure 2: Anscombe’s quartet: Visualization

2.4 The Datasaurus Dozen

Figure 3: The Datasaurus Dozen animated by Tom Westlake

3 Shiny

3.1 What is Shiny?

  • History of Shiny: Joe Cheng: The Past and Future of Shiny2
  • Popularity: Shiny (vs. Ggplot2 , dplyr )
  • A web application framework for R to turn analyses into interactive web applications.. what does that mean?
    • The userinterface is a webpage
    • On this webpage you can manipulate things
    • Behind the webpage there is a computer (your computer or a server)
    • That computer/server runs R and the R script of the webapp
    • When you change something on the webpage, the information is send to the computer
    • Computer runs the script with the new inputs (input functions)
    • Computer sends back outputs to the webpage (output functions)

3.2 Components of a Shiny app

  • As depicted in Figure 4, a user interacts with a server on which the shinyapp/website is hosted
    • A Shiny app has two components, the user interface (UI) and the server, that are passed as arguments to the shinyApp() which creates a Shiny app from this ui/server pair
Figure 4: Source: https://hosting.analythium.io/the-anatomy-of-a-shiny-application/ (c) Analythium

3.3 Pro & contra Shiny

3.3.1 Pros of R Shiny:

  1. Fast Prototyping: Shiny is excellent for quickly turning ideas into applications; easy to use even for non-seasoned programmers
  2. Interactivity: lets you build interactive web apps, enhancing user engagement and experience (dashboards!)
  3. Integration with R Ecosystem: integrates seamlessly with R’s vast open-source ecosystem (see shiny for python)
  4. Statistical Modeling and Visualization: allows for complex statistical modeling and visualizations within your app
  5. No Need for Web Development Skills: Create web apps with R code alone (no need for HTML, CSS, or JavaScript)
  6. Reactivity: fairly simple to create applications that automatically update in response to user inputs
  7. Sharing and Publishing: apps can be easily published and shared (e.g., Shinyapps.io or shinylive)

3.3.2 Cons of R Shiny:

  1. Performance: apps run on top of R, an interpreted language, which can cause performance issues
  2. Single-threaded: R (and by extension Shiny) is single-threaded, which can also cause performance issues (see here).
  3. Complexity: Shiny’s basics are easy but mastering intricacies of reactivity is more challenging
  4. Data Gathering and Saving: It can be challenging to use Shiny for gathering and saving data to the database.
  5. Maintenance Cost: cost of maintaining a Shiny application over time can be high (now we have shinylive!)
  6. Software Dependencies: Certain Shiny applications may have many software dependencies, which could potentially lead to issues down the line

4 Exploring European Social Survey (ESS): The app we will build

  • In the workshop we will built the Shiny app shown in Figure 5 together. Please explore this app here (5-10 minutes) and answer the following questions:
    • What questions can we answer using the app?
    • How can this app help us to understand and analyze the underlying data?
    • What interactive elements can we identify in the app?
Figure 5: (Source: Original image)

4.1 The data

Data preparation code of the app
data <- readRDS("./data/ess_trust.rds")

kable(head(data))
Table 3: Data from the European Social Survey, Round 10
idno country internet_use trust_parliament trust_legal trust_police trust_politicians trust_parties trust_eu trust_un left_right happiness age income_feeling
27 AT 5 5 10 10 5 5 5 5 9 7 43 3
137 AT 5 7 8 8 3 4 5 2 5 8 67 2
194 AT 4 6 8 8 5 5 5 5 5 9 40 1
208 AT 5 0 5 8 3 3 0 2 NA 8 63 2
220 AT 1 7 8 8 7 7 5 5 5 8 71 2
254 AT 2 6 5 7 5 5 4 6 3 9 64 1
  • data (see Table 3) comprises 49519 individuals that live in 29 countries (File: ess_trust.rds)
    • Later on we also aggregate the data to the country-level



Table 4: Data from the European Social Survey, Round 10 with geographic information
Data preparation code of the app
data_geo <- readRDS("./data/ess_trust_geo.rds")
# View(data_geo)
  • data_geo (see Table 4) comprises aggregated ESS data with geographic information (File: ess_trust_geo.rds; geometry variable describes the geographic shape of regions)
  • Advantages: Dataset is interesting and contains mapping data

5 Your (first) Shiny app

  • Below you will create you first app and we’ll use the opportunity to discuss the basic components of a shiny app (see analogous example here).
  1. Install the relevant packages:
install.packages("shiny")
install.packages("tidyverse")
  1. Create a directory with the name of your app “myfirstapp” in your working directory.
  2. Create an rscript file in Rstudio and save it in the working directory with the name app.R.
  3. Copy the code below and paste it into your app.R script (UPDATE).
  1. You can run and stop the app by clicking Run App (Figure 6) button in the document toolbar.
Figure 6: The Run App button can be found at the top-right of the source pane.

6 Minimum viable product (MVP)

  • …useful concept when building apps (see Figure 7)!
Figure 7: Illustration of MVP (Source: Fay et al. 2021 - read description)
  • version […] with just enough features to be usable by early customers” to collect feedback (Wikipedia)
  • “Making things work before working on low-level optimization makes the whole engineering process easier” (Fay et al. 2021)
  • The “UI first” approach: often the safest way to go (Fay et al. 2021)
    • Agreeing on specifications: helps everybody involved in the application to agree on what the app is supposed to do, and once the UI is set, there should be no “surprise implementation”
    • Organizing work: “It’s much easier to work on a piece of the app you can visually identify and integrate in a complete app scenario”
    • But…
  • ..we “follow” same strategy, slowly building out our shiny app, adding features & complexity

7 Workflow: Development, debugging and getting help

  • See discussions of workflow in Wickham (2021, Ch. 5, 20.2.1)
  • Three important Shiny workflows:
    • Basic development cycle of creating apps, making changes, and experimenting with the results.
    • Debugging, i.e., figure out what’s gone wrong with your code/brainstorm solutions
    • Writing reprexes, self-contained chunks of code that illustrate a problem (essential for getting others’ help)
  • Below development WF, debugging later on

7.1 Development workflow

  1. Creating the app: start every app with the same lines of R code below (Shift + Tab or in menue New Project -> Shiny Web Application)
  2. Seeing your changes: you’ll create a few apps a day (really?!?), but you’ll run apps hundreds of times, so mastering the development workflow is particularly important
  3. Write some code.3
  4. Launch the app with Cmd/Ctrl + Shift + Enter.
  5. Interactively experiment with the app.
  6. Close the app.
  7. Go back to 1.
library(shiny)
ui <- fluidPage(
  
)
server <- function(input, output, session) {
  
}
shinyApp(ui, server)

7.1.1 Rstudio & Shiny: A few tips

  • Controlling the view: Default is a pop-out window but you can also choose Run in Viewer Pane and Run External.
  • Document outline: Use it for navigation in your app code (Cntrl + Shift + O)
  • Using/exploring other apps: Inspect that app code, then slowly delete parts you don’t need
    • Rerun app to see whether it still works after each deletion
    • if only interested in UI, delete everything in within server function: server <- function(input, output, session) {delete everything here}
    • Important: Search for dependencies (that can sometimes be delete), e.g., search for www folder
      • also image links with src or png, jpg

7.2 Debugging workflow

  • Guaranteed that something will go wrong at the start
  • Cause is mismatch between your mental model of Shiny, and what Shiny actually does
  • We need to develop robust workflow for identifying and fixing mistakes
  • Three main cases of problems: (1) Unexpected error, (2) No error but incorrect values; (3) Correct values but not updated
      1. Use traceback and interactive debugger
      1. Use interactive debugger
      1. Problem unique to Shiny, i.e., R skills don’t help
  • See Wickham (2021, Ch. 5.2, link) for explanations and examples

References

Anscombe, F J. 1973. “Graphs in Statistical Analysis.” Am. Stat. 27 (1): 17–21.
Bertini, Enrico. 2017. “From Data Visualization to Interactive Data Analysis.” https://medium.com/@FILWD/from-data-visualization-to-interactive-data-analysis-e24ae3751bf3.
Cleveland, William S, and Robert McGill. 1984. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association 79 (387): 531–54.
Fay, Colin, Sébastien Rochette, Vincent Guyader, and Cervan Girard. 2021. Engineering Production-Grade Shiny Apps. CRC Press.
Friendly, Michael. 2006. “A Brief History of Data Visualization.” In Handbook of Data Visualization, 15–56. Springer Handbooks Comp.statistics. Springer Berlin Heidelberg.
Swayne, Deborah. 1999. “Introduction to the Special Issue on Interactive Graphical Data Analysis: What Is Interaction?” Computational Statistics 14 (1): 1–6.
Wickham, Hadley. 2021. Mastering Shiny. " O’Reilly Media, Inc.".

Footnotes

  1. Inspirational. The main goal here is to inspire people. To wow them! But not just on a superficial level, but to really engage people into deeper thinking, sense of beauty and awe. Explanatory. The main goal here is to use graphics as a way to explain some complex idea, phenomenon or process. Analytical. The main goal here is to extract information out of data with the purpose of answering questions and advancing understanding of some phenomenon of interest.↩︎

  2. Joe Cheng is the Chief Technology Officer at RStudio and was the original creator of the Shiny web framework, and continues to work on packages at the intersection of R and the web.↩︎

  3. Automated testing: allows you to turn interactive experiments you’re running into automated code, i.e., run tests more quickly and not forget them (because they are automated). Requires more initial investment.↩︎