Custom charting functions using ggplot2

Data visualisation series (Post #1)

Riddhiman

While R has variety of options to choose from when it comes to 2D graphics and data visualisation, it is hard to beat ggplot2 in terms of features, functionality and overall visual quality. I wanted to share my take on how to use the package which is, to make customised charting functions for specific chart types using ggplot2 as the underlying visualisation engine.

Libraries

# Pacman is a package management tool 
install.packages("pacman")
library(pacman)

# p_load automatically installs packages if needed
p_load(dplyr, ggplot2, scales, stringr)

Sample dataset

A summarised version of the COVID-19 Data Repository hosted by JHU is available for download here

df <- read.csv("covid_data.csv")

Something of interest could be the daily number of confirmed cases for the top five countries (by volume). Some amount of data prep is needed to get to these numbers.

# Get top 5 countries
top_countries <- df %>% 
  group_by(country) %>% 
  summarise(count = sum(deaths_daily)) %>% 
  top_n(5) %>% 
  .$country
print(top_countries)
## [1] "Brazil" "India"  "Mexico" "Peru"   "US"
# Create a data frame with the required information
# Note that a centered 7 day moving average is used
plotdf <- df %>% 
  mutate(date = as.Date(date)) %>% 
  filter(country %in% top_countries, 
         date >= "2020-05-01") %>% 
  group_by(country, date) %>% 
  summarise(count = sum(confirmed_daily)) %>%
  arrange(country, date) %>% 
  group_by(country) %>% 
  mutate(MA = zoo::rollapply(count, FUN = mean, width = 7, by = 1, fill = NA, align = "center"))

Simple example

Say we needed a line chart visualising the data we just prepared. Note the use aes_string() instead of just aes(). This lets us supply arguments to ggplot2 as strings.

# Function definition.
line_chart <- function(df, x, y, group_color = NULL, line_width = 1, line_type = 1){
  ggplot(df, aes_string(x = x, y = y, color = group_color)) + 
    geom_line(size = line_width, linetype = line_type)
}
# Test run
line_chart(plotdf, x = "date", y = "MA", group_color = "country", 
           line_type = 1, line_width = 1.2)

Customised theme

Now that we know how to encapsulate the call to ggplot2 in a more intuitive manner, we can create a customised theme for our charts. This is useful since this theme can be applied to any chart.

custom_theme <- function(plt, base_size = 11, base_line_size = 1, palette = "Set1"){
  
  # Note the use of "+" and not "%>%"
  plt + 
    # Adjust overall font size
    theme_minimal(base_size = base_size, base_line_size = base_line_size) + 
    
    # Put legend at the bottom
    theme(legend.position = "bottom") + 
    
    # Different colour scale
    scale_color_brewer(palette = palette)
}
# Test run
line_chart(plotdf, "date", "MA", "country") %>% custom_theme()

Adding bells and whistles

Now that we have some of the basic components, we can add some additional features to our line_chart() function.

line_chart <- function(df, x, y, group_color = NULL, 
                       line_width = 1, line_type = 1, 
                       xlab = NULL, ylab = NULL, 
                       title = NULL, subtitle = NULL, caption = NULL){
  # Base plot
  ggplot(df, aes_string(x = x, y = y, color = group_color)) + 
    
    # Line chart 
    geom_line(size = line_width, linetype = line_type) + 
    
    # Titles and subtitles
    labs(x = xlab, y = ylab, title = title, subtitle = subtitle, caption = caption)
}

We’ll also tinker with our custom_theme() function.

custom_theme <- function(plt, 
                         palette = "Set1", 
                         format_x_axis_as = NULL, format_y_axis_as = NULL, 
                         x_axis_scale = 1, y_axis_scale = 1, 
                         x_axis_text_size = 10, y_axis_text_size = 10, 
                         base_size = 11, base_line_size = 1){
  
  mappings <- names(unlist(plt$mapping))
  
  p <- plt + 
    
    # Adjust overall font size
    theme_minimal(base_size = base_size, base_line_size = base_line_size) + 
    
    # Put legend at the bottom
    theme(legend.position = "bottom") + 
    
    # Different colour palette
    {if("colour" %in% mappings) scale_color_brewer(palette = palette)}+
    
    {if("fill" %in% mappings) scale_fill_brewer(palette = palette)}+
    
    # Change some theme options
    theme(plot.background = element_rect(fill = "#f7f7f7"), 
          plot.subtitle = element_text(face = "italic"), 
          axis.title.x = element_text(face = "bold", 
                                      size = x_axis_text_size), 
          axis.title.y = element_text(face = "bold", 
                                      size = y_axis_text_size)) + 
    
    # Change x-axis formatting
    {if(!is.null(format_x_axis_as))
      switch(format_x_axis_as, 
             "date" = scale_x_date(breaks = pretty_breaks(n = 12)), 
             "number" = scale_x_continuous(labels = number_format(accuracy = 0.1, 
                                                                  decimal.mark = ",", 
                                                                  scale = x_axis_scale)), 
             "percent" = scale_x_continuous(labels = percent))} + 
    
    # Change y-axis formatting
    {if(!is.null(format_y_axis_as))
      
      switch(format_y_axis_as, 
             "date" = scale_y_date(breaks = pretty_breaks(n = 12)), 
             "number" = scale_y_continuous(labels = number_format(accuracy = 0.1, 
                                                                  decimal.mark = ",", 
                                                                  scale = y_axis_scale)), 
             "percent" = scale_y_continuous(labels = percent))}
  
  # Capitalise all names
  vec <- lapply(p$labels, str_to_title)
  names(vec) <- names(p$labels)
  p$labels <- vec
  
  return(p)
}

Now let’s see how it all comes together.

line_chart(plotdf,
           x = "date", 
           y = "MA", 
           group_color = "country", 
           xlab = "Date", 
           ylab = "Moving Avg. (in '000)", 
           title = "Daily COVID19 Case Load", 
           subtitle = "Top 5 countries by volume")%>% 
  
  custom_theme(format_x_axis_as = "date", 
               format_y_axis_as = "number", 
               y_axis_scale = 0.001)

Bar chart example

The good thing about the custom_theme() function is that it can be applied to any ggplot2 object.

p <- plotdf %>%  
  mutate(month = format(date, "%m-%b")) %>% 
  ggplot(aes(x = month, y = MA, fill = country)) + 
  geom_col(position = "dodge") + 
  labs(title = "Monthly COVID19 Case load trend", 
       subtitle = "Top 5 countries", 
       x = "Month", 
       y = "Moving Average ('000)")

custom_theme(p, palette = "Set1", format_y_axis_as = "number", y_axis_scale = 0.001)

Parting notes

It is worth noting that building customised charting functions using ggplot2 is most useful when you need to create the same type of chart(s) again and again. When doing any kind of exploratory work, using ggplot2 directly is easier and more useful since you can build all kinds of charts (or layer charts of different types) within the same pipeline.

Thoughts? Comments? Helpful? Not helpful? Like to see anything else added in here? Let me know!