library(ggplot2)

ggplot2 is a R package dedicated to data visualization. It can greatly improve the quality and aesthetics of your graphics, and will make you much more efficient in creating them.

In most cases a ggplot2 graphic start with ggplot(), supply a dataset and aesthetic mapping (with aes()). Then you can add on layers (like geom_point() or geom_histogram()), scales (like scale_colour_brewer()), faceting specifications (like facet_wrap()) and coordinate systems (like coord_flip()).

Basic plot

required data:

mpg
## # A tibble: 234 x 11
##   manufacturer model displ  year   cyl trans      drv     cty   hwy fl    class 
##   <chr>        <chr> <dbl> <int> <int> <chr>      <chr> <int> <int> <chr> <chr> 
## 1 audi         a4      1.8  1999     4 auto(l5)   f        18    29 p     compa…
## 2 audi         a4      1.8  1999     4 manual(m5) f        21    29 p     compa…
## 3 audi         a4      2    2008     4 manual(m6) f        20    31 p     compa…
## 4 audi         a4      2    2008     4 auto(av)   f        21    30 p     compa…
## 5 audi         a4      2.8  1999     6 auto(l5)   f        16    26 p     compa…
## # … with 229 more rows

ggplot2 builds charts through layers using geom_ functions (like geom_point(),geom_bar())

p <- ggplot(mpg, aes(x = displ, y = hwy)) + geom_point()
p

This produces a scatterplot defined by:

  1. Data: mpg

  2. Aesthetic mapping: engine size mapped to x position, fuel economy to y position.

  3. Layer: points.

Pay attention to the structure of this function call: data and aesthetic mappings are supplied in ggplot(), then layers are added on with +. This is an important pattern, and as you learn more about ggplot2 you’ll construct increasingly sophisticated plots by adding on more types of components.

To add additional variables to a plot, we can use other aesthetics like colour, shape, alpha, and size

p <- ggplot(mpg, aes(displ, hwy, colour = class)) + geom_point()
p

p <- ggplot(mpg, aes(displ, hwy, colour = class)) + scale_colour_brewer(palette="Oranges") + geom_point()
p

p <- ggplot(mpg, aes(displ, hwy, shape = drv)) + geom_point()
p

p <- ggplot(mpg, aes(displ, hwy, size = cyl)) + geom_point()
p

p <- ggplot(mpg, aes(displ, hwy, size = cyl, alpha=0.5)) + geom_point()
p

Annotations

When constructing a data visualisation, it is often necessary to make annotations to the data displayed. Conceptually, an annotation supplies metadata for the plot: that is, it provides additional information about the data being displayed. From a practical standpoint, however, metadata is just another form of data. Because of this, the annotation tools in ggplot2 reuse the same geoms that are used to create other plots. However, to meet the specific needs that users often have when annotating plots, there are some helper functions in ggplot2 itself, and a number of other packages have extended ggplot2 in ways you may find helpful.

Adding a title and xlabel/ylabel for the plot.

When customising a plot, it is often useful to modify the titles associated with the plot, axes, and legends. To assist with this task ggplot2 provides the labs() helper function, which lets you set the various titles using name-value pairs like title = "My plot title", x = "X axis" or fill = "fill legend":

p <- ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = factor(cyl))) +
  labs(
    x = "Engine displacement (litres)", 
    y = "Highway miles per gallon", 
    colour = "Number of cylinders",
    title = "Mileage by engine size and cylinders",
    subtitle = "Source: http://fueleconomy.gov"
  )
p

Adjust the position of title for the plot:

p <- ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = factor(cyl))) +
  labs(
    x = "Engine displacement (litres)", 
    y = "Highway miles per gallon", 
    colour = "Number of cylinders",
    title = "Mileage by engine size and cylinders",
    subtitle = "Source: http://fueleconomy.gov"
  ) +
  theme(
    plot.title = element_text(hjust = 0.5, size = 18),
    plot.subtitle = element_text(hjust = 0.5, size = 12)
  )
p

The values supplied to labs() are typically text strings, with \n used to specify line breaks, but you can also supply mathematical expressions wrapped in quote(). The rules by which these expressions are interpreted can be found by typing ?plotmath.

values <- seq(from = -2, to = 2, by = .01)
df <- data.frame(x = values, y = values ^ 3)

ggplot(df, aes(x, y)) +
  geom_path() +
  labs(y = quote(f(x) == x^3))

ggplot2 includes the annotate() function for annotation

yrng <- range(economics$unemploy)
xrng <- range(economics$date)
caption <- paste(
  strwrap("Unemployment rates in the US have varied a lot over the years", 40),
  collapse = "\n"
)

p <- ggplot(economics, aes(date, unemploy)) +
  geom_line() +
  annotate(
    geom = "text", x = xrng[1], y = yrng[2], 
    label = caption, hjust = 0, vjust = 1, size = 4
  )
p

The convenience of the annotate() function comes in handy in other situations.

For example, a common form of annotation is to highlight a subset of points by drawing larger points in a different colour underneath the main data set.

To highlight vehicles manufactured by Subaru you could use this to create the basic plot:

p <- ggplot(mpg, aes(displ, hwy)) +
  geom_point(
    data = dplyr::filter(mpg, manufacturer == "subaru"), 
    colour = "orange",
    size = 3
  ) +
  geom_point()
p

The problem with this is that the highlighted category would not be labelled. This is easily rectified using annotate()

p +
  annotate(geom = "point", x = 5.5, y = 40, colour = "orange", size = 3) +
  annotate(geom = "point", x = 5.5, y = 40) +
  annotate(geom = "text", x = 5.6, y = 40, label = "subaru", hjust = "left")

Theme

Use theme() or theme_*() if you just need to tweak the display of an existing theme.

p <- ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = factor(cyl))) +
  labs(
    x = "Engine displacement (litres)", 
    y = "Highway miles per gallon", 
    colour = "Number of cylinders",
    title = "Mileage by engine size and cylinders",
    subtitle = "Source: http://fueleconomy.gov"
  ) +
  # theme_bw() +
  theme_classic()
p

Output

Use ggsave() function to save the figure.

p <- ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = factor(cyl))) +
  labs(
    x = "Engine displacement (litres)", 
    y = "Highway miles per gallon", 
    colour = "Number of cylinders",
    title = "Mileage by engine size and cylinders",
    subtitle = "Source: http://fueleconomy.gov"
  ) +
  theme_classic()

if (!dir.exists('./output/figures')) {dir.create('./output/figures', recursive = T)}
ggsave('./output/figures/plot.pdf', plot = p, width = 5, height = 5)
ggsave('./output/figures/plot.png', plot = p, width = 5, height = 5)

Use pdf() png() jpeg() function to save the figure.

p <- ggplot(mpg, aes(displ, hwy)) +
  geom_point(aes(colour = factor(cyl))) +
  labs(
    x = "Engine displacement (litres)", 
    y = "Highway miles per gallon", 
    colour = "Number of cylinders",
    title = "Mileage by engine size and cylinders",
    subtitle = "Source: http://fueleconomy.gov"
  ) +
  theme_bw()

pdf('./output/figures/plot1.pdf')
p
dev.off()

png('./output/figures/plot1.png')
p
dev.off()