the principle is easy to grasp: we simply overlay successive forms on top of each other
low level operations \(=\) everything is possible
the principle is easy to grasp: we simply overlay successive forms on top of each other
low level operations \(=\) everything is possible
because everything is an overlay on stg already in place, the first plot is critical: you need to plan everything in advance!!! Doing simple stuff can be surprisingly difficult.
many commands which are not really intuitive. After 10 years, I still look up ?par
regularly.
You miss most of the show!
I'm a ggplot2
noob, base R is my home. I have a lot of sympathy for it.
much more user friendly: stacks all your layers before creating the graph and makes all the computations for you. You don't need to overthink how to set the stage any more.
millions of contributed packages
much more user friendly: stacks all your layers before creating the graph and makes all the computations for you. You don't need to overthink how to set the stage any more.
millions of contributed packages
there are numerous packages out there which make good graphs with a user-friendly interface (ie with minimal user input)
example: ggpubr or fplot for distributions; ggcorrplot for correlations; highcharter for many things, etc.
in a single line of code you get a graph of (usually) very decent quality
very good for exploratory graphs
in a single line of code you get a graph of (usually) very decent quality
very good for exploratory graphs
I strongly encourage you to learn how to work with inkscape\(\star\)
it's just crazy how fast you can edit/create images
that's indispensable in your skill set, you'll save so much time!
later: add gif edition of the castle image
you can do exactly what you want, as precisely as you want
with practice, you can edit very rapidly
you can do exactly what you want, as precisely as you want
with practice, you can edit very rapidly
the direct edition only comes after the first creation of the graph: hence you have to navigate across software
cannot be automated: all the work that you do with one graph, you'd have to do it again for a graph with new data
R dispose of multiple functions to display data:
plot()
: CORE graphical functionpoints()
, lines()
, abline()
, text()
: used to plot additional datadensity()
, hist()
, boxplot()
, etc...The function plot()
is the main graphical function of R (more precisely, it's a method).
By default it is a scatterplot between two variables, but it can be used to do much more than that.
Some functions preprocess the data, like density()
, and modify completely the behavior of plot()
when you apply it to the preprocessed data. More on that later.
When you apply plot()
, it creates a new graphic and the previous one is lost (of course there are exceptions...). To add several pieces of information, you'll need to use other functions.
plot
argumentsMain plot arguments relating to data:
x, y
: the dataxlim, ylim
: the limits of the plotting regioncol, pch, lty, lwd
: color, symbol, line type and line widthtype
: the type of plotlog
: whether to put the x/y axes to logarithmUsing type = "n"
hides the data, but EVERYTHING else is there. Can be useful when constructing complex graphics: i.e. when setting the stage.
plot(1:5, type = "n")
plot(1:20, pch = 1:20)grid()
plot(1:5, pch = 16, cex = 1:5, main = "cex: modify point size")
Lot of color possibilities:
rgb()
,hsv()
, etc RColorBrewer
Color interpolation:
rainbow(n)
, heat.colors(n)
, etc, create vectors of n
colors.colorRampPalette(c("white", "blue"))(5)
: create a vector of 5 colors between the colors white and blue.Nice introduction to R colors in the R-stats UBC course
Generate a 100 periods Brownian motion \(x_{t+1} = x_{t} + \epsilon_{t}\), \(\epsilon_{t}\sim N(0,1)\).
rainbow()
to set the color of each point. To add points/lines onto an existing plot:
lines()
points()
It behaves as the function plot()
and contains the same arguments (col, lty, cex, lwd, pch).
plot(1:5, ylim = c(-2, 5))lines(1:5 - 1)points(1:5 - 2)
In the following graph, the functions plot()
, lines()
and points()
have been called. Can you say to what command refers each graphical information, and in what order they have been called?
Re-generate the previous Brownian motion.
Plot it with both line and dots.
Generate another Brownian motion with \(\epsilon_{t}\sim N(0, 4)\).
Plot the two motions on a single graph, the second one should be of "firebrick"
color, have thick and dashed line and be of triangle symbol.
The function abline()
draws lines. Its arguments are:
h
: coordinate of horizontal linev
: coordinate of vertical linea, b
: intercept (a) and slope of a straight line. Shorthan exist: can take the result of an OLS regression (function lm()
) instead.plot(iris$Sepal.Length, iris$Petal.Width)abline(lm(Petal.Width ~ Sepal.Length, iris))abline(h = c(1, 2), v = c(5, 7), col = "gray", lty = 3)
You want to illustrate the relation between the variables "Sepal.Length" and "Petal.Width" for each species of the iris
data.
You can add text to an existing plot with the function text()
. The most important arguments are:
plot(5:1, col = "firebrick", pch = 18, xlim = c(0, 6))text(1, 5, "pos = default")text(2:5, 4:1, paste0("pos = ", 1:4), pos = 1:4)
iris
data.Base R can be surprisingly painful for doing seemingly simple stuff.
Base R can be surprisingly painful for doing seemingly simple stuff.
Base R can be surprisingly painful for doing seemingly simple stuff.
Base R is so painful, that if you stick to it, it will make you a good programmer (or a masochist!).
Remember though: it's not just painful, it's also extremely powerful!
It's very easy to write code that is specific to your current data! In fact, it's usually the first thing we do, and it works well.
It's very easy to write code that is specific to your current data! In fact, it's usually the first thing we do, and it works well.
plot(iris$Petal.Length, iris$Sepal.Length, col = iris$Species, pch = 20, cex = 2)text(1.5, 5, "Setosa", font = 2, cex = 4)text(4, 6, "Versicolor", font = 2, cex = 4, col = 2)text(6, 7, "Virginica", font = 2, cex = 4, col = 3)
If your data changes, even slightly, your code is messed up.
Changing Sepal.Length
into Sepal.Width
loses the legend:
plot(iris$Petal.Length, iris$Sepal.Width, col = iris$Species, pch = 20, cex = 2)text(1.5, 5, "Setosa", font = 2, cex = 4)text(4, 6, "Versicolor", font = 2, cex = 4, col = 2)text(6, 7, "Virginica", font = 2, cex = 4, col = 3)
If your data changes, even slightly, your code is messed up.
The data always changes!
If you want to replicate a hard coded graph to a new data set you:
If you want to replicate a hard coded graph to a new data set you:
I think I don't need to write that each of these three steps are highly error-prone, and can cost dearly.\(\star\)
very simple: don't hard code!
OK, here comes some tips
BAD
plot(iris$Petal.Length, iris$Sepal.Width, col = iris$Species, pch = 20, cex = 2)text(1.5, 5, "Setosa", font = 2, cex = 4)text(4, 6, "Versicolor", font = 2, cex = 4, col = 2)text(6, 7, "Virginica", font = 2, cex = 4, col = 3)
GOOD
FONT = 2CEX = 4plot(iris$Petal.Length, iris$Sepal.Width, col = iris$Species, pch = 20, cex = 2)text(1.5, 5, "Setosa", font = FONT, cex = CEX)text(4, 6, "Versicolor", font = FONT, cex = CEX, col = 2)text(6, 7, "Virginica", font = FONT, cex = CEX, col = 3)
when you decide to place some text here, or a legend there, how do you take the decision?
you decide based on heuristics (although you may not even notice there was a decision process!)
when you decide to place some text here, or a legend there, how do you take the decision?
you decide based on heuristics (although you may not even notice there was a decision process!)
the game is to extract the (often implicit) rules that made you take a decision\(\star\)
if you achieve to make the heuristic explicit: you win since now you can automatize it!
Remember when I asked to put the names in the middle of the points?
Remember when I asked to put the names in the middle of the points?
Remember when I asked to put the names in the middle of the points?
BAD
FONT = 2CEX = 4plot(iris$Petal.Length, iris$Sepal.Width, col = iris$Species, pch = 20, cex = 2)text(1.5, 5, "Setosa", font = FONT, cex = CEX)text(4, 6, "Versicolor", font = FONT, cex = CEX, col = 2)text(6, 7, "Virginica", font = FONT, cex = CEX, col = 3)
GOOD
FONT = 2CEX = 4plot(iris$Petal.Length, iris$Sepal.Width, col = iris$Species, pch = 20, cex = 2)bary = aggregate(cbind(Petal.Length, Sepal.Width) ~ Species, iris, mean)text(bary[1, 2], bary[1, 3], "Setosa", font = FONT, cex = CEX)text(bary[2, 2], bary[2, 3], "Versicolor", font = FONT, cex = CEX, col = 2)text(bary[3, 2], bary[3, 3], "Virginica", font = FONT, cex = CEX, col = 3)
BAD
FONT = 2CEX = 4plot(iris$Petal.Length, iris$Sepal.Width, col = iris$Species, pch = 20, cex = 2)bary = aggregate(cbind(Petal.Length, Sepal.Width) ~ Species, iris, mean)text(bary[1, 2], bary[1, 3], "Setosa", font = FONT, cex = CEX)text(bary[2, 2], bary[2, 3], "Versicolor", font = FONT, cex = CEX, col = 2)text(bary[3, 2], bary[3, 3], "Virginica", font = FONT, cex = CEX, col = 3)
GOOD
FONT = 2CEX = 4plot(iris$Petal.Length, iris$Sepal.Width, col = iris$Species, pch = 20, cex = 2)categ_val = levels(iris$Species)for(i in seq_along(categ_val)){ data = iris[iris$Species == categ_val[i], ] text(mean(data$Petal.Length), mean(data$Sepal.Width), categ_val[i], font = FONT, cex = CEX, col = i)}
Apply recursively Tip 1, Tip 2 and Tip 3 until you can't any more.
Apply recursively Tip 1, Tip 2 and Tip 3 until you can't any more.
BAD
FONT = 2CEX = 4plot(iris$Petal.Length, iris$Sepal.Width, col = iris$Species, pch = 20, cex = 2)categ = levels(iris$Species)for(i in seq_along(categ)){ data = iris[iris$Species == categ[i], ] text(mean(data$Petal.Length), mean(data$Sepal.Width), categ[i], font = FONT, cex = CEX, col = i)}
GOOD
FONT = 2CEX = 4x = iris$Petal.Lengthy = iris$Sepal.Widthcateg = iris$Speciesplot(x, y, col = categ, pch = 20, cex = 2)categ_val = levels(categ)for(i in seq_along(categ_val)){ who = categ == categ_val[i] text(mean(x[who]), mean(y[who]), categ_val[i], font = FONT, cex = CEX, col = i)}
Can those tips be concretely helpful?
To know that, let's summon the copy-paste demon.
plot(iris$Petal.Length, iris$Sepal.Width, col = iris$Species, pch = 20, cex = 2)text(1.5, 5, "Setosa", font = 2, cex = 4)text(4, 6, "Versicolor", font = 2, cex = 4, col = 2)text(6, 7, "Virginica", font = 2, cex = 4, col = 3)
The demon has immense powers
FONT = 2CEX = 4x = iris$Petal.Lengthy = iris$Sepal.Widthcateg = iris$Speciesplot(x, y, col = categ, pch = 20, cex = 2)categ_val = levels(categ)for(i in seq_along(categ_val)){ who = categ == categ_val[i] text(mean(x[who]), mean(y[who]), categ_val[i], font = FONT, cex = CEX, col = i)}
The demon is weak
If you've followed the tips, guess what:
If you've followed the tips, guess what:
You can create a function for your graph for free!
If you've followed the tips, guess what:
You can create a function for your graph for free!
Before
FONT = 2CEX = 4x = iris$Petal.Lengthy = iris$Sepal.Widthcateg = iris$Speciesplot(x, y, col = categ, pch = 20, cex = 2)categ_val = levels(categ)for(i in seq_along(categ_val)){ who = categ == categ_val[i] text(mean(x[who]), mean(y[who]), categ_val[i], font = FONT, cex = CEX, col = i)}
After
scatter_name = function(x, y, categ, font = 2, cex = 4){ plot(x, y, col = categ, pch = 20, cex = 2) categ_val = levels(categ) for(i in seq_along(categ_val)){ who = categ == categ_val[i] text(mean(x[who]), mean(y[who]), categ_val[i], font = font, cex = cex, col = i) }}scatter_name(iris$Petal.Length, iris$Sepal.Width, iris$Species)
guards you against, or limits, copy-paste problems
facilitates graph replications
guards you against, or limits, copy-paste problems
facilitates graph replications
you don't have to think to implementation details when running the function (reduces mental load)
guards you against, or limits, copy-paste problems
facilitates graph replications
you don't have to think to implementation details when running the function (reduces mental load)
it's very easy to include new features to the functions, and all the calls benefit from it
Code telling what you do and not how you do it increases productivity tremendously.
Code telling what you do and not how you do it increases productivity tremendously.
FONT = 2CEX = 4x = iris$Petal.Lengthy = iris$Sepal.Widthcateg = iris$Speciesplot(x, y, col = categ, pch = 20, cex = 2)categ_val = levels(categ)for(i in seq_along(categ_val)){ who = categ == categ_val[i] text(mean(x[who]), mean(y[who]), categ_val[i], col = i, font = FONT, cex = CEX)}
scatter_name(iris$Petal.Length, iris$Sepal.Width, iris$Species)
Code telling what you do and not how you do it increases productivity tremendously.
FONT = 2CEX = 4x = iris$Petal.Lengthy = iris$Sepal.Widthcateg = iris$Speciesplot(x, y, col = categ, pch = 20, cex = 2)categ_val = levels(categ)for(i in seq_along(categ_val)){ who = categ == categ_val[i] text(mean(x[who]), mean(y[who]), categ_val[i], col = i, font = FONT, cex = CEX)}
scatter_name(iris$Petal.Length, iris$Sepal.Width, iris$Species)
The code on the right will always be easier to understand than the code on the left.\(\star\)
you have a presentation in 30 minutes and have to finish that graph
you're making a graph that you think will never replicate\(\star\)
you have a presentation in 30 minutes and have to finish that graph
you're making a graph that you think will never replicate\(\star\)
the graph is really simple (in terms of lines of code!)
thinking in functions will change the way you code
it will clarify your code: it will be easier to understand and share, and less error-prone
due to the high fixed costs, 0 marginal cost nature of functions, you'll gain a lot of productivity
Remember the scatterplot with different colors and a linear fit? Let's redo it.
iris
data, and add a linear fitRemember the scatterplot with different colors and a linear fit? Let's redo it.
plot the scatterplot between the variables "Sepal.Length" and "Petal.Width" for each species of the iris
data, and add a linear fit
use segments()
to shorten the fit to the width of the scatterplot
Remember the scatterplot with different colors and a linear fit? Let's redo it.
plot the scatterplot between the variables "Sepal.Length" and "Petal.Width" for each species of the iris
data, and add a linear fit
use segments()
to shorten the fit to the width of the scatterplot
transform it into a function, with the appropriate arguments
Remember the scatterplot with different colors and a linear fit? Let's redo it.
plot the scatterplot between the variables "Sepal.Length" and "Petal.Width" for each species of the iris
data, and add a linear fit
use segments()
to shorten the fit to the width of the scatterplot
transform it into a function, with the appropriate arguments
add the argument line_extend
giving how much the length of the segment should be extended, in % of the graph width (default is 0)
Remember the scatterplot with different colors and a linear fit? Let's redo it.
plot the scatterplot between the variables "Sepal.Length" and "Petal.Width" for each species of the iris
data, and add a linear fit
use segments()
to shorten the fit to the width of the scatterplot
transform it into a function, with the appropriate arguments
add the argument line_extend
giving how much the length of the segment should be extended, in % of the graph width (default is 0)
control the arguments given by the user
So far we've seen only data content. But there's much more to make a good graph: all the surrounding information!
plot()
arguments: xlab
, ylab
: x/y axis labelssub
, main
: subtitle and main titleaxes
: whether to draw the axesann
: if FALSE
, cleans all x/y labelslegend()
: adds a legendtitle()
: adds axis labels and titles (close to previous plot arguments)axis()
: function to draw the axes of a plot.plot(-1:1, -1:1, xlab = "xlab", ylab = "ylab", main = "main", sub = "sub", type = "n")text(0, 0, 'plot(-1:1, -1:1, xlab = "xlab", ylab = "ylab", main = "main", sub = "sub")')
Do the scatterplot between variables "Sepal.Length" and "Petal.Width" of the iris
data.
You can add a title after a plot is done with title()
.
plot(1:5)title(main = "This is the title", sub = "This is the subtitle")
Use mtext
to add text in the margin of the graph. Can be used to insert a title.
plot(1:5)mtext("That's a basic graph", side = 3, line = 1, font = 2, adj = 0)
You can add a legend to clarify the content of a plot. A legend is a piece of information appearing inside the plotting region.
Here are the main arguments:
x
, y
: the location of the legend (top left corner). There exist shorthands! instead you can use "topleft", "right", etc.legend
: the content of the legend (a character vector.)pch
, lty
, col
, lwd
: the pch
, lty
, col
, lwd
associated to the legend vectorbty
: whether or not to show the legend box ("o" is default, "n" removes it)Keep in mind that there are many more arguments.
How to have a legend in the bottom? Here's some ready-made code.
legend_bottom = function(..., bty = "n"){ # Original credits to: https://stackoverflow.com/questions/3932038/plot-a-legend-outside-of-the-plotting-area-in-base-graphics/3932558 op = par(fig = c(0, 1, 0, 1), oma = c(0, 0, 0, 0), mar = c(0, 0, 0, 0), new = TRUE) on.exit(par(op)) plot(0, 0, type = 'n', axes = FALSE, ann = FALSE) legend("bottom", horiz = TRUE, bty = bty, ...)}
plot(iris$Sepal.Length, iris$Petal.Width, col = iris$Species, pch = 15)legend_bottom(legend = levels(iris$Species), col = 1:3, pch = 15)
plot(iris$Sepal.Length, iris$Petal.Width, col = iris$Species, pch = 15)legend_bottom(legend = levels(iris$Species), col = 1:3, pch = 15)
Oh, yeah the legend ends up being too close to the label... Remember about "setting the stage"? To make it nicer, you'd need to increase the bottom margin beforehand with, e.g., par(mar = c(7, 4, 2, 2))
:-/ That's one of the reasons why ggplot
is so much easier to handle.
You can modify the axes at will.
axis(i)
draws the ith axis with, 1: bottom, 2: left, 3: top and 4: right.
plot(1:5, axes = FALSE)axis(1)axis(4)
The function axis has the following main options:
side
: where to draw the axis (1: bottom, etc, 4:right)at
: where the ticks are drawnlabels
: the labels at the ticks (usually numbers)lwd
: line width of the horizontal line (if side is 1 or 3)lwd.ticks
: the line width of the ticks. Default is equal to lwdtck
: length of the ticks (in fraction of the plotting region)las
: orientation of the textMany other options.
plot(1:5, axes = FALSE, ann = FALSE)box() # draw a simple boxaxis(1, 1:4, c("First", "Second", "Third", "Fourth"), cex.axis = .9)axis(3, 5, "Fifth", lwd.ticks = 2)axis(2, 1:3, c("One", "Two", "Three"), las = 2)axis(2, 4, "Four", col.axis = "red")axis(2, 5, "Five", col.ticks = "blue", lwd = 2)
You can add mathematical expressions in graphics. Advice: for single mathematical formulas, use only function substitute()
.
Write a formula inside the function substitute()
(a bit like in Latex). Some elements that can compose the formula:
x[i]
for subscript \(x_{i}\)x^i
for superscript \(x^{i}\)x %in% y
, \(x\in y\)alpha
, beta
, gamma
, etc... paste(x, y, z)
: juxtapose the three componentsSee ?plotmath
for more details.
substitute()
contains a second argument. It can be used to replace some variables with numbers:
curve(sin(x)*sqrt(x), 1, 10000, log = "x", axes = FALSE, ann = FALSE)title(main = substitute(sin(x)%*%sqrt(x)))box() ; axis(2)for(i in 0:4) axis(1, at = 10**i, substitute(10^p, list(p = i)), cex.axis = 2)
The function par()
contains most graphical parameters (there are 72... ).
You can change the graphical parameters directly with par()
.
par(cex = 2, lwd = 2)plot(1:5, type = "o")
All plots will have these parameters as default. To reinitialize it, you can:
par(cex = 1, lwd = 1)
op = par(cex = 2, lwd = 2) # save old params# make the graphspar(op) # reinitialize it
Some useful parameters:
mar
: the margins of the plot: the space between the axes and the edge of the plot. It's a vector of length 4 (1st is bottom, last is right).cex
, cex.axis
, cex.lab
, cex.main
, cex.sub
: expansion factor for different situationscol
, col.axis
, col.lab
, col.main
, col.sub.
bg
: color of the background (default is white)family
: font family: can be "serif", "sans" and "mono"las
: orientation of text (1: horizontal, 3:vertical)See ?par
for more details.
You can combine multiple graphs in one. Simplest way is to use mfrow
:
op = par(mfrow = c(2, 2))for(i in 1:4) curve(sin(x) * x**i, -10, 10, ylab = paste0("sin(x) * x^", i))par(op) # reset to an unique frame
I'm afraid you won't be able to make nice graphs from this bare bones introduction!
It only brushes the topic but I hope that you could get some insights along the way! And especially that the functional programming approach convinced you!
Cheers!
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |