+ - 0:00:00
Notes for current slide
Notes for next slide

Data visualization is the art of summarizing information from a data source into a pleasant, non-distorted, informative visual representation. Pleasant and informative: those are the keys to deliver a high-impact message. Miss one of them and nobody will listen. The objective of this course is to give the keys to understand what a good visualization is, and provide some tools to make such visualizations.

The course will cover some theory: we will talk about things such as color, placement, font and how the brain perceives shapes. We will also work our way through the powerful R graphics engine and the ggplot2 library. Throughout this course there will be many small assignments.

Data analysis II

Visualisation

1 / 80

Laurent Bergé

University of Bordeaux, BxSE

09/12/2021

Who am I?

Laurent Bergé

2 / 80

Data visualization is the art of summarizing information from a data source into a pleasant, non-distorted, informative visual representation. Pleasant and informative: those are the keys to deliver a high-impact message. Miss one of them and nobody will listen. The objective of this course is to give the keys to understand what a good visualization is, and provide some tools to make such visualizations.

The course will cover some theory: we will talk about things such as color, placement, font and how the brain perceives shapes. We will also work our way through the powerful R graphics engine and the ggplot2 library. Throughout this course there will be many small assignments.

What do I do?

My fields

  • Applied economics ( = Data + methods)
  • Economics of Innovation ( = Large data)
  • (a bit of) Statistics ( = Computational methods)
3 / 80

R and me

The story

  • I've met R during my master in 2010
  • since then... it's a love story
4 / 80

R and me

The story

  • I've met R during my master in 2010
  • since then... it's a love story

The outcome of our relationship

  • 7 packages, 6 of which are public
  • the packages cover:
    • econometrics
    • data handling
    • statistical models
    • package development
    • graphics
4 / 80

This course

Data analysis II

  1. Data visualization
  2. Webscraping
5 / 80

Data visualization

This crash course is not...

  • really a tutorial on how to make graphs in R
6 / 80

Data visualization

This crash course is not...

  • really a tutorial on how to make graphs in R

it's rather about...

  • opening your eyes on what makes a good statistical graph

  • learning the tradoffs in graph making

6 / 80

Q: Why do we need data
visualisation?

8 / 80

A: Because numbers suck

9 / 80

Numbers suck

Sort these countries by GDP per capita:

Country GDP per capita (US$, ppp)
Belgium 50442.05
France 45149.10
Germany 53752.03
Spain 39907.56
United Kingdom 45504.84
10 / 80

Numbers suck

Sort these countries by GDP per capita:

11 / 80

Numbers suck II

Numbers
Graph
Anscombe

What can you say about these numbers?

x y mean_X sd_X mean_Y sd_Y cor_XY
55.4 97.2 54.3 16.8 47.8 26.9 -0.0645
51.5 96
46.2 94.5
42.8 91.4
40.8 88.3
38.7 84.9
35.6 79.9
33.1 77.6
... ...
12 / 80

Numbers suck III

Numbers
Graph

Do you see a pattern?

13 / 80

Why should you take visualization
seriously?

14 / 80

Because love is an axis story

15 / 80

No, it's still because numbers suck

  • remember that we're very limited human beings!

  • we only can grasp and understand the world with our 5 senses

  • abstract concepts (e.g. numbers) are tied to these physical senses

16 / 80

No, it's still because numbers suck

  • remember that we're very limited human beings!

  • we only can grasp and understand the world with our 5 senses

  • abstract concepts (e.g. numbers) are tied to these physical senses

  • to compare numbers we need to visualize them in our head

  • when we graphically represent numbers, we cut the middle man

16 / 80

Experiment: I give numbers 1, 10, 100, 1000 and tell them to close their eyes

Visualization: why?

  • powerful way to understand the data

  • powerful way to send a message (not the same as the previous point!)

  • most graphs suck (Excel do you hear me?): it's an easy way to stand out

17 / 80

event study graphs next year: many examples of how to make your message impactful based on a set of data the problem is that the students don't see the value of it if they haven't tried hard to make a nice graph so I should give them an assignement in class that they try hard to complete. Only then I can come with theory and advices.

Why R?

  • Powerful: imagination is the limit, you can graph anything you have in your mind (really)
18 / 80

Why R?

  • Powerful: imagination is the limit, you can graph anything you have in your mind (really)

  • Versatile: there are so many packages...

18 / 80

Why R?

  • Powerful: imagination is the limit, you can graph anything you have in your mind (really)

  • Versatile: there are so many packages...

  • Communication: smooth integration in HTML documents, create a website in minutes (Rmarkdown)

18 / 80

Small gallery of nice graphs

19 / 80

Shiny web apps

20 / 80

Maps

Graph
Code
21 / 80

Animations

Graph
Code
22 / 80

Nice spider stuff

23 / 80

Purpose of a graph

Good graphs have a main purpose.

24 / 80

Purpose of a graph

Good graphs have a main purpose.

Main purposes:

  • looking nice

  • send a message

  • exploratory

24 / 80

Purpose of a publication graph

Good graphs have a main purpose.

Main purposes:

  • looking nice

  • send a message ( = we're here!)

  • exploratory

25 / 80

Graph theory

26 / 80

Publication graph

What's the best graph?

  1. informative ( = clear and not misleading)
  2. pleasant
27 / 80

Publication graph

What's the best graph?

  1. informative ( = clear and not misleading)
  2. pleasant

But...

  • the medium should not take precedence on the content!
  • graphs which are too beautiful (too arty) distract the reader from the message
27 / 80

informative: get the information quickly and unambiguously

pleasant: looks nice -- hard to tell

stacking information is good but can reduce clarity there are ways to stack information w/t reducing too much clarity: colors / forms

The classic misleading graph

Problem?
Reply
28 / 80

Maybe too nice?

Basic
Too nice?
29 / 80

The 3 components of a graph

  1. content

  2. clarity

  3. attractiveness

30 / 80

Content

  • the information you want to share/observe

  • typically: relationships, time series, consequence of events, conditional distributions

  • you can stack several content in the same graph

31 / 80

Clarity

  • how easy it is to understand the content

  • defined by its measure: the time it takes to extract a piece of information the lower the time, the higher the clarity

32 / 80

Attractiveness

  • is the graph pleasant to look at?

  • do you stare at it for its own sake?

33 / 80

Graphs as a maths problem

Graph making is just about optimization

Precision approach (us)

graph=argmaxgclarity(g)s.t.content(g)τattractiveness(g)η

34 / 80

Graphs as a maths problem

Graph making is just about optimization

Precision approach (us)

graph=argmaxgclarity(g)s.t.content(g)τattractiveness(g)η

Infography approach

graph=argmaxgattractiveness(g)s.t.content(g)τclarity(g)γ

34 / 80

Tradeoffs

  • adding content necessarily reduces clarity (there are solutions to limit that)

  • the relationship between attractiveness and content/clarity is not clear. Graphs which are visually too attractive usually reduce clarity or weaken the message.

35 / 80

Example (iris again)

Graph 1
Criticism 1
Graph 2
Criticism 2
36 / 80

Golden rules of writing graph making

  1. Thou shalt not expect the reader to be interested in what you do
37 / 80

Golden rules of writing graph making

  1. Thou shalt not expect the reader to be interested in what you do

  2. Thou shalt not expect the reader to spend more than 5 seconds on your graph

37 / 80

Lessons of the rules

  • Clarity is cardinal

    • what is this graph about?

    • what does it represent?

    • what's the value of that point?

    • a graph should be self explanatory

38 / 80

Lessons of the rules

  • Clarity is cardinal

    • what is this graph about?

    • what does it represent?

    • what's the value of that point?

    • a graph should be self explanatory

  • Attractiveness is important, but second tier

    • hooks the reader

    • makes the reader stay longer and maybe decide to put in some efforts to understand your graph

38 / 80

clarity: think it that way: there is a threshold of effort above which the reader will just stop

attractiveness: that's why infographics in newspapers are so nice: to hook the uninterested reader

The reader as a shopper

The money is the effort. The reader has little money.

39 / 80

The reader as a shopper

The money is the effort. The reader has little money.

  • Clarity = value for money

    • you can extract more content with lower effort

    • if you don't get much for your money, you just switch to another product

39 / 80

The reader as a shopper

The money is the effort. The reader has little money.

  • Clarity = value for money

    • you can extract more content with lower effort

    • if you don't get much for your money, you just switch to another product

  • Attractiveness: increases the amount of money you wanna spend. It's the same effect as advertising:

    Buy that ticket to become a millionaire!
    vs
    Buy that ticket and you may become a millionaire if you win a lottery with probability one over one billion.
39 / 80

Typology of graphs

For publication (i.e. for the world)

  • low content: to send a single or two messages
  • clarity should be maximal: the reader must understand fast
  • high attractiveness: to hook the reader
40 / 80

Typology of graphs

For publication (i.e. for the world)

  • low content: to send a single or two messages
  • clarity should be maximal: the reader must understand fast
  • high attractiveness: to hook the reader

For exploration (i.e. for yourself)

  • high content: you want the most information in a single graph
  • tolerance for lower clarity: you know what you're doing, that's the price to pay to dispose of more content in a single graph. Although you still need a minimum of clarity
  • attractiveness: isn't the priority
40 / 80

Typology of graphs

For publication (i.e. for the world)

  • low content: to send a single or two messages
  • clarity should be maximal: the reader must understand fast
  • high attractiveness: to hook the reader

For exploration (i.e. for yourself)

  • high content: you want the most information in a single graph
  • tolerance for lower clarity: you know what you're doing, that's the price to pay to dispose of more content in a single graph. Although you still need a minimum of clarity
  • attractiveness: isn't the priority

You still have the same optimization problem to solve but the thresholds are different!

40 / 80

How to solve the optimization
problem?

41 / 80
  1. content

  2. clarity

  3. attractiveness

42 / 80

Attractiveness

  • what is considered nice largely depends on a consensus which can evolve over time

  • it depends on preferences that vary between persons and even within persons (just have a look at your haircuts of 10 years ago!)

  • we won't cover attractiveness since we can't please everyone

43 / 80

Attractiveness

  • what is considered nice largely depends on a consensus which can evolve over time

  • it depends on preferences that vary between persons and even within persons (just have a look at your haircuts of 10 years ago!)

  • we won't cover attractiveness since we can't please everyone

But...

  • there are guiding principles of proportions and colors

  • good news: clear graphs usually look ok

43 / 80
  1. content

  2. clarity

  3. attractiveness

44 / 80

Content

  • tailor your content, ask yourself:

    • is that information important?
    • what would the graph look like without it?
    • cut anything not necessary
  • add elements of context if needed (do they strengthen your point?)

  • is the graph faithful?

  • hierarchy of information! (the main message should be emphasized vàv other messages/the elements of context)

  • do I get the takeaway just from looking at the graph? Or do I need to read the text / get an oral explanation to get the central message?

45 / 80

next year => add examples for each of those cases => very important to add concrete example : make an exercise in class of having to graph an idea, and come with these concepts afterwards (so that the students can see what it really means)

  1. content

  2. clarity

  3. attractiveness

46 / 80

Clarity

47 / 80

The good news

There are many tips to make graphs clear!

48 / 80

What humans are good at

  • discerning colors
49 / 80

What humans are good at

  • discerning colors

  • discerning shapes

49 / 80

What humans are good at

  • discerning colors

  • discerning shapes

  • heights comparisons

49 / 80

What humans are good at

  • discerning colors

  • discerning shapes

  • heights comparisons

Let's leverage these three properties to make good graphs!

49 / 80

graphe avec et sand grille horizontale graph étroit => OK grphe large => difficile de comparer

Colors

  • colors are extremely powerful: used properly, they add an extra layer of information without requiring any extra graph-space and have almost no processing cost for the reader
50 / 80

Colors

  • colors are extremely powerful: used properly, they add an extra layer of information without requiring any extra graph-space and have almost no processing cost for the reader

Moral of the story: Abuse colors!

50 / 80

Colors: what for?

  • two main uses of color:

    • to distinguish categories
    • to represent intensity
51 / 80

Colors: what for?

  • two main uses of color:

    • to distinguish categories
    • to represent intensity
  • two different usages = two very different color picks!

51 / 80

Colors: OK, but which colors?

Colors: a tricky topic!

52 / 80

Colors: a mini primer

Colors can be decomposed in

  • Hue
  • Saturation
  • Lightness
53 / 80

Colors: a mini primer

Colors can be decomposed in

  • Hue
  • Saturation
  • Lightness

Hue

  • pure color

Saturation

  • quantity of grey added to the color
  • 0: only grey
  • 100: no grey

Lightness

  • quantity of black (<50) or white (>50) added to the color
  • 0: black
  • 50: pure color
  • 100: white
53 / 80

Colors: general rule

Don't use pure colors!

54 / 80

Colors: general rule

Don't use pure colors!

They're a bit of an eyesore. They're very bright, making them hard to read. That's why I had to use a heavy font. Note that the brightness depends on the hue: hues are not equal light-wise!

54 / 80

Colors: general rule

Don't use pure colors!

They're a bit of an eyesore. They're very bright, making them hard to read. That's why I had to use a heavy font. Note that the brightness depends on the hue: hues are not equal light-wise!

These are the same hues. I've just reduced saturation. The text is easier to read. I can even remove the heavy font!

54 / 80

Colors: distinguishing categories

  • use colors that are "different" but have some harmony

  • how to find harmonious color sets?

55 / 80

Colors: distinguishing categories

  • use colors that are "different" but have some harmony

  • how to find harmonious color sets?

Don't do it yourself!

55 / 80

Distinct colors: 3 rules

General rules (and like French grammar, there are always exceptions!)

  1. use colors that are "clear-cut" (we keep colors in mind using their names, ex: using blue + mid-blue-mid-green + green makes it hard to remember)
56 / 80

Distinct colors: 3 rules

General rules (and like French grammar, there are always exceptions!)

  1. use colors that are "clear-cut" (we keep colors in mind using their names, ex: using blue + mid-blue-mid-green + green makes it hard to remember)

  2. use different hues, not only different shades (shade variations are harder to remember and discern than hue variations; having both is even better)

56 / 80

Distinct colors: 3 rules

General rules (and like French grammar, there are always exceptions!)

  1. use colors that are "clear-cut" (we keep colors in mind using their names, ex: using blue + mid-blue-mid-green + green makes it hard to remember)

  2. use different hues, not only different shades (shade variations are harder to remember and discern than hue variations; having both is even better)

  3. don't use colors to distinguish too many categories:

    • 2-4: best
    • 5: to avoid, but can be OK if palette is good and depends on the graph
    • >5: forget about it (too big a hit on clarity)
56 / 80

1) pple remind the colors with their names, hence a color that is mid blue/mid green is harder to remember that stg blue. IF TIME => show two palettes with these differences

2) Same comment, don't use shades of a same hue, light blue/dark blue => hard to remember IF TIME => show two palettes with these differences

IF TIME: 4 panes: one with the names and associated colors, then the graph same set of two but with clear cut colors

3) What to do if I have 5+ categories to display? Cut content! In general, if you have a graph with 5+ categories to display, ask yourself if there's not a problem in terms of content

Distinct colors: Rule 3

OK, but what if I really have to display 6+ categories?

57 / 80

Distinct colors: Rule 3

OK, but what if I really have to display 6+ categories?

57 / 80

Distinct colors: Example

Graph 1
Graph 2
58 / 80

Colors: intensity

Two main types of things to represent:

  • min-max: dichotomic representation (you can use >2 colors though)
  • negative-zero-positive: at least three colors (dichromatic works well)
59 / 80

A) things with positive values only: unemployment, earnings, whatever

B) correlation, deviations from the mean, etc

Colors: intensity

Type of colors
min-max
neg-0-pos
60 / 80

Color: Accessibility

  • about 5% of the population is colorblind
61 / 80

Color: Accessibility

  • about 5% of the population is colorblind
61 / 80

Source for the numbers: https://www.colorblindguide.com/post/colorblind-people-population-live-counter

mostly men: 8% women: 1/200

The main consequence is that there is no silver bullet to represent discrete color points

Color accessibility in R

Two dedicated packages (among others):

62 / 80

Color accessibility

scico
viridis
63 / 80

Shapes

  • it's fairly easy for humans to discern shapes
  • however it's more difficult and less automatic than for colours
  • add shapes in scatterplots in conjunction with colours to facilitate reading
  • very useful in B&W
64 / 80

Shapes: example

Shape
Sh. + Col.
BW
BW + Sh.
65 / 80

Harnessing visual perception

  • humans are pretty good at comparing heights and widths
  • in contrast, they're pretty bad at comparing surfaces

use bars to compare numbers, use a grid to help make comparisons

66 / 80

Comparisons: Example

Difficult
Easy

Can you see a difference?

67 / 80

Pie charts can be good to show big discrepancies--but that's illustration then. For precise statistical graphs, forget about it.

If you have many categorical values to display, vertical bar graphs can be good.

68 / 80

although too much content will always be hard to read. The solution is to cut out the cars that are not important in our study or make a single group of them. => In the future I should also take care of explaining how to curate the content to sharpen the message.

Clarity: Helping the reader

69 / 80

Helping the reader

  • the data in your graph is central
  • the text describing the graph is not less central!
70 / 80

Helping the reader

  • the data in your graph is central
  • the text describing the graph is not less central!

Without description, a graph is worthless!

70 / 80

Describing a graph: First commandment

First commandment

You must absolutely ALWAYS label your axes, or else you'll endure divine wrath!1

71 / 80

Describing a graph: Other main textual components

  • the legend, if there are 2+ types of data
  • the title1
72 / 80

Describing a graph: Other main textual components

  • the legend, if there are 2+ types of data
  • the title1
To keep in mind

Text takes space, and space is limited! The text should be as short as possible while remaining as informative as possible.

72 / 80

Helping the reader: Tips

  • the text must be read easily: readable font + min. font size
73 / 80

Helping the reader: Tips

  • the text must be read easily: readable font + min. font size

  • hierarchy: the explanations should not take more space than the data. More important information should be more emphasized.

73 / 80

Helping the reader: Tips

  • the text must be read easily: readable font + min. font size

  • hierarchy: the explanations should not take more space than the data. More important information should be more emphasized.

  • repeat information

73 / 80

Helping the reader: Tips

  • the text must be read easily: readable font + min. font size

  • hierarchy: the explanations should not take more space than the data. More important information should be more emphasized.

  • repeat information

  • add a grid when relevant

73 / 80

Helping the reader: Tips

  • the text must be read easily: readable font + min. font size

  • hierarchy: the explanations should not take more space than the data. More important information should be more emphasized.

  • repeat information

  • add a grid when relevant

  • minimize the distance between the legend and the data: especially with 4+ categories

73 / 80

Making the text easy to read

  1. you must choose a readable font
  2. ensure the size of the text is easily readable (but not too big)
74 / 80

Font family

Font families
showtext
Graph

There are three main font families:

  • serif
  • sans-serif
  • monospace
  • sans-serif are recognized as the font family with the highest readability
  • in general, don't use serif fonts in graphs.
  • mono fonts can be nice when you have words with the same size that stack (like country codes)
75 / 80

Font size

  • do you think that once you find a graph displaying nicely on your screen, you can just ggsave it and it will be fine?
76 / 80

Font size

  • do you think that once you find a graph displaying nicely on your screen, you can just ggsave it and it will be fine?

Think twice!

76 / 80

Font size

  • do you think that once you find a graph displaying nicely on your screen, you can just ggsave it and it will be fine?

Think twice!

  • the final location of your graph is your document, not your screen!
  • the graph will be rescaled to fit its width in the document the font may become too small!
76 / 80

Font size: A helper function

Some functions may help you to deal with that: pdf_fit/png_fit from fplot:

  • use setFplot_page to define the size of the final document if needed (by default it's an A4 page with some usual margins)
# starts recording.
# - pt = 11 will save the graph with 11pt font size
# - w2h = 1.75 means that the width to height ratio is 1.75 (wide graph)
pdf_fit("path.pdf", pt = 11, w2h = 1.75)
# your graph
# ends recording.
# The final look of your graph is displayed in the viewer pane
fit_off()
77 / 80

Repeat + grid: An example

78 / 80

Minimize the legend-to-data distance

Difficult
Easy
79 / 80

Can you spot the problems?

Context
The graph
Criticism

I will show you a graph coming from a top (top) publication.

The research was careful and the results are of great relevance: there is no doubt on the quality and the importance of the research done.

Despite the stellar work, the graphs could be improved (i.e. clarity), at no cost.

80 / 80

Who am I?

Laurent Bergé

2 / 80

Data visualization is the art of summarizing information from a data source into a pleasant, non-distorted, informative visual representation. Pleasant and informative: those are the keys to deliver a high-impact message. Miss one of them and nobody will listen. The objective of this course is to give the keys to understand what a good visualization is, and provide some tools to make such visualizations.

The course will cover some theory: we will talk about things such as color, placement, font and how the brain perceives shapes. We will also work our way through the powerful R graphics engine and the ggplot2 library. Throughout this course there will be many small assignments.

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow