计算机代考程序代写 case study ETX2250/ETF5922: Data Visualization and Analytics – cscodehelp代写

ETX2250/ETF5922: Data Visualization and Analytics
Good and Bad visualization
Lecturer:
Department of Econometrics and Business Statistics 
 Week 4

Why visualisation?
Gain insights from data. Overview of large datasets. Search for:
Trends Relationships Irregularites
In business data visualisation is a crucial tool to support decision making.
2/46

Tesla Motors
Tesla vehicles collect a large number of data from sensors. The plot on the next slide shows tyre pressure over time This visualisation was used to
Check pressure when vehicles left factory,
See how long customers take to respond to a low pressure alert, Do predictive modelling on when tyres go at.
3/46

Tesla Motors
You can read more about the case study here.
4/46

Plotting Principles

Tufte’s principles
Principles of good practice in data visualisation are outlined in The Visual Display of Quantitative Information by . These include:
Avoid distorting what the data have to say
Present many numbers in a small space
Make large data sets coherent
Encourage the eye to compare different pieces of data
6/46

Bad plots
Tufte also provides a catalog of bad plots.
What makes these plots bad can be put into three categories.
Taste (Aesthetic) Perceptual
Data
7/46

Bad Taste

An ugly plot
9/46

is the inclusion of elements that are not necessary to communicate the information. The inclusion of the following can be considered chartjunk:
Heavy gridlines. Unnecessary text. Pictures within the chart.
10/46

Another example
18
16
14
12
10
8
6
4
2
0
12345
11/46

Guidelines
Aim for a high Data Ink Ratio
Also aim for a high Data Density
If data density is small, perhaps use a table
12/46
cihparg fo aerA = ytisneD ataD stniop atad fo rebmuN
cihparg ni desu knI = oitar knI ataD atad yalpsid ot desu knI

Low data density
13/46

Bad… but not misleading
Note that although the previous plots look bad, strictly speaking they do not mislead. Also maximising data-ink ratio should be seen as a guideline rather than a strict rule. For instance the default background for ggplot2 is arguably chartjunk
There are good reasons for using it.
14/46

ggplot2
15/46

Wickham on the grey background
“We can still see the gridlines to aid in the judgement of position (Cleveland, 1993b), but they have little visual impact and we can easily “tune” them out… Finally, the grey background creates a continuous eld of colour which ensures that the plot is perceived as a single visual entity”
ggplot2: Elegant Graphics for Data Analyis
16/46

Bad Perception

What can we perceive?
Human perception is a broad eld that takes in ideas from psychology and philosophy. For data visualisation we can perceive:
Length Area Volume Shape Position Color Angle
18/46

Errors of perception
Data visualisation is all about mapping data to things we can perceive. This should not be done in a way that is innacurate or misleading.
The following plots provide some examples of what can go wrong.
19/46

Confusing length and area
20/46

Confusing length and area
On the previous plot the number of customers is represented by length (height of computer) However the area of the 2D pictures of computers scale up more than their heights.
Also the picture leads us to imagine a 3D computer making this effect worse.
The value for Mac is only about 3 to 4 times more than for None but we perceive the difference to be much more.
21/46

Beware 3D
22/46

Beware 3D
23/46

Beware 3D
Dicult to line up the heights of bars with the actual values
Closer green bar (MSN) looks bigger.
On the pie chart rendering in 3D makes the blue segment (Google) look the biggest. Do not use three dimensions when two will work well.
24/46

Lie Factor
The lie factor is given by
The lie factor should be 1.
25/46
atad ni tceffe fo eziS = rotcaf eiL hparg ni tceffe fo eziS

Road miles (from Tufte)
26/46

Effects
The data says that mileage rose from 18 to 27.5 which is a 53% increase.
The line on the graph increases from 0.6 inches to 5.3 inches which is a 783% increase! The lie factor is
27/46
41 ≈ 35/387

Bad Data

Bad Data
Sometimes there is nothing wrong with the plot but with the data.
On the following slide is a plot comparing the cost of going to college in the US against the salaries of college graduates.
Can you nd problems with this graph?
29/46

College cost
30/46

Problems
There is nothing incorrect about this graph.
However the message is misleading.
The income is a yearly income while the cost of college is over four years (and only paid once). Also it does not show the income of people who are not college graduates.
Think carefully about comparisons on a plot.
Make sure your conclusions align with what is in the plot.
31/46

Thexandyaxis

The y-axis
Watch this video.
Are we interested in the size of the variable rather than changes in the variable? Is zero a reasonable value for the variable to take?
Are we using a bar chart?
Answering yes to these questions means we should give more consideration to including zero on the y- axis.
33/46

Stock Prices
From this graph we conclude that Twitter stock prices increased dramatically on April 26.
34/46

A longer term view
Not that dramatic anymore.
35/46

More bad plots

Electrolux
37/46

McKesson
38/46

McKinsey
39/46

Pizza
40/46

Pie chart
41/46

Climate Change
42/46

Season
43/46

Narcotics
44/46

Summary
Graphs can be misleading
The default options in ggplot are chosen to protect the user from errors of taste and errors of perception. Nothing protects you from using bad or misleading data…
…except for your own common sense
45/46

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Lecturer:
Department of Econometrics and Business Statistics 
 Week 4

Leave a Reply

Your email address will not be published. Required fields are marked *