程序代写CS代考 Semester 2 2021 – cscodehelp代写
Semester 2 2021
Lecture 2: Visualisation – Part II
Basic Visualisation
✓Line plots ✓Boxplots
• Histograms • Bar charts
• Scatter plots • Heatmap
• Parallel Coordinate plots
Histograms with equal width bins
• Commonly used histograms
• x-axis: Divide the range of values into consecutive, non-overlapping,
and equal width intervals.
• y-axis: height proportional to the frequency of the bin
Histogram with variable width bins
• Not very common
• x-axis: Divide the range of values into consecutive, non-overlapping,
and variable width intervals.
• y-axis: height proportional to frequency density—the number of cases per unit of the variable. The rectangle has its area proportional to the frequency
Histogram with variable width bins
By Qwfp at English Wikipedia, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=20290683
Histograms – patterns
• Symmetric? Left/right skewed, unimodal, bimodal, multimodal?
Histograms – cont.
• Histograms of the same dataset may look different with different bins sizes
• Problem: Hard to choose an appropriate bin size for histogram • Too small → normal objects in empty/rare bins, false positive
• Too big → outliers in some frequent bins, false negative
Iris dataset
• Well known dataset introduced by statistician with 150 objects (https://en.wikipedia.org/wiki/Iris_flower_data_set)
• Four features • Petal width
• Petal length • Sepalwidth • Sepal length
• Three flower species (classes): • Setosa
• Virginica
• . . Mohlenbrock. USDA NRCS. 1995. Northeast wetland flora: Field office guide to plant species. Northeast National Technical Center, Chester, PA. Courtesy of USDA NRCS Wetland Science Institute.
Histogram – petal width of Iris flowers
Histograms of the same dataset may look different with different bins sizes
Outliers and histograms
Paternity case: “The study of outliers”, V. Barnett, Journal of the Royal Statistical Society, 27(3), 1978
Bar charts
• Summarise data points over a categorical variable.
X-axis: categorical variable
Y-axis: numeric value
Bar charts vs histograms
• Histograms:
X-axis is intervals of a numeric variable Y-axis is the frequency or frequency-density Only sensible to be ordered in one way
• Bar charts:
X-axis is a categorical variable Y-axis is a numeric quantity Can be in any order
They look similar but they have different semantics.