程序代写CS代考 Semester 2 2021 – cscodehelp代写

Semester 2 2021
Lecture 2: Visualisation – Part II

Basic Visualisation
✓Line plots ✓Boxplots
• Histograms • Bar charts
• Scatter plots • Heatmap
• Parallel Coordinate plots

Histograms with equal width bins
• Commonly used histograms
• x-axis: Divide the range of values into consecutive, non-overlapping,
and equal width intervals.
• y-axis: height proportional to the frequency of the bin

Histogram with variable width bins
• Not very common
• x-axis: Divide the range of values into consecutive, non-overlapping,
and variable width intervals.
• y-axis: height proportional to frequency density—the number of cases per unit of the variable. The rectangle has its area proportional to the frequency

Histogram with variable width bins
By Qwfp at English Wikipedia, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=20290683

Histograms – patterns
• Symmetric? Left/right skewed, unimodal, bimodal, multimodal?

Histograms – cont.
• Histograms of the same dataset may look different with different bins sizes
• Problem: Hard to choose an appropriate bin size for histogram • Too small → normal objects in empty/rare bins, false positive
• Too big → outliers in some frequent bins, false negative

Iris dataset
• Well known dataset introduced by statistician with 150 objects (https://en.wikipedia.org/wiki/Iris_flower_data_set)
• Four features • Petal width
• Petal length • Sepalwidth • Sepal length
• Three flower species (classes): • Setosa
• Virginica
• . . Mohlenbrock. USDA NRCS. 1995. Northeast wetland flora: Field office guide to plant species. Northeast National Technical Center, Chester, PA. Courtesy of USDA NRCS Wetland Science Institute.

Histogram – petal width of Iris flowers
Histograms of the same dataset may look different with different bins sizes

Outliers and histograms
Paternity case: “The study of outliers”, V. Barnett, Journal of the Royal Statistical Society, 27(3), 1978

Bar charts
• Summarise data points over a categorical variable.
X-axis: categorical variable
Y-axis: numeric value

Bar charts vs histograms
• Histograms:
X-axis is intervals of a numeric variable Y-axis is the frequency or frequency-density Only sensible to be ordered in one way
• Bar charts:
X-axis is a categorical variable Y-axis is a numeric quantity Can be in any order
They look similar but they have different semantics.

Leave a Reply

Your email address will not be published. Required fields are marked *