计算机代考程序代写 data science decision tree algorithm Where are we now? – cscodehelp代写
Where are we now?
XML Template (2011) [10.8.2011–6:17pm] [1–18] K:/IVI/IVI 415994.3d (IVI) [PREPRINTER stage]
Kandel et al. 3
Figure 1. The iterative process of wrangling and analysis. One or more initial data sets may be used and new versions may come later. The wrangling and analysis phases overlap. While wrangling tools tend to be separated from the visual analysis tools, the ideal system would provide integrated tools (light yellow). The purple line illustrates a typical iterative process with multiple back and forth steps. Much wrangling may need to take place before the data can be loaded within visualization and analysis tools, which typically immediately reveals new problems with the data. Wrangling might take
2
place at all the stages of analysis as users sort out interesting insights from dirty data, or new data become available or
Classification
• Introduction to classification
• Comparing Classification and Regression
• Decision tree classification
• Will make connections to entropy
• k nearest neighbour classification
3
Classification vs Regression
4
Classification
• Classification
• Making predictions about the future, based on historical data
• Predictive modelling/classification
• The sexy part of data science ?!
• A foundation for machine intelligence, AI, machines replacing humans and taking our jobs ….
5
Classification – Example 1
Predicting disease from microarray data Training
Gene 1
Gene 2
Gene 3
…
Gene n
Develop cancer <1 year
Person 1
2.3
1.1
0.3
...
2.1
1
Person 2
3.2
0.2
1.2
...
1.1
1
Person 3
1.9
3.8
2.7
...
0.2
1
...
...
...
...
...
...
..
Person m
2.8
3.1
2.5
...
3.4
0
Testing
Gene 1
Gene 2
Gene 3
...
Gene n
Develop cancer <1 year
Person X
2.1
0.9
0.6
...
1.9
?
6
Classification – Example 2
Animal classification
Test data
https://www-users.cs.umn.edu/~kumar/dmbook/ch4.pdf
7
Classification- Example 3
Banking: classifying borrowers
Test data
Tid
Home Owner
Marital status
Annual Income
Defaulted Borrower
11
No
Single
55K
?
8
Classification – Example 4
Detecting tax cheats
Tid
1 2 3 4 5 6 7 8 9 10
10
Refund
Marital
Taxable
Cheat
No No No No Yes No No Yes No Yes
Status
Income
Yes
Single
125K
No
Married
100K
No
Single
70K
Yes
Married
120K
No
Divorced
95K
No
Married
60K
Yes
Divorced
220K
No
Single
85K
No
Married
75K
No
Single
90K
Test data
Tid
Refund
Marital Status
Taxable Income
Tax Cheat
11
Yes
Married
125K
?
9
categorical categorical
continuous class
Classification: Definition
• Given a collection of records (training set )
• Eachrecordcontainsasetofattributes,oneclasslabel.
• Find a predictive model for each class label as a function of the values of all attributes, i.e. y = f(x1, x2, ..., xn)
• y: discrete value, target variable
• x1,... xn: attributes, predictors
• f: is the predictive model (a tree, a rule, a mathematical formula)
• Goal: previously unseen records should be assigned a class as accurately as possible
• Atestsetisusedtodeterminetheaccuracyofthemodel,i.e.the full data set is divided into training and test sets, with the training set used to build the model and the test set used to validate it
10
Classification framework
11
Regression: Recall Definition
• Givenacollectionofrecords(trainingset)
• Eachrecordcontainsasetofattributesandonetargetvariable
• Learn predictive model from data, i.e. y = f(x1, x2, ..., xn)
• y: target variable is continuous (key difference to classification) • x1,... xn: attributes, predictors
12
Regression example 1
Predicting ice-creams consumption from temperature: y = f(x)
?
13
Regression example 2
Predicting the activity level of a target gene
Gene 1
Gene 2
Gene 3
...
Gene n
Person 1
2.3
1.1
0.3
...
2.1
Person 2
3.2
0.2
1.2
...
1.1
Person 3
1.9
3.8
2.7
...
0.2
...
...
...
...
...
...
Person m
2.8
3.1
2.5
...
3.4
Gene n+1
3.2
1.1
0.2
...
0.9
Gene n+1
?
Gene 1
Gene 2
Gene 3
...
Gene n
Person m+1
2.1
0.9
0.6
...
1.9
14
Quiz
• Is it classification or regression?
• Prediction of rain or no rain tomorrow based on today’s rain data
• Prediction of the amount of rain tomorrow based on today’s rain data • Prediction of exam mark based on study hours
• Prediction of exam pass or fail based on study hours
• Prediction of salary based on degree results
• Prediction of complications of surgery based on pre-op medical data
15
Decision Trees
16
Decision Tree example: Prediction of tax cheats
Splitting Attributes
Tid
1 2 3 4 5 6 7 8 9 10
10
Refund
Marital
Taxable
Cheat
No No No No Yes No No Yes No Yes
Status
Income
Yes
Single
125K
No
Married
100K
No
Single
70K
Yes
Married
120K
No
Divorced
95K
No
Married
60K
Yes
Divorced
220K
No
Single
85K
No
Married
75K
No
Single
90K
Refund
Yes No
NO MarSt Single, Divorced
Married
NO
TaxInc
< 80K
NO
> 80K
YES
Training Data
Model: Decision Tree
17
categorical categorical
continuous class
Decision Tree example: Prediction of tax cheats 2
MarSt
NO
NO
Single, Divorced
Refund
TaxInc
< 80K
NO
Married
Tid
1 2 3 4 5 6 7 8 9 10
10
Refund
Marital
Taxable
Cheat
No No No No Yes No No Yes No Yes
Status
Income
Yes
Single
125K
No
Married
100K
No
Single
70K
Yes
Married
120K
No
Divorced
95K
No
Married
60K
Yes
Divorced
220K
No
Single
85K
No
Married
75K
No
Single
90K
Yes
No
> 80K
YES
Training Data
There can be more than one tree that fits the same data!
Model: Decision Tree
18
categorical categorical
continuous class
Decision Tree Classification Task
Attrib1
Yes No No Yes No No Yes No No No
Attrib2
Attrib3
125K 100K 70K 120K 95K 60K 220K 85K 75K 90K
Class
No No No No Yes No No Yes No Yes
Large
Medium
Small
Medium
Large
Medium
Large
Small
Medium
Small
Tid
1 2 3 4 5 6 7 8 9 10
Tree Induction algorithm
Induction
Learn Model
10
Model
Decision Tree
Training Set
Apply Model
Attrib2
Small
Medium
Large
Tid Attrib1 Attrib3
11 No 55K ?
12 Yes 80K ?
13 Yes 110K ?
14 No Small 95K ?
15 No Large 67K ?
Test Set
Class
Deduction
10
19
Apply DT Model to Test Data (1)
Start from the root of the tree
Refund
Yes No
Test Data
Refund
Marital
Taxable
Status
Income
Cheat
No
10
Married
80K
?
NO
MarSt
Single, Divorced
TaxInc
Married
NO
< 80K
NO
> 80K
YES
20
Apply DT Model to Test Data (2)
Test Data
Refund
Marital
Taxable
Status
Income
Cheat
No
10
Married
80K
?
Refund
Yes No
NO
MarSt
Single, Divorced
TaxInc
Married
NO
< 80K
NO
> 80K
YES
21
Apply DT Model to Test Data (3)
Test Data
Refund
Marital
Taxable
Status
Income
Cheat
No
10
Married
80K
?
Refund
Yes No
NO
MarSt
Single, Divorced
TaxInc
Married
NO
< 80K
NO
> 80K
YES
22
Apply DT Model to Test Data (4)
Test Data
Refund
Marital
Taxable
Status
Income
Cheat
No
10
Married
80K
?
Refund
Yes No
NO
MarSt
Single, Divorced
TaxInc
Married
NO
< 80K
NO
> 80K
YES
23
Apply DT Model to Test Data (5)
Test Data
Refund
Marital
Taxable
Status
Income
Cheat
No
10
Married
80K
?
Refund
Yes No
NO
MarSt
Single, Divorced
TaxInc
Married
NO
< 80K
NO
> 80K
YES
24
Apply DT Model to Test Data (6)
Test Data
Refund
Marital
Taxable
Status
Income
Cheat
No
10
Married
80K
?
Refund
Yes No
NO
MarSt
Single, Divorced
Married
NO
Assign Cheat to “No”
TaxInc
< 80K
NO
> 80K
YES
25
Apply DT Model to Test Data (7)
Start from the root of tree.
Test Data
Refund
Marital
Taxable
Cheat
Status
Income
10
No
Single
100K
?
Refund
Yes No
NO
MarSt
Single, Divorced
TaxInc
Married
NO
< 80K
NO
> 80K
YES
26
Decision Trees
Decision tree
• A flow-chart-like tree structure
• Internal node denotes a test on an attribute • Branch represents an outcome of the test
• Leaf nodes represent class labels
27
Decision Tree Algorithm
Tid
1 2 3 4 5 6 7 8 9 10
Tree Induction algorithm
Induction
Attrib1
Yes No No Yes No No Yes No No No
Attrib2
Attrib3
125K 100K 70K 120K 95K 60K 220K 85K 75K 90K
Class
No No No No Yes No No Yes No Yes
Large
Medium
Small
Medium
Large
Medium
Large
Small
Medium
Small
Learn Model
10
Model
Decision Tree
Training Set
Apply Model
Attrib2
Small
Medium
Large
Tid Attrib1 Attrib3
11 No 55K ?
12 Yes 80K ?
13 Yes 110K ?
14 No Small 95K ?
15 No Large 67K ?
Test Set
Class
Deduction
10
28
Decision Tree Algorithm (2)
• Many Algorithms:
• We will look at a representative one
(Hunt’s algorithm)
• How would you code an algorithm that automatically constructs a decision tree for the tax cheat data?
Tid
1 2 3 4 5 6 7 8 9 10
Refund
Yes No No Yes No No Yes No No No
Marital Status
Taxable Income
Cheat
No No No No Yes No No Yes No Yes
Single
125K
Married
100K
Single
70K
Married
120K
Divorced
95K
Married
60K
Divorced
220K
Single
85K
Married
75K
Single
90K
10
29
General Structure of Hunt’s Algorithm
• Let Dt be the set of training records that reach node t • General Procedure:
• If Dt contains only records that belong to the same class yt, then t is a leaf node labeled as yt
• If Dt is an empty set, then t is a leaf node labeled by the default class yd
• If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets
• Recursively apply the procedure to each subset
Tid
1 2 3 4 5 6 7 8 9 10
Refund
Yes No No Yes No No Yes No No No
Marital Status
Taxable Income
Cheat
No No No No Yes No No Yes No Yes
Single
125K
Married
100K
Single
70K
Married
120K
Divorced
95K
Married
60K
Divorced
220K
Single
85K
Married
75K
Single
90K
10
Dt
Refund Yes
No
30
Refund
Yes No No Yes No No Yes No No No
Marital Status
Taxable Income
Cheat
No No No No Yes No No Yes No Yes
Single
125K
Married
100K
Single
70K
Married
120K
Divorced
95K
Married
60K
Divorced
220K
Single
85K
Married
75K
Single
90K
Tid
Hunt’s Algorithm Example 1 2
If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each subset.
Refund Yes
3
4
5
6
7
8
9
10
10 D
t
No
31
Stopping condition: leaf node
• If Dt contains records that all belong to the same class yt, then t is a leaf node labeled as yt • If Dt is an empty set, then t is a leaf node labeled as the default class yd
Refund Yes
No
32
Decision Tree splitting example
Splitting Attributes
Refund
Yes No
NO MarSt Single, Divorced
Married
NO
TaxInc
< 80K
NO
> 80K
YES
33
Tree induction (creation)
34
Tree Induction
• Determine how to split the records
• How to specify the attribute test condition? • How to determine the best split?
• Determine when to stop splitting
• When a node only has instances of a single class
35
How to Specify Test Conditions?
• Depends on attribute types • Nominal
• Ordinal
• Continuous
• Depends on number of ways to split • 2-way split
• Multi-way split
36
Splitting Nominal Attributes
• Multi-waysplit:Useasmanypartitionsasdistinctvalues
CarType
Sports
Family
Luxury
• Binary split: Divide values into two subsets, find an optimal partitioning
{Sports, CarType Luxury}
{Family}
OR {Family, CarType Luxury}
{Sports}
37
Splitting Ordinal Attributes
• Multi-way split: Use as many partitions as distinct values
Size
Medium
{Small, Medium}
Size
{Medium, Size OR Large}
Small
Large
• Binary split: Divides values into two subsets, find optimal partitioning
{Large}
{Small}
• What about this split?
{Small, Large}
Size
{Medium}
38
Splitting Continuous Attributes
• Different ways of handling
• Discretisation to form an ordinal categorical attribute
• Static – discretise once at the beginning
• Dynamic – ranges can be found by equal interval bucketing, equal frequency bucketing (percentiles) or clustering
• Binary Decision: (A < v) or (A 3 v)
• consider all possible splits and find the best cut
• can be more computing intensive
39
Splitting Continuous Attributes
T axable Income > 80K?
Yes No
(i) Binary split
T axable Income?
Tid Refund
1 Yes
2 No
3 No
4 Yes
5 No
6 No
7 Yes
8 No
9 No
10 No
Marital Taxable
< 10K
> 80K
Single 125K Married 100K Single 70K Married 120K Divorced 95K Married 60K Divorced 220K Single 85K Married 75K Single 90K
No No No No Yes No No Yes No Yes
[10K,25K) [25K,50K) [50K,80K) (ii) Multi-way split
10
Status
Income Cheat
40
Tree Induction
• Determine how to split the records
• How to specify the attribute test condition? • How to determine the best split?
• Determine when to stop splitting
• When a node only has instances of a single class
41
How to determine the
Before Splitting: 10 records of class 0, 10 records of class 1
Own Car?
Car Type?
Family Sports
Student ID?
c1 c
C0:1 … C0:1 C0:0 … C0:0
Yes
No
Luxury
c c20 11
Which test condition is better?
10
C1: 0 C1: 0 C1: 1
C1: 1
C0: 6 C1: 4
C0: 4 C1: 6
C0: 1 C1: 3
C0: 8 C1: 0
C0: 1 C1: 7
42
How to determine the best split
• Greedy approach: Nodes with homogeneous class distribution are better • Need a measure of node impurity:
C0: 5 C1: 5
Non-homogeneous, Homogeneous,
High degree of impurity Low degree of impurity
C0: 9 C1: 1
43
Measures of Node Impurity
• Entropy
• We have seen entropy in the feature correlation section, where it was used to
measure the amount of uncertainty in an outcome
• Entropy can also be viewed as an impurity measure
• The set {A,B,C,A,A,A,A,A} has low entropy: low uncertainty and low impurity • The set {A,B,C,D,B,E,A,F} has high entropy: high uncertainty and high impurity
44
Node Impurity Criteria based on entropy
Entropy (H) at a given node t:
H(t)=−∑p(j|t)logp(j|t) j
(NOTE: p( j | t) is the relative frequency of class j at node t) Measures homogeneity of a node:
• Maximum (log nc) when records are equally distributed amongst all classes (nc is number of classes)
• Minimum (0.0) when all records belong to one class
45
Examples of computing Entropy
P(C1)=0/6=0 P(C2)=6/6=1 Entropy=–0log2 0–1log2 1=–0–0=0
P(C1) = 1/6 P(C2) = 5/6
Entropy = – (1/6) log2 (1/6) – (5/6) log2 (5/6) = 0.65
P(C1) = 2/6 P(C2) = 4/6
Entropy = – (2/6) log2 (2/6) – (4/6) log2 (4/6) = 0.92
H(t)=−∑p(j|t)log2 p(j|t) j
C1
0
C2
6
C1
1
C2
5
C1
2
C2
4
46
Question: What is the entropy?
P(C1) = P(C2) = Entropy =
H(t)=−∑p(j|t)log2 p(j|t) j
C1
13
C2
20
47
Question: What is the entropy?
P(C1) = 13 P(C2) = 20
H(t)=−∑p(j|t)log2 p(j|t) j
C1
13
C2
20
33
33
Entropy = – ((13/33)log(13/33) + (20/33)log(20/33)) = 0.97
48
How good is a Split?
Compare the impurity (entropy) of parent node (before splitting) with the impurity (entropy) of the children nodes (after splitting)
Gain = H(Parent) HX(Parent|Child) k N(vj)
= H(Parent) j=1 N H(vj)
H(vj): impurity measure of node vj
j: children node index
N(vj): number of data points in child node vj N: number of data points in parent node The larger the gain, the better
49
How good is a Split?
Gain = H(Parent) HX(Parent|Child) k N(vj)
= H(Parent) j=1 N H(vj)
Note: the information gain is equivalent to the mutual information
between the class feature and the feature being split on
Thus splitting using the information gain is to choose the feature with highest information shared with the class variable
50
How good is a Split?
Own Car?
Car Type?
Student ID?
c1 c
C0:1 … C0:1 C0:0 … C0:0
Yes
No Family Sports
Luxury
c c20 11
10
C1: 0 C1: 0 C1: 1
C1: 1
C0: 6 C1: 4
C0: 4 C1: 6
C0: 1 C1: 3
C0: 8 C1: 0
C0: 1 C1: 7
51
How to determine the ?
Before Splitting: 10 records of class 0, 10 records of class 1
C0:10 C1:10
C0:10 C1:10
Own Car?
Car Type?
Family Sports
Student ID?
Yes
No
c c20 11
Luxury
c1 c
C0:1 … C0:1 C0:0 … C0:0
C1: 0 C1: 0 C1: 1 C1: 1
C0: 6 C1: 4
C0: 4 C1: 6
C0: 1 C1: 3
C0: 8 C1: 0
Which test condition is better?
– Compute the gain of both splits
– Choose the one with the larger gain
52
C0: 1 C1: 7
10
How to determine the ?
Before Splitting: 10 records of class 0, 10 records of class 1
Own Car?
Car Type?
No Family Sports
Student ID?
Yes
c c20 11
Luxury
c1 c
C0:1 … C0:1 C0:0 … C0:0
C1: 0 C1: 0 C1: 1 C1: 1
10
C0: 6 C1: 4
C0: 4 C1: 6
C0: 1 C1: 3
Own Car: Information gain=0.029
Car type: Information gain=0.62
We should choose Car type as the better split
53
C0: 8 C1: 0
C0: 1 C1: 7
Creating a decision tree
• Calculate information gain [Left Child],[Right Child] for all:
• Refund [Yes], [No]
• Marital status [Single],[Married],[Divorced]
• Taxable income
• [60,60], (60,220]
• [60,70], (70,220]
• [60,75],(75,220]
• [60,85],(85,220]
• [60,90],(90,220]
• [60,95],(95,220]
• [60,100],(100,220]
• [60,120],(120,220]
• [60,125],(125,220]
• Choose feature + split with the highest information gain and use this as the root node and its split
• Do recursively, terminating when a node consists of only Cheat=No or Cheat=Yes.
Tid
1 2 3 4 5 6 7 8 9 10
Refund
Yes No No Yes No No Yes No No No
Marital Status
Single Married Single Married Divorced Married Divorced Single Married Single
Taxable Income
125K 100K 70K 120K 95K 60K 220K 85K 75K 90K
Cheat
No No No No Yes No No Yes No Yes
10
54
Question 2c) from 2016 exam
Given a dataset with two classes, A and B, suppose the root node of a decision tree has 50 instances of class A and 150 instances of class B. Consider a candidate split of this root node into two children, the first with (25 class A and 25 class B), the second with (25 class A and 125 class B). Write a formula to measure the utility of this split using the entropy criterion. Explain how this formula helps measure split utility.
55
Question 2c) from 2016 exam
Given a dataset with two classes, A and B, suppose the root node of a decision tree has 50 instances of class A and 150 instances of class B. Consider a candidate split of this root node into two children, the first with (25 class A and 25 class B), the second with (25 class A and 125 class B). Write a formula to measure the utility of this split using the entropy criterion. Explain how this formula helps measure split utility.
56
Question 2c) from 2016 exam
Given a dataset with two classes, A and B, suppose the root node of a decision tree has 50 instances of class A and 150 instances of class B. Consider a candidate split of this root node into two children, the first with (25 class A and 25 class B), the second with (25 class A and 125 class B). Write a formula to measure the utility of this split using the entropy criterion. Explain how this formula helps measure split utility.
A:50 B:150
A:25 B:25
A:25 B:125
57
Question 2c) from 2016 exam
Given a dataset with two classes, A and B, suppose the root node of a decision tree has 50 instances of class A and 150 instances of class B. Consider a candidate split of this root node into two children, the first with (25 class A and 25 class B), the second with (25 class A and 125 class B). Write a formula to measure the utility of this split using the entropy criterion. Explain how this formula helps measure split utility.
Entropy(root)= -(50/200)log(50/200) –(150/200)log(150/200)
A:25 B:25
A:25 B:125
A:50 B:150
58
Question 2c) from 2016 exam
Given a dataset with two classes, A and B, suppose the root node of a decision tree has 50 instances of class A and 150 instances of class B. Consider a candidate split of this root node into two children, the first with (25 class A and 25 class B), the second with (25 class A and 125 class B). Write a formula to measure the utility of this split using the entropy criterion. Explain how this formula helps measure split utility.
Entropy(root)= -(50/200)log(50/200) –(150/200)log(150/200)
Entropy(left child)= A:25
A:25 B:125
A:50 B:150
-(25/50)log(25/50) -(25/50)log(25/50)
B:25
59
Question 2c) from 2016 exam
Given a dataset with two classes, A and B, suppose the root node of a decision tree has 50 instances of class A and 150 instances of class B. Consider a candidate split of this root node into two children, the first with (25 class A and 25 class B), the second with (25 class A and 125 class B). Write a formula to measure the utility of this split using the entropy criterion. Explain how this formula helps measure split utility.
Entropy(root)= -(50/200)log(50/200) –(150/200)log(150/200)
Entropy(left child)= A:25 -(25/50)log(25/50) B:25 -(25/50)log(25/50)
A:25 B:125
Entropy(right child)= -(25/150)log(25/150)
–(125/150)log(125/150)
A:50 B:150
60
Question 2c) from 2016 exam
Given a dataset with two classes, A and B, suppose the root node of a decision tree has 50 instances of class A and 150 instances of class B. Consider a candidate split of this root node into two children, the first with (25 class A and 25 class B), the second with (25 class A and 125 class B). Write a formula to measure the utility of this split using the entropy criterion. Explain how this formula helps measure split utility.
Split utility= Information Gain
=Entropy(root) – Entropy(root|split)
=Entropy(root) – [(50/200)*Entropy(left child) +(150/200)*Entropy(right child)]
61
MI trees: advantages and disadvantages
• Advantages
• Easy to interpret
• Relatively efficient to construct
• Fast for deciding about a test instance
• Disadvantages
• A simple greedy construction strategy, producing a set of If ..then rules.
Sometimes this is too simple for data with complex structure: • Everything should be as simple as possible, but not simpler
• For complex datasets, the tree might grow very big and difficult to understand 62
K-nearest neighbour classifier
63
Nearest Neighbour Classifier
Basic idea: “If it walks like a duck and quacks like a duck, then it’s probably a duck”
Training Records
Choose k of the “nearest” records
Compute Distance
Test Record
64
Nearest-Neighbour Classifier
Unknown record
! Requires three things
– The set of stored records
– Distance Metric to compute distance between records
– The value of k, the number of nearest neighbours to retrieve
! To classify an unknown record:
1. Compute distance to other
training records
2. Identify k nearest neighbors
3. Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking a majority vote)
65
Definition of Nearest Neighbour
X
X
X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
K-nearest neighbours of a record x are data points that have the k smallest distance to x
66
Distance measure
• Compute distance between two points p=(p1,p2,…), • Euclidean distance
d(p,q)= å(p-q)2
q=(q1,q2,…)
i
ii
• Can also use Pearson coefficient (similarity measure) • Determine the class from nearest neighbour list
• take the majority vote of class labels among the k-nearest neighbours
• weigh the vote according to distance, e.g. w = % &!
67
1 nearest-neighbour
Voronoi Diagram defines the classification boundary
The green area takes the class of the green point
68
K-Nearest Neighbour classifier
Choosing a value for k:
• If k is too small, its sensitive to noise points
• If k is too large, the neighbourhood may include points from other classes
X
http://2.bp.blogspot.com/-EK4tA2525EM/U-c6Q4jJuwI/AAAAAAAADjw/hdqRXuunpnQ/s1600/knn.png
69
Points to remember
• Understand what is the difference between classification and regression and why it is useful to build models for these tasks
• Understand how a decision tree may be used to make predictions about the class of a test instance
• Understand the key steps in building a decision tree: • how to split the instances
• how to specify the attribute test condition • how to determine the best split
• how to decide when to stop splitting
70
Points to remember
• Understand the operation and rationale of the k nearest neighbour algorithm for classification
• Understand the advantages and disadvantages of using the k nearest neighbour or decision tree for classification
71