程序代写代做代考 python AI flex decision tree Keras javascript assembly data mining Bayesian cuda ER Java GPU algorithm chain deep learning matlab FACULTY OF SCIENCE

FACULTY OF SCIENCE

AND TECHNOLOGY

MSc. Applied Data Analytics

June 2016

Learning Deep Structured Network for Identification of

Mixed Patterns in Semiconductor Wafer Maps

Van Hoa Trinh

DISSERTATION DECLARATION

This Dissertation/Project Report is submitted in partial fulfilment of the requirements

for a Masters degree at Bournemouth University. I declare that this Dissertation/

Project Report is my own work and that it does not contravene any academic offence

as specified in the University’s regulations.

Retention

I agree that, should the University wish to retain it for reference purposes, a copy of

my Dissertation/Project Report may be held by Bournemouth University normally for

a period of 3 academic years. I understand that my Dissertation/Project Report may

be destroyed once the retention period has expired. I am also aware that the University

does not guarantee to retain this Dissertation/Project Report for any length of time (if

at all) and that I have been advised to retain a copy for my future reference.

Confidentiality

I confirm that this Dissertation/Project Report does not contain information of a com-

mercial or confidential nature or include personal information other than that which

would normally be in the public domain unless the relevant permissions have been ob-

tained. In particular any information which identifies a particular individuals religious

or political beliefs, information relating to their health, ethnicity, criminal history or

personal life has been anonymised unless permission for its publication has been granted

from the person to whom it relates.

The copyright for this dissertation remains with me.

Requests for Information

I agree that this Dissertation/Project Report may be made available as the result of a

request for information under the Freedom of Information Act.

Signed:

Name: Van Hoa Trinh

Date: 30/06/2016

Programme: Msc. Applied Data Analytics

Abstract

Wafer defect detection has been the focal research in wafer manufacturing industry. A

big gap on research of identification of mixed defect patterns on semiconductor wafers is

the main motivation for this thesis. This dissertation illustrates the design and imple-

mentation of wafer map defect detection based on deep convolutional neural networks.

It is the first research to test the performance of deep learning model on mixed de-

fect pattern recognition. The thesis starts with a literature review of defect detection

processes with various machine learning methods recently used by researchers and the

shortcomings of these methods. It then describes a detailed review of convolutional neu-

ral networks and proposes an appropriate parameter for wafer defect detection, of which

the main part is implemented. The experimental results are discussed in this thesis and

justified based on a comprehensive model selection performed. All experiments were run

on an operating system of Windows 7 Professional 64-bit (6.1, Build 7601), processor of

Intel(R) Core(TM) i5-4690 CPU @ 3.50GHz (4 CPUs), 3.5GHz and installed memory

of 16GB. A defect detection test accuracy of 79.6%

Acknowledgements

I would like to express my deep gratitude to my supervisor Dr. Paul Yoo. His great

guidance and very insightful suggestions and comments have enabled me to write up

this study. Finally yet importantly, I extend my gratefulness to my beloved family and

my darling for being my moral support and everything they have done to me. I could

never be what I am today without them.

iii

Contents

Abstract ii

Acknowledgements iii

List of Figures vii

List of Tables ix

Abbreviations x

1 Introduction 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Structure of The Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Identification of Mixed Defect Pattern Classification in Semiconductor
Wafer Maps 8

2.1 Defect Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1 Generated Wafer Map . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1.1 Statistical Models . . . . . . . . . . . . . . . . . . . . . . 10

2.1.1.2 Probabilistic Models . . . . . . . . . . . . . . . . . . . . . 13

2.1.2 Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 De-noising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.4 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 Spatial Randomness Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.6 Classification Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.6.1 Traditional Machine Learning Techniques . . . . . . . . . . . . . . 20

2.6.2 Deep Learning Techniques . . . . . . . . . . . . . . . . . . . . . . . 21

2.7 Root Case Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.8 Machine Learning In Identification On Defect Pattern In Semiconductor
Wafers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3 Learning Deep Structured Convolutional Neural Network 28

Contents

3.1 Wafer Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Tools And Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.1 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.2 Experimental Environment . . . . . . . . . . . . . . . . . . . . . . 31

3.3 Convolutional Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.3.1 Local Receptive Fields . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3.2 Feature Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3.3 Weight Sharing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3.4 Estimation of the Feature Map Volume . . . . . . . . . . . . . . . 37

3.3.5 Experimental Design of Convolutional Layers . . . . . . . . . . . . 39

3.4 Activation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4.1 Background of Activation Functions . . . . . . . . . . . . . . . . . 40

3.4.2 Experimental Design of Activation Functions . . . . . . . . . . . . 41

3.5 Pooling Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.5.1 Background of Pooling Layers . . . . . . . . . . . . . . . . . . . . . 41

3.5.2 Experimental Design Of Pooling Layer . . . . . . . . . . . . . . . . 42

3.6 Fully-Connected Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.7 Gradient Descent Optimisation Algorithms . . . . . . . . . . . . . . . . . 43

3.7.1 Cost Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

3.7.2 Stochastic Gradient Descent . . . . . . . . . . . . . . . . . . . . . . 45

3.7.3 Momentum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.7.4 Adagrad . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.7.5 Adadelta And RMSprop . . . . . . . . . . . . . . . . . . . . . . . . 47

3.7.6 Adam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.7.7 Experimental Design of Gradient Descent Optimisation Algorithms 50

3.8 Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3.9 Performance Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . 51

3.9.1 Mean Squared Error And Model Accuracy . . . . . . . . . . . . . . 51

3.9.2 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.9.3 Coefficient of Determination . . . . . . . . . . . . . . . . . . . . . . 52

3.9.4 ROC Curve (Receiver Operating Characteristics) . . . . . . . . . . 52

3.10 Stratified K-fold Cross Validation . . . . . . . . . . . . . . . . . . . . . . . 55

3.11 Experiment and results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.11.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.11.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.11.2.1 Convolutional Layer . . . . . . . . . . . . . . . . . . . . . 60

3.11.2.2 Fully-Connected Layer . . . . . . . . . . . . . . . . . . . 63

3.11.2.3 Gradient Descent Optimisation Algorithms . . . . . . . . 64

3.11.2.4 De-Noising Effect . . . . . . . . . . . . . . . . . . . . . . 65

3.11.2.5 Performance Comparison of CNNs Against Other Shal-
low Learning Networks . . . . . . . . . . . . . . . . . . . 68

3.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

4 Conclusion and Future Work 72

4.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.2 Review of Chapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

4.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Contents

A CNN’s Python Code 75

B Matlab’s De-Noising Code 80

Bibliography 82

List of Figures

1.1 Wafer manufacturing process (Kang et al., 2015) . . . . . . . . . . . . . . 1

1.2 A finished wafer contains hundred squares representing for a chip (Geng,
2005) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Packaged microprocessors (Geng, 2005) . . . . . . . . . . . . . . . . . . . 2

1.4 Root cause determination (Imai et al., 2010) . . . . . . . . . . . . . . . . . 4

1.5 Scope of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Process of defect pattern analysis . . . . . . . . . . . . . . . . . . . . . . . 9

2.2 Example of Chao et al.’s generated defect patterns: Cluster pattern with
intensity of 90% and 50-100 defective chips . . . . . . . . . . . . . . . . . 11

2.3 Choi’s generated defect patterns . . . . . . . . . . . . . . . . . . . . . . . 12

2.4 Multivariate normal distribution . . . . . . . . . . . . . . . . . . . . . . . 13

2.5 Jeong et al. (2008)’s simulated data . . . . . . . . . . . . . . . . . . . . . 14

2.6 Example of using the spatial filter with size of 3×3 . . . . . . . . . . . . . 15

2.7 Median-filtering technique . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.8 (a) Example of wafer maps. (b) Radon transform output. (c) Radon-
based attributes Rϕ. (d) Radon-based attributes Rσ (Wu et al., 2015) . . 18

2.9 Rotation moment invariant . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.10 Summary of some algorithms in deep and shallow nets (Ranzato, 2014) . 21

2.11 Deep and shallow learning network . . . . . . . . . . . . . . . . . . . . . . 22

3.1 Circle and spot mixed patterns . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Circle and scratch mixed patterns . . . . . . . . . . . . . . . . . . . . . . . 30

3.3 Cluster and scratch mixed patterns . . . . . . . . . . . . . . . . . . . . . . 30

3.4 Cluster and circle mixed patterns . . . . . . . . . . . . . . . . . . . . . . . 30

3.5 A typical structure of CNNs (Lecun et al., 1998) . . . . . . . . . . . . . . 32

3.6 Local receptive field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.7 Output image when using different weights in local receptive fields . . . . 34

3.8 Convolution operation in CNNs . . . . . . . . . . . . . . . . . . . . . . . . 35

3.9 Local receptive field’s movement at stride = 2 . . . . . . . . . . . . . . . . 36

3.10 Feature map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.11 Weight sharing (Le, 2015) . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.12 Convolution with 3×3 Filter (Karpathy, 2015) . . . . . . . . . . . . . . . . 38

3.13 Sigmoid, tanh and RELU function . . . . . . . . . . . . . . . . . . . . . . 40

3.14 Max pooling and average pooling . . . . . . . . . . . . . . . . . . . . . . . 42

3.15 Stochastic gradient descent algorithm (Ian Goodfellow and Courville, 2016) 46

3.16 Stochastic gradient descent algorithm with momentum (Ian Goodfellow
and Courville, 2016) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

vii

List of Figures

3.17 Adagrad algorithm (Ian Goodfellow and Courville, 2016) . . . . . . . . . . 48

3.18 RMSprop and Adadelta algorithm (Ian Goodfellow and Courville, 2016) . 48

3.19 Adam algorithm (Ian Goodfellow and Courville, 2016) . . . . . . . . . . . 49

3.20 Example of dropout neural network . . . . . . . . . . . . . . . . . . . . . . 51

3.21 Receiver Operating Characteristics . . . . . . . . . . . . . . . . . . . . . . 54

3.22 Relationship between AUC and diagnostic accuracy . . . . . . . . . . . . 55

3.23 Performance comparison in terms of pooling layers . . . . . . . . . . . . . 63

3.24 Performance comparison in terms of stride . . . . . . . . . . . . . . . . . . 63

3.25 Performance comparison corresponding to the value of dropout . . . . . . 64

3.26 Performance comparison corresponding to gradient descent optimisation
algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.27 Effect of de-noising techniques on Circle and spot mixed patterns: (A)
Initial image input, (B) Effect of Median Filter, (C) Effect of Averaging
Filter, and (D) Effect of Adaptive filtering . . . . . . . . . . . . . . . . . . 66

3.28 Example of inefficiency of de-noising techniques on complex Circle and
spot mixed patterns: (A) Initial image input, (B) Effect of Median Filter,
(C) Effect of Averaging Filter, and (D) Effect of Adaptive filtering . . . . 67

viii

List of Tables

1.1 Examples of root causes of defect patterns on wafer . . . . . . . . . . . . . 5

2.1 Process summary of various methods. . . . . . . . . . . . . . . . . . . . . 23

3.1 Example of the ROC curve . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.2 Experimental design in terms of Convolutional layers . . . . . . . . . . . . 56

3.3 Experimental design in terms of fully-connected layers . . . . . . . . . . . 60

3.4 Sample architecture of 3C-1P-0.25D-1F-0.5D-1F-1111S-3332K . . . . . . 61

3.5 Experimental results for various convolutional layer’s architectures . . . . 62

3.6 Model performance corresponding to stride value . . . . . . . . . . . . . . 62

3.7 Experimental results for various architectures of fully-connected layer . . 64

3.8 Experimental results corresponding to gradient descent optimisation al-
gorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

3.9 Experimental results of de-noising techniques . . . . . . . . . . . . . . . . 67

3.10 The proposed CNN architecture of 2C-1MP-1F-0.25D-1F-111S-334K . . . 68

3.11 Models’ parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

3.12 Performance comparison between CNNs and other traditional machine
learning networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Abbreviations

ANN Artificial Neural Network

AUC Area Under Curve

CNN Convolutional Neural Network

CP Circuit Probe

DBN Deep Belief Network

GRNN General Regression Neural Network

HPP Homogeneous Poisson Process

IC Internal Circuit

kNN K Nearest Neighbour

LNPP Local and Nonlocal Preserving Project

LOR Log and Odds Ratio

MLP MultiLayer Perceptron

MRF Markov Random Field

MSE Mean Squared Error

MVN Multi Variate Bernoulli Process

NMF Non-negative Matrix Factorization

PCA Principal Component Analysis

PFT Polar Fourier Transform

PNN Probabilistic Neural Network

RBF Radical Basis Function

ReLU Rectified Linear Unit

RMI Rotation Moment Invariant

ROC Receiver Operating Characteristics

SGD Stochastic Gradient Descent

SHBP Spatially Homogeneous Normal

SRT Spatial Randomness Test

SVD Singular Value Decomposition

SVM Support Vector Machine

TBM Time to Build Model

WBM Wafer Bin Map

Chapter 1

Introduction

1.1 Introduction

A manufacturing process can be defined as a procedure of converting unprocessed mate-

rial into finished goods. In semiconductor manufacturing industry, semiconductor mate-

rials such as Silicon, Zinc oxide, do-pants, and insulators (Prasad et al., 2013) are used

to produce finished products including integrated circuits or integrated circuit packages.

Production of semiconductor devices involves a lengthy and complex process and takes

weeks to complete (Chou et al., 1997). In overall, the process can be divided into four

main following steps (Peleg, 2004; Uzsoy et al., 1994) (Figure 1.1) :

1. Fabrication: Identical IC chips or dies are fabricated in batches on a thin slice

of crystalline silicon which is called wafer. Each wafer may contain many chips

(Figure 1.2). The recent trend of wafer manufacturing is to create a smaller size

of chips (i.e. 65nm) in a larger size of the wafer (i.e. 450mm) (Sonderman, 2011).

Figure 1.1: Wafer manufacturing process (Kang et al., 2015)

Figure 1.2: A finished wafer con-
tains hundred squares representing for

a chip (Geng, 2005)

Figure 1.3: Packaged microproces-
sors (Geng, 2005)

2. Wafer test or Circuit probe (CP) test: After the first process, not all the IC

dies work well so each IC die is evaluated whether it works properly or not before

the wafer is cut by a diamond saw into individual chips. A defective IC die is

marked as a spot of ink and a wafer bin map (WBM) is created (Geng, 2005). A

wafer which contains more than a threshold of defective chips is then discarded.

3. Assembly: Only non-defective ICs proceed to the next step. They are then

inserted into a protective package to protect the chips from high humidity (Geng,

2005; Rabaey and Nikolic, 2002) and its fragile wire bonds (Geng, 2005). The type

of package is determined by how they will be used and what type of microprocessor

is (Figure 1.3).

4. Final test: Finally, the IC chips are tested again by a functional test to prevent

any possible faults in the packing process. The final test is to evaluate whether

ICs could perform well in hot and cold environments or not (Kang et al., 2015).

In semiconductor manufacturing industries, the three fundamental targets of manufac-

turing are to produce wafers that have the following characteristics (Gary S. May, 2006):

• Low cost. Yield and throughput are the key factors in terms of cost reduction.

Yield rate could be defined as a proportion of products on the wafer found to work

properly while throughput is the number of wafers through a machine per hour.

High yield rate and throughput result in lower cost.

• High quality. High quality must come from a sTable and reliable manufacturing

process. Products should be produced uniformly and efficiently in large quantities.

• High reliability. Manufacturing faults should be reduced to increase the degree

of reliability.

It is clear that the most important factor in semiconductor manufacturing is yield rate.

As mentioned above, the wafer is cut into many individual chips. Hence, the manu-

facturers’ top priority is to have the highest number of chips that can be made from a

single wafer (Kenneth A. Jackson, 2008). For that reason, almost all of semiconductor

manufacturing companies pay serious attention on defect pattern on wafer in order to

find the root cause of the defect and then improve the manufacturing process. In the

past, defect patterns were identified by manual inspection. However, this method is time

consuming and human experts have a limited capability to identify the defect patterns

among a huge amount of wafer maps with high accuracy. As a result, automatic detec-

tions were developed by many semiconductor manufacturing companies; as an attempt

to reduce cost and increase yield rate.

1.2 Motivation

There are many reasons why a defect might occur. Yuan et al. (2010) pointed out

that spatial defect patterns are separated into two main groups: global/random defects,

which are caused by random causes, and local/systematic defects, which are caused by

assignable causes.

Local defect pattern can be divided into single pattern and mixed pattern (Figure 1.5).

A single pattern could have many defect shapes, i.e. centre, doughnut, ring or spot. A

mixed pattern is a combination of two single patterns such as ring+spot or cluster+ring.

Local defects are normally generated by failing equipments, for example, chemical stains,

micron-scale particles from manufacturing equipments and human mistakes, etc. Local

defects create distinguishable patterns on the specific position of wafer surface and they

can be detected easily by using detection and classification processes on wafer maps

(Tan and Lau, 2011).

In contrast to local defects, global defects are generated by normal equipments (Imai

et al., 2010), including air quality in manufacturing room, variation in heating or depo-

sition (Yuan et al., 2010). They create randomly defect positions spreading in any area

of the wafer. In order to mitigate the global defects, it could take a long time to clean

room operation protocols or fixing equipments.

Figure 1.4: Root cause determination (Imai et al., 2010)

Figure 1.5: Scope of the thesis

In addition, the various defect patterns in the wafer map provide crucial information

that could help manufacturing companies determine the root causes of the fabrication

problems. To be more specific, Figure 1.4 shows defect occurrence model (Imai et al.,

2010) that illustrates how a local defect is created. The upper-right area and the lower-

left corner defect patterns are generated by the failure of Equipment 1 of Process X and

Equipment 2 of Process Z respectively. The other random defect patterns are created

by normal equipments. Based on this defect pattern, quality engineer could know which

equipments (i.e. equipment 1 and 2) and processes (i.e. process X and Z) are failing.

Because the local defect pattern can be visualised using wafer map and based on this

information, the root cause can be determined easily. Table 1.1 illustrates some examples

of root causes of defect pattern. Identifying the root causes could then increase the yield

rate and reduce cost per die (Chen and Liu, 2000).

Identification of defect patterns in semiconductor wafer had led to many machine learn-

ing researchers to find the most appropriate network recently. As mentioned above,

there are two main defect patterns, single and mixed pattern. The current approaches

involve many different machine learning methods and algorithms, however, almost all of

Defect pattern Assignable cause Defect pattern Assignable cause

Machine handling
problem (Chen
and Liu, 2000;
Wang et al.,
2006)

Scrape error (Liu
and Chien, 2013)

Thin film de-
position process
(Wang et al.,
2006)

Mask error (Liu
and Chien, 2013)

Etching process
problem (Wang
et al., 2006; Yuan
et al., 2011)

Probe-pin error
(Liu and Chien,
2013)

Stepper malfunc-
tions and sawing
imperfections
(Kim et al., 2016)

Probe-card error
(Liu and Chien,
2013)

Test-spec. error
(Liu and Chien,
2013)

Process error (Liu
and Chien, 2013)

Table 1.1: Examples of root causes of defect patterns on wafer

the current researches focus on building a classification model to detect the single de-

fect patterns and they achieved a very high accuracy rate. Recently in December 2015,

Adly et al. (2015a) proposed Simplified Subspaced Regression Network that outperforms

other current methods with a very high classification accuracy of 99.884%. However,

there is still a gap research on identification of defect patterns in semiconductor wafer

maps. An identification of mixed defect pattern was not researched thoroughly due to

its complex patterns. Based on the current situation, this thesis focus on finding the

most appropriate machine learning technique to mitigate this research gap.

1.3 Proposal

The traditional machine learning approaches causes some significant problems. First,

using all of the parameters in the image input often increases the computational costs

and time to build a model. For example, an image with the size of 40×40 so the first

hidden layer should have 40×40=1600 weights. Clearly a huge amount of parameters

could lead to overfitting and high computational costs. Second, the traditional machine

learning approaches could not be invariant to shifts in the image inputs. That means

when defect pattern gets shifted several pixels to a certain direction, the traditional

machine learning technique may give a different result.

The recent research on deep learning model has contributed substantially on the com-

puter vision area, especially convolutional neural networks. The idea of CNNs is that

the hidden neurons are only connected with a small area of the previous layer rather

than all of them. These neurons will extract the important features of the image. After

that an ordinary neural network process these features to classify the input into some

predefined categories. However, there is currently no research on learning deep struc-

tured network for identification of mixed patterns. Therefore, as an attempt to cover

the lack of mixed defect pattern research, the scope of this thesis focuses on building a

deep learning structured model to classify the mixed pattern of wafer map (Figure 1.5).

In summary, the main purpose of the thesis is threefold: (1) to conduct a thorough

literature review of wafer’s defect pattern recognition and point out the gap of research

in identification of defect patterns in semiconductor wafer maps, (2) to design a deep

learning structured network that can effectively identify the mixed defect pattern in

semiconductor wafers. The main deep net employed in the thesis is CNNs, and (3)

to prove the superiority of deep learning model over the traditional machine learning

approaches in terms of mixed defect pattern classification.

Many experiments were done to validate the CNNs and it was proved that the CNNs is

superior to other traditional machine learning network, in terms of classification accuracy

and coefficient of determination, achieving 79.626 % and 74.224% respectively.

1.4 Structure of The Thesis

• Chapter 2: Identification of Mixed defect Pattern Classification In

Semiconductor Wafer Maps Chapter 2 starts with the overview of each step in

the process of defect pattern classification on the wafer. It then briefly summarises

the current trend of what recent research focused on.

• Chapter 3: Learning Deep Structured Convolutional Neural Network

Chapter 3 provides a detailed discussion of CNNs. A thorough explanation of how

to find the best parameters in CNNs will be discussed. After that, it focuses on the

performance of CNNs and compares its performance with other shallow networks.

• Chapter 4: Conclusion and Future Work The final chapter concludes the

thesis with the summary of the results, followed by suggestions for further devel-

opment and recommendations.

Chapter 2

Identification of Mixed Defect

Pattern Classification in

Semiconductor Wafer Maps

Recent years have been witnessed a rapid emergence of a number of research on wafer

defect pattern classification. Chapter 2 introduces the overall strategy that researchers

normally used, followed by a brief summary of which machine learning techniques they

researched on, and what they have done in the last few years.

The process of defect pattern classification could be summarised in Figure 2.1 regarding

to the research of Lee and Kim (2015) and Yum et al. (2012).

• Step 1: Data collection/generation The wafer dataset is generated from two

main sources: real data, and self-generated data. Because of the expensive ac-

quisition cost of the real data from manufacturing companies, self-generated data,

based on the known wafer defect pattern, are widely used in the area of computer

science.

• Step 2: De-noising As mentioned in the first chapter, a normal wafer contains

some random defect patterns and local defect patterns. Global defect pattern

affects the accuracy of the classification and makes the computation more complex.

Therefore, before training the network, removing the random defect pattern from

local one is a necessary pre-processing step to increase the classification rate.

Figure 2.1: Process of defect pattern analysis

• Step 3: Feature generation Two main steps in step 3 are feature extraction

and feature/attribute selection. In order to support the further steps, some crit-

ical features are extracted. These feature vectors act as inputs of defect pattern

detection and defect pattern recognition.

• Step 4: Defect detection Automatic defect detection contains spatial random-

ness test which tests the dependence between data points. In this stage, the output

of defect detection is whether the defect pattern is normal or not. If the defect

pattern is abnormal or contains local defective chips, the wafer will process in step

• Step 5: Defect classification With the feature vectors in step 3, abnormal

defect pattern is classified into predefined categories, i.e. ring, spot, curvilinear

pattern, etc, using various machine learning techniques and algorithms.

• Step 6: Root cause analysis After knowing the defect pattern, other machine

learning techniques are used for root cause analysis to identify which processes or

machines failed.

In this section, we first describe defect pattern generation, which is the first step of the

classification process. After understanding how data is collected, we discuss more details

of the main methodologies using in each stage of defect pattern detection like spatial

randomness test, de-noising process, feature extraction, etc.

2.1 Defect Pattern

Many authors used different type of wafer maps

Published by admin

Leave a Reply Cancel reply