程序代写代做代考 Excel python SQL database Java matlab data mining javascript hbase hadoop c++ algorithm finance Bayesian c# decision tree Hive data science Introduction to information system

Introduction to information system

Introduction to Data Science

Bowei Chen

School of Computer Science

University of Lincoln

CMP3036M/CMP9063M Data Science

Hello, I’m a

Data Scientist!

My research interest lies mostly in developing

intelligent algorithms and data solutions to the

following fields:

• Computational advertising:

programmatic guarantee

• Internet economics and digital products:

inventory pricing, information systems

• Mathematical finance:

derivatives pricing, algorithmic trading

http://staff.lincoln.ac.uk/bchen

http://staff.lincoln.ac.uk/bchen

Module Motivation

“Torture the data, and it will confess to anything.”

Ronald Coase, Nobel Prize Laureate in Economics

“I keep saying that the sexy job in the next 10 years will be statisticians.”

Hal Varian, Google Chief Economist

“Data is a precious thing and will last longer than the systems themselves.”

Tim Berners-Lee, Inventor of the World Wide Web

Top 10 Hot Job Titles That

Barely Existed 5 Years Ago | 2014

The Alan Turing Institute

It is the UK’s national centre for data science, headquartered at the British Library.

Following a public competition with international peer review, the Institute was

founded in 2015 as a joint venture by the universities of Cambridge, Edinburgh,

Oxford, University College London, Warwick and the UK EPSRC. https://turing.ac.uk

https://turing.ac.uk/

Module Information

Title Data Science

Code CMP3036M/CMP9063M

Semester 2016-2017 Semesters A & B

Coordinator Bowei Chen

Instructors Bowei Chen (Semester A)

TBC (Semester B)

Demonstrators Deema Abdal Hafeth

Liyun Gong

JingMin Huang

Assessment CMP3036M: Assignment (50%) + Assignment (50%)

CMP9063M: Assignment (40%) + Assignment (40%) + Report (20%)

Topics in Semester A

Week A01 Introduction (Lecture)

Weeks A02-06 Theory: Fundamentals of Probability and Statistics (Lecture)

• Probability Concept

• Popular Distributions

• Point and Interval Estimation

• Sampling and Hypothesis Testing

Practice: R (Workshop)

Week A07 Direct Study

Weeks A08-13 Theory: Supervised Learning (Lecture)

• Data Preparation and Model Evaluation

• Linear and Logistic Regressions

• Naïve Bayes and Decision Tree

Practice: R and Microsoft Azure (Workshop)

Timetable in Semester A

Lecture

Thursday 15:00 – 16:00 @ MB0312

Workshop

Group A:

Thursday 9:00 – 11:00 @ MC3203

Group C:

Thursday 16:00 – 18:00 @ MC3204

Group B:

Friday 15:00 – 17:00 @ MC3204

Contact Information

Name Role Contact

Bowei Chen* Module Coordinator bchen@lincoln.ac.uk

Deema Abdal Hafeth Demonstrator/TA dabdalhafeth@lincoln.ac.uk

Liyun Gong Demonstrator/TA lgong@lincoln.ac.uk

Jingmin Huang Demonstrator/TA jhua8590@gmail.com

* Office Hours: Monday 14:00 – 16:00 @ MC3220B, MHT

mailto:bchen@lincoln.ac.uk
https://github.com/boweichen/CMP3036MDataScience/blob/master/dabdalhafeth@lincoln.ac.uk
https://github.com/boweichen/CMP3036MDataScience/blob/master/lgong@lincoln.ac.uk
mailto:jhua8590@gmail.com

Module Github page

https://github.com/boweichen/CMP3036MDataScience

• Detailed Course Topics (by Week for Semester A)

• Reading List

Note: please check your slides, example codes, and

assessment documents on Blackboard. The reading

list on the Github page is a general guide for your

direct study. I suggest you to select the materials

according to your background and interest.

https://github.com/boweichen/CMP3036MDataScience

What is Data Science?
There is much debate about what

data science is, and what it isn’t.

Data Science

It is an interdisciplinary field about processes and systems to extract knowledge or

insights from data in various forms. It includes:

• Dealing with data storage and retrieval

• Summarising and analysing data

• Parallel data processing

• Pattern recognition and statistical testing

• Building predictive models

• Data visualisation

• Management information system (MIS) reporting

Statistics

Statistics is the study of the collection, analysis, interpretation, presentation, and

organisation of data.

Some people think statistics is a branch of mathematics while this point of view is not

agreed by all mathematicians and statisticians
https://www.quora.com/What-do-pure-mathematicians-and-statisticians-think-of-each-other/answer/Michael-Hochster

https://www.quora.com/What-do-pure-mathematicians-and-statisticians-think-of-each-other/answer/Michael-Hochster

Machine Learning

A computer program is said to learn from experience E with respect to some class of

tasks T and performance measure P, if its performance at tasks in T, as measured by

P, improves with experience E.

Tom Mitchell

Example (a handwriting recognition learning problem)

Task T: recognizing and classifying handwritten words within images

Performance measure P: percent of words correctly classified

Training experience E: a database of handwritten words with given classifications

Five Tribes of Machine Learning

Tribe Origins Master algorithms Representative scientist

Symbolists Logic,

philosophy

Inverse deduction Tom Mitchell, Steve Muggleton,

Ross Quinlan

Connectionists Neuroscience Backpropagation Geoff Hinton, Yann LeCun,

Yoshua Bengio

Revolutionaries Evolutionary

biology

Genetic

programming

John Koza, John Holland, Hod

Lipson

Bayesians Statistics Probabilistic

inference

David Heckerman, Judea Pearl,

Michael Jordan

Analogisers Psychology Kernel machines Peter Hart, Vladimir Vapnik,

Douglas Hofstadter

Pedro Domingos. The Five Tribes of Machine Learning and What You Can Take from Each. University of Washington.

Posted in Uncategorized

Leave a Reply

Your email address will not be published. Required fields are marked *