程序代写代做代考 Excel python SQL database Java matlab data mining javascript hbase hadoop c++ algorithm finance Bayesian c# decision tree Hive data science Introduction to information system
Introduction to information system
Introduction to Data Science
Bowei Chen
School of Computer Science
University of Lincoln
CMP3036M/CMP9063M Data Science
Hello, I’m a
Data Scientist!
My research interest lies mostly in developing
intelligent algorithms and data solutions to the
following fields:
• Computational advertising:
programmatic guarantee
• Internet economics and digital products:
inventory pricing, information systems
• Mathematical finance:
derivatives pricing, algorithmic trading
http://staff.lincoln.ac.uk/bchen
http://staff.lincoln.ac.uk/bchen
Module Motivation
“Torture the data, and it will confess to anything.”
Ronald Coase, Nobel Prize Laureate in Economics
“I keep saying that the sexy job in the next 10 years will be statisticians.”
Hal Varian, Google Chief Economist
“Data is a precious thing and will last longer than the systems themselves.”
Tim Berners-Lee, Inventor of the World Wide Web
Top 10 Hot Job Titles That
Barely Existed 5 Years Ago | 2014
The Alan Turing Institute
It is the UK’s national centre for data science, headquartered at the British Library.
Following a public competition with international peer review, the Institute was
founded in 2015 as a joint venture by the universities of Cambridge, Edinburgh,
Oxford, University College London, Warwick and the UK EPSRC. https://turing.ac.uk
https://turing.ac.uk/
Module Information
Title Data Science
Code CMP3036M/CMP9063M
Semester 2016-2017 Semesters A & B
Coordinator Bowei Chen
Instructors Bowei Chen (Semester A)
TBC (Semester B)
Demonstrators Deema Abdal Hafeth
Liyun Gong
JingMin Huang
Assessment CMP3036M: Assignment (50%) + Assignment (50%)
CMP9063M: Assignment (40%) + Assignment (40%) + Report (20%)
Topics in Semester A
Week A01 Introduction (Lecture)
Weeks A02-06 Theory: Fundamentals of Probability and Statistics (Lecture)
• Probability Concept
• Popular Distributions
• Point and Interval Estimation
• Sampling and Hypothesis Testing
Practice: R (Workshop)
Week A07 Direct Study
Weeks A08-13 Theory: Supervised Learning (Lecture)
• Data Preparation and Model Evaluation
• Linear and Logistic Regressions
• Naïve Bayes and Decision Tree
Practice: R and Microsoft Azure (Workshop)
Timetable in Semester A
Lecture
Thursday 15:00 – 16:00 @ MB0312
Workshop
Group A:
Thursday 9:00 – 11:00 @ MC3203
Group C:
Thursday 16:00 – 18:00 @ MC3204
Group B:
Friday 15:00 – 17:00 @ MC3204
Contact Information
Name Role Contact
Bowei Chen* Module Coordinator bchen@lincoln.ac.uk
Deema Abdal Hafeth Demonstrator/TA dabdalhafeth@lincoln.ac.uk
Liyun Gong Demonstrator/TA lgong@lincoln.ac.uk
Jingmin Huang Demonstrator/TA jhua8590@gmail.com
* Office Hours: Monday 14:00 – 16:00 @ MC3220B, MHT
mailto:bchen@lincoln.ac.uk
https://github.com/boweichen/CMP3036MDataScience/blob/master/dabdalhafeth@lincoln.ac.uk
https://github.com/boweichen/CMP3036MDataScience/blob/master/lgong@lincoln.ac.uk
mailto:jhua8590@gmail.com
Module Github page
https://github.com/boweichen/CMP3036MDataScience
• Detailed Course Topics (by Week for Semester A)
• Reading List
Note: please check your slides, example codes, and
assessment documents on Blackboard. The reading
list on the Github page is a general guide for your
direct study. I suggest you to select the materials
according to your background and interest.
https://github.com/boweichen/CMP3036MDataScience
What is Data Science?
There is much debate about what
data science is, and what it isn’t.
Data Science
It is an interdisciplinary field about processes and systems to extract knowledge or
insights from data in various forms. It includes:
• Dealing with data storage and retrieval
• Summarising and analysing data
• Parallel data processing
• Pattern recognition and statistical testing
• Building predictive models
• Data visualisation
• Management information system (MIS) reporting
Statistics
Statistics is the study of the collection, analysis, interpretation, presentation, and
organisation of data.
Some people think statistics is a branch of mathematics while this point of view is not
agreed by all mathematicians and statisticians
https://www.quora.com/What-do-pure-mathematicians-and-statisticians-think-of-each-other/answer/Michael-Hochster
https://www.quora.com/What-do-pure-mathematicians-and-statisticians-think-of-each-other/answer/Michael-Hochster
Machine Learning
A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by
P, improves with experience E.
Tom Mitchell
Example (a handwriting recognition learning problem)
Task T: recognizing and classifying handwritten words within images
Performance measure P: percent of words correctly classified
Training experience E: a database of handwritten words with given classifications
Five Tribes of Machine Learning
Tribe Origins Master algorithms Representative scientist
Symbolists Logic,
philosophy
Inverse deduction Tom Mitchell, Steve Muggleton,
Ross Quinlan
Connectionists Neuroscience Backpropagation Geoff Hinton, Yann LeCun,
Yoshua Bengio
Revolutionaries Evolutionary
biology
Genetic
programming
John Koza, John Holland, Hod
Lipson
Bayesians Statistics Probabilistic
inference
David Heckerman, Judea Pearl,
Michael Jordan
Analogisers Psychology Kernel machines Peter Hart, Vladimir Vapnik,
Douglas Hofstadter
Pedro Domingos. The Five Tribes of Machine Learning and What You Can Take from Each. University of Washington.