add first notebook and supporting files

pull/7/head
Kevin Markham 2015-04-08 00:49:52 -04:00
parent b600604bd6
commit b6513636d6
7 changed files with 317 additions and 0 deletions

2
.gitignore vendored 100644
View File

@ -0,0 +1,2 @@
.ipynb_checkpoints/
*.pyc

View File

@ -0,0 +1,248 @@
{
"metadata": {
"name": "",
"signature": "sha256:3a45466f81f7926609b8d5a7f9daaac6a202c78255a1369eb02391279866cba5"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# What is machine learning, and how does it work?\n",
"*From the video series: [Introduction to machine learning with scikit-learn](https://github.com/justmarkham/scikit-learn-videos)*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Machine learning](images/01_robot.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Agenda\n",
"\n",
"- What is machine learning?\n",
"- What are the two main categories of machine learning?\n",
"- What are some examples of machine learning?\n",
"- How does machine learning \"work\"?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is machine learning?\n",
"\n",
"One definition: \"Machine learning is the semi-automated extraction of knowledge from data\"\n",
"\n",
"- **Knowledge from data**: Starts with a question that might be answerable using data\n",
"- **Automated extraction**: A computer provides the insight\n",
"- **Semi-automated**: Requires many smart decisions by a human"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What are the two main categories of machine learning?\n",
"\n",
"**Supervised learning**: Making predictions using data\n",
" \n",
"- Example: Is a given email \"spam\" or \"ham\"?\n",
"- There is an outcome we are trying to predict"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Spam filter](images/01_spam_filter.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Unsupervised learning**: Extracting structure from data\n",
"\n",
"- Example: Segment grocery store shoppers into clusters that exhibit similar behaviors\n",
"- There is no \"right answer\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Clustering](images/01_clustering.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How does machine learning \"work\"?\n",
"\n",
"High-level steps of supervised learning:\n",
"\n",
"1. First, train a **machine learning model** using **labeled data**\n",
"\n",
" - \"Labeled data\" has been labeled with the outcome\n",
" - \"Machine learning model\" learns the relationship between the attributes of the data and its outcome\n",
"\n",
"2. Then, make **predictions** on **new data** for which the label is unknown"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Supervised learning](images/01_supervised_learning.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The primary goal of supervised learning is to build a model that \"generalizes\": It accurately predicts the **future** rather than the **past**!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Questions about machine learning\n",
"\n",
"- How do I choose **which attributes** of my data to include in the model?\n",
"- How do I choose **which model** to use?\n",
"- How do I **optimize** this model for best performance?\n",
"- How do I ensure that I'm building a model that will **generalize** to unseen data?\n",
"- Can I **estimate** how well my model is likely to perform on unseen data?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Resources\n",
"\n",
"- Book: [An Introduction to Statistical Learning](http://www-bcf.usc.edu/~gareth/ISL/) (section 2.1, 14 pages)\n",
"- Video: [Learning Paradigms](http://work.caltech.edu/library/014.html) (13 minutes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Comments or Questions?\n",
"\n",
"- Email: <kevin@dataschool.io>\n",
"- Website: http://dataschool.io\n",
"- Twitter: [@justmarkham](https://twitter.com/justmarkham)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.core.display import HTML\n",
"def css_styling():\n",
" styles = open(\"styles/custom.css\", \"r\").read()\n",
" return HTML(styles)\n",
"css_styling()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"html": [
"<style>\n",
" @font-face {\n",
" font-family: \"Computer Modern\";\n",
" src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
" }\n",
" div.cell{\n",
" width: 90%;\n",
"/* margin-left:auto;*/\n",
"/* margin-right:auto;*/\n",
" }\n",
" ul {\n",
" line-height: 145%;\n",
" font-size: 90%;\n",
" }\n",
" li {\n",
" margin-bottom: 1em;\n",
" }\n",
" h1 {\n",
" font-family: Helvetica, serif;\n",
" }\n",
" h4{\n",
" margin-top: 12px;\n",
" margin-bottom: 3px;\n",
" }\n",
" div.text_cell_render{\n",
" font-family: Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
" line-height: 145%;\n",
" font-size: 130%;\n",
" width: 90%;\n",
" margin-left:auto;\n",
" margin-right:auto;\n",
" }\n",
" .CodeMirror{\n",
" font-family: \"Source Code Pro\", source-code-pro,Consolas, monospace;\n",
" }\n",
"/* .prompt{\n",
" display: None;\n",
" }*/\n",
" .text_cell_render h5 {\n",
" font-weight: 300;\n",
" font-size: 16pt;\n",
" color: #4057A1;\n",
" font-style: italic;\n",
" margin-bottom: 0.5em;\n",
" margin-top: 0.5em;\n",
" display: block;\n",
" }\n",
"\n",
" .warning{\n",
" color: rgb( 240, 20, 20 )\n",
" }\n",
"</style>\n",
"<script>\n",
" MathJax.Hub.Config({\n",
" TeX: {\n",
" extensions: [\"AMSmath.js\"]\n",
" },\n",
" tex2jax: {\n",
" inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
" displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
" },\n",
" displayAlign: 'center', // Change this to 'center' to center equations.\n",
" \"HTML-CSS\": {\n",
" styles: {'.MathJax_Display': {\"margin\": 4}}\n",
" }\n",
" });\n",
"</script>"
],
"metadata": {},
"output_type": "pyout",
"prompt_number": 1,
"text": [
"<IPython.core.display.HTML at 0x3edad30>"
]
}
],
"prompt_number": 1
}
],
"metadata": {}
}
]
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 40 KiB

BIN
images/01_robot.png 100644

Binary file not shown.

After

Width:  |  Height:  |  Size: 66 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 58 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

67
styles/custom.css 100644
View File

@ -0,0 +1,67 @@
<style>
@font-face {
font-family: "Computer Modern";
src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');
}
div.cell{
width: 90%;
/* margin-left:auto;*/
/* margin-right:auto;*/
}
ul {
line-height: 145%;
font-size: 90%;
}
li {
margin-bottom: 1em;
}
h1 {
font-family: Helvetica, serif;
}
h4{
margin-top: 12px;
margin-bottom: 3px;
}
div.text_cell_render{
font-family: Computer Modern, "Helvetica Neue", Arial, Helvetica, Geneva, sans-serif;
line-height: 145%;
font-size: 130%;
width: 90%;
margin-left:auto;
margin-right:auto;
}
.CodeMirror{
font-family: "Source Code Pro", source-code-pro,Consolas, monospace;
}
/* .prompt{
display: None;
}*/
.text_cell_render h5 {
font-weight: 300;
font-size: 16pt;
color: #4057A1;
font-style: italic;
margin-bottom: 0.5em;
margin-top: 0.5em;
display: block;
}
.warning{
color: rgb( 240, 20, 20 )
}
</style>
<script>
MathJax.Hub.Config({
TeX: {
extensions: ["AMSmath.js"]
},
tex2jax: {
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"] ]
},
displayAlign: 'center', // Change this to 'center' to center equations.
"HTML-CSS": {
styles: {'.MathJax_Display': {"margin": 4}}
}
});
</script>