add first notebook and supporting files
parent
b600604bd6
commit
b6513636d6
|
@ -0,0 +1,2 @@
|
|||
.ipynb_checkpoints/
|
||||
*.pyc
|
|
@ -0,0 +1,248 @@
|
|||
{
|
||||
"metadata": {
|
||||
"name": "",
|
||||
"signature": "sha256:3a45466f81f7926609b8d5a7f9daaac6a202c78255a1369eb02391279866cba5"
|
||||
},
|
||||
"nbformat": 3,
|
||||
"nbformat_minor": 0,
|
||||
"worksheets": [
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# What is machine learning, and how does it work?\n",
|
||||
"*From the video series: [Introduction to machine learning with scikit-learn](https://github.com/justmarkham/scikit-learn-videos)*"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"![Machine learning](images/01_robot.png)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Agenda\n",
|
||||
"\n",
|
||||
"- What is machine learning?\n",
|
||||
"- What are the two main categories of machine learning?\n",
|
||||
"- What are some examples of machine learning?\n",
|
||||
"- How does machine learning \"work\"?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## What is machine learning?\n",
|
||||
"\n",
|
||||
"One definition: \"Machine learning is the semi-automated extraction of knowledge from data\"\n",
|
||||
"\n",
|
||||
"- **Knowledge from data**: Starts with a question that might be answerable using data\n",
|
||||
"- **Automated extraction**: A computer provides the insight\n",
|
||||
"- **Semi-automated**: Requires many smart decisions by a human"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## What are the two main categories of machine learning?\n",
|
||||
"\n",
|
||||
"**Supervised learning**: Making predictions using data\n",
|
||||
" \n",
|
||||
"- Example: Is a given email \"spam\" or \"ham\"?\n",
|
||||
"- There is an outcome we are trying to predict"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"![Spam filter](images/01_spam_filter.png)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"**Unsupervised learning**: Extracting structure from data\n",
|
||||
"\n",
|
||||
"- Example: Segment grocery store shoppers into clusters that exhibit similar behaviors\n",
|
||||
"- There is no \"right answer\""
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"![Clustering](images/01_clustering.png)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## How does machine learning \"work\"?\n",
|
||||
"\n",
|
||||
"High-level steps of supervised learning:\n",
|
||||
"\n",
|
||||
"1. First, train a **machine learning model** using **labeled data**\n",
|
||||
"\n",
|
||||
" - \"Labeled data\" has been labeled with the outcome\n",
|
||||
" - \"Machine learning model\" learns the relationship between the attributes of the data and its outcome\n",
|
||||
"\n",
|
||||
"2. Then, make **predictions** on **new data** for which the label is unknown"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"![Supervised learning](images/01_supervised_learning.png)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The primary goal of supervised learning is to build a model that \"generalizes\": It accurately predicts the **future** rather than the **past**!"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Questions about machine learning\n",
|
||||
"\n",
|
||||
"- How do I choose **which attributes** of my data to include in the model?\n",
|
||||
"- How do I choose **which model** to use?\n",
|
||||
"- How do I **optimize** this model for best performance?\n",
|
||||
"- How do I ensure that I'm building a model that will **generalize** to unseen data?\n",
|
||||
"- Can I **estimate** how well my model is likely to perform on unseen data?"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Resources\n",
|
||||
"\n",
|
||||
"- Book: [An Introduction to Statistical Learning](http://www-bcf.usc.edu/~gareth/ISL/) (section 2.1, 14 pages)\n",
|
||||
"- Video: [Learning Paradigms](http://work.caltech.edu/library/014.html) (13 minutes)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Comments or Questions?\n",
|
||||
"\n",
|
||||
"- Email: <kevin@dataschool.io>\n",
|
||||
"- Website: http://dataschool.io\n",
|
||||
"- Twitter: [@justmarkham](https://twitter.com/justmarkham)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"collapsed": false,
|
||||
"input": [
|
||||
"from IPython.core.display import HTML\n",
|
||||
"def css_styling():\n",
|
||||
" styles = open(\"styles/custom.css\", \"r\").read()\n",
|
||||
" return HTML(styles)\n",
|
||||
"css_styling()"
|
||||
],
|
||||
"language": "python",
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"html": [
|
||||
"<style>\n",
|
||||
" @font-face {\n",
|
||||
" font-family: \"Computer Modern\";\n",
|
||||
" src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
|
||||
" }\n",
|
||||
" div.cell{\n",
|
||||
" width: 90%;\n",
|
||||
"/* margin-left:auto;*/\n",
|
||||
"/* margin-right:auto;*/\n",
|
||||
" }\n",
|
||||
" ul {\n",
|
||||
" line-height: 145%;\n",
|
||||
" font-size: 90%;\n",
|
||||
" }\n",
|
||||
" li {\n",
|
||||
" margin-bottom: 1em;\n",
|
||||
" }\n",
|
||||
" h1 {\n",
|
||||
" font-family: Helvetica, serif;\n",
|
||||
" }\n",
|
||||
" h4{\n",
|
||||
" margin-top: 12px;\n",
|
||||
" margin-bottom: 3px;\n",
|
||||
" }\n",
|
||||
" div.text_cell_render{\n",
|
||||
" font-family: Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
|
||||
" line-height: 145%;\n",
|
||||
" font-size: 130%;\n",
|
||||
" width: 90%;\n",
|
||||
" margin-left:auto;\n",
|
||||
" margin-right:auto;\n",
|
||||
" }\n",
|
||||
" .CodeMirror{\n",
|
||||
" font-family: \"Source Code Pro\", source-code-pro,Consolas, monospace;\n",
|
||||
" }\n",
|
||||
"/* .prompt{\n",
|
||||
" display: None;\n",
|
||||
" }*/\n",
|
||||
" .text_cell_render h5 {\n",
|
||||
" font-weight: 300;\n",
|
||||
" font-size: 16pt;\n",
|
||||
" color: #4057A1;\n",
|
||||
" font-style: italic;\n",
|
||||
" margin-bottom: 0.5em;\n",
|
||||
" margin-top: 0.5em;\n",
|
||||
" display: block;\n",
|
||||
" }\n",
|
||||
"\n",
|
||||
" .warning{\n",
|
||||
" color: rgb( 240, 20, 20 )\n",
|
||||
" }\n",
|
||||
"</style>\n",
|
||||
"<script>\n",
|
||||
" MathJax.Hub.Config({\n",
|
||||
" TeX: {\n",
|
||||
" extensions: [\"AMSmath.js\"]\n",
|
||||
" },\n",
|
||||
" tex2jax: {\n",
|
||||
" inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
|
||||
" displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
|
||||
" },\n",
|
||||
" displayAlign: 'center', // Change this to 'center' to center equations.\n",
|
||||
" \"HTML-CSS\": {\n",
|
||||
" styles: {'.MathJax_Display': {\"margin\": 4}}\n",
|
||||
" }\n",
|
||||
" });\n",
|
||||
"</script>"
|
||||
],
|
||||
"metadata": {},
|
||||
"output_type": "pyout",
|
||||
"prompt_number": 1,
|
||||
"text": [
|
||||
"<IPython.core.display.HTML at 0x3edad30>"
|
||||
]
|
||||
}
|
||||
],
|
||||
"prompt_number": 1
|
||||
}
|
||||
],
|
||||
"metadata": {}
|
||||
}
|
||||
]
|
||||
}
|
Binary file not shown.
After Width: | Height: | Size: 40 KiB |
Binary file not shown.
After Width: | Height: | Size: 66 KiB |
Binary file not shown.
After Width: | Height: | Size: 58 KiB |
Binary file not shown.
After Width: | Height: | Size: 37 KiB |
|
@ -0,0 +1,67 @@
|
|||
<style>
|
||||
@font-face {
|
||||
font-family: "Computer Modern";
|
||||
src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');
|
||||
}
|
||||
div.cell{
|
||||
width: 90%;
|
||||
/* margin-left:auto;*/
|
||||
/* margin-right:auto;*/
|
||||
}
|
||||
ul {
|
||||
line-height: 145%;
|
||||
font-size: 90%;
|
||||
}
|
||||
li {
|
||||
margin-bottom: 1em;
|
||||
}
|
||||
h1 {
|
||||
font-family: Helvetica, serif;
|
||||
}
|
||||
h4{
|
||||
margin-top: 12px;
|
||||
margin-bottom: 3px;
|
||||
}
|
||||
div.text_cell_render{
|
||||
font-family: Computer Modern, "Helvetica Neue", Arial, Helvetica, Geneva, sans-serif;
|
||||
line-height: 145%;
|
||||
font-size: 130%;
|
||||
width: 90%;
|
||||
margin-left:auto;
|
||||
margin-right:auto;
|
||||
}
|
||||
.CodeMirror{
|
||||
font-family: "Source Code Pro", source-code-pro,Consolas, monospace;
|
||||
}
|
||||
/* .prompt{
|
||||
display: None;
|
||||
}*/
|
||||
.text_cell_render h5 {
|
||||
font-weight: 300;
|
||||
font-size: 16pt;
|
||||
color: #4057A1;
|
||||
font-style: italic;
|
||||
margin-bottom: 0.5em;
|
||||
margin-top: 0.5em;
|
||||
display: block;
|
||||
}
|
||||
|
||||
.warning{
|
||||
color: rgb( 240, 20, 20 )
|
||||
}
|
||||
</style>
|
||||
<script>
|
||||
MathJax.Hub.Config({
|
||||
TeX: {
|
||||
extensions: ["AMSmath.js"]
|
||||
},
|
||||
tex2jax: {
|
||||
inlineMath: [ ['$','$'], ["\\(","\\)"] ],
|
||||
displayMath: [ ['$$','$$'], ["\\[","\\]"] ]
|
||||
},
|
||||
displayAlign: 'center', // Change this to 'center' to center equations.
|
||||
"HTML-CSS": {
|
||||
styles: {'.MathJax_Display': {"margin": 4}}
|
||||
}
|
||||
});
|
||||
</script>
|
Loading…
Reference in New Issue