add first notebook and supporting files

2015-04-08 00:49:52 -04:00 · 2015-04-08 00:49:52 -04:00 · b6513636d6
parent b600604bd6
commit b6513636d6
7 changed files with 317 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,2 @@
+.ipynb_checkpoints/
+*.pyc
--- a/01_machine_learning_intro.ipynb
+++ b/01_machine_learning_intro.ipynb
@ -0,0 +1,248 @@
+{
+ "metadata": {
+  "name": "",
+  "signature": "sha256:3a45466f81f7926609b8d5a7f9daaac6a202c78255a1369eb02391279866cba5"
+ },
+ "nbformat": 3,
+ "nbformat_minor": 0,
+ "worksheets": [
+  {
+   "cells": [
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "# What is machine learning, and how does it work?\n",
+      "*From the video series: [Introduction to machine learning with scikit-learn](https://github.com/justmarkham/scikit-learn-videos)*"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "![Machine learning](images/01_robot.png)"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "## Agenda\n",
+      "\n",
+      "- What is machine learning?\n",
+      "- What are the two main categories of machine learning?\n",
+      "- What are some examples of machine learning?\n",
+      "- How does machine learning \"work\"?"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "## What is machine learning?\n",
+      "\n",
+      "One definition: \"Machine learning is the semi-automated extraction of knowledge from data\"\n",
+      "\n",
+      "- **Knowledge from data**: Starts with a question that might be answerable using data\n",
+      "- **Automated extraction**: A computer provides the insight\n",
+      "- **Semi-automated**: Requires many smart decisions by a human"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "## What are the two main categories of machine learning?\n",
+      "\n",
+      "**Supervised learning**: Making predictions using data\n",
+      "    \n",
+      "- Example: Is a given email \"spam\" or \"ham\"?\n",
+      "- There is an outcome we are trying to predict"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "![Spam filter](images/01_spam_filter.png)"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "**Unsupervised learning**: Extracting structure from data\n",
+      "\n",
+      "- Example: Segment grocery store shoppers into clusters that exhibit similar behaviors\n",
+      "- There is no \"right answer\""
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "![Clustering](images/01_clustering.png)"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "## How does machine learning \"work\"?\n",
+      "\n",
+      "High-level steps of supervised learning:\n",
+      "\n",
+      "1. First, train a **machine learning model** using **labeled data**\n",
+      "\n",
+      "    - \"Labeled data\" has been labeled with the outcome\n",
+      "    - \"Machine learning model\" learns the relationship between the attributes of the data and its outcome\n",
+      "\n",
+      "2. Then, make **predictions** on **new data** for which the label is unknown"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "![Supervised learning](images/01_supervised_learning.png)"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "The primary goal of supervised learning is to build a model that \"generalizes\": It accurately predicts the **future** rather than the **past**!"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "## Questions about machine learning\n",
+      "\n",
+      "- How do I choose **which attributes** of my data to include in the model?\n",
+      "- How do I choose **which model** to use?\n",
+      "- How do I **optimize** this model for best performance?\n",
+      "- How do I ensure that I'm building a model that will **generalize** to unseen data?\n",
+      "- Can I **estimate** how well my model is likely to perform on unseen data?"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "## Resources\n",
+      "\n",
+      "- Book: [An Introduction to Statistical Learning](http://www-bcf.usc.edu/~gareth/ISL/) (section 2.1, 14 pages)\n",
+      "- Video: [Learning Paradigms](http://work.caltech.edu/library/014.html) (13 minutes)"
+     ]
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "## Comments or Questions?\n",
+      "\n",
+      "- Email: <kevin@dataschool.io>\n",
+      "- Website: http://dataschool.io\n",
+      "- Twitter: [@justmarkham](https://twitter.com/justmarkham)"
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "from IPython.core.display import HTML\n",
+      "def css_styling():\n",
+      "    styles = open(\"styles/custom.css\", \"r\").read()\n",
+      "    return HTML(styles)\n",
+      "css_styling()"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": [
+      {
+       "html": [
+        "<style>\n",
+        "    @font-face {\n",
+        "        font-family: \"Computer Modern\";\n",
+        "        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
+        "    }\n",
+        "    div.cell{\n",
+        "        width: 90%;\n",
+        "/*        margin-left:auto;*/\n",
+        "/*        margin-right:auto;*/\n",
+        "    }\n",
+        "    ul {\n",
+        "        line-height: 145%;\n",
+        "        font-size: 90%;\n",
+        "    }\n",
+        "    li {\n",
+        "        margin-bottom: 1em;\n",
+        "    }\n",
+        "    h1 {\n",
+        "        font-family: Helvetica, serif;\n",
+        "    }\n",
+        "    h4{\n",
+        "        margin-top: 12px;\n",
+        "        margin-bottom: 3px;\n",
+        "       }\n",
+        "    div.text_cell_render{\n",
+        "        font-family: Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
+        "        line-height: 145%;\n",
+        "        font-size: 130%;\n",
+        "        width: 90%;\n",
+        "        margin-left:auto;\n",
+        "        margin-right:auto;\n",
+        "    }\n",
+        "    .CodeMirror{\n",
+        "            font-family: \"Source Code Pro\", source-code-pro,Consolas, monospace;\n",
+        "    }\n",
+        "/*    .prompt{\n",
+        "        display: None;\n",
+        "    }*/\n",
+        "    .text_cell_render h5 {\n",
+        "        font-weight: 300;\n",
+        "        font-size: 16pt;\n",
+        "        color: #4057A1;\n",
+        "        font-style: italic;\n",
+        "        margin-bottom: 0.5em;\n",
+        "        margin-top: 0.5em;\n",
+        "        display: block;\n",
+        "    }\n",
+        "\n",
+        "    .warning{\n",
+        "        color: rgb( 240, 20, 20 )\n",
+        "        }\n",
+        "</style>\n",
+        "<script>\n",
+        "    MathJax.Hub.Config({\n",
+        "                        TeX: {\n",
+        "                           extensions: [\"AMSmath.js\"]\n",
+        "                           },\n",
+        "                tex2jax: {\n",
+        "                    inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
+        "                    displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
+        "                },\n",
+        "                displayAlign: 'center', // Change this to 'center' to center equations.\n",
+        "                \"HTML-CSS\": {\n",
+        "                    styles: {'.MathJax_Display': {\"margin\": 4}}\n",
+        "                }\n",
+        "        });\n",
+        "</script>"
+       ],
+       "metadata": {},
+       "output_type": "pyout",
+       "prompt_number": 1,
+       "text": [
+        "<IPython.core.display.HTML at 0x3edad30>"
+       ]
+      }
+     ],
+     "prompt_number": 1
+    }
+   ],
+   "metadata": {}
+  }
+ ]
+}
--- a/images/01_clustering.png
+++ b/images/01_clustering.png
--- a/images/01_robot.png
+++ b/images/01_robot.png
--- a/images/01_spam_filter.png
+++ b/images/01_spam_filter.png
--- a/images/01_supervised_learning.png
+++ b/images/01_supervised_learning.png
--- a/styles/custom.css
+++ b/styles/custom.css
@ -0,0 +1,67 @@
+<style>
+    @font-face {
+        font-family: "Computer Modern";
+        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');
+    }
+    div.cell{
+        width: 90%;
+/*        margin-left:auto;*/
+/*        margin-right:auto;*/
+    }
+    ul {
+        line-height: 145%;
+        font-size: 90%;
+    }
+    li {
+        margin-bottom: 1em;
+    }
+    h1 {
+        font-family: Helvetica, serif;
+    }
+    h4{
+        margin-top: 12px;
+        margin-bottom: 3px;
+       }
+    div.text_cell_render{
+        font-family: Computer Modern, "Helvetica Neue", Arial, Helvetica, Geneva, sans-serif;
+        line-height: 145%;
+        font-size: 130%;
+        width: 90%;
+        margin-left:auto;
+        margin-right:auto;
+    }
+    .CodeMirror{
+            font-family: "Source Code Pro", source-code-pro,Consolas, monospace;
+    }
+/*    .prompt{
+        display: None;
+    }*/
+    .text_cell_render h5 {
+        font-weight: 300;
+        font-size: 16pt;
+        color: #4057A1;
+        font-style: italic;
+        margin-bottom: 0.5em;
+        margin-top: 0.5em;
+        display: block;
+    }
+
+    .warning{
+        color: rgb( 240, 20, 20 )
+        }
+</style>
+<script>
+    MathJax.Hub.Config({
+                        TeX: {
+                           extensions: ["AMSmath.js"]
+                           },
+                tex2jax: {
+                    inlineMath: [ ['$','$'], ["\\(","\\)"] ],
+                    displayMath: [ ['$$','$$'], ["\\[","\\]"] ]
+                },
+                displayAlign: 'center', // Change this to 'center' to center equations.
+                "HTML-CSS": {
+                    styles: {'.MathJax_Display': {"margin": 4}}
+                }
+        });
+</script>