add first notebook and supporting files

2015-04-08 00:49:52 -04:00 · 2015-04-08 00:49:52 -04:00 · b6513636d6
parent b600604bd6
commit b6513636d6
7 changed files with 317 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,2 @@
 .ipynb_checkpoints/
 *.pyc
--- a/01_machine_learning_intro.ipynb
+++ b/01_machine_learning_intro.ipynb
@ -0,0 +1,248 @@
 {
 "metadata": {
  "name": "",
  "signature": "sha256:3a45466f81f7926609b8d5a7f9daaac6a202c78255a1369eb02391279866cba5"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "# What is machine learning, and how does it work?\n",
      "*From the video series: [Introduction to machine learning with scikit-learn](https://github.com/justmarkham/scikit-learn-videos)*"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![Machine learning](images/01_robot.png)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Agenda\n",
      "\n",
      "- What is machine learning?\n",
      "- What are the two main categories of machine learning?\n",
      "- What are some examples of machine learning?\n",
      "- How does machine learning \"work\"?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## What is machine learning?\n",
      "\n",
      "One definition: \"Machine learning is the semi-automated extraction of knowledge from data\"\n",
      "\n",
      "- **Knowledge from data**: Starts with a question that might be answerable using data\n",
      "- **Automated extraction**: A computer provides the insight\n",
      "- **Semi-automated**: Requires many smart decisions by a human"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## What are the two main categories of machine learning?\n",
      "\n",
      "**Supervised learning**: Making predictions using data\n",
      "    \n",
      "- Example: Is a given email \"spam\" or \"ham\"?\n",
      "- There is an outcome we are trying to predict"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![Spam filter](images/01_spam_filter.png)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "**Unsupervised learning**: Extracting structure from data\n",
      "\n",
      "- Example: Segment grocery store shoppers into clusters that exhibit similar behaviors\n",
      "- There is no \"right answer\""
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![Clustering](images/01_clustering.png)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## How does machine learning \"work\"?\n",
      "\n",
      "High-level steps of supervised learning:\n",
      "\n",
      "1. First, train a **machine learning model** using **labeled data**\n",
      "\n",
      "    - \"Labeled data\" has been labeled with the outcome\n",
      "    - \"Machine learning model\" learns the relationship between the attributes of the data and its outcome\n",
      "\n",
      "2. Then, make **predictions** on **new data** for which the label is unknown"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "![Supervised learning](images/01_supervised_learning.png)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The primary goal of supervised learning is to build a model that \"generalizes\": It accurately predicts the **future** rather than the **past**!"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Questions about machine learning\n",
      "\n",
      "- How do I choose **which attributes** of my data to include in the model?\n",
      "- How do I choose **which model** to use?\n",
      "- How do I **optimize** this model for best performance?\n",
      "- How do I ensure that I'm building a model that will **generalize** to unseen data?\n",
      "- Can I **estimate** how well my model is likely to perform on unseen data?"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Resources\n",
      "\n",
      "- Book: [An Introduction to Statistical Learning](http://www-bcf.usc.edu/~gareth/ISL/) (section 2.1, 14 pages)\n",
      "- Video: [Learning Paradigms](http://work.caltech.edu/library/014.html) (13 minutes)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "## Comments or Questions?\n",
      "\n",
      "- Email: <kevin@dataschool.io>\n",
      "- Website: http://dataschool.io\n",
      "- Twitter: [@justmarkham](https://twitter.com/justmarkham)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from IPython.core.display import HTML\n",
      "def css_styling():\n",
      "    styles = open(\"styles/custom.css\", \"r\").read()\n",
      "    return HTML(styles)\n",
      "css_styling()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "html": [
        "<style>\n",
        "    @font-face {\n",
        "        font-family: \"Computer Modern\";\n",
        "        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
        "    }\n",
        "    div.cell{\n",
        "        width: 90%;\n",
        "/*        margin-left:auto;*/\n",
        "/*        margin-right:auto;*/\n",
        "    }\n",
        "    ul {\n",
        "        line-height: 145%;\n",
        "        font-size: 90%;\n",
        "    }\n",
        "    li {\n",
        "        margin-bottom: 1em;\n",
        "    }\n",
        "    h1 {\n",
        "        font-family: Helvetica, serif;\n",
        "    }\n",
        "    h4{\n",
        "        margin-top: 12px;\n",
        "        margin-bottom: 3px;\n",
        "       }\n",
        "    div.text_cell_render{\n",
        "        font-family: Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
        "        line-height: 145%;\n",
        "        font-size: 130%;\n",
        "        width: 90%;\n",
        "        margin-left:auto;\n",
        "        margin-right:auto;\n",
        "    }\n",
        "    .CodeMirror{\n",
        "            font-family: \"Source Code Pro\", source-code-pro,Consolas, monospace;\n",
        "    }\n",
        "/*    .prompt{\n",
        "        display: None;\n",
        "    }*/\n",
        "    .text_cell_render h5 {\n",
        "        font-weight: 300;\n",
        "        font-size: 16pt;\n",
        "        color: #4057A1;\n",
        "        font-style: italic;\n",
        "        margin-bottom: 0.5em;\n",
        "        margin-top: 0.5em;\n",
        "        display: block;\n",
        "    }\n",
        "\n",
        "    .warning{\n",
        "        color: rgb( 240, 20, 20 )\n",
        "        }\n",
        "</style>\n",
        "<script>\n",
        "    MathJax.Hub.Config({\n",
        "                        TeX: {\n",
        "                           extensions: [\"AMSmath.js\"]\n",
        "                           },\n",
        "                tex2jax: {\n",
        "                    inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
        "                    displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
        "                },\n",
        "                displayAlign: 'center', // Change this to 'center' to center equations.\n",
        "                \"HTML-CSS\": {\n",
        "                    styles: {'.MathJax_Display': {\"margin\": 4}}\n",
        "                }\n",
        "        });\n",
        "</script>"
       ],
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 1,
       "text": [
        "<IPython.core.display.HTML at 0x3edad30>"
       ]
      }
     ],
     "prompt_number": 1
    }
   ],
   "metadata": {}
  }
 ]
 }
--- a/images/01_clustering.png
+++ b/images/01_clustering.png
--- a/images/01_robot.png
+++ b/images/01_robot.png
--- a/images/01_spam_filter.png
+++ b/images/01_spam_filter.png
--- a/images/01_supervised_learning.png
+++ b/images/01_supervised_learning.png
--- a/styles/custom.css
+++ b/styles/custom.css
@ -0,0 +1,67 @@
 <style>
    @font-face {
        font-family: "Computer Modern";
        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');
    }
    div.cell{
        width: 90%;
 /*        margin-left:auto;*/
 /*        margin-right:auto;*/
    }
    ul {
        line-height: 145%;
        font-size: 90%;
    }
    li {
        margin-bottom: 1em;
    }
    h1 {
        font-family: Helvetica, serif;
    }
    h4{
        margin-top: 12px;
        margin-bottom: 3px;
       }
    div.text_cell_render{
        font-family: Computer Modern, "Helvetica Neue", Arial, Helvetica, Geneva, sans-serif;
        line-height: 145%;
        font-size: 130%;
        width: 90%;
        margin-left:auto;
        margin-right:auto;
    }
    .CodeMirror{
            font-family: "Source Code Pro", source-code-pro,Consolas, monospace;
    }
 /*    .prompt{
        display: None;
    }*/
    .text_cell_render h5 {
        font-weight: 300;
        font-size: 16pt;
        color: #4057A1;
        font-style: italic;
        margin-bottom: 0.5em;
        margin-top: 0.5em;
        display: block;
    }
    .warning{
        color: rgb( 240, 20, 20 )
        }
 </style>
 <script>
    MathJax.Hub.Config({
                        TeX: {
                           extensions: ["AMSmath.js"]
                           },
                tex2jax: {
                    inlineMath: [ ['$','$'], ["\\(","\\)"] ],
                    displayMath: [ ['$$','$$'], ["\\[","\\]"] ]
                },
                displayAlign: 'center', // Change this to 'center' to center equations.
                "HTML-CSS": {
                    styles: {'.MathJax_Display': {"margin": 4}}
                }
        });
 </script>