update to scikit-learn 0.23.2 and Python 3.9.1

2021-03-02 08:53:12 -05:00 · 2021-03-02 08:53:12 -05:00 · 4e8af9d831
parent cec096b944
commit 4e8af9d831
11 changed files with 519 additions and 1229 deletions
--- a/01_machine_learning_intro.ipynb
+++ b/01_machine_learning_intro.ipynb
@ -4,16 +4,16 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# What is machine learning, and how does it work? ([video #1](https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1))\n",
+    "# What is Machine Learning, and how does it work? ([video #1](https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1))\n",
    "\n",
-    "Created by [Data School](http://www.dataschool.io/). Watch all 9 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos)."
+    "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "![Machine learning](images/01_robot.png)"
+    "![Machine Learning](images/01_robot.png)"
   ]
  },
  {
@ -22,19 +22,19 @@
   "source": [
    "## Agenda\n",
    "\n",
-    "- What is machine learning?\n",
-    "- What are the two main categories of machine learning?\n",
-    "- What are some examples of machine learning?\n",
-    "- How does machine learning \"work\"?"
+    "- What is Machine Learning?\n",
+    "- What are the two main categories of Machine Learning?\n",
+    "- What are some examples of Machine Learning?\n",
+    "- How does Machine Learning \"work\"?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## What is machine learning?\n",
+    "## What is Machine Learning?\n",
    "\n",
-    "One definition: \"Machine learning is the semi-automated extraction of knowledge from data\"\n",
+    "One definition: \"Machine Learning is the semi-automated extraction of knowledge from data\"\n",
    "\n",
    "- **Knowledge from data**: Starts with a question that might be answerable using data\n",
    "- **Automated extraction**: A computer provides the insight\n",
@ -45,7 +45,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## What are the two main categories of machine learning?\n",
+    "## What are the two main categories of Machine Learning?\n",
    "\n",
    "**Supervised learning**: Making predictions using data\n",
    "    \n",
@ -81,14 +81,14 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## How does machine learning \"work\"?\n",
+    "## How does Machine Learning \"work\"?\n",
    "\n",
    "High-level steps of supervised learning:\n",
    "\n",
-    "1. First, train a **machine learning model** using **labeled data**\n",
+    "1. First, train a **Machine Learning model** using **labeled data**\n",
    "\n",
    "    - \"Labeled data\" has been labeled with the outcome\n",
-    "    - \"Machine learning model\" learns the relationship between the attributes of the data and its outcome\n",
+    "    - \"Machine Learning model\" learns the relationship between the attributes of the data and its outcome\n",
    "\n",
    "2. Then, make **predictions** on **new data** for which the label is unknown"
   ]
@ -111,7 +111,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Questions about machine learning\n",
+    "## Questions about Machine Learning\n",
    "\n",
    "- How do I choose **which attributes** of my data to include in the model?\n",
    "- How do I choose **which model** to use?\n",
@ -126,7 +126,7 @@
   "source": [
    "## Resources\n",
    "\n",
-    "- Book: [An Introduction to Statistical Learning](http://www-bcf.usc.edu/~gareth/ISL/) (section 2.1, 14 pages)\n",
+    "- Book: [An Introduction to Statistical Learning](https://www.statlearning.com/) (section 2.1, 14 pages)\n",
    "- Video: [Learning Paradigms](http://work.caltech.edu/library/014.html) (13 minutes)"
   ]
  },
@ -137,102 +137,9 @@
    "## Comments or Questions?\n",
    "\n",
    "- Email: <kevin@dataschool.io>\n",
-    "- Website: http://dataschool.io\n",
+    "- Website: https://www.dataschool.io\n",
    "- Twitter: [@justmarkham](https://twitter.com/justmarkham)"
   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<style>\n",
-       "    @font-face {\n",
-       "        font-family: \"Computer Modern\";\n",
-       "        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
-       "    }\n",
-       "    div.cell{\n",
-       "        width: 90%;\n",
-       "/*        margin-left:auto;*/\n",
-       "/*        margin-right:auto;*/\n",
-       "    }\n",
-       "    ul {\n",
-       "        line-height: 145%;\n",
-       "        font-size: 90%;\n",
-       "    }\n",
-       "    li {\n",
-       "        margin-bottom: 1em;\n",
-       "    }\n",
-       "    h1 {\n",
-       "        font-family: Helvetica, serif;\n",
-       "    }\n",
-       "    h4{\n",
-       "        margin-top: 12px;\n",
-       "        margin-bottom: 3px;\n",
-       "       }\n",
-       "    div.text_cell_render{\n",
-       "        font-family: Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
-       "        line-height: 145%;\n",
-       "        font-size: 130%;\n",
-       "        width: 90%;\n",
-       "        margin-left:auto;\n",
-       "        margin-right:auto;\n",
-       "    }\n",
-       "    .CodeMirror{\n",
-       "            font-family: \"Source Code Pro\", source-code-pro,Consolas, monospace;\n",
-       "    }\n",
-       "/*    .prompt{\n",
-       "        display: None;\n",
-       "    }*/\n",
-       "    .text_cell_render h5 {\n",
-       "        font-weight: 300;\n",
-       "        font-size: 16pt;\n",
-       "        color: #4057A1;\n",
-       "        font-style: italic;\n",
-       "        margin-bottom: 0.5em;\n",
-       "        margin-top: 0.5em;\n",
-       "        display: block;\n",
-       "    }\n",
-       "\n",
-       "    .warning{\n",
-       "        color: rgb( 240, 20, 20 )\n",
-       "        }\n",
-       "</style>\n",
-       "<script>\n",
-       "    MathJax.Hub.Config({\n",
-       "                        TeX: {\n",
-       "                           extensions: [\"AMSmath.js\"]\n",
-       "                           },\n",
-       "                tex2jax: {\n",
-       "                    inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
-       "                    displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
-       "                },\n",
-       "                displayAlign: 'center', // Change this to 'center' to center equations.\n",
-       "                \"HTML-CSS\": {\n",
-       "                    styles: {'.MathJax_Display': {\"margin\": 4}}\n",
-       "                }\n",
-       "        });\n",
-       "</script>"
-      ],
-      "text/plain": [
-       "<IPython.core.display.HTML object>"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from IPython.core.display import HTML\n",
-    "def css_styling():\n",
-    "    styles = open(\"styles/custom.css\", \"r\").read()\n",
-    "    return HTML(styles)\n",
-    "css_styling()"
-   ]
  }
 ],
 "metadata": {
@ -251,7 +158,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.6.5"
+   "version": "3.9.1"
  }
 },
 "nbformat": 4,
--- a/02_machine_learning_setup.ipynb
+++ b/02_machine_learning_setup.ipynb
@ -4,9 +4,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Setting up Python for machine learning: scikit-learn and Jupyter Notebook ([video #2](https://www.youtube.com/watch?v=IsXXlYVBt1M&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=2))\n",
+    "# Setting up Python for Machine Learning: scikit-learn and Jupyter Notebook ([video #2](https://www.youtube.com/watch?v=IsXXlYVBt1M&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=2))\n",
    "\n",
-    "Created by [Data School](http://www.dataschool.io/). Watch all 9 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n",
+    "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n",
    "\n",
    "**Note:** Since the video recording, the official name of the \"IPython Notebook\" was changed to \"Jupyter Notebook\". However, the functionality is the same."
   ]
@ -38,7 +38,7 @@
    "\n",
    "### Benefits:\n",
    "\n",
-    "- **Consistent interface** to machine learning models\n",
+    "- **Consistent interface** to Machine Learning models\n",
    "- Provides many **tuning parameters** but with **sensible defaults**\n",
    "- Exceptional **documentation**\n",
    "- Rich set of functionality for **companion tasks**\n",
@ -46,14 +46,14 @@
    "\n",
    "### Potential drawbacks:\n",
    "\n",
-    "- Harder (than R) to **get started with machine learning**\n",
+    "- Harder (than R) to **get started with Machine Learning**\n",
    "- Less emphasis (than R) on **model interpretability**\n",
    "\n",
    "### Further reading:\n",
    "\n",
-    "- Ben Lorica: [Six reasons why I recommend scikit-learn](http://radar.oreilly.com/2013/12/six-reasons-why-i-recommend-scikit-learn.html)\n",
-    "- scikit-learn authors: [API design for machine learning software](http://arxiv.org/pdf/1309.0238v1.pdf)\n",
-    "- Data School: [Should you teach Python or R for data science?](http://www.dataschool.io/python-or-r-for-data-science/)"
+    "- Ben Lorica: [Six reasons why I recommend scikit-learn](https://www.oreilly.com/content/six-reasons-why-i-recommend-scikit-learn/)\n",
+    "- scikit-learn authors: [API design for machine learning software](https://arxiv.org/pdf/1309.0238v1.pdf)\n",
+    "- Data School: [Should you teach Python or R for data science?](https://www.dataschool.io/python-or-r-for-data-science/)"
   ]
  },
  {
@ -69,9 +69,9 @@
   "source": [
    "## Installing scikit-learn\n",
    "\n",
-    "**Option 1:** [Install scikit-learn library](http://scikit-learn.org/stable/install.html) and dependencies (NumPy and SciPy)\n",
+    "**Option 1:** [Install scikit-learn library](https://scikit-learn.org/stable/install.html) and dependencies (NumPy and SciPy)\n",
    "\n",
-    "**Option 2:** [Install Anaconda distribution](https://www.anaconda.com/download/) of Python, which includes:\n",
+    "**Option 2:** [Install Anaconda distribution](https://www.anaconda.com/products/individual) of Python, which includes:\n",
    "\n",
    "- Hundreds of useful packages (including scikit-learn)\n",
    "- IPython and Jupyter Notebook\n",
@ -124,9 +124,9 @@
    "\n",
    "### IPython, Jupyter, and Markdown resources:\n",
    "\n",
-    "- [nbviewer](http://nbviewer.jupyter.org/): view notebooks online as static documents\n",
-    "- [IPython documentation](http://ipython.readthedocs.io/en/stable/)\n",
-    "- [Jupyter Notebook quickstart](http://jupyter.readthedocs.io/en/latest/content-quickstart.html)\n",
+    "- [nbviewer](https://nbviewer.jupyter.org/): view notebooks online as static documents\n",
+    "- [IPython documentation](https://ipython.readthedocs.io/en/stable/)\n",
+    "- [Jupyter Notebook quickstart](https://jupyter.readthedocs.io/en/latest/content-quickstart.html)\n",
    "- [GitHub's Mastering Markdown](https://guides.github.com/features/mastering-markdown/): short guide with lots of examples"
   ]
  },
@ -149,102 +149,9 @@
    "## Comments or Questions?\n",
    "\n",
    "- Email: <kevin@dataschool.io>\n",
-    "- Website: http://dataschool.io\n",
+    "- Website: https://www.dataschool.io\n",
    "- Twitter: [@justmarkham](https://twitter.com/justmarkham)"
   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<style>\n",
-       "    @font-face {\n",
-       "        font-family: \"Computer Modern\";\n",
-       "        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
-       "    }\n",
-       "    div.cell{\n",
-       "        width: 90%;\n",
-       "/*        margin-left:auto;*/\n",
-       "/*        margin-right:auto;*/\n",
-       "    }\n",
-       "    ul {\n",
-       "        line-height: 145%;\n",
-       "        font-size: 90%;\n",
-       "    }\n",
-       "    li {\n",
-       "        margin-bottom: 1em;\n",
-       "    }\n",
-       "    h1 {\n",
-       "        font-family: Helvetica, serif;\n",
-       "    }\n",
-       "    h4{\n",
-       "        margin-top: 12px;\n",
-       "        margin-bottom: 3px;\n",
-       "       }\n",
-       "    div.text_cell_render{\n",
-       "        font-family: Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
-       "        line-height: 145%;\n",
-       "        font-size: 130%;\n",
-       "        width: 90%;\n",
-       "        margin-left:auto;\n",
-       "        margin-right:auto;\n",
-       "    }\n",
-       "    .CodeMirror{\n",
-       "            font-family: \"Source Code Pro\", source-code-pro,Consolas, monospace;\n",
-       "    }\n",
-       "/*    .prompt{\n",
-       "        display: None;\n",
-       "    }*/\n",
-       "    .text_cell_render h5 {\n",
-       "        font-weight: 300;\n",
-       "        font-size: 16pt;\n",
-       "        color: #4057A1;\n",
-       "        font-style: italic;\n",
-       "        margin-bottom: 0.5em;\n",
-       "        margin-top: 0.5em;\n",
-       "        display: block;\n",
-       "    }\n",
-       "\n",
-       "    .warning{\n",
-       "        color: rgb( 240, 20, 20 )\n",
-       "        }\n",
-       "</style>\n",
-       "<script>\n",
-       "    MathJax.Hub.Config({\n",
-       "                        TeX: {\n",
-       "                           extensions: [\"AMSmath.js\"]\n",
-       "                           },\n",
-       "                tex2jax: {\n",
-       "                    inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
-       "                    displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
-       "                },\n",
-       "                displayAlign: 'center', // Change this to 'center' to center equations.\n",
-       "                \"HTML-CSS\": {\n",
-       "                    styles: {'.MathJax_Display': {\"margin\": 4}}\n",
-       "                }\n",
-       "        });\n",
-       "</script>"
-      ],
-      "text/plain": [
-       "<IPython.core.display.HTML object>"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from IPython.core.display import HTML\n",
-    "def css_styling():\n",
-    "    styles = open(\"styles/custom.css\", \"r\").read()\n",
-    "    return HTML(styles)\n",
-    "css_styling()"
-   ]
  }
 ],
 "metadata": {
@ -263,7 +170,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.6.5"
+   "version": "3.9.1"
  }
 },
 "nbformat": 4,
--- a/03_getting_started_with_iris.ipynb
+++ b/03_getting_started_with_iris.ipynb
@ -6,9 +6,9 @@
   "source": [
    "# Getting started in scikit-learn with the famous iris dataset ([video #3](https://www.youtube.com/watch?v=hd1W4CyPX58&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=3))\n",
    "\n",
-    "Created by [Data School](http://www.dataschool.io/). Watch all 9 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n",
+    "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n",
    "\n",
-    "**Note:** This notebook uses Python 3.6 and scikit-learn 0.19.1. The original notebook (shown in the video) used Python 2.7 and scikit-learn 0.16, and can be downloaded from the [archive branch](https://github.com/justmarkham/scikit-learn-videos/tree/archive)."
+    "**Note:** This notebook uses Python 3.9.1 and scikit-learn 0.23.2. The original notebook (shown in the video) used Python 2.7 and scikit-learn 0.16."
   ]
  },
  {
@ -17,9 +17,9 @@
   "source": [
    "## Agenda\n",
    "\n",
-    "- What is the famous iris dataset, and how does it relate to machine learning?\n",
+    "- What is the famous iris dataset, and how does it relate to Machine Learning?\n",
    "- How do we load the iris dataset into scikit-learn?\n",
-    "- How do we describe a dataset using machine learning terminology?\n",
+    "- How do we describe a dataset using Machine Learning terminology?\n",
    "- What are scikit-learn's four key requirements for working with data?"
   ]
  },
@ -47,8 +47,19 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 1,
   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# added empty cell so that the cell numbering matches the video"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "scrolled": false
+   },
   "outputs": [
    {
     "data": {
@ -57,14 +68,14 @@
       "        <iframe\n",
       "            width=\"300\"\n",
       "            height=\"200\"\n",
-       "            src=\"http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data\"\n",
+       "            src=\"https://www.dataschool.io/files/iris.txt\"\n",
       "            frameborder=\"0\"\n",
       "            allowfullscreen\n",
       "        ></iframe>\n",
       "        "
      ],
      "text/plain": [
-       "<IPython.lib.display.IFrame at 0x10caa2470>"
+       "<IPython.lib.display.IFrame at 0x7fe408230e80>"
      ]
     },
     "execution_count": 2,
@ -74,17 +85,17 @@
   ],
   "source": [
    "from IPython.display import IFrame\n",
-    "IFrame('http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', width=300, height=200)"
+    "IFrame('https://www.dataschool.io/files/iris.txt', width=300, height=200)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Machine learning on the iris dataset\n",
+    "## Machine Learning on the iris dataset\n",
    "\n",
    "- Framed as a **supervised learning** problem: Predict the species of an iris using the measurements\n",
-    "- Famous dataset for machine learning because prediction is **easy**\n",
+    "- Famous dataset for Machine Learning because prediction is **easy**\n",
    "- Learn more about the iris dataset: [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/datasets/Iris)"
   ]
  },
@ -130,7 +141,9 @@
  {
   "cell_type": "code",
   "execution_count": 5,
-   "metadata": {},
+   "metadata": {
+    "scrolled": true
+   },
   "outputs": [
    {
     "name": "stdout",
@ -170,10 +183,10 @@
      " [5.4 3.4 1.5 0.4]\n",
      " [5.2 4.1 1.5 0.1]\n",
      " [5.5 4.2 1.4 0.2]\n",
-      " [4.9 3.1 1.5 0.1]\n",
+      " [4.9 3.1 1.5 0.2]\n",
      " [5.  3.2 1.2 0.2]\n",
      " [5.5 3.5 1.3 0.2]\n",
-      " [4.9 3.1 1.5 0.1]\n",
+      " [4.9 3.6 1.4 0.1]\n",
      " [4.4 3.  1.3 0.2]\n",
      " [5.1 3.4 1.5 0.2]\n",
      " [5.  3.5 1.3 0.3]\n",
@ -298,7 +311,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "## Machine learning terminology\n",
+    "## Machine Learning terminology\n",
    "\n",
    "- Each row is an **observation** (also known as: sample, example, instance, record)\n",
    "- Each column is a **feature** (also known as: predictor, attribute, independent variable, input, regressor, covariate)"
@ -378,7 +391,7 @@
    "## Requirements for working with data in scikit-learn\n",
    "\n",
    "1. Features and response are **separate objects**\n",
-    "2. Features and response should be **numeric**\n",
+    "2. Features should always be **numeric**, and response should be **numeric** for regression problems\n",
    "3. Features and response should be **NumPy arrays**\n",
    "4. Features and response should have **specific shapes**"
   ]
@ -458,9 +471,9 @@
   "source": [
    "## Resources\n",
    "\n",
-    "- scikit-learn documentation: [Dataset loading utilities](http://scikit-learn.org/stable/datasets/)\n",
+    "- scikit-learn documentation: [Dataset loading utilities](https://scikit-learn.org/stable/datasets.html)\n",
    "- Jake VanderPlas: Fast Numerical Computing with NumPy ([slides](https://speakerdeck.com/jakevdp/losing-your-loops-fast-numerical-computing-with-numpy-pycon-2015), [video](https://www.youtube.com/watch?v=EEUXKG97YRw))\n",
-    "- Scott Shell: [An Introduction to NumPy](http://www.engr.ucsb.edu/~shell/che210d/numpy.pdf) (PDF)"
+    "- Scott Shell: [An Introduction to NumPy](https://sites.engineering.ucsb.edu/~shell/che210d/numpy.pdf) (PDF)"
   ]
  },
  {
@ -470,102 +483,9 @@
    "## Comments or Questions?\n",
    "\n",
    "- Email: <kevin@dataschool.io>\n",
-    "- Website: http://dataschool.io\n",
+    "- Website: https://www.dataschool.io\n",
    "- Twitter: [@justmarkham](https://twitter.com/justmarkham)"
   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<style>\n",
-       "    @font-face {\n",
-       "        font-family: \"Computer Modern\";\n",
-       "        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
-       "    }\n",
-       "    div.cell{\n",
-       "        width: 90%;\n",
-       "/*        margin-left:auto;*/\n",
-       "/*        margin-right:auto;*/\n",
-       "    }\n",
-       "    ul {\n",
-       "        line-height: 145%;\n",
-       "        font-size: 90%;\n",
-       "    }\n",
-       "    li {\n",
-       "        margin-bottom: 1em;\n",
-       "    }\n",
-       "    h1 {\n",
-       "        font-family: Helvetica, serif;\n",
-       "    }\n",
-       "    h4{\n",
-       "        margin-top: 12px;\n",
-       "        margin-bottom: 3px;\n",
-       "       }\n",
-       "    div.text_cell_render{\n",
-       "        font-family: Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
-       "        line-height: 145%;\n",
-       "        font-size: 130%;\n",
-       "        width: 90%;\n",
-       "        margin-left:auto;\n",
-       "        margin-right:auto;\n",
-       "    }\n",
-       "    .CodeMirror{\n",
-       "            font-family: \"Source Code Pro\", source-code-pro,Consolas, monospace;\n",
-       "    }\n",
-       "/*    .prompt{\n",
-       "        display: None;\n",
-       "    }*/\n",
-       "    .text_cell_render h5 {\n",
-       "        font-weight: 300;\n",
-       "        font-size: 16pt;\n",
-       "        color: #4057A1;\n",
-       "        font-style: italic;\n",
-       "        margin-bottom: 0.5em;\n",
-       "        margin-top: 0.5em;\n",
-       "        display: block;\n",
-       "    }\n",
-       "\n",
-       "    .warning{\n",
-       "        color: rgb( 240, 20, 20 )\n",
-       "        }\n",
-       "</style>\n",
-       "<script>\n",
-       "    MathJax.Hub.Config({\n",
-       "                        TeX: {\n",
-       "                           extensions: [\"AMSmath.js\"]\n",
-       "                           },\n",
-       "                tex2jax: {\n",
-       "                    inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
-       "                    displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
-       "                },\n",
-       "                displayAlign: 'center', // Change this to 'center' to center equations.\n",
-       "                \"HTML-CSS\": {\n",
-       "                    styles: {'.MathJax_Display': {\"margin\": 4}}\n",
-       "                }\n",
-       "        });\n",
-       "</script>"
-      ],
-      "text/plain": [
-       "<IPython.core.display.HTML object>"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from IPython.core.display import HTML\n",
-    "def css_styling():\n",
-    "    styles = open(\"styles/custom.css\", \"r\").read()\n",
-    "    return HTML(styles)\n",
-    "css_styling()"
-   ]
  }
 ],
 "metadata": {
@ -584,7 +504,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.6.5"
+   "version": "3.9.1"
  }
 },
 "nbformat": 4,
--- a/04_model_training.ipynb
+++ b/04_model_training.ipynb
@ -4,11 +4,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Training a machine learning model with scikit-learn ([video #4](https://www.youtube.com/watch?v=RlQuVL6-qe8&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=4))\n",
+    "# Training a Machine Learning model with scikit-learn ([video #4](https://www.youtube.com/watch?v=RlQuVL6-qe8&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=4))\n",
    "\n",
-    "Created by [Data School](http://www.dataschool.io/). Watch all 9 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n",
+    "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n",
    "\n",
-    "**Note:** This notebook uses Python 3.6 and scikit-learn 0.19.1. The original notebook (shown in the video) used Python 2.7 and scikit-learn 0.16, and can be downloaded from the [archive branch](https://github.com/justmarkham/scikit-learn-videos/tree/archive)."
+    "**Note:** This notebook uses Python 3.9.1 and scikit-learn 0.23.2. The original notebook (shown in the video) used Python 2.7 and scikit-learn 0.16."
   ]
  },
  {
@ -19,7 +19,7 @@
    "\n",
    "- What is the **K-nearest neighbors** classification model?\n",
    "- What are the four steps for **model training and prediction** in scikit-learn?\n",
-    "- How can I apply this pattern to **other machine learning models**?"
+    "- How can I apply this pattern to **other Machine Learning models**?"
   ]
  },
  {
@ -29,6 +29,15 @@
    "## Reviewing the iris dataset"
   ]
  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# added empty cell so that the cell numbering matches the video"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": 2,
@ -41,14 +50,14 @@
       "        <iframe\n",
       "            width=\"300\"\n",
       "            height=\"200\"\n",
-       "            src=\"http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data\"\n",
+       "            src=\"https://www.dataschool.io/files/iris.txt\"\n",
       "            frameborder=\"0\"\n",
       "            allowfullscreen\n",
       "        ></iframe>\n",
       "        "
      ],
      "text/plain": [
-       "<IPython.lib.display.IFrame at 0x10fb4e4a8>"
+       "<IPython.lib.display.IFrame at 0x7f8c18558700>"
      ]
     },
     "execution_count": 2,
@ -58,7 +67,7 @@
   ],
   "source": [
    "from IPython.display import IFrame\n",
-    "IFrame('http://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', width=300, height=200)"
+    "IFrame('https://www.dataschool.io/files/iris.txt', width=300, height=200)"
   ]
  },
  {
@ -119,7 +128,7 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "*Image Credits: [Data3classes](http://commons.wikimedia.org/wiki/File:Data3classes.png#/media/File:Data3classes.png), [Map1NN](http://commons.wikimedia.org/wiki/File:Map1NN.png#/media/File:Map1NN.png), [Map5NN](http://commons.wikimedia.org/wiki/File:Map5NN.png#/media/File:Map5NN.png) by Agor153. Licensed under CC BY-SA 3.0*"
+    "*Image Credits: [Data3classes](https://commons.wikimedia.org/wiki/File:Data3classes.png#/media/File:Data3classes.png), [Map1NN](https://commons.wikimedia.org/wiki/File:Map1NN.png#/media/File:Map1NN.png), [Map5NN](https://commons.wikimedia.org/wiki/File:Map5NN.png#/media/File:Map5NN.png) by Agor153. Licensed under CC BY-SA 3.0*"
   ]
  },
  {
@ -228,9 +237,7 @@
     "name": "stdout",
     "output_type": "stream",
     "text": [
-      "KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
-      "           metric_params=None, n_jobs=1, n_neighbors=1, p=2,\n",
-      "           weights='uniform')\n"
+      "KNeighborsClassifier(n_neighbors=1)\n"
     ]
    }
   ],
@ -256,9 +263,7 @@
    {
     "data": {
      "text/plain": [
-       "KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
-       "           metric_params=None, n_jobs=1, n_neighbors=1, p=2,\n",
-       "           weights='uniform')"
+       "KNeighborsClassifier(n_neighbors=1)"
      ]
     },
     "execution_count": 8,
@ -390,8 +395,8 @@
    "# import the class\n",
    "from sklearn.linear_model import LogisticRegression\n",
    "\n",
-    "# instantiate the model (using the default parameters)\n",
-    "logreg = LogisticRegression()\n",
+    "# instantiate the model\n",
+    "logreg = LogisticRegression(solver='liblinear')\n",
    "\n",
    "# fit the model with data\n",
    "logreg.fit(X, y)\n",
@ -406,9 +411,9 @@
   "source": [
    "## Resources\n",
    "\n",
-    "- [Nearest Neighbors](http://scikit-learn.org/stable/modules/neighbors.html) (user guide), [KNeighborsClassifier](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) (class documentation)\n",
-    "- [Logistic Regression](http://scikit-learn.org/stable/modules/linear_model.html#logistic-regression) (user guide), [LogisticRegression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) (class documentation)\n",
-    "- [Videos from An Introduction to Statistical Learning](http://www.dataschool.io/15-hours-of-expert-machine-learning-videos/)\n",
+    "- [Nearest Neighbors](https://scikit-learn.org/stable/modules/neighbors.html) (user guide), [KNeighborsClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) (class documentation)\n",
+    "- [Logistic Regression](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression) (user guide), [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) (class documentation)\n",
+    "- [Videos from An Introduction to Statistical Learning](https://www.dataschool.io/15-hours-of-expert-machine-learning-videos/)\n",
    "    - Classification Problems and K-Nearest Neighbors (Chapter 2)\n",
    "    - Introduction to Classification (Chapter 4)\n",
    "    - Logistic Regression and Maximum Likelihood (Chapter 4)"
@ -421,102 +426,9 @@
    "## Comments or Questions?\n",
    "\n",
    "- Email: <kevin@dataschool.io>\n",
-    "- Website: http://dataschool.io\n",
+    "- Website: https://www.dataschool.io\n",
    "- Twitter: [@justmarkham](https://twitter.com/justmarkham)"
   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<style>\n",
-       "    @font-face {\n",
-       "        font-family: \"Computer Modern\";\n",
-       "        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');\n",
-       "    }\n",
-       "    div.cell{\n",
-       "        width: 90%;\n",
-       "/*        margin-left:auto;*/\n",
-       "/*        margin-right:auto;*/\n",
-       "    }\n",
-       "    ul {\n",
-       "        line-height: 145%;\n",
-       "        font-size: 90%;\n",
-       "    }\n",
-       "    li {\n",
-       "        margin-bottom: 1em;\n",
-       "    }\n",
-       "    h1 {\n",
-       "        font-family: Helvetica, serif;\n",
-       "    }\n",
-       "    h4{\n",
-       "        margin-top: 12px;\n",
-       "        margin-bottom: 3px;\n",
-       "       }\n",
-       "    div.text_cell_render{\n",
-       "        font-family: Computer Modern, \"Helvetica Neue\", Arial, Helvetica, Geneva, sans-serif;\n",
-       "        line-height: 145%;\n",
-       "        font-size: 130%;\n",
-       "        width: 90%;\n",
-       "        margin-left:auto;\n",
-       "        margin-right:auto;\n",
-       "    }\n",
-       "    .CodeMirror{\n",
-       "            font-family: \"Source Code Pro\", source-code-pro,Consolas, monospace;\n",
-       "    }\n",
-       "/*    .prompt{\n",
-       "        display: None;\n",
-       "    }*/\n",
-       "    .text_cell_render h5 {\n",
-       "        font-weight: 300;\n",
-       "        font-size: 16pt;\n",
-       "        color: #4057A1;\n",
-       "        font-style: italic;\n",
-       "        margin-bottom: 0.5em;\n",
-       "        margin-top: 0.5em;\n",
-       "        display: block;\n",
-       "    }\n",
-       "\n",
-       "    .warning{\n",
-       "        color: rgb( 240, 20, 20 )\n",
-       "        }\n",
-       "</style>\n",
-       "<script>\n",
-       "    MathJax.Hub.Config({\n",
-       "                        TeX: {\n",
-       "                           extensions: [\"AMSmath.js\"]\n",
-       "                           },\n",
-       "                tex2jax: {\n",
-       "                    inlineMath: [ ['$','$'], [\"\\\\(\",\"\\\\)\"] ],\n",
-       "                    displayMath: [ ['$$','$$'], [\"\\\\[\",\"\\\\]\"] ]\n",
-       "                },\n",
-       "                displayAlign: 'center', // Change this to 'center' to center equations.\n",
-       "                \"HTML-CSS\": {\n",
-       "                    styles: {'.MathJax_Display': {\"margin\": 4}}\n",
-       "                }\n",
-       "        });\n",
-       "</script>"
-      ],
-      "text/plain": [
-       "<IPython.core.display.HTML object>"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from IPython.core.display import HTML\n",
-    "def css_styling():\n",
-    "    styles = open(\"styles/custom.css\", \"r\").read()\n",
-    "    return HTML(styles)\n",
-    "css_styling()"
-   ]
  }
 ],
 "metadata": {
@ -535,7 +447,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.6.5"
+   "version": "3.9.1"
  }
 },
 "nbformat": 4,
--- a/05_model_evaluation.ipynb
+++ b/05_model_evaluation.ipynb
--- a/06_linear_regression.ipynb
+++ b/06_linear_regression.ipynb
--- a/07_cross_validation.ipynb
+++ b/07_cross_validation.ipynb
--- a/08_grid_search.ipynb
+++ b/08_grid_search.ipynb
--- a/09_classification_metrics.ipynb
+++ b/09_classification_metrics.ipynb
--- a/10_categorical_features.ipynb
+++ b/10_categorical_features.ipynb
@ -4,11 +4,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Encoding categorical features ([video #10](https://www.youtube.com/watch?v=irHhDMbw3xo&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=10))\n",
+    "# Building a Machine Learning workflow ([video #10](https://www.youtube.com/watch?v=irHhDMbw3xo&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=10))\n",
    "\n",
    "Created by [Data School](https://www.dataschool.io). Watch all 10 videos on [YouTube](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A). Download the notebooks from [GitHub](https://github.com/justmarkham/scikit-learn-videos).\n",
    "\n",
-    "**Note:** This notebook uses scikit-learn 0.20. Some of the code below will not work if you are using an earlier version of scikit-learn."
+    "**Note:** This notebook uses Python 3.9.1 and scikit-learn 0.23.2. The original notebook (shown in the video) used Python 3.7 and scikit-learn 0.20.2."
   ]
  },
  {
@ -297,33 +297,71 @@
   "metadata": {},
   "outputs": [
    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(889, 1)\n",
-      "(889,)\n"
-     ]
+     "data": {
+      "text/plain": [
+       "(889, 1)"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
    }
   ],
   "source": [
-    "print(X.shape)\n",
-    "print(y.shape)"
+    "X.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(889,)"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
   "source": [
-    "from sklearn.linear_model import LogisticRegression\n",
-    "logreg = LogisticRegression(solver='lbfgs')"
+    "y.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.linear_model import LogisticRegression"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "logreg = LogisticRegression()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import cross_val_score"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
   "outputs": [
    {
     "data": {
@ -331,19 +369,18 @@
       "0.6783406335301212"
      ]
     },
-     "execution_count": 13,
+     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "from sklearn.model_selection import cross_val_score\n",
    "cross_val_score(logreg, X, y, cv=5, scoring='accuracy').mean()"
   ]
  },
  {
   "cell_type": "code",
-   "execution_count": 14,
+   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
@ -354,7 +391,7 @@
       "Name: Survived, dtype: float64"
      ]
     },
-     "execution_count": 14,
+     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -372,7 +409,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 15,
+   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
@ -451,7 +488,7 @@
       "4         0       3    male        S"
      ]
     },
-     "execution_count": 15,
+     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -462,7 +499,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
@ -473,7 +510,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
@ -488,7 +525,7 @@
       "       [0., 1.]])"
      ]
     },
-     "execution_count": 17,
+     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -499,7 +536,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
@ -508,7 +545,7 @@
       "[array(['female', 'male'], dtype=object)]"
      ]
     },
-     "execution_count": 18,
+     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -519,7 +556,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 19,
+   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
@ -534,7 +571,7 @@
       "       [0., 1., 0.]])"
      ]
     },
-     "execution_count": 19,
+     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -545,7 +582,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 20,
+   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
@ -554,7 +591,7 @@
       "[array(['C', 'Q', 'S'], dtype=object)]"
      ]
     },
-     "execution_count": 20,
+     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -563,32 +600,6 @@
    "ohe.categories_"
   ]
  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "array([[0., 1., 0., 0., 1.],\n",
-       "       [1., 0., 1., 0., 0.],\n",
-       "       [1., 0., 0., 0., 1.],\n",
-       "       ...,\n",
-       "       [1., 0., 0., 0., 1.],\n",
-       "       [0., 1., 1., 0., 0.],\n",
-       "       [0., 1., 0., 1., 0.]])"
-      ]
-     },
-     "execution_count": 21,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "ohe.fit_transform(df[['Sex', 'Embarked']])"
-   ]
-  },
  {
   "cell_type": "markdown",
   "metadata": {},
@ -598,7 +609,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 22,
+   "execution_count": 24,
   "metadata": {},
   "outputs": [],
   "source": [
@ -607,7 +618,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 23,
+   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
@ -680,7 +691,7 @@
       "4       3    male        S"
      ]
     },
-     "execution_count": 23,
+     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -691,7 +702,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 24,
+   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
@ -701,7 +712,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 25,
+   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
@ -712,7 +723,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 26,
+   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
@ -727,7 +738,7 @@
       "       [0., 1., 0., 1., 0., 3.]])"
      ]
     },
-     "execution_count": 26,
+     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -738,7 +749,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 27,
+   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
@ -748,7 +759,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 28,
+   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
@ -757,7 +768,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 29,
+   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
@ -766,7 +777,7 @@
       "0.7727924839713071"
      ]
     },
-     "execution_count": 29,
+     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -786,7 +797,16 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 30,
+   "execution_count": 32,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# added empty cell so that the cell numbering matches the video"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
   "metadata": {
    "scrolled": true
   },
@ -861,7 +881,7 @@
       "790       3    male        Q"
      ]
     },
-     "execution_count": 30,
+     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -873,7 +893,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 31,
+   "execution_count": 34,
   "metadata": {
    "scrolled": true
   },
@ -881,15 +901,15 @@
    {
     "data": {
      "text/plain": [
-       "Pipeline(memory=None,\n",
-       "     steps=[('columntransformer', ColumnTransformer(n_jobs=None, remainder='passthrough', sparse_threshold=0.3,\n",
-       "         transformer_weights=None,\n",
-       "         transformers=[('onehotencoder', OneHotEncoder(categorical_features=None, categories=None,\n",
-       "       dtype=<class 'numpy.float64'>, handle_unknown='error...enalty='l2', random_state=None, solver='lbfgs',\n",
-       "          tol=0.0001, verbose=0, warm_start=False))])"
+       "Pipeline(steps=[('columntransformer',\n",
+       "                 ColumnTransformer(remainder='passthrough',\n",
+       "                                   transformers=[('onehotencoder',\n",
+       "                                                  OneHotEncoder(),\n",
+       "                                                  ['Sex', 'Embarked'])])),\n",
+       "                ('logisticregression', LogisticRegression())])"
      ]
     },
-     "execution_count": 31,
+     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -900,7 +920,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 32,
+   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
@ -909,7 +929,7 @@
       "array([1, 0, 1, 1, 0])"
      ]
     },
-     "execution_count": 32,
+     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
@ -927,7 +947,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 33,
+   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
@ -941,7 +961,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
@ -953,7 +973,7 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 35,
+   "execution_count": 38,
   "metadata": {},
   "outputs": [],
   "source": [
@ -965,67 +985,13 @@
  },
  {
   "cell_type": "code",
-   "execution_count": 36,
+   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "pipe = make_pipeline(column_trans, logreg)"
   ]
  },
-  {
-   "cell_type": "code",
-   "execution_count": 37,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "0.7727924839713071"
-      ]
-     },
-     "execution_count": 37,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "cross_val_score(pipe, X, y, cv=5, scoring='accuracy').mean()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 38,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "X_new = X.sample(5, random_state=99)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 39,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Pipeline(memory=None,\n",
-       "     steps=[('columntransformer', ColumnTransformer(n_jobs=None, remainder='passthrough', sparse_threshold=0.3,\n",
-       "         transformer_weights=None,\n",
-       "         transformers=[('onehotencoder', OneHotEncoder(categorical_features=None, categories=None,\n",
-       "       dtype=<class 'numpy.float64'>, handle_unknown='error...enalty='l2', random_state=None, solver='lbfgs',\n",
-       "          tol=0.0001, verbose=0, warm_start=False))])"
-      ]
-     },
-     "execution_count": 39,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "pipe.fit(X, y)"
-   ]
-  },
  {
   "cell_type": "code",
   "execution_count": 40,
@ -1034,7 +1000,7 @@
    {
     "data": {
      "text/plain": [
-       "array([1, 0, 1, 1, 0])"
+       "0.7727924839713071"
      ]
     },
     "execution_count": 40,
@ -1043,8 +1009,49 @@
    }
   ],
   "source": [
+    "cross_val_score(pipe, X, y, cv=5, scoring='accuracy').mean()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "X_new = X.sample(5, random_state=99)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 42,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([1, 0, 1, 1, 0])"
+      ]
+     },
+     "execution_count": 42,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pipe.fit(X, y)\n",
    "pipe.predict(X_new)"
   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Comments or Questions?\n",
+    "\n",
+    "- Email: <kevin@dataschool.io>\n",
+    "- Website: https://www.dataschool.io\n",
+    "- Twitter: [@justmarkham](https://twitter.com/justmarkham)"
+   ]
  }
 ],
 "metadata": {
@ -1063,7 +1070,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.7.5"
+   "version": "3.9.1"
  }
 },
 "nbformat": 4,
--- a/README.md
+++ b/README.md
@ -1,41 +1,41 @@
-# Introduction to machine learning with scikit-learn
+# Introduction to Machine Learning with scikit-learn

-This video series will teach you how to solve machine learning problems using Python's popular scikit-learn library. There are **10 video tutorials** totaling 4.5 hours, each with a corresponding **Jupyter notebook**. The notebook contains everything you see in the video: code, output, images, and comments.
+This video series will teach you how to solve Machine Learning problems using Python's popular scikit-learn library. There are **10 video tutorials** totaling 4.5 hours, each with a corresponding **Jupyter notebook**. The notebook contains everything you see in the video: code, output, images, and comments.

-**Note:** The notebooks in this repository have been updated to use Python 3.6 and scikit-learn 0.19.1. The original notebooks (shown in the video) used Python 2.7 and scikit-learn 0.16, and can be downloaded from the [archive branch](https://github.com/justmarkham/scikit-learn-videos/tree/archive). You can read about how I updated the code in this [blog post](https://www.dataschool.io/how-to-update-your-scikit-learn-code-for-2018/).
+**Note:** The notebooks in this repository have been updated to use Python 3.9.1 and scikit-learn 0.23.2. The original notebooks (shown in the video) used Python 2.7 and scikit-learn 0.16, and can be downloaded from the [archive branch](https://github.com/justmarkham/scikit-learn-videos/tree/archive). You can read about how I updated the code in this [blog post](https://www.dataschool.io/how-to-update-your-scikit-learn-code-for-2018/).

 You can [watch the entire series](https://www.youtube.com/playlist?list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A) on YouTube, and [view all of the notebooks](http://nbviewer.jupyter.org/github/justmarkham/scikit-learn-videos/tree/master/) using nbviewer.

 [![Watch the first tutorial video](images/youtube.png)](https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1 "Watch the first tutorial video")

-Once you complete this video series, I recommend enrolling in my online course, [Machine Learning with Text in Python](http://www.dataschool.io/learn/), to gain a deeper understanding of scikit-learn and Natural Language Processing.
+Once you complete this video series, I recommend enrolling in my online course, [Machine Learning with Text in Python](https://www.dataschool.io/learn/), to gain a deeper understanding of scikit-learn and Natural Language Processing.

 ## Table of Contents

-1. What is machine learning, and how does it work? ([video](https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1), [notebook](01_machine_learning_intro.ipynb))
-    - What is machine learning?
-    - What are the two main categories of machine learning?
-    - What are some examples of machine learning?
-    - How does machine learning "work"?
+1. What is Machine Learning, and how does it work? ([video](https://www.youtube.com/watch?v=elojMnjn4kk&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=1), [notebook](01_machine_learning_intro.ipynb))
+    - What is Machine Learning?
+    - What are the two main categories of Machine Learning?
+    - What are some examples of Machine Learning?
+    - How does Machine Learning "work"?

-2. Setting up Python for machine learning: scikit-learn and Jupyter Notebook ([video](https://www.youtube.com/watch?v=IsXXlYVBt1M&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=2), [notebook](02_machine_learning_setup.ipynb))
+2. Setting up Python for Machine Learning: scikit-learn and Jupyter Notebook ([video](https://www.youtube.com/watch?v=IsXXlYVBt1M&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=2), [notebook](02_machine_learning_setup.ipynb))
    - What are the benefits and drawbacks of scikit-learn?
    - How do I install scikit-learn?
    - How do I use the Jupyter Notebook?
    - What are some good resources for learning Python?

 3. Getting started in scikit-learn with the famous iris dataset ([video](https://www.youtube.com/watch?v=hd1W4CyPX58&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=3), [notebook](03_getting_started_with_iris.ipynb))
-    - What is the famous iris dataset, and how does it relate to machine learning?
+    - What is the famous iris dataset, and how does it relate to Machine Learning?
    - How do we load the iris dataset into scikit-learn?
-    - How do we describe a dataset using machine learning terminology?
+    - How do we describe a dataset using Machine Learning terminology?
    - What are scikit-learn's four key requirements for working with data?

-4. Training a machine learning model with scikit-learn ([video](https://www.youtube.com/watch?v=RlQuVL6-qe8&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=4), [notebook](04_model_training.ipynb))
+4. Training a Machine Learning model with scikit-learn ([video](https://www.youtube.com/watch?v=RlQuVL6-qe8&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=4), [notebook](04_model_training.ipynb))
    - What is the K-nearest neighbors classification model?
    - What are the four steps for model training and prediction in scikit-learn?
-    - How can I apply this pattern to other machine learning models?
+    - How can I apply this pattern to other Machine Learning models?

-5. Comparing machine learning models in scikit-learn ([video](https://www.youtube.com/watch?v=0pP4EwWJgIU&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=5), [notebook](05_model_evaluation.ipynb))
+5. Comparing Machine Learning models in scikit-learn ([video](https://www.youtube.com/watch?v=0pP4EwWJgIU&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=5), [notebook](05_model_evaluation.ipynb))
    - How do I choose which model to use for my supervised learning task?
    - How do I choose the best tuning parameters for that model?
    - How do I estimate the likely performance of my model on out-of-sample data?
@ -70,7 +70,7 @@ Once you complete this video series, I recommend enrolling in my online course,
    - What is the purpose of an ROC curve?
    - How does Area Under the Curve (AUC) differ from classification accuracy?

-10. Encoding categorical features ([video](https://www.youtube.com/watch?v=irHhDMbw3xo&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=10), [notebook](10_categorical_features.ipynb))
+10. Building a Machine Learning workflow ([video](https://www.youtube.com/watch?v=irHhDMbw3xo&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=10), [notebook](10_categorical_features.ipynb))
    - Why should you use a Pipeline?
    - How do you encode categorical features with OneHotEncoder?
    - How do you apply OneHotEncoder to selected columns with ColumnTransformer?
@ -80,7 +80,7 @@ Once you complete this video series, I recommend enrolling in my online course,

 ## Bonus Video

-At the PyCon 2016 conference, I taught a **3-hour tutorial** that builds upon this video series and focuses on **text-based data**. You can watch the [tutorial video](https://www.youtube.com/watch?v=ZiKMIuYidY0&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=10) on YouTube.
+At the PyCon 2016 conference, I taught a **3-hour tutorial** that builds upon this video series and focuses on **text-based data**. You can watch the [tutorial video](https://www.youtube.com/watch?v=ZiKMIuYidY0&list=PL5-da3qGB5ICeMbQuqbbCOQWcS6OYBr5A&index=11) on YouTube.

 Here are the topics I covered: