Add source material from second edition

master
Jake VanderPlas 2023-05-05 16:20:45 -07:00
parent 8a34a4f653
commit d66231454e
215 changed files with 91674 additions and 12480 deletions

View File

@ -1,27 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"| [Contents](Index.ipynb) | [IPython: Beyond Normal Python](01.00-IPython-Beyond-Normal-Python.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/00.00-Preface.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -37,27 +15,27 @@
"\n",
"This is a book about doing data science with Python, which immediately begs the question: what is *data science*?\n",
"It's a surprisingly hard definition to nail down, especially given how ubiquitous the term has become.\n",
"Vocal critics have variously dismissed the term as a superfluous label (after all, what science doesn't involve data?) or a simple buzzword that only exists to salt resumes and catch the eye of overzealous tech recruiters.\n",
"Vocal critics have variously dismissed it as a superfluous label (after all, what science doesn't involve data?) or a simple buzzword that only exists to salt resumes and catch the eye of overzealous tech recruiters.\n",
"\n",
"In my mind, these critiques miss something important.\n",
"Data science, despite its hype-laden veneer, is perhaps the best label we have for the cross-disciplinary set of skills that are becoming increasingly important in many applications across industry and academia.\n",
"This cross-disciplinary piece is key: in my mind, the best extisting definition of data science is illustrated by Drew Conway's Data Science Venn Diagram, first published on his blog in September 2010:"
"This *cross-disciplinary* piece is key: in my mind, the best existing definition of data science is illustrated by Drew Conway's Data Science Venn Diagram, first published on his blog in September 2010 (see the following figure)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Data Science Venn Diagram](figures/Data_Science_VD.png)\n",
"![Data Science Venn Diagram](images/Data_Science_VD.png)\n",
"\n",
"<small>(Source: [Drew Conway](http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram). Used by permission.)</small>"
"<small>(source: [Drew Conway](http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram), used by permission)</small>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While some of the intersection labels are a bit tongue-in-cheek, this diagram captures the essence of what I think people mean when they say \"data science\": it is fundamentally an *interdisciplinary* subject.\n",
"While some of the intersection labels are a bit tongue-in-cheek, this diagram captures the essence of what I think people mean when they say \"data science\": it is fundamentally an interdisciplinary subject.\n",
"Data science comprises three distinct and overlapping areas: the skills of a *statistician* who knows how to model and summarize datasets (which are growing ever larger); the skills of a *computer scientist* who can design and use algorithms to efficiently store, process, and visualize this data; and the *domain expertise*—what we might think of as \"classical\" training in a subject—necessary both to formulate the right questions and to put their answers in context.\n",
"\n",
"With this in mind, I would encourage you to think of data science not as a new domain of knowledge to learn, but a new set of skills that you can apply within your current area of expertise.\n",
@ -65,18 +43,19 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Who Is This Book For?\n",
"\n",
"In my teaching both at the University of Washington and at various tech-focused conferences and meetups, one of the most common questions I have heard is this: \"how should I learn Python?\"\n",
"In my teaching both at the University of Washington and at various tech-focused conferences and meetups, one of the most common questions I have heard is this: \"How should I learn Python?\"\n",
"The people asking are generally technically minded students, developers, or researchers, often with an already strong background in writing code and using computational and numerical tools.\n",
"Most of these folks don't want to learn Python *per se*, but want to learn the language with the aim of using it as a tool for data-intensive and computational science.\n",
"Most of these folks don't want to learn Python per se, but want to learn the language with the aim of using it as a tool for data-intensive and computational science.\n",
"While a large patchwork of videos, blog posts, and tutorials for this audience is available online, I've long been frustrated by the lack of a single good answer to this question; that is what inspired this book.\n",
"\n",
"The book is not meant to be an introduction to Python or to programming in general; I assume the reader has familiarity with the Python language, including defining functions, assigning variables, calling methods of objects, controlling the flow of a program, and other basic tasks.\n",
"Instead it is meant to help Python users learn to use Python's data science stacklibraries such as IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related toolsto effectively store, manipulate, and gain insight from data."
"Instead, it is meant to help Python users learn to use Python's data science stack—libraries such as those mentioned in the following section, and related tools—to effectively store, manipulate, and gain insight from data."
]
},
{
@ -85,44 +64,31 @@
"source": [
"## Why Python?\n",
"\n",
"Python has emerged over the last couple decades as a first-class tool for scientific computing tasks, including the analysis and visualization of large datasets.\n",
"Python has emerged over the last couple of decades as a first-class tool for scientific computing tasks, including the analysis and visualization of large datasets.\n",
"This may have come as a surprise to early proponents of the Python language: the language itself was not specifically designed with data analysis or scientific computing in mind.\n",
"The usefulness of Python for data science stems primarily from the large and active ecosystem of third-party packages: *NumPy* for manipulation of homogeneous array-based data, *Pandas* for manipulation of heterogeneous and labeled data, *SciPy* for common scientific computing tasks, *Matplotlib* for publication-quality visualizations, *IPython* for interactive execution and sharing of code, *Scikit-Learn* for machine learning, and many more tools that will be mentioned in the following pages.\n",
"\n",
"If you are looking for a guide to the Python language itself, I would suggest the sister project to this book, \"[A Whirlwind Tour of the Python Language](https://github.com/jakevdp/WhirlwindTourOfPython)\".\n",
"If you are looking for a guide to the Python language itself, I would suggest the sister project to this book, [https://www.oreilly.com/library/view/a-whirlwind-tour/9781492037859](_A Whirlwind Tour of the Python Language_).\n",
"This short report provides a tour of the essential features of the Python language, aimed at data scientists who already are familiar with one or more other programming languages."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Python 2 vs Python 3\n",
"\n",
"This book uses the syntax of Python 3, which contains language enhancements that are not compatible with the 2.x series of Python.\n",
"Though Python 3.0 was first released in 2008, adoption has been relatively slow, particularly in the scientific and web development communities.\n",
"This is primarily because it took some time for many of the essential third-party packages and toolkits to be made compatible with the new language internals.\n",
"Since early 2014, however, stable releases of the most important tools in the data science ecosystem have been fully compatible with both Python 2 and 3, and so this book will use the newer Python 3 syntax.\n",
"However, the vast majority of code snippets in this book will also work without modification in Python 2: in cases where a Py2-incompatible syntax is used, I will make every effort to note it explicitly."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Outline of the Book\n",
"\n",
"Each chapter of this book focuses on a particular package or tool that contributes a fundamental piece of the Python Data Sciece story.\n",
"Each numbered part of this book focuses on a particular package or tool that contributes a fundamental piece of the Python data science story, and is broken into short self-contained chapters that each discuss a single concept:\n",
"\n",
"1. IPython and Jupyter: these packages provide the computational environment in which many Python-using data scientists work.\n",
"2. NumPy: this library provides the ``ndarray`` for efficient storage and manipulation of dense data arrays in Python.\n",
"3. Pandas: this library provides the ``DataFrame`` for efficient storage and manipulation of labeled/columnar data in Python.\n",
"4. Matplotlib: this library provides capabilities for a flexible range of data visualizations in Python.\n",
"5. Scikit-Learn: this library provides efficient & clean Python implementations of the most important and established machine learning algorithms.\n",
"- *Part I, Jupyter: Beyond Normal Python*, introduces IPython and Jupyter. These packages provide the computational environment in which many Python-using data scientists work.\n",
"- *Part II, Introduction to NumPy*, focuses on the NumPy library, which provides the `ndarray` for efficient storage and manipulation of dense data arrays in Python.\n",
"- *Part III, Data Manipulation with Pandas*, introduces the Pandas library, which provides the `DataFrame` for efficient storage and manipulation of labeled/columnar data in Python.\n",
"- *Part IV, Visualization with Matplotlib*, concentrates on Matplotlib, a library that provides capabilities for a flexible range of data visualizations in Python.\n",
"- *Part V, Machine Learning*, focuses on the Scikit-Learn library, which provides efficient and clean Python implementations of the most important and established machine learning algorithms.\n",
"\n",
"The PyData world is certainly much larger than these five packages, and is growing every day.\n",
"With this in mind, I make every attempt through these pages to provide references to other interesting efforts, projects, and packages that are pushing the boundaries of what can be done in Python.\n",
"Nevertheless, these five are currently fundamental to much of the work being done in the Python data science space, and I expect they will remain important even as the ecosystem continues growing around them."
"The PyData world is certainly much larger than these six packages, and is growing every day.\n",
"With this in mind, I make every attempt throughout this book to provide references to other interesting efforts, projects, and packages that are pushing the boundaries of what can be done in Python.\n",
"Nevertheless, the packages I concentrate on are currently fundamental to much of the work being done in the Python data science space, and I expect they will remain important even as the ecosystem continues growing around them."
]
},
{
@ -133,53 +99,47 @@
"\n",
"Supplemental material (code examples, figures, etc.) is available for download at http://github.com/jakevdp/PythonDataScienceHandbook/. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless youre reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from OReilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your products documentation does require permission.\n",
"\n",
"We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example:\n",
"We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: \"*Python Data Science Handbook*, 2nd edition, by Jake VanderPlas (OReilly). Copyright 2023 Jake VanderPlas, 978-1-098-12122-8.\"\n",
"\n",
"> *The Python Data Science Handbook* by Jake VanderPlas (OReilly). Copyright 2016 Jake VanderPlas, 978-1-491-91205-8.\n",
"\n",
"If you feel your use of code examples falls outside fair use or the per mission given above, feel free to contact us at permissions@oreilly.com."
"If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installation Considerations\n",
"\n",
"Installing Python and the suite of libraries that enable scientific computing is straightforward . This section will outline some of the considerations when setting up your computer.\n",
"Installing Python and the suite of libraries that enable scientific computing is straightforward. This section will outline some of the things to keep in mind when setting up your computer.\n",
"\n",
"Though there are various ways to install Python, the one I would suggest for use in data science is the Anaconda distribution, which works similarly whether you use Windows, Linux, or Mac OS X.\n",
"Though there are various ways to install Python, the one I would suggest for use in data science is the Anaconda distribution, which works similarly whether you use Windows, Linux, or macOS.\n",
"The Anaconda distribution comes in two flavors:\n",
"\n",
"- [Miniconda](http://conda.pydata.org/miniconda.html) gives you the Python interpreter itself, along with a command-line tool called ``conda`` which operates as a cross-platform package manager geared toward Python packages, similar in spirit to the apt or yum tools that Linux users might be familiar with.\n",
"- [Miniconda](http://conda.pydata.org/miniconda.html) gives you the Python interpreter itself, along with a command-line tool called *conda* which operates as a cross-platform package manager geared toward Python packages, similar in spirit to the apt or yum tools that Linux users might be familiar with.\n",
"\n",
"- [Anaconda](https://www.continuum.io/downloads) includes both Python and conda, and additionally bundles a suite of other pre-installed packages geared toward scientific computing. Because of the size of this bundle, expect the installation to consume several gigabytes of disk space.\n",
"- [Anaconda](https://www.continuum.io/downloads) includes both Python and conda, and additionally bundles a suite of other preinstalled packages geared toward scientific computing. Because of the size of this bundle, expect the installation to consume several gigabytes of disk space.\n",
"\n",
"Any of the packages included with Anaconda can also be installed manually on top of Miniconda; for this reason I suggest starting with Miniconda.\n",
"\n",
"To get started, download and install the Miniconda packagemake sure to choose a version with Python 3and then install the core packages used in this book:\n",
"To get started, download and install the Miniconda package—make sure to choose a version with Python 3—and then install the core packages used in this book:\n",
"\n",
"```\n",
"[~]$ conda install numpy pandas scikit-learn matplotlib seaborn jupyter\n",
"```\n",
"\n",
"Throughout the text, we will also make use of other more specialized tools in Python's scientific ecosystem; installation is usually as easy as typing **``conda install packagename``**.\n",
"For more information on conda, including information about creating and using conda environments (which I would *highly* recommend), refer to [conda's online documentation](http://conda.pydata.org/docs/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"| [Contents](Index.ipynb) | [IPython: Beyond Normal Python](01.00-IPython-Beyond-Normal-Python.ipynb) >\n",
"Throughout the text, we will also make use of other more specialized tools in Python's scientific ecosystem; installation is usually as easy as typing **`conda install packagename`**.\n",
"If you ever come across packages that are not available in the default conda channel, be sure to check out [*conda-forge*](https://conda-forge.org/), a broad, community-driven repository of conda packages.\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/00.00-Preface.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"For more information on conda, including information about creating and using conda environments (which I would *highly* recommend), refer to [conda's online documentation](http://conda.pydata.org/docs/)."
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
@ -195,9 +155,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@ -4,29 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Preface](00.00-Preface.ipynb) | [Contents](Index.ipynb) | [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.00-IPython-Beyond-Normal-Python.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# IPython: Beyond Normal Python"
"# Jupyter: Beyond Normal Python"
]
},
{
@ -34,101 +12,25 @@
"metadata": {},
"source": [
"There are many options for development environments for Python, and I'm often asked which one I use in my own work.\n",
"My answer sometimes surprises people: my preferred environment is [IPython](http://ipython.org/) plus a text editor (in my case, Emacs or Atom depending on my mood).\n",
"IPython (short for *Interactive Python*) was started in 2001 by Fernando Perez as an enhanced Python interpreter, and has since grown into a project aiming to provide, in Perez's words, \"Tools for the entire life cycle of research computing.\"\n",
"If Python is the engine of our data science task, you might think of IPython as the interactive control panel.\n",
"My answer sometimes surprises people: my preferred environment is [IPython](http://ipython.org/) plus a text editor (in my case, Emacs or VSCode depending on my mood).\n",
"Jupyter got its start as the IPython shell, which was created in 2001 by Fernando Perez as an enhanced Python interpreter and has since grown into a project aiming to provide, in Perez's words, \"Tools for the entire life cycle of research computing.\"\n",
"If Python is the engine of our data science task, you might think of Jupyter as the interactive control panel.\n",
"\n",
"As well as being a useful interactive interface to Python, IPython also provides a number of useful syntactic additions to the language; we'll cover the most useful of these additions here.\n",
"In addition, IPython is closely tied with the [Jupyter project](http://jupyter.org), which provides a browser-based notebook that is useful for development, collaboration, sharing, and even publication of data science results.\n",
"The IPython notebook is actually a special case of the broader Jupyter notebook structure, which encompasses notebooks for Julia, R, and other programming languages.\n",
"As an example of the usefulness of the notebook format, look no further than the page you are reading: the entire manuscript for this book was composed as a set of IPython notebooks.\n",
"As well as being a useful interactive interface to Python, Jupyter also provides a number of useful syntactic additions to the language; we'll cover the most useful of these additions here.\n",
"Perhaps the most familiar interface provided by the Jupyter project is the Jupyter Notebook, a browser-based environment that is useful for development, collaboration, sharing, and even publication of data science results.\n",
"As an example of the usefulness of the notebook format, look no further than the page you are reading: the entire manuscript for this book was composed as a set of Jupyter notebooks.\n",
"\n",
"IPython is about using Python effectively for interactive scientific and data-intensive computing.\n",
"This chapter will start by stepping through some of the IPython features that are useful to the practice of data science, focusing especially on the syntax it offers beyond the standard features of Python.\n",
"Next, we will go into a bit more depth on some of the more useful \"magic commands\" that can speed-up common tasks in creating and using data science code.\n",
"Finally, we will touch on some of the features of the notebook that make it useful in understanding data and sharing results."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Shell or Notebook?\n",
"\n",
"There are two primary means of using IPython that we'll discuss in this chapter: the IPython shell and the IPython notebook.\n",
"The bulk of the material in this chapter is relevant to both, and the examples will switch between them depending on what is most convenient.\n",
"In the few sections that are relevant to just one or the other, we will explicitly state that fact.\n",
"Before we start, some words on how to launch the IPython shell and IPython notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Launching the IPython Shell\n",
"\n",
"This chapter, like most of this book, is not designed to be absorbed passively.\n",
"I recommend that as you read through it, you follow along and experiment with the tools and syntax we cover: the muscle-memory you build through doing this will be far more useful than the simple act of reading about it.\n",
"Start by launching the IPython interpreter by typing **``ipython``** on the command-line; alternatively, if you've installed a distribution like Anaconda or EPD, there may be a launcher specific to your system (we'll discuss this more fully in [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb)).\n",
"\n",
"Once you do this, you should see a prompt like the following:\n",
"```\n",
"IPython 4.0.1 -- An enhanced Interactive Python.\n",
"? -> Introduction and overview of IPython's features.\n",
"%quickref -> Quick reference.\n",
"help -> Python's own help system.\n",
"object? -> Details about 'object', use 'object??' for extra details.\n",
"In [1]:\n",
"```\n",
"With that, you're ready to follow along."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Launching the Jupyter Notebook\n",
"\n",
"The Jupyter notebook is a browser-based graphical interface to the IPython shell, and builds on it a rich set of dynamic display capabilities.\n",
"As well as executing Python/IPython statements, the notebook allows the user to include formatted text, static and dynamic visualizations, mathematical equations, JavaScript widgets, and much more.\n",
"Furthermore, these documents can be saved in a way that lets other people open them and execute the code on their own systems.\n",
"\n",
"Though the IPython notebook is viewed and edited through your web browser window, it must connect to a running Python process in order to execute code.\n",
"This process (known as a \"kernel\") can be started by running the following command in your system shell:\n",
"\n",
"```\n",
"$ jupyter notebook\n",
"```\n",
"\n",
"This command will launch a local web server that will be visible to your browser.\n",
"It immediately spits out a log showing what it is doing; that log will look something like this:\n",
"\n",
"```\n",
"$ jupyter notebook\n",
"[NotebookApp] Serving notebooks from local directory: /Users/jakevdp/PythonDataScienceHandbook\n",
"[NotebookApp] 0 active kernels \n",
"[NotebookApp] The IPython Notebook is running at: http://localhost:8888/\n",
"[NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).\n",
"```\n",
"\n",
"Upon issuing the command, your default browser should automatically open and navigate to the listed local URL;\n",
"the exact address will depend on your system.\n",
"If the browser does not open automatically, you can open a window and manually open this address (*http://localhost:8888/* in this example)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Preface](00.00-Preface.ipynb) | [Contents](Index.ipynb) | [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.00-IPython-Beyond-Normal-Python.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"This part of the book will start by stepping through some of the Jupyter and IPython features that are useful to the practice of data science, focusing especially on the syntax they offer beyond the standard features of Python.\n",
"Next, we will go into a bit more depth on some of the more useful *magic commands* that can speed up common tasks in creating and using data science code.\n",
"Finally, we will touch on some of the features of the notebook that make it useful for understanding data and sharing results."
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
@ -144,9 +46,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@ -4,29 +4,76 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"# Getting Started in IPython and Jupyter\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
"In writing Python code for data science, I generally go between three modes of working: I use the IPython shell for trying out short sequences of commands, the Jupyter Notebook for longer interactive analysis and for sharing content with others, and interactive development environments (IDEs) like Emacs or VSCode for creating reusable Python packages.\n",
"This chapter focuses on the first two modes: the IPython shell and the Jupyter Notebook.\n",
"Use of an IDE for software development is an important third tool in the data scientist's repertoire, but we will not directly address that here."
]
},
{
"cell_type": "markdown",
"id": "7b582097",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [IPython: Beyond Normal Python](01.00-IPython-Beyond-Normal-Python.ipynb) | [Contents](Index.ipynb) | [Keyboard Shortcuts in the IPython Shell](01.02-Shell-Keyboard-Shortcuts.ipynb) >\n",
"## Launching the IPython Shell\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.01-Help-And-Documentation.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"The text in this part, like most of this book, is not designed to be absorbed passively.\n",
"I recommend that as you read through it, you follow along and experiment with the tools and syntax we cover: the muscle memory you build through doing this will be far more useful than the simple act of reading about it.\n",
"Start by launching the IPython interpreter by typing **`ipython`** on the command line; alternatively, if you've installed a distribution like Anaconda or EPD, there may be a launcher specific to your system (we'll discuss this more fully in [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb)).\n",
"\n",
"Once you do this, you should see a prompt like the following:\n",
"\n",
"```ipython\n",
"Python 3.9.2 (v3.9.2:1a79785e3e, Feb 19 2021, 09:06:10) \n",
"Type 'copyright', 'credits' or 'license' for more information\n",
"IPython 7.21.0 -- An enhanced Interactive Python. Type '?' for help.\n",
"\n",
"In [1]:\n",
"```\n",
"With that, you're ready to follow along."
]
},
{
"cell_type": "markdown",
"id": "d1d2d0fb",
"metadata": {},
"source": [
"# Help and Documentation in IPython"
"## Launching the Jupyter Notebook\n",
"\n",
"The Jupyter Notebook is a browser-based graphical interface to the IPython shell, and builds on it a rich set of dynamic display capabilities.\n",
"As well as executing Python/IPython statements, notebooks allow the user to include formatted text, static and dynamic visualizations, mathematical equations, JavaScript widgets, and much more.\n",
"Furthermore, these documents can be saved in a way that lets other people open them and execute the code on their own systems.\n",
"\n",
"Though you'll view and edit Jupyter notebooks through your web browser window, they must connect to a running Python process in order to execute code.\n",
"You can start this process (known as a \"kernel\") by running the following command in your system shell:\n",
"\n",
"```\n",
"$ jupyter lab\n",
"```\n",
"\n",
"This command will launch a local web server that will be visible to your browser.\n",
"It immediately spits out a log showing what it is doing; that log will look something like this:\n",
"\n",
"```\n",
"$ jupyter lab\n",
"[ServerApp] Serving notebooks from local directory: /Users/jakevdp/PythonDataScienceHandbook\n",
"[ServerApp] Jupyter Server 1.4.1 is running at:\n",
"[ServerApp] http://localhost:8888/lab?token=dd852649\n",
"[ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).\n",
"```\n",
"\n",
"Upon issuing the command, your default browser should automatically open and navigate to the listed local URL;\n",
"the exact address will depend on your system.\n",
"If the browser does not open automatically, you can open a window and manually open this address (*http://localhost:8888/lab/* in this example)."
]
},
{
"cell_type": "markdown",
"id": "92286db8",
"metadata": {},
"source": [
"## Help and Documentation in IPython"
]
},
{
@ -35,60 +82,55 @@
"source": [
"If you read no other section in this chapter, read this one: I find the tools discussed here to be the most transformative contributions of IPython to my daily workflow.\n",
"\n",
"When a technologically-minded person is asked to help a friend, family member, or colleague with a computer problem, most of the time it's less a matter of knowing the answer as much as knowing how to quickly find an unknown answer.\n",
"In data science it's the same: searchable web resources such as online documentation, mailing-list threads, and StackOverflow answers contain a wealth of information, even (especially?) if it is a topic you've found yourself searching before.\n",
"When a technologically minded person is asked to help a friend, family member, or colleague with a computer problem, most of the time it's less a matter of knowing the answer than of knowing how to quickly find an unknown answer.\n",
"In data science it's the same: searchable web resources such as online documentation, mailing list threads, and Stack Overflow answers contain a wealth of information, even (especially?) about topics you've found yourself searching on before.\n",
"Being an effective practitioner of data science is less about memorizing the tool or command you should use for every possible situation, and more about learning to effectively find the information you don't know, whether through a web search engine or another means.\n",
"\n",
"One of the most useful functions of IPython/Jupyter is to shorten the gap between the user and the type of documentation and search that will help them do their work effectively.\n",
"While web searches still play a role in answering complicated questions, an amazing amount of information can be found through IPython alone.\n",
"Some examples of the questions IPython can help answer in a few keystrokes:\n",
"Some examples of the questions IPython can help answer in a few keystrokes include:\n",
"\n",
"- How do I call this function? What arguments and options does it have?\n",
"- What does the source code of this Python object look like?\n",
"- What is in this package I imported? What attributes or methods does this object have?\n",
"- What is in this package I imported? \n",
"- What attributes or methods does this object have?\n",
"\n",
"Here we'll discuss IPython's tools to quickly access this information, namely the ``?`` character to explore documentation, the ``??`` characters to explore source code, and the Tab key for auto-completion."
"Here we'll discuss the tools provided in the IPython shell and Jupyter Notebook to quickly access this information, namely the `?` character to explore documentation, the `??` characters to explore source code, and the Tab key for autocompletion."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Accessing Documentation with ``?``\n",
"### Accessing Documentation with ?\n",
"\n",
"The Python language and its data science ecosystem is built with the user in mind, and one big part of that is access to documentation.\n",
"Every Python object contains the reference to a string, known as a *doc string*, which in most cases will contain a concise summary of the object and how to use it.\n",
"Python has a built-in ``help()`` function that can access this information and prints the results.\n",
"For example, to see the documentation of the built-in ``len`` function, you can do the following:\n",
"The Python language and its data science ecosystem are built with the user in mind, and one big part of that is access to documentation.\n",
"Every Python object contains a reference to a string, known as a *docstring*, which in most cases will contain a concise summary of the object and how to use it.\n",
"Python has a built-in `help` function that can access this information and prints the results.\n",
"For example, to see the documentation of the built-in `len` function, you can do the following:\n",
"\n",
"```ipython\n",
"In [1]: help(len)\n",
"Help on built-in function len in module builtins:\n",
"\n",
"len(...)\n",
" len(object) -> integer\n",
" \n",
" Return the number of items of a sequence or mapping.\n",
"len(obj, /)\n",
" Return the number of items in a container.\n",
"```\n",
"\n",
"Depending on your interpreter, this information may be displayed as inline text, or in some separate pop-up window."
"Depending on your interpreter, this information may be displayed as inline text or in a separate pop-up window."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Because finding help on an object is so common and useful, IPython introduces the ``?`` character as a shorthand for accessing this documentation and other relevant information:\n",
"Because finding help on an object is so common and useful, IPython and Jupyter introduce the `?` character as a shorthand for accessing this documentation and other relevant information:\n",
"\n",
"```ipython\n",
"In [2]: len?\n",
"Type: builtin_function_or_method\n",
"String form: <built-in function len>\n",
"Namespace: Python builtin\n",
"Docstring:\n",
"len(object) -> integer\n",
"\n",
"Return the number of items of a sequence or mapping.\n",
"Signature: len(obj, /)\n",
"Docstring: Return the number of items in a container.\n",
"Type: builtin_function_or_method\n",
"```"
]
},
@ -101,9 +143,9 @@
"```ipython\n",
"In [3]: L = [1, 2, 3]\n",
"In [4]: L.insert?\n",
"Type: builtin_function_or_method\n",
"String form: <built-in method insert of list object at 0x1024b8ea8>\n",
"Docstring: L.insert(index, object) -- insert object before index\n",
"Signature: L.insert(index, object, /)\n",
"Docstring: Insert object before index.\n",
"Type: builtin_function_or_method\n",
"```\n",
"\n",
"or even objects themselves, with the documentation from their type:\n",
@ -113,9 +155,11 @@
"Type: list\n",
"String form: [1, 2, 3]\n",
"Length: 3\n",
"Docstring:\n",
"list() -> new empty list\n",
"list(iterable) -> new list initialized from iterable's items\n",
"Docstring: \n",
"Built-in mutable sequence.\n",
"\n",
"If no argument is given, the constructor creates a new empty list.\n",
"The argument must be an iterable if specified.\n",
"```"
]
},
@ -134,21 +178,21 @@
"```\n",
"\n",
"Note that to create a docstring for our function, we simply placed a string literal in the first line.\n",
"Because doc strings are usually multiple lines, by convention we used Python's triple-quote notation for multi-line strings."
"Because docstrings are usually multiple lines, by convention we used Python's triple-quote notation for multiline strings."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we'll use the ``?`` mark to find this doc string:\n",
"Now we'll use the `?` to find this docstring:\n",
"\n",
"```ipython\n",
"In [7]: square?\n",
"Type: function\n",
"String form: <function square at 0x103713cb0>\n",
"Definition: square(a)\n",
"Docstring: Return the square of a.\n",
"Signature: square(a)\n",
"Docstring: Return the square of a.\n",
"File: <ipython-input-6>\n",
"Type: function\n",
"```\n",
"\n",
"This quick access to documentation via docstrings is one reason you should get in the habit of always adding such inline documentation to the code you write!"
@ -158,100 +202,101 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Accessing Source Code with ``??``\n",
"### Accessing Source Code with ??\n",
"\n",
"Because the Python language is so easily readable, another level of insight can usually be gained by reading the source code of the object you're curious about.\n",
"IPython provides a shortcut to the source code with the double question mark (``??``):\n",
"IPython and Jupyter provide a shortcut to the source code with the double question mark (`??`):\n",
"\n",
"```ipython\n",
"In [8]: square??\n",
"Type: function\n",
"String form: <function square at 0x103713cb0>\n",
"Definition: square(a)\n",
"Source:\n",
"Signature: square(a)\n",
"Source: \n",
"def square(a):\n",
" \"Return the square of a\"\n",
" \"\"\"Return the square of a.\"\"\"\n",
" return a ** 2\n",
"File: <ipython-input-6>\n",
"Type: function\n",
"```\n",
"\n",
"For simple functions like this, the double question-mark can give quick insight into the under-the-hood details."
"For simple functions like this, the double question mark can give quick insight into the under-the-hood details."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you play with this much, you'll notice that sometimes the ``??`` suffix doesn't display any source code: this is generally because the object in question is not implemented in Python, but in C or some other compiled extension language.\n",
"If this is the case, the ``??`` suffix gives the same output as the ``?`` suffix.\n",
"You'll find this particularly with many of Python's built-in objects and types, for example ``len`` from above:\n",
"If you play with this much, you'll notice that sometimes the `??` suffix doesn't display any source code: this is generally because the object in question is not implemented in Python, but in C or some other compiled extension language.\n",
"If this is the case, the `??` suffix gives the same output as the `?` suffix.\n",
"You'll find this particularly with many of Python's built-in objects and types, including the `len` function from earlier:\n",
"\n",
"```ipython\n",
"In [9]: len??\n",
"Type: builtin_function_or_method\n",
"String form: <built-in function len>\n",
"Namespace: Python builtin\n",
"Docstring:\n",
"len(object) -> integer\n",
"\n",
"Return the number of items of a sequence or mapping.\n",
"Signature: len(obj, /)\n",
"Docstring: Return the number of items in a container.\n",
"Type: builtin_function_or_method\n",
"```\n",
"\n",
"Using ``?`` and/or ``??`` gives a powerful and quick interface for finding information about what any Python function or module does."
"Using `?` and/or `??` is a powerful and quick way of finding information about what any Python function or module does."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploring Modules with Tab-Completion\n",
"### Exploring Modules with Tab Completion\n",
"\n",
"IPython's other useful interface is the use of the tab key for auto-completion and exploration of the contents of objects, modules, and name-spaces.\n",
"In the examples that follow, we'll use ``<TAB>`` to indicate when the Tab key should be pressed."
"Another useful interface is the use of the Tab key for autocompletion and exploration of the contents of objects, modules, and namespaces.\n",
"In the examples that follow, I'll use `<TAB>` to indicate when the Tab key should be pressed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tab-completion of object contents\n",
"#### Tab completion of object contents\n",
"\n",
"Every Python object has various attributes and methods associated with it.\n",
"Like with the ``help`` function discussed before, Python has a built-in ``dir`` function that returns a list of these, but the tab-completion interface is much easier to use in practice.\n",
"To see a list of all available attributes of an object, you can type the name of the object followed by a period (\"``.``\") character and the Tab key:\n",
"Like the `help` function mentioned earlier, Python has a built-in `dir` function that returns a list of these, but the tab-completion interface is much easier to use in practice.\n",
"To see a list of all available attributes of an object, you can type the name of the object followed by a period (\"`.`\") character and the Tab key:\n",
"\n",
"```ipython\n",
"In [10]: L.<TAB>\n",
"L.append L.copy L.extend L.insert L.remove L.sort \n",
"L.clear L.count L.index L.pop L.reverse \n",
" append() count insert reverse \n",
" clear extend pop sort \n",
" copy index remove \n",
"```\n",
"\n",
"To narrow-down the list, you can type the first character or several characters of the name, and the Tab key will find the matching attributes and methods:\n",
"To narrow down the list, you can type the first character or several characters of the name, and the Tab key will find the matching attributes and methods:\n",
"\n",
"```ipython\n",
"In [10]: L.c<TAB>\n",
"L.clear L.copy L.count \n",
" clear() count()\n",
" copy() \n",
"\n",
"In [10]: L.co<TAB>\n",
"L.copy L.count \n",
" copy() count()\n",
"```\n",
"\n",
"If there is only a single option, pressing the Tab key will complete the line for you.\n",
"For example, the following will instantly be replaced with ``L.count``:\n",
"For example, the following will instantly be replaced with `L.count`:\n",
"\n",
"```ipython\n",
"In [10]: L.cou<TAB>\n",
"\n",
"```\n",
"\n",
"Though Python has no strictly-enforced distinction between public/external attributes and private/internal attributes, by convention a preceding underscore is used to denote such methods.\n",
"Though Python has no strictly enforced distinction between public/external attributes and private/internal attributes, by convention a preceding underscore is used to denote the latter.\n",
"For clarity, these private methods and special methods are omitted from the list by default, but it's possible to list them by explicitly typing the underscore:\n",
"\n",
"```ipython\n",
"In [10]: L._<TAB>\n",
"L.__add__ L.__gt__ L.__reduce__\n",
"L.__class__ L.__hash__ L.__reduce_ex__\n",
" __add__ __delattr__ __eq__ \n",
" __class__ __delitem__ __format__()\n",
" __class_getitem__() __dir__() __ge__ >\n",
" __contains__ __doc__ __getattribute__ \n",
"```\n",
"\n",
"For brevity, we've only shown the first couple lines of the output.\n",
"For brevity, I've only shown the first few columns of the output.\n",
"Most of these are Python's special double-underscore methods (often nicknamed \"dunder\" methods)."
]
},
@ -259,41 +304,43 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tab completion when importing\n",
"#### Tab completion when importing\n",
"\n",
"Tab completion is also useful when importing objects from packages.\n",
"Here we'll use it to find all possible imports in the ``itertools`` package that start with ``co``:\n",
"```\n",
"Here we'll use it to find all possible imports in the `itertools` package that start with `co`:\n",
"\n",
"```ipython\n",
"In [10]: from itertools import co<TAB>\n",
"combinations compress\n",
"combinations_with_replacement count\n",
" combinations() compress()\n",
" combinations_with_replacement() count()\n",
"```\n",
"\n",
"Similarly, you can use tab-completion to see which imports are available on your system (this will change depending on which third-party scripts and modules are visible to your Python session):\n",
"```\n",
"\n",
"```ipython\n",
"In [10]: import <TAB>\n",
"Display all 399 possibilities? (y or n)\n",
"Crypto dis py_compile\n",
"Cython distutils pyclbr\n",
"... ... ...\n",
"difflib pwd zmq\n",
" abc anyio \n",
" activate_this appdirs \n",
" aifc appnope >\n",
" antigravity argon2 \n",
"\n",
"In [10]: import h<TAB>\n",
"hashlib hmac http \n",
"heapq html husl \n",
"```\n",
"(Note that for brevity, I did not print here all 399 importable packages and modules on my system.)"
" hashlib html \n",
" heapq http \n",
" hmac \n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Beyond tab completion: wildcard matching\n",
"#### Beyond tab completion: Wildcard matching\n",
"\n",
"Tab completion is useful if you know the first few characters of the object or attribute you're looking for, but is little help if you'd like to match characters at the middle or end of the word.\n",
"For this use-case, IPython provides a means of wildcard matching for names using the ``*`` character.\n",
"Tab completion is useful if you know the first few characters of the name of the object or attribute you're looking for, but is little help if you'd like to match characters in the middle or at the end of the name.\n",
"For this use case, IPython and Jupyter provide a means of wildcard matching for names using the `*` character.\n",
"\n",
"For example, we can use this to list every object in the namespace that ends with ``Warning``:\n",
"For example, we can use this to list every object in the namespace whose name ends with `Warning`:\n",
"\n",
"```ipython\n",
"In [10]: *Warning?\n",
@ -305,35 +352,28 @@
"ResourceWarning\n",
"```\n",
"\n",
"Notice that the ``*`` character matches any string, including the empty string.\n",
"Notice that the `*` character matches any string, including the empty string.\n",
"\n",
"Similarly, suppose we are looking for a string method that contains the word ``find`` somewhere in its name.\n",
"Similarly, suppose we are looking for a string method that contains the word `find` somewhere in its name.\n",
"We can search for it this way:\n",
"\n",
"```ipython\n",
"In [10]: str.*find*?\n",
"In [11]: str.*find*?\n",
"str.find\n",
"str.rfind\n",
"```\n",
"\n",
"I find this type of flexible wildcard search can be very useful for finding a particular command when getting to know a new package or reacquainting myself with a familiar one."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [IPython: Beyond Normal Python](01.00-IPython-Beyond-Normal-Python.ipynb) | [Contents](Index.ipynb) | [Keyboard Shortcuts in the IPython Shell](01.02-Shell-Keyboard-Shortcuts.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.01-Help-And-Documentation.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"I find this type of flexible wildcard search can be useful for finding a particular command when getting to know a new package or reacquainting myself with a familiar one."
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3.9.6 64-bit ('3.9.6')",
"language": "python",
"name": "python3"
},
@ -347,9 +387,14 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.6"
},
"vscode": {
"interpreter": {
"hash": "513788764cd0ec0f97313d5418a13e1ea666d16d72f976a8acadce25a5af2ffc"
}
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@ -1,27 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb) | [Contents](Index.ipynb) | [IPython Magic Commands](01.03-Magic-Commands.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.02-Shell-Keyboard-Shortcuts.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -33,33 +11,32 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If you spend any amount of time on the computer, you've probably found a use for keyboard shortcuts in your workflow.\n",
"Most familiar perhaps are the Cmd-C and Cmd-V (or Ctrl-C and Ctrl-V) for copying and pasting in a wide variety of programs and systems.\n",
"Power-users tend to go even further: popular text editors like Emacs, Vim, and others provide users an incredible range of operations through intricate combinations of keystrokes.\n",
"If you spend any amount of time on a computer, you've probably found a use for keyboard shortcuts in your workflow.\n",
"Most familiar perhaps are Cmd-c and Cmd-v (or Ctrl-c and Ctrl-v), used for copying and pasting in a wide variety of programs and systems.\n",
"Power users tend to go even further: popular text editors like Emacs, Vim, and others provide users an incredible range of operations through intricate combinations of keystrokes.\n",
"\n",
"The IPython shell doesn't go this far, but does provide a number of keyboard shortcuts for fast navigation while typing commands.\n",
"These shortcuts are not in fact provided by IPython itself, but through its dependency on the GNU Readline library: as such, some of the following shortcuts may differ depending on your system configuration.\n",
"Also, while some of these shortcuts do work in the browser-based notebook, this section is primarily about shortcuts in the IPython shell.\n",
"While some of these shortcuts do work in the browser-based notebooks, this section is primarily about shortcuts in the IPython shell.\n",
"\n",
"Once you get accustomed to these, they can be very useful for quickly performing certain commands without moving your hands from the \"home\" keyboard position.\n",
"If you're an Emacs user or if you have experience with Linux-style shells, the following will be very familiar.\n",
"We'll group these shortcuts into a few categories: *navigation shortcuts*, *text entry shortcuts*, *command history shortcuts*, and *miscellaneous shortcuts*."
"I'll group these shortcuts into a few categories: *navigation shortcuts*, *text entry shortcuts*, *command history shortcuts*, and *miscellaneous shortcuts*."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Navigation shortcuts\n",
"## Navigation Shortcuts\n",
"\n",
"While the use of the left and right arrow keys to move backward and forward in the line is quite obvious, there are other options that don't require moving your hands from the \"home\" keyboard position:\n",
"\n",
"| Keystroke | Action |\n",
"|-----------------------------------|--------------------------------------------|\n",
"| ``Ctrl-a`` | Move cursor to the beginning of the line |\n",
"| ``Ctrl-e`` | Move cursor to the end of the line |\n",
"| ``Ctrl-b`` or the left arrow key | Move cursor back one character |\n",
"| ``Ctrl-f`` or the right arrow key | Move cursor forward one character |"
"| Keystroke | Action |\n",
"|---------------------------------|--------------------------------------------|\n",
"| Ctrl-a | Move cursor to beginning of line |\n",
"| Ctrl-e | Move cursor to end of the line |\n",
"| Ctrl-b or the left arrow key | Move cursor back one character |\n",
"| Ctrl-f or the right arrow key | Move cursor forward one character |"
]
},
{
@ -69,18 +46,17 @@
"## Text Entry Shortcuts\n",
"\n",
"While everyone is familiar with using the Backspace key to delete the previous character, reaching for the key often requires some minor finger gymnastics, and it only deletes a single character at a time.\n",
"In IPython there are several shortcuts for removing some portion of the text you're typing.\n",
"The most immediately useful of these are the commands to delete entire lines of text.\n",
"In IPython there are several shortcuts for removing some portion of the text you're typing; the most immediately useful of these are the commands to delete entire lines of text.\n",
"You'll know these have become second-nature if you find yourself using a combination of Ctrl-b and Ctrl-d instead of reaching for Backspace to delete the previous character!\n",
"\n",
"| Keystroke | Action |\n",
"|-------------------------------|--------------------------------------------------|\n",
"| Backspace key | Delete previous character in line |\n",
"| ``Ctrl-d`` | Delete next character in line |\n",
"| ``Ctrl-k`` | Cut text from cursor to end of line |\n",
"| ``Ctrl-u`` | Cut text from beginning of line to cursor |\n",
"| ``Ctrl-y`` | Yank (i.e. paste) text that was previously cut |\n",
"| ``Ctrl-t`` | Transpose (i.e., switch) previous two characters |"
"| Keystroke | Action |\n",
"|-----------------------------|--------------------------------------------------|\n",
"| Backspace key | Delete previous character in line |\n",
"| Ctrl-d | Delete next character in line |\n",
"| Ctrl-k | Cut text from cursor to end of line |\n",
"| Ctrl-u | Cut text from beginning of line to cursor |\n",
"| Ctrl-y | Yank (i.e., paste) text that was previously cut |\n",
"| Ctrl-t | Transpose (i.e., switch) previous two characters |"
]
},
{
@ -91,21 +67,21 @@
"\n",
"Perhaps the most impactful shortcuts discussed here are the ones IPython provides for navigating the command history.\n",
"This command history goes beyond your current IPython session: your entire command history is stored in a SQLite database in your IPython profile directory.\n",
"The most straightforward way to access these is with the up and down arrow keys to step through the history, but other options exist as well:\n",
"The most straightforward way to access previous commands is by using the up and down arrow keys to step through the history, but other options exist as well:\n",
"\n",
"| Keystroke | Action |\n",
"|-------------------------------------|--------------------------------------------|\n",
"| ``Ctrl-p`` (or the up arrow key) | Access previous command in history |\n",
"| ``Ctrl-n`` (or the down arrow key) | Access next command in history |\n",
"| ``Ctrl-r`` | Reverse-search through command history |"
"| Keystroke | Action |\n",
"|-----------------------------------|--------------------------------------------|\n",
"| Ctrl-p (or the up arrow key) | Access previous command in history |\n",
"| Ctrl-n (or the down arrow key) | Access next command in history |\n",
"| Ctrl-r | Reverse-search through command history |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The reverse-search can be particularly useful.\n",
"Recall that in the previous section we defined a function called ``square``.\n",
"The reverse-search option can be particularly useful.\n",
"Recall that earlier we defined a function called `square`.\n",
"Let's reverse-search our Python history from a new IPython shell and find this definition again.\n",
"When you press Ctrl-r in the IPython terminal, you'll see the following prompt:\n",
"\n",
@ -114,14 +90,14 @@
"(reverse-i-search)`': \n",
"```\n",
"\n",
"If you start typing characters at this prompt, IPython will auto-fill the most recent command, if any, that matches those characters:\n",
"If you start typing characters at this prompt, IPython will autofill the most recent command, if any, that matches those characters:\n",
"\n",
"```ipython\n",
"In [1]: \n",
"(reverse-i-search)`sqa': square??\n",
"```\n",
"\n",
"At any point, you can add more characters to refine the search, or press Ctrl-r again to search further for another command that matches the query. If you followed along in the previous section, pressing Ctrl-r twice more gives:\n",
"At any point, you can add more characters to refine the search, or press Ctrl-r again to search further for another command that matches the query. If you followed along earlier, pressing Ctrl-r twice more gives:\n",
"\n",
"```ipython\n",
"In [1]: \n",
@ -131,7 +107,7 @@
"```\n",
"\n",
"Once you have found the command you're looking for, press Return and the search will end.\n",
"We can then use the retrieved command, and carry-on with our session:\n",
"You can then use the retrieved command and carry on with your session:\n",
"\n",
"```ipython\n",
"In [1]: def square(a):\n",
@ -142,8 +118,8 @@
"Out[2]: 4\n",
"```\n",
"\n",
"Note that Ctrl-p/Ctrl-n or the up/down arrow keys can also be used to search through history, but only by matching characters at the beginning of the line.\n",
"That is, if you type **``def``** and then press Ctrl-p, it would find the most recent command (if any) in your history that begins with the characters ``def``."
"Note that you can use Ctrl-p/Ctrl-n or the up/down arrow keys to search through your history in a similar way, but only by matching characters at the beginning of the line.\n",
"That is, if you type **`def`** and then press Ctrl-p, it will find the most recent command (if any) in your history that begins with the characters `def`."
]
},
{
@ -154,36 +130,29 @@
"\n",
"Finally, there are a few miscellaneous shortcuts that don't fit into any of the preceding categories, but are nevertheless useful to know:\n",
"\n",
"| Keystroke | Action |\n",
"|-------------------------------|--------------------------------------------|\n",
"| ``Ctrl-l`` | Clear terminal screen |\n",
"| ``Ctrl-c`` | Interrupt current Python command |\n",
"| ``Ctrl-d`` | Exit IPython session |\n",
"| Keystroke | Action |\n",
"|-----------------------------|--------------------------------------------|\n",
"| Ctrl-l | Clear terminal screen |\n",
"| Ctrl-c | Interrupt current Python command |\n",
"| Ctrl-d | Exit IPython session |\n",
"\n",
"The Ctrl-c in particular can be useful when you inadvertently start a very long-running job."
"The Ctrl-c shortcut in particular can be useful when you inadvertently start a very long-running job."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"While some of the shortcuts discussed here may seem a bit tedious at first, they quickly become automatic with practice.\n",
"While some of the shortcuts discussed here may seem a bit obscure at first, they quickly become automatic with practice.\n",
"Once you develop that muscle memory, I suspect you will even find yourself wishing they were available in other contexts."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb) | [Contents](Index.ipynb) | [IPython Magic Commands](01.03-Magic-Commands.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.02-Shell-Keyboard-Shortcuts.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
@ -199,9 +168,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@ -1,27 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Keyboard Shortcuts in the IPython Shell](01.02-Shell-Keyboard-Shortcuts.ipynb) | [Contents](Index.ipynb) | [Input and Output History](01.04-Input-Output-History.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.03-Magic-Commands.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -33,85 +11,26 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The previous two sections showed how IPython lets you use and explore Python efficiently and interactively.\n",
"The previous chapter showed how IPython lets you use and explore Python efficiently and interactively.\n",
"Here we'll begin discussing some of the enhancements that IPython adds on top of the normal Python syntax.\n",
"These are known in IPython as *magic commands*, and are prefixed by the ``%`` character.\n",
"These are known in IPython as *magic commands*, and are prefixed by the `%` character.\n",
"These magic commands are designed to succinctly solve various common problems in standard data analysis.\n",
"Magic commands come in two flavors: *line magics*, which are denoted by a single ``%`` prefix and operate on a single line of input, and *cell magics*, which are denoted by a double ``%%`` prefix and operate on multiple lines of input.\n",
"We'll demonstrate and discuss a few brief examples here, and come back to more focused discussion of several useful magic commands later in the chapter."
"Magic commands come in two flavors: *line magics*, which are denoted by a single `%` prefix and operate on a single line of input, and *cell magics*, which are denoted by a double `%%` prefix and operate on multiple lines of input.\n",
"I'll demonstrate and discuss a few brief examples here, and come back to a more focused discussion of several useful magic commands later."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Pasting Code Blocks: ``%paste`` and ``%cpaste``\n",
"\n",
"When working in the IPython interpreter, one common gotcha is that pasting multi-line code blocks can lead to unexpected errors, especially when indentation and interpreter markers are involved.\n",
"A common case is that you find some example code on a website and want to paste it into your interpreter.\n",
"Consider the following simple function:\n",
"\n",
"``` python\n",
">>> def donothing(x):\n",
"... return x\n",
"\n",
"```\n",
"The code is formatted as it would appear in the Python interpreter, and if you copy and paste this directly into IPython you get an error:\n",
"\n",
"```ipython\n",
"In [2]: >>> def donothing(x):\n",
" ...: ... return x\n",
" ...: \n",
" File \"<ipython-input-20-5a66c8964687>\", line 2\n",
" ... return x\n",
" ^\n",
"SyntaxError: invalid syntax\n",
"```\n",
"\n",
"In the direct paste, the interpreter is confused by the additional prompt characters.\n",
"But never fearIPython's ``%paste`` magic function is designed to handle this exact type of multi-line, marked-up input:\n",
"\n",
"```ipython\n",
"In [3]: %paste\n",
">>> def donothing(x):\n",
"... return x\n",
"\n",
"## -- End pasted text --\n",
"```\n",
"\n",
"The ``%paste`` command both enters and executes the code, so now the function is ready to be used:\n",
"\n",
"```ipython\n",
"In [4]: donothing(10)\n",
"Out[4]: 10\n",
"```\n",
"\n",
"A command with a similar intent is ``%cpaste``, which opens up an interactive multiline prompt in which you can paste one or more chunks of code to be executed in a batch:\n",
"\n",
"```ipython\n",
"In [5]: %cpaste\n",
"Pasting code; enter '--' alone on the line to stop or use Ctrl-D.\n",
":>>> def donothing(x):\n",
":... return x\n",
":--\n",
"```\n",
"\n",
"These magic commands, like others we'll see, make available functionality that would be difficult or impossible in a standard Python interpreter."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Running External Code: ``%run``\n",
"As you begin developing more extensive code, you will likely find yourself working in both IPython for interactive exploration, as well as a text editor to store code that you want to reuse.\n",
"## Running External Code: %run\n",
"As you begin developing more extensive code, you will likely find yourself working in IPython for interactive exploration, as well as a text editor to store code that you want to reuse.\n",
"Rather than running this code in a new window, it can be convenient to run it within your IPython session.\n",
"This can be done with the ``%run`` magic.\n",
"This can be done with the `%run` magic command.\n",
"\n",
"For example, imagine you've created a ``myscript.py`` file with the following contents:\n",
"For example, imagine you've created a *myscript.py* file with the following contents:\n",
"\n",
"```python\n",
"#-------------------------------------\n",
"# file: myscript.py\n",
"\n",
"def square(x):\n",
@ -119,7 +38,7 @@
" return x ** 2\n",
"\n",
"for N in range(1, 4):\n",
" print(N, \"squared is\", square(N))\n",
" print(f\"{N} squared is {square(N)}\")\n",
"```\n",
"\n",
"You can execute this from your IPython session as follows:\n",
@ -138,25 +57,25 @@
"Out[7]: 25\n",
"```\n",
"\n",
"There are several options to fine-tune how your code is run; you can see the documentation in the normal way, by typing **``%run?``** in the IPython interpreter."
"There are several options to fine-tune how your code is run; you can see the documentation in the normal way, by typing **`%run?`** in the IPython interpreter."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Timing Code Execution: ``%timeit``\n",
"Another example of a useful magic function is ``%timeit``, which will automatically determine the execution time of the single-line Python statement that follows it.\n",
"## Timing Code Execution: %timeit\n",
"Another example of a useful magic function is `%timeit`, which will automatically determine the execution time of the single-line Python statement that follows it.\n",
"For example, we may want to check the performance of a list comprehension:\n",
"\n",
"```ipython\n",
"In [8]: %timeit L = [n ** 2 for n in range(1000)]\n",
"1000 loops, best of 3: 325 µs per loop\n",
"430 µs ± 3.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n",
"```\n",
"\n",
"The benefit of ``%timeit`` is that for short commands it will automatically perform multiple runs in order to attain more robust results.\n",
"For multi line statements, adding a second ``%`` sign will turn this into a cell magic that can handle multiple lines of input.\n",
"For example, here's the equivalent construction with a ``for``-loop:\n",
"The benefit of `%timeit` is that for short commands it will automatically perform multiple runs in order to attain more robust results.\n",
"For multiline statements, adding a second `%` sign will turn this into a cell magic that can handle multiple lines of input.\n",
"For example, here's the equivalent construction with a `for` loop:\n",
"\n",
"```ipython\n",
"In [9]: %%timeit\n",
@ -164,22 +83,22 @@
" ...: for n in range(1000):\n",
" ...: L.append(n ** 2)\n",
" ...: \n",
"1000 loops, best of 3: 373 µs per loop\n",
"484 µs ± 5.67 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n",
"```\n",
"\n",
"We can immediately see that list comprehensions are about 10% faster than the equivalent ``for``-loop construction in this case.\n",
"We'll explore ``%timeit`` and other approaches to timing and profiling code in [Profiling and Timing Code](01.07-Timing-and-Profiling.ipynb)."
"We can immediately see that list comprehensions are about 10% faster than the equivalent `for` loop construction in this case.\n",
"We'll explore `%timeit` and other approaches to timing and profiling code in [Profiling and Timing Code](01.07-Timing-and-Profiling.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Help on Magic Functions: ``?``, ``%magic``, and ``%lsmagic``\n",
"## Help on Magic Functions: ?, %magic, and %lsmagic\n",
"\n",
"Like normal Python functions, IPython magic functions have docstrings, and this useful\n",
"documentation can be accessed in the standard manner.\n",
"So, for example, to read the documentation of the ``%timeit`` magic simply type this:\n",
"So, for example, to read the documentation of the `%timeit` magic function, simply type this:\n",
"\n",
"```ipython\n",
"In [10]: %timeit?\n",
@ -199,24 +118,17 @@
"```\n",
"\n",
"Finally, I'll mention that it is quite straightforward to define your own magic functions if you wish.\n",
"We won't discuss it here, but if you are interested, see the references listed in [More IPython Resources](01.08-More-IPython-Resources.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Keyboard Shortcuts in the IPython Shell](01.02-Shell-Keyboard-Shortcuts.ipynb) | [Contents](Index.ipynb) | [Input and Output History](01.04-Input-Output-History.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.03-Magic-Commands.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"I won't discuss it here, but if you are interested, see the references listed in [More IPython Resources](01.08-More-IPython-Resources.ipynb)."
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3.9.6 64-bit ('3.9.6')",
"language": "python",
"name": "python3"
},
@ -230,9 +142,14 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.6"
},
"vscode": {
"interpreter": {
"hash": "513788764cd0ec0f97313d5418a13e1ea666d16d72f976a8acadce25a5af2ffc"
}
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@ -1,27 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [IPython Magic Commands](01.03-Magic-Commands.ipynb) | [Contents](Index.ipynb) | [IPython and Shell Commands](01.05-IPython-And-Shell-Commands.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.04-Input-Output-History.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -33,8 +11,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Previously we saw that the IPython shell allows you to access previous commands with the up and down arrow keys, or equivalently the Ctrl-p/Ctrl-n shortcuts.\n",
"Additionally, in both the shell and the notebook, IPython exposes several ways to obtain the output of previous commands, as well as string versions of the commands themselves.\n",
"Previously you saw that the IPython shell allows you to access previous commands with the up and down arrow keys, or equivalently the Ctrl-p/Ctrl-n shortcuts.\n",
"Additionally, in both the shell and notebooks, IPython exposes several ways to obtain the output of previous commands, as well as string versions of the commands themselves.\n",
"We'll explore those here."
]
},
@ -42,11 +20,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## IPython's ``In`` and ``Out`` Objects\n",
"## IPython's In and Out Objects\n",
"\n",
"By now I imagine you're quite familiar with the ``In [1]:``/``Out[1]:`` style prompts used by IPython.\n",
"By now I imagine you're becoming familiar with the `In [1]:`/`Out[1]:` style of prompts used by IPython.\n",
"But it turns out that these are not just pretty decoration: they give a clue as to how you can access previous inputs and outputs in your current session.\n",
"Imagine you start a session that looks like this:\n",
"Suppose we start a session that looks like this:\n",
"\n",
"```ipython\n",
"In [1]: import math\n",
@ -63,15 +41,18 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We've imported the built-in ``math`` package, then computed the sine and the cosine of the number 2.\n",
"These inputs and outputs are displayed in the shell with ``In``/``Out`` labels, but there's moreIPython actually creates some Python variables called ``In`` and ``Out`` that are automatically updated to reflect this history:\n",
"We've imported the built-in `math` package, then computed the sine and the cosine of the number 2.\n",
"These inputs and outputs are displayed in the shell with `In`/`Out` labels, but there's more—IPython actually creates some Python variables called `In` and `Out` that are automatically updated to reflect this history:\n",
"\n",
"```ipython\n",
"In [4]: print(In)\n",
"['', 'import math', 'math.sin(2)', 'math.cos(2)', 'print(In)']\n",
"In [4]: In\n",
"Out[4]: ['', 'import math', 'math.sin(2)', 'math.cos(2)', 'In']\n",
"\n",
"In [5]: Out\n",
"Out[5]: {2: 0.9092974268256817, 3: -0.4161468365471424}\n",
"Out[5]:\n",
"{2: 0.9092974268256817,\n",
" 3: -0.4161468365471424,\n",
" 4: ['', 'import math', 'math.sin(2)', 'math.cos(2)', 'In', 'Out']}\n",
"```"
]
},
@ -79,33 +60,33 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The ``In`` object is a list, which keeps track of the commands in order (the first item in the list is a place-holder so that ``In[1]`` can refer to the first command):\n",
"The `In` object is a list, which keeps track of the commands in order (the first item in the list is a placeholder so that `In [1]` can refer to the first command):\n",
"\n",
"```ipython\n",
"In [6]: print(In[1])\n",
"import math\n",
"```\n",
"\n",
"The ``Out`` object is not a list but a dictionary mapping input numbers to their outputs (if any):\n",
"The `Out` object is not a list but a dictionary mapping input numbers to their outputs (if any):\n",
"\n",
"```ipython\n",
"In [7]: print(Out[2])\n",
"0.9092974268256817\n",
"```\n",
"\n",
"Note that not all operations have outputs: for example, ``import`` statements and ``print`` statements don't affect the output.\n",
"The latter may be surprising, but makes sense if you consider that ``print`` is a function that returns ``None``; for brevity, any command that returns ``None`` is not added to ``Out``.\n",
"Note that not all operations have outputs: for example, `import` statements and `print` statements don't affect the output.\n",
"The latter may be surprising, but makes sense if you consider that `print` is a function that returns `None`; for brevity, any command that returns `None` is not added to `Out`.\n",
"\n",
"Where this can be useful is if you want to interact with past results.\n",
"For example, let's check the sum of ``sin(2) ** 2`` and ``cos(2) ** 2`` using the previously-computed results:\n",
"For example, let's check the sum of `sin(2) ** 2` and `cos(2) ** 2` using the previously computed results:\n",
"\n",
"```ipython\n",
"In [8]: Out[2] ** 2 + Out[3] ** 2\n",
"Out[8]: 1.0\n",
"```\n",
"\n",
"The result is ``1.0`` as we'd expect from the well-known trigonometric identity.\n",
"In this case, using these previous results probably is not necessary, but it can become very handy if you execute a very expensive computation and want to reuse the result!"
"The result is `1.0`, as we'd expect from the well-known trigonometric identity.\n",
"In this case, using these previous results probably is not necessary, but it can become quite handy if you execute a very expensive computation and forget to assign the result to a variable."
]
},
{
@ -114,7 +95,7 @@
"source": [
"## Underscore Shortcuts and Previous Outputs\n",
"\n",
"The standard Python shell contains just one simple shortcut for accessing previous output; the variable ``_`` (i.e., a single underscore) is kept updated with the previous output; this works in IPython as well:\n",
"The standard Python shell contains just one simple shortcut for accessing previous output: the variable `_` (i.e., a single underscore) is kept updated with the previous output. This works in IPython as well:\n",
"\n",
"```ipython\n",
"In [9]: print(_)\n",
@ -133,7 +114,7 @@
"\n",
"IPython stops there: more than three underscores starts to get a bit hard to count, and at that point it's easier to refer to the output by line number.\n",
"\n",
"There is one more shortcut we should mention, howevera shorthand for ``Out[X]`` is ``_X`` (i.e., a single underscore followed by the line number):\n",
"There is one more shortcut I should mention, however—a shorthand for `Out[X]` is `_X` (i.e., a single underscore followed by the line number):\n",
"\n",
"```ipython\n",
"In [12]: Out[2]\n",
@ -150,14 +131,14 @@
"source": [
"## Suppressing Output\n",
"Sometimes you might wish to suppress the output of a statement (this is perhaps most common with the plotting commands that we'll explore in [Introduction to Matplotlib](04.00-Introduction-To-Matplotlib.ipynb)).\n",
"Or maybe the command you're executing produces a result that you'd prefer not like to store in your output history, perhaps so that it can be deallocated when other references are removed.\n",
"Or maybe the command you're executing produces a result that you'd prefer not to store in your output history, perhaps so that it can be deallocated when other references are removed.\n",
"The easiest way to suppress the output of a command is to add a semicolon to the end of the line:\n",
"\n",
"```ipython\n",
"In [14]: math.sin(2) + math.cos(2);\n",
"```\n",
"\n",
"Note that the result is computed silently, and the output is neither displayed on the screen or stored in the ``Out`` dictionary:\n",
"The result is computed silently, and the output is neither displayed on the screen nor stored in the `Out` dictionary:\n",
"\n",
"```ipython\n",
"In [15]: 14 in Out\n",
@ -170,35 +151,26 @@
"metadata": {},
"source": [
"## Related Magic Commands\n",
"For accessing a batch of previous inputs at once, the ``%history`` magic command is very helpful.\n",
"For accessing a batch of previous inputs at once, the `%history` magic command is very helpful.\n",
"Here is how you can print the first four inputs:\n",
"\n",
"```ipython\n",
"In [16]: %history -n 1-4\n",
"In [16]: %history -n 1-3\n",
" 1: import math\n",
" 2: math.sin(2)\n",
" 3: math.cos(2)\n",
" 4: print(In)\n",
"```\n",
"\n",
"As usual, you can type ``%history?`` for more information and a description of options available.\n",
"Other similar magic commands are ``%rerun`` (which will re-execute some portion of the command history) and ``%save`` (which saves some set of the command history to a file).\n",
"For more information, I suggest exploring these using the ``?`` help functionality discussed in [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [IPython Magic Commands](01.03-Magic-Commands.ipynb) | [Contents](Index.ipynb) | [IPython and Shell Commands](01.05-IPython-And-Shell-Commands.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.04-Input-Output-History.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"As usual, you can type `%history?` for more information and a description of options available (see [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb) for details on the `?` functionality).\n",
"Other useful magic commands are `%rerun`, which will re-execute some portion of the command history, and `%save`, which saves some set of the command history to a file)."
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
@ -214,9 +186,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@ -1,27 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Input and Output History](01.04-Input-Output-History.ipynb) | [Contents](Index.ipynb) | [Errors and Debugging](01.06-Errors-and-Debugging.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.05-IPython-And-Shell-Commands.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -35,11 +13,11 @@
"source": [
"When working interactively with the standard Python interpreter, one of the frustrations is the need to switch between multiple windows to access Python tools and system command-line tools.\n",
"IPython bridges this gap, and gives you a syntax for executing shell commands directly from within the IPython terminal.\n",
"The magic happens with the exclamation point: anything appearing after ``!`` on a line will be executed not by the Python kernel, but by the system command-line.\n",
"The magic happens with the exclamation point: anything appearing after `!` on a line will be executed not by the Python kernel, but by the system command line.\n",
"\n",
"The following assumes you're on a Unix-like system, such as Linux or Mac OSX.\n",
"Some of the examples that follow will fail on Windows, which uses a different type of shell by default (though with the 2016 announcement of native Bash shells on Windows, soon this may no longer be an issue!).\n",
"If you're unfamiliar with shell commands, I'd suggest reviewing the [Shell Tutorial](http://swcarpentry.github.io/shell-novice/) put together by the always excellent Software Carpentry Foundation."
"The following discussion assumes you're on a Unix-like system, such as Linux or macOS.\n",
"Some of the examples that follow will fail on Windows, which uses a different type of shell by default, though if you use the *Windows Subsystem for Linux* the examples here should run correctly.\n",
"If you're unfamiliar with shell commands, I'd suggest reviewing the [Unix shell tutorial](http://swcarpentry.github.io/shell-novice/) put together by the always excellent Software Carpentry Foundation."
]
},
{
@ -48,24 +26,24 @@
"source": [
"## Quick Introduction to the Shell\n",
"\n",
"A full intro to using the shell/terminal/command-line is well beyond the scope of this chapter, but for the uninitiated we will offer a quick introduction here.\n",
"A full introduction to using the shell/terminal/command line is well beyond the scope of this chapter, but for the uninitiated I will offer a quick introduction here.\n",
"The shell is a way to interact textually with your computer.\n",
"Ever since the mid 1980s, when Microsoft and Apple introduced the first versions of their now ubiquitous graphical operating systems, most computer users have interacted with their operating system through familiar clicking of menus and drag-and-drop movements.\n",
"Ever since the mid-1980s, when Microsoft and Apple introduced the first versions of their now ubiquitous graphical operating systems, most computer users have interacted with their operating systems through the familiar menu selections and drag-and-drop movements.\n",
"But operating systems existed long before these graphical user interfaces, and were primarily controlled through sequences of text input: at the prompt, the user would type a command, and the computer would do what the user told it to.\n",
"Those early prompt systems are the precursors of the shells and terminals that most active data scientists still use today.\n",
"Those early prompt systems were the precursors of the shells and terminals that most data scientists still use today.\n",
"\n",
"Someone unfamiliar with the shell might ask why you would bother with this, when many results can be accomplished by simply clicking on icons and menus.\n",
"A shell user might reply with another question: why hunt icons and click menus when you can accomplish things much more easily by typing?\n",
"While it might sound like a typical tech preference impasse, when moving beyond basic tasks it quickly becomes clear that the shell offers much more control of advanced tasks, though admittedly the learning curve can intimidate the average computer user.\n",
"Someone unfamiliar with the shell might ask why you would bother with this, when many of the same results can be accomplished by simply clicking on icons and menus.\n",
"A shell user might reply with another question: why hunt for icons and menu items when you can accomplish things much more easily by typing?\n",
"While it might sound like a typical tech preference impasse, when moving beyond basic tasks it quickly becomes clear that the shell offers much more control of advanced tasks—though admittedly the learning curve can be intimidating.\n",
"\n",
"As an example, here is a sample of a Linux/OSX shell session where a user explores, creates, and modifies directories and files on their system (``osx:~ $`` is the prompt, and everything after the ``$`` sign is the typed command; text that is preceded by a ``#`` is meant just as description, rather than something you would actually type in):\n",
"As an example, here is a sample of a Linux/macOS shell session where a user explores, creates, and modifies directories and files on their system (`osx:~ $` is the prompt, and everything after the `$` is the typed command; text that is preceded by a `#` is meant just as description, rather than something you would actually type in):\n",
"\n",
"```bash\n",
"osx:~ $ echo \"hello world\" # echo is like Python's print function\n",
"hello world\n",
"\n",
"osx:~ $ pwd # pwd = print working directory\n",
"/home/jake # this is the \"path\" that we're sitting in\n",
"/home/jake # This is the \"path\" that we're sitting in\n",
"\n",
"osx:~ $ ls # ls = list working directory contents\n",
"notebooks projects \n",
@ -84,14 +62,13 @@
"\n",
"osx:myproject $ mv ../myproject.txt ./ # mv = move file. Here we're moving the\n",
" # file myproject.txt from one directory\n",
" # up (../) to the current directory (./)\n",
" # up (../) to the current directory (./).\n",
"osx:myproject $ ls\n",
"myproject.txt\n",
"```\n",
"\n",
"Notice that all of this is just a compact way to do familiar operations (navigating a directory structure, creating a directory, moving a file, etc.) by typing commands rather than clicking icons and menus.\n",
"Note that with just a few commands (``pwd``, ``ls``, ``cd``, ``mkdir``, and ``cp``) you can do many of the most common file operations.\n",
"It's when you go beyond these basics that the shell approach becomes really powerful."
"With just a few commands (`pwd`, `ls`, `cd`, `mkdir`, and `cp`) you can do many of the most common file operations, but it's when you go beyond these basics that the shell approach becomes really powerful."
]
},
{
@ -100,8 +77,8 @@
"source": [
"## Shell Commands in IPython\n",
"\n",
"Any command that works at the command-line can be used in IPython by prefixing it with the ``!`` character.\n",
"For example, the ``ls``, ``pwd``, and ``echo`` commands can be run as follows:\n",
"Any standard shell command can be used directly in IPython by prefixing it with the `!` character.\n",
"For example, the `ls`, `pwd`, and `echo` commands can be run as follows:\n",
"\n",
"```ipython\n",
"In [1]: !ls\n",
@ -121,8 +98,8 @@
"source": [
"## Passing Values to and from the Shell\n",
"\n",
"Shell commands can not only be called from IPython, but can also be made to interact with the IPython namespace.\n",
"For example, you can save the output of any shell command to a Python list using the assignment operator:\n",
"Shell commands not only can be called from IPython, but can also be made to interact with the IPython namespace.\n",
"For example, you can save the output of any shell command to a Python list using the assignment operator, `=`:\n",
"\n",
"```ipython\n",
"In [4]: contents = !ls\n",
@ -136,15 +113,15 @@
"['/Users/jakevdp/notebooks/tmp/myproject']\n",
"```\n",
"\n",
"Note that these results are not returned as lists, but as a special shell return type defined in IPython:\n",
"These results are not returned as lists, but as a special shell return type defined in IPython:\n",
"\n",
"```ipython\n",
"In [8]: type(directory)\n",
"IPython.utils.text.SList\n",
"```\n",
"\n",
"This looks and acts a lot like a Python list, but has additional functionality, such as\n",
"the ``grep`` and ``fields`` methods and the ``s``, ``n``, and ``p`` properties that allow you to search, filter, and display the results in convenient ways.\n",
"This looks and acts a lot like a Python list but has additional functionality, such as\n",
"the `grep` and `fields` methods and the `s`, `n`, and `p` properties that allow you to search, filter, and display the results in convenient ways.\n",
"For more information on these, you can use IPython's built-in help features."
]
},
@ -152,7 +129,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Communication in the other directionpassing Python variables into the shellis possible using the ``{varname}`` syntax:\n",
"Communication in the other direction—passing Python variables into the shell—is possible using the `{varname}` syntax:\n",
"\n",
"```ipython\n",
"In [9]: message = \"hello from Python\"\n",
@ -168,9 +145,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Shell-Related Magic Commands\n",
"## Shell-Related Magic Commands\n",
"\n",
"If you play with IPython's shell commands for a while, you might notice that you cannot use ``!cd`` to navigate the filesystem:\n",
"If you play with IPython's shell commands for a while, you might notice that you cannot use `!cd` to navigate the filesystem:\n",
"\n",
"```ipython\n",
"In [11]: !pwd\n",
@ -182,24 +159,24 @@
"/home/jake/projects/myproject\n",
"```\n",
"\n",
"The reason is that shell commands in the notebook are executed in a temporary subshell.\n",
"If you'd like to change the working directory in a more enduring way, you can use the ``%cd`` magic command:\n",
"The reason is that shell commands in the notebook are executed in a temporary subshell that does not maintain state from command to command.\n",
"If you'd like to change the working directory in a more enduring way, you can use the `%cd` magic command:\n",
"\n",
"```ipython\n",
"In [14]: %cd ..\n",
"/home/jake/projects\n",
"```\n",
"\n",
"In fact, by default you can even use this without the ``%`` sign:\n",
"In fact, by default you can even use this without the `%` sign:\n",
"\n",
"```ipython\n",
"In [15]: cd myproject\n",
"/home/jake/projects/myproject\n",
"```\n",
"\n",
"This is known as an ``automagic`` function, and this behavior can be toggled with the ``%automagic`` magic function.\n",
"This is known as an *automagic* function, and the ability to execute such commands without an explicit `%` can be toggled with the `%automagic` magic function.\n",
"\n",
"Besides ``%cd``, other available shell-like magic functions are ``%cat``, ``%cp``, ``%env``, ``%ls``, ``%man``, ``%mkdir``, ``%more``, ``%mv``, ``%pwd``, ``%rm``, and ``%rmdir``, any of which can be used without the ``%`` sign if ``automagic`` is on.\n",
"Besides `%cd`, other available shell-like magic functions are `%cat`, `%cp`, `%env`, `%ls`, `%man`, `%mkdir`, `%more`, `%mv`, `%pwd`, `%rm`, and `%rmdir`, any of which can be used without the `%` sign if `automagic` is on.\n",
"This makes it so that you can almost treat the IPython prompt as if it's a normal shell:\n",
"\n",
"```ipython\n",
@ -216,22 +193,15 @@
"In [20]: rm -r tmp\n",
"```\n",
"\n",
"This access to the shell from within the same terminal window as your Python session means that there is a lot less switching back and forth between interpreter and shell as you write your Python code."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Input and Output History](01.04-Input-Output-History.ipynb) | [Contents](Index.ipynb) | [Errors and Debugging](01.06-Errors-and-Debugging.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.05-IPython-And-Shell-Commands.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"This access to the shell from within the same terminal window as your Python session lets you more naturally combine Python and the shell in your workflows with fewer context switches."
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
@ -247,9 +217,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@ -1,27 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [IPython and Shell Commands](01.05-IPython-And-Shell-Commands.ipynb) | [Contents](Index.ipynb) | [Profiling and Timing Code](01.07-Timing-and-Profiling.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.06-Errors-and-Debugging.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -41,11 +19,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Controlling Exceptions: ``%xmode``\n",
"## Controlling Exceptions: %xmode\n",
"\n",
"Most of the time when a Python script fails, it will raise an Exception.\n",
"Most of the time when a Python script fails, it will raise an exception.\n",
"When the interpreter hits one of these exceptions, information about the cause of the error can be found in the *traceback*, which can be accessed from within Python.\n",
"With the ``%xmode`` magic function, IPython allows you to control the amount of information printed when the exception is raised.\n",
"With the `%xmode` magic function, IPython allows you to control the amount of information printed when the exception is raised.\n",
"Consider the following code:"
]
},
@ -53,7 +31,10 @@
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
@ -70,7 +51,10 @@
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -94,20 +78,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Calling ``func2`` results in an error, and reading the printed trace lets us see exactly what happened.\n",
"By default, this trace includes several lines showing the context of each step that led to the error.\n",
"Using the ``%xmode`` magic function (short for *Exception mode*), we can change what information is printed.\n",
"Calling `func2` results in an error, and reading the printed trace lets us see exactly what happened.\n",
"In the default mode, this trace includes several lines showing the context of each step that led to the error.\n",
"Using the `%xmode` magic function (short for *exception mode*), we can change what information is printed.\n",
"\n",
"``%xmode`` takes a single argument, the mode, and there are three possibilities: ``Plain``, ``Context``, and ``Verbose``.\n",
"The default is ``Context``, and gives output like that just shown before.\n",
"``Plain`` is more compact and gives less information:"
"`%xmode` takes a single argument, the mode, and there are three possibilities: `Plain`, `Context`, and `Verbose`.\n",
"The default is `Context`, which gives output like that just shown.\n",
"`Plain` is more compact and gives less information:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -126,7 +113,10 @@
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -150,14 +140,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The ``Verbose`` mode adds some extra information, including the arguments to any functions that are called:"
"The `Verbose` mode adds some extra information, including the arguments to any functions that are called:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -176,7 +169,10 @@
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -200,10 +196,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This extra information can help narrow-in on why the exception is being raised.\n",
"So why not use the ``Verbose`` mode all the time?\n",
"This extra information can help you narrow in on why the exception is being raised.\n",
"So why not use the `Verbose` mode all the time?\n",
"As code gets complicated, this kind of traceback can get extremely long.\n",
"Depending on the context, sometimes the brevity of ``Default`` mode is easier to work with."
"Depending on the context, sometimes the brevity of `Plain` or `Context` mode is easier to work with."
]
},
{
@ -212,36 +208,39 @@
"source": [
"## Debugging: When Reading Tracebacks Is Not Enough\n",
"\n",
"The standard Python tool for interactive debugging is ``pdb``, the Python debugger.\n",
"The standard Python tool for interactive debugging is `pdb`, the Python debugger.\n",
"This debugger lets the user step through the code line by line in order to see what might be causing a more difficult error.\n",
"The IPython-enhanced version of this is ``ipdb``, the IPython debugger.\n",
"The IPython-enhanced version of this is `ipdb`, the IPython debugger.\n",
"\n",
"There are many ways to launch and use both these debuggers; we won't cover them fully here.\n",
"Refer to the online documentation of these two utilities to learn more.\n",
"\n",
"In IPython, perhaps the most convenient interface to debugging is the ``%debug`` magic command.\n",
"In IPython, perhaps the most convenient interface to debugging is the `%debug` magic command.\n",
"If you call it after hitting an exception, it will automatically open an interactive debugging prompt at the point of the exception.\n",
"The ``ipdb`` prompt lets you explore the current state of the stack, explore the available variables, and even run Python commands!\n",
"The `ipdb` prompt lets you explore the current state of the stack, explore the available variables, and even run Python commands!\n",
"\n",
"Let's look at the most recent exception, then do some basic tasksprint the values of ``a`` and ``b``, and type ``quit`` to quit the debugging session:"
"Let's look at the most recent exception, then do some basic tasks. We'll print the values of `a` and `b`, then type `quit` to quit the debugging session:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"> \u001b[0;32m<ipython-input-1-d849e34d61fb>\u001b[0m(2)\u001b[0;36mfunc1\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32m 1 \u001b[0;31m\u001b[0;32mdef\u001b[0m \u001b[0mfunc1\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\u001b[0;32m----> 2 \u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0ma\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\u001b[0;32m 3 \u001b[0;31m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\n",
"> <ipython-input-1-d849e34d61fb>(2)func1()\n",
" 1 def func1(a, b):\n",
"----> 2 return a / b\n",
" 3 \n",
"\n",
"ipdb> print(a)\n",
"1\n",
"ipdb> print(b)\n",
@ -258,43 +257,46 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The interactive debugger allows much more than this, thoughwe can even step up and down through the stack and explore the values of variables there:"
"The interactive debugger allows much more than this, thoughwe can even step up and down through the stack and explore the values of variables there:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"> \u001b[0;32m<ipython-input-1-d849e34d61fb>\u001b[0m(2)\u001b[0;36mfunc1\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32m 1 \u001b[0;31m\u001b[0;32mdef\u001b[0m \u001b[0mfunc1\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\u001b[0;32m----> 2 \u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0ma\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\u001b[0;32m 3 \u001b[0;31m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\n",
"> <ipython-input-1-d849e34d61fb>(2)func1()\n",
" 1 def func1(a, b):\n",
"----> 2 return a / b\n",
" 3 \n",
"\n",
"ipdb> up\n",
"> \u001b[0;32m<ipython-input-1-d849e34d61fb>\u001b[0m(7)\u001b[0;36mfunc2\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32m 5 \u001b[0;31m \u001b[0ma\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\u001b[0;32m 6 \u001b[0;31m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\u001b[0;32m----> 7 \u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunc1\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\n",
"> <ipython-input-1-d849e34d61fb>(7)func2()\n",
" 5 a = x\n",
" 6 b = x - 1\n",
"----> 7 return func1(a, b)\n",
"\n",
"ipdb> print(x)\n",
"1\n",
"ipdb> up\n",
"> \u001b[0;32m<ipython-input-6-b2e110f6fc8f>\u001b[0m(1)\u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32m----> 1 \u001b[0;31m\u001b[0mfunc2\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\n",
"> <ipython-input-6-b2e110f6fc8f>(1)<module>()\n",
"----> 1 func2(1)\n",
"\n",
"ipdb> down\n",
"> \u001b[0;32m<ipython-input-1-d849e34d61fb>\u001b[0m(7)\u001b[0;36mfunc2\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32m 5 \u001b[0;31m \u001b[0ma\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mx\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\u001b[0;32m 6 \u001b[0;31m \u001b[0mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m-\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\u001b[0;32m----> 7 \u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mfunc1\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\n",
"> <ipython-input-1-d849e34d61fb>(7)func2()\n",
" 5 a = x\n",
" 6 b = x - 1\n",
"----> 7 return func1(a, b)\n",
"\n",
"ipdb> quit\n"
]
}
@ -307,16 +309,19 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This allows you to quickly find out not only what caused the error, but what function calls led up to the error.\n",
"This allows us to quickly find out not only what caused the error, but what function calls led up to the error.\n",
"\n",
"If you'd like the debugger to launch automatically whenever an exception is raised, you can use the ``%pdb`` magic function to turn on this automatic behavior:"
"If you'd like the debugger to launch automatically whenever an exception is raised, you can use the `%pdb` magic function to turn on this automatic behavior:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -343,11 +348,11 @@
"name": "stdout",
"output_type": "stream",
"text": [
"> \u001b[0;32m<ipython-input-1-d849e34d61fb>\u001b[0m(2)\u001b[0;36mfunc1\u001b[0;34m()\u001b[0m\n",
"\u001b[0;32m 1 \u001b[0;31m\u001b[0;32mdef\u001b[0m \u001b[0mfunc1\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\u001b[0;32m----> 2 \u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0ma\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mb\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\u001b[0;32m 3 \u001b[0;31m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0m\n",
"> <ipython-input-1-d849e34d61fb>(2)func1()\n",
" 1 def func1(a, b):\n",
"----> 2 return a / b\n",
" 3 \n",
"\n",
"ipdb> print(b)\n",
"0\n",
"ipdb> quit\n"
@ -364,7 +369,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, if you have a script that you'd like to run from the beginning in interactive mode, you can run it with the command ``%run -d``, and use the ``next`` command to step through the lines of code interactively."
"Finally, if you have a script that you'd like to run from the beginning in interactive mode, you can run it with the command `%run -d`, and use the `next` command to step through the lines of code interactively."
]
},
{
@ -373,38 +378,31 @@
"source": [
"### Partial list of debugging commands\n",
"\n",
"There are many more available commands for interactive debugging than we've listed here; the following table contains a description of some of the more common and useful ones:\n",
"There are many more available commands for interactive debugging than I've shown here. The following table contains a description of some of the more common and useful ones:\n",
"\n",
"| Command | Description |\n",
"|-----------------|-------------------------------------------------------------|\n",
"| ``list`` | Show the current location in the file |\n",
"| ``h(elp)`` | Show a list of commands, or find help on a specific command |\n",
"| ``q(uit)`` | Quit the debugger and the program |\n",
"| ``c(ontinue)`` | Quit the debugger, continue in the program |\n",
"| ``n(ext)`` | Go to the next step of the program |\n",
"| ``<enter>`` | Repeat the previous command |\n",
"| ``p(rint)`` | Print variables |\n",
"| ``s(tep)`` | Step into a subroutine |\n",
"| ``r(eturn)`` | Return out of a subroutine |\n",
"| Command | Description |\n",
"|---------------|-------------------------------------------------------------|\n",
"| `l(ist)` | Show the current location in the file |\n",
"| `h(elp)` | Show a list of commands, or find help on a specific command |\n",
"| `q(uit)` | Quit the debugger and the program |\n",
"| `c(ontinue)` | Quit the debugger, continue in the program |\n",
"| `n(ext)` | Go to the next step of the program |\n",
"| `<enter>` | Repeat the previous command |\n",
"| `p(rint)` | Print variables |\n",
"| `s(tep)` | Step into a subroutine |\n",
"| `r(eturn)` | Return out of a subroutine |\n",
"\n",
"For more information, use the ``help`` command in the debugger, or take a look at ``ipdb``'s [online documentation](https://github.com/gotcha/ipdb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [IPython and Shell Commands](01.05-IPython-And-Shell-Commands.ipynb) | [Contents](Index.ipynb) | [Profiling and Timing Code](01.07-Timing-and-Profiling.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.06-Errors-and-Debugging.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"For more information, use the `help` command in the debugger, or take a look at `ipdb`'s [online documentation](https://github.com/gotcha/ipdb)."
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -418,9 +416,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@ -1,27 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Errors and Debugging](01.06-Errors-and-Debugging.ipynb) | [Contents](Index.ipynb) | [More IPython Resources](01.08-More-IPython-Resources.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.07-Timing-and-Profiling.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -37,27 +15,27 @@
"Early in developing your algorithm, it can be counterproductive to worry about such things. As Donald Knuth famously quipped, \"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.\"\n",
"\n",
"But once you have your code working, it can be useful to dig into its efficiency a bit.\n",
"Sometimes it's useful to check the execution time of a given command or set of commands; other times it's useful to dig into a multiline process and determine where the bottleneck lies in some complicated series of operations.\n",
"Sometimes it's useful to check the execution time of a given command or set of commands; other times it's useful to examine a multiline process and determine where the bottleneck lies in some complicated series of operations.\n",
"IPython provides access to a wide array of functionality for this kind of timing and profiling of code.\n",
"Here we'll discuss the following IPython magic commands:\n",
"\n",
"- ``%time``: Time the execution of a single statement\n",
"- ``%timeit``: Time repeated execution of a single statement for more accuracy\n",
"- ``%prun``: Run code with the profiler\n",
"- ``%lprun``: Run code with the line-by-line profiler\n",
"- ``%memit``: Measure the memory use of a single statement\n",
"- ``%mprun``: Run code with the line-by-line memory profiler\n",
"- `%time`: Time the execution of a single statement\n",
"- `%timeit`: Time repeated execution of a single statement for more accuracy\n",
"- `%prun`: Run code with the profiler\n",
"- `%lprun`: Run code with the line-by-line profiler\n",
"- `%memit`: Measure the memory use of a single statement\n",
"- `%mprun`: Run code with the line-by-line memory profiler\n",
"\n",
"The last four commands are not bundled with IPythonyou'll need to get the ``line_profiler`` and ``memory_profiler`` extensions, which we will discuss in the following sections."
"The last four commands are not bundled with IPython; to use them you'll need to get the `line_profiler` and `memory_profiler` extensions, which we will discuss in the following sections."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Timing Code Snippets: ``%timeit`` and ``%time``\n",
"## Timing Code Snippets: %timeit and %time\n",
"\n",
"We saw the ``%timeit`` line-magic and ``%%timeit`` cell-magic in the introduction to magic functions in [IPython Magic Commands](01.03-Magic-Commands.ipynb); it can be used to time the repeated execution of snippets of code:"
"We saw the `%timeit` line magic and `%%timeit` cell magic in the introduction to magic functions in [IPython Magic Commands](01.03-Magic-Commands.ipynb); these can be used to time the repeated execution of snippets of code:"
]
},
{
@ -69,7 +47,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"100000 loops, best of 3: 1.54 µs per loop\n"
"1.53 µs ± 47.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n"
]
}
],
@ -81,8 +59,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that because this operation is so fast, ``%timeit`` automatically does a large number of repetitions.\n",
"For slower commands, ``%timeit`` will automatically adjust and perform fewer repetitions:"
"Note that because this operation is so fast, `%timeit` automatically does a large number of repetitions.\n",
"For slower commands, `%timeit` will automatically adjust and perform fewer repetitions:"
]
},
{
@ -94,7 +72,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"1 loops, best of 3: 407 ms per loop\n"
"536 ms ± 15.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
@ -111,8 +89,7 @@
"metadata": {},
"source": [
"Sometimes repeating an operation is not the best option.\n",
"For example, if we have a list that we'd like to sort, we might be misled by a repeated operation.\n",
"Sorting a pre-sorted list is much faster than sorting an unsorted list, so the repetition will skew the result:"
"For example, if we have a list that we'd like to sort, we might be misled by a repeated operation; sorting a pre-sorted list is much faster than sorting an unsorted list, so the repetition will skew the result:"
]
},
{
@ -124,7 +101,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"100 loops, best of 3: 1.9 ms per loop\n"
"1.71 ms ± 334 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n"
]
}
],
@ -138,7 +115,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For this, the ``%time`` magic function may be a better choice. It also is a good choice for longer-running commands, when short, system-related delays are unlikely to affect the result.\n",
"For this, the `%time` magic function may be a better choice. It also is a good choice for longer-running commands, when short, system-related delays are unlikely to affect the result.\n",
"Let's time the sorting of an unsorted and a presorted list:"
]
},
@ -152,8 +129,8 @@
"output_type": "stream",
"text": [
"sorting an unsorted list:\n",
"CPU times: user 40.6 ms, sys: 896 µs, total: 41.5 ms\n",
"Wall time: 41.5 ms\n"
"CPU times: user 31.3 ms, sys: 686 µs, total: 32 ms\n",
"Wall time: 33.3 ms\n"
]
}
],
@ -174,8 +151,8 @@
"output_type": "stream",
"text": [
"sorting an already sorted list:\n",
"CPU times: user 8.18 ms, sys: 10 µs, total: 8.19 ms\n",
"Wall time: 8.24 ms\n"
"CPU times: user 5.19 ms, sys: 268 µs, total: 5.46 ms\n",
"Wall time: 14.1 ms\n"
]
}
],
@ -185,15 +162,16 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice how much faster the presorted list is to sort, but notice also how much longer the timing takes with ``%time`` versus ``%timeit``, even for the presorted list!\n",
"This is a result of the fact that ``%timeit`` does some clever things under the hood to prevent system calls from interfering with the timing.\n",
"For example, it prevents cleanup of unused Python objects (known as *garbage collection*) which might otherwise affect the timing.\n",
"For this reason, ``%timeit`` results are usually noticeably faster than ``%time`` results.\n",
"Notice how much faster the presorted list is to sort, but notice also how much longer the timing takes with `%time` versus `%timeit`, even for the presorted list!\n",
"This is a result of the fact that `%timeit` does some clever things under the hood to prevent system calls from interfering with the timing.\n",
"For example, it prevents cleanup of unused Python objects (known as *garbage collection*) that might otherwise affect the timing.\n",
"For this reason, `%timeit` results are usually noticeably faster than `%time` results.\n",
"\n",
"For ``%time`` as with ``%timeit``, using the double-percent-sign cell magic syntax allows timing of multiline scripts:"
"For `%time`, as with `%timeit`, using the `%%` cell magic syntax allows timing of multiline scripts:"
]
},
{
@ -205,8 +183,8 @@
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 504 ms, sys: 979 µs, total: 505 ms\n",
"Wall time: 505 ms\n"
"CPU times: user 655 ms, sys: 5.68 ms, total: 661 ms\n",
"Wall time: 710 ms\n"
]
}
],
@ -222,21 +200,26 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For more information on ``%time`` and ``%timeit``, as well as their available options, use the IPython help functionality (i.e., type ``%time?`` at the IPython prompt)."
"For more information on `%time` and `%timeit`, as well as their available options, use the IPython help functionality (e.g., type `%time?` at the IPython prompt)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Profiling Full Scripts: ``%prun``\n",
"## Profiling Full Scripts: %prun\n",
"\n",
"A program is made of many single statements, and sometimes timing these statements in context is more important than timing them on their own.\n",
"Python contains a built-in code profiler (which you can read about in the Python documentation), but IPython offers a much more convenient way to use this profiler, in the form of the magic function ``%prun``.\n",
"A program is made up of many single statements, and sometimes timing these statements in context is more important than timing them on their own.\n",
"Python contains a built-in code profiler (which you can read about in the Python documentation), but IPython offers a much more convenient way to use this profiler, in the form of the magic function `%prun`.\n",
"\n",
"By way of example, we'll define a simple function that does some calculations:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "code",
"execution_count": 7,
@ -255,7 +238,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can call ``%prun`` with a function call to see the profiled results:"
"Now we can call `%prun` with a function call to see the profiled results:"
]
},
{
@ -269,6 +252,25 @@
"text": [
" "
]
},
{
"data": {
"text/plain": [
" 14 function calls in 0.932 seconds\n",
"\n",
" Ordered by: internal time\n",
"\n",
" ncalls tottime percall cumtime percall filename:lineno(function)\n",
" 5 0.808 0.162 0.808 0.162 <ipython-input-7-f105717832a2>:4(<listcomp>)\n",
" 5 0.066 0.013 0.066 0.013 {built-in method builtins.sum}\n",
" 1 0.044 0.044 0.918 0.918 <ipython-input-7-f105717832a2>:1(sum_of_lists)\n",
" 1 0.014 0.014 0.932 0.932 <string>:1(<module>)\n",
" 1 0.000 0.000 0.932 0.932 {built-in method builtins.exec}\n",
" 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
@ -279,42 +281,27 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In the notebook, the output is printed to the pager, and looks something like this:\n",
"The result is a table that indicates, in order of total time on each function call, where the execution is spending the most time. In this case, the bulk of the execution time is in the list comprehension inside `sum_of_lists`.\n",
"From here, we could start thinking about what changes we might make to improve the performance of the algorithm.\n",
"\n",
"```\n",
"14 function calls in 0.714 seconds\n",
"\n",
" Ordered by: internal time\n",
"\n",
" ncalls tottime percall cumtime percall filename:lineno(function)\n",
" 5 0.599 0.120 0.599 0.120 <ipython-input-19>:4(<listcomp>)\n",
" 5 0.064 0.013 0.064 0.013 {built-in method sum}\n",
" 1 0.036 0.036 0.699 0.699 <ipython-input-19>:1(sum_of_lists)\n",
" 1 0.014 0.014 0.714 0.714 <string>:1(<module>)\n",
" 1 0.000 0.000 0.714 0.714 {built-in method exec}\n",
"```\n",
"\n",
"The result is a table that indicates, in order of total time on each function call, where the execution is spending the most time. In this case, the bulk of execution time is in the list comprehension inside ``sum_of_lists``.\n",
"From here, we could start thinking about what changes we might make to improve the performance in the algorithm.\n",
"\n",
"For more information on ``%prun``, as well as its available options, use the IPython help functionality (i.e., type ``%prun?`` at the IPython prompt)."
"For more information on `%prun`, as well as its available options, use the IPython help functionality (i.e., type `%prun?` at the IPython prompt)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Line-By-Line Profiling with ``%lprun``\n",
"## Line-by-Line Profiling with %lprun\n",
"\n",
"The function-by-function profiling of ``%prun`` is useful, but sometimes it's more convenient to have a line-by-line profile report.\n",
"This is not built into Python or IPython, but there is a ``line_profiler`` package available for installation that can do this.\n",
"Start by using Python's packaging tool, ``pip``, to install the ``line_profiler`` package:\n",
"The function-by-function profiling of `%prun` is useful, but sometimes it's more convenient to have a line-by-line profile report.\n",
"This is not built into Python or IPython, but there is a `line_profiler` package available for installation that can do this.\n",
"Start by using Python's packaging tool, `pip`, to install the `line_profiler` package:\n",
"\n",
"```\n",
"$ pip install line_profiler\n",
"```\n",
"\n",
"Next, you can use IPython to load the ``line_profiler`` IPython extension, offered as part of this package:"
"Next, you can use IPython to load the `line_profiler` IPython extension, offered as part of this package:"
]
},
{
@ -330,14 +317,37 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now the ``%lprun`` command will do a line-by-line profiling of any functionin this case, we need to tell it explicitly which functions we're interested in profiling:"
"Now the `%lprun` command will do a line-by-line profiling of any function. In this case, we need to tell it explicitly which functions we're interested in profiling:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"outputs": [
{
"data": {
"text/plain": [
"Timer unit: 1e-06 s\n",
"\n",
"Total time: 0.014803 s\n",
"File: <ipython-input-7-f105717832a2>\n",
"Function: sum_of_lists at line 1\n",
"\n",
"Line # Hits Time Per Hit % Time Line Contents\n",
"==============================================================\n",
" 1 def sum_of_lists(N):\n",
" 2 1 6.0 6.0 0.0 total = 0\n",
" 3 6 13.0 2.2 0.1 for i in range(5):\n",
" 4 5 14242.0 2848.4 96.2 L = [j ^ (j >> i) for j in range(N)]\n",
" 5 5 541.0 108.2 3.7 total += sum(L)\n",
" 6 1 1.0 1.0 0.0 return total"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%lprun -f sum_of_lists sum_of_lists(5000)"
]
@ -346,51 +356,32 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"As before, the notebook sends the result to the pager, but it looks something like this:\n",
"\n",
"```\n",
"Timer unit: 1e-06 s\n",
"\n",
"Total time: 0.009382 s\n",
"File: <ipython-input-19-fa2be176cc3e>\n",
"Function: sum_of_lists at line 1\n",
"\n",
"Line # Hits Time Per Hit % Time Line Contents\n",
"==============================================================\n",
" 1 def sum_of_lists(N):\n",
" 2 1 2 2.0 0.0 total = 0\n",
" 3 6 8 1.3 0.1 for i in range(5):\n",
" 4 5 9001 1800.2 95.9 L = [j ^ (j >> i) for j in range(N)]\n",
" 5 5 371 74.2 4.0 total += sum(L)\n",
" 6 1 0 0.0 0.0 return total\n",
"```\n",
"\n",
"The information at the top gives us the key to reading the results: the time is reported in microseconds and we can see where the program is spending the most time.\n",
"The information at the top gives us the key to reading the results: the time is reported in microseconds, and we can see where the program is spending the most time.\n",
"At this point, we may be able to use this information to modify aspects of the script and make it perform better for our desired use case.\n",
"\n",
"For more information on ``%lprun``, as well as its available options, use the IPython help functionality (i.e., type ``%lprun?`` at the IPython prompt)."
"For more information on `%lprun`, as well as its available options, use the IPython help functionality (i.e., type `%lprun?` at the IPython prompt)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Profiling Memory Use: ``%memit`` and ``%mprun``\n",
"## Profiling Memory Use: %memit and %mprun\n",
"\n",
"Another aspect of profiling is the amount of memory an operation uses.\n",
"This can be evaluated with another IPython extension, the ``memory_profiler``.\n",
"As with the ``line_profiler``, we start by ``pip``-installing the extension:\n",
"This can be evaluated with another IPython extension, the `memory_profiler`.\n",
"As with the `line_profiler`, we start by `pip`-installing the extension:\n",
"\n",
"```\n",
"$ pip install memory_profiler\n",
"```\n",
"\n",
"Then we can use IPython to load the extension:"
"Then we can use IPython to load it:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
@ -401,20 +392,20 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The memory profiler extension contains two useful magic functions: the ``%memit`` magic (which offers a memory-measuring equivalent of ``%timeit``) and the ``%mprun`` function (which offers a memory-measuring equivalent of ``%lprun``).\n",
"The ``%memit`` function can be used rather simply:"
"The memory profiler extension contains two useful magic functions: `%memit` (which offers a memory-measuring equivalent of `%timeit`) and `%mprun` (which offers a memory-measuring equivalent of `%lprun`).\n",
"The `%memit` magic function can be used rather simply:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"peak memory: 100.08 MiB, increment: 61.36 MiB\n"
"peak memory: 141.70 MiB, increment: 75.65 MiB\n"
]
}
],
@ -426,15 +417,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"We see that this function uses about 100 MB of memory.\n",
"We see that this function uses about 140 MB of memory.\n",
"\n",
"For a line-by-line description of memory use, we can use the ``%mprun`` magic.\n",
"Unfortunately, this magic works only for functions defined in separate modules rather than the notebook itself, so we'll start by using the ``%%file`` magic to create a simple module called ``mprun_demo.py``, which contains our ``sum_of_lists`` function, with one addition that will make our memory profiling results more clear:"
"For a line-by-line description of memory use, we can use the `%mprun` magic function.\n",
"Unfortunately, this works only for functions defined in separate modules rather than the notebook itself, so we'll start by using the `%%file` cell magic to create a simple module called `mprun_demo.py`, which contains our `sum_of_lists` function, with one addition that will make our memory profiling results more clear:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 13,
"metadata": {},
"outputs": [
{
@ -465,7 +456,7 @@
},
{
"cell_type": "code",
"execution_count": 15,
"execution_count": 14,
"metadata": {},
"outputs": [
{
@ -474,6 +465,25 @@
"text": [
"\n"
]
},
{
"data": {
"text/plain": [
"Filename: /Users/jakevdp/github/jakevdp/PythonDataScienceHandbook/notebooks_v2/mprun_demo.py\n",
"\n",
"Line # Mem usage Increment Occurences Line Contents\n",
"============================================================\n",
" 1 66.7 MiB 66.7 MiB 1 def sum_of_lists(N):\n",
" 2 66.7 MiB 0.0 MiB 1 total = 0\n",
" 3 75.1 MiB 8.4 MiB 6 for i in range(5):\n",
" 4 105.9 MiB 30.8 MiB 5000015 L = [j ^ (j >> i) for j in range(N)]\n",
" 5 109.8 MiB 3.8 MiB 5 total += sum(L)\n",
" 6 75.1 MiB -34.6 MiB 5 del L # remove reference to L\n",
" 7 66.9 MiB -8.2 MiB 1 return total"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
@ -485,48 +495,20 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The result, printed to the pager, gives us a summary of the memory use of the function, and looks something like this:\n",
"```\n",
"Filename: ./mprun_demo.py\n",
"\n",
"Line # Mem usage Increment Line Contents\n",
"================================================\n",
" 4 71.9 MiB 0.0 MiB L = [j ^ (j >> i) for j in range(N)]\n",
"\n",
"\n",
"Filename: ./mprun_demo.py\n",
"\n",
"Line # Mem usage Increment Line Contents\n",
"================================================\n",
" 1 39.0 MiB 0.0 MiB def sum_of_lists(N):\n",
" 2 39.0 MiB 0.0 MiB total = 0\n",
" 3 46.5 MiB 7.5 MiB for i in range(5):\n",
" 4 71.9 MiB 25.4 MiB L = [j ^ (j >> i) for j in range(N)]\n",
" 5 71.9 MiB 0.0 MiB total += sum(L)\n",
" 6 46.5 MiB -25.4 MiB del L # remove reference to L\n",
" 7 39.1 MiB -7.4 MiB return total\n",
"```\n",
"Here the ``Increment`` column tells us how much each line affects the total memory budget: observe that when we create and delete the list ``L``, we are adding about 25 MB of memory usage.\n",
"Here, the `Increment` column tells us how much each line affects the total memory budget: observe that when we create and delete the list `L`, we are adding about 30 MB of memory usage.\n",
"This is on top of the background memory usage from the Python interpreter itself.\n",
"\n",
"For more information on ``%memit`` and ``%mprun``, as well as their available options, use the IPython help functionality (i.e., type ``%memit?`` at the IPython prompt)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Errors and Debugging](01.06-Errors-and-Debugging.ipynb) | [Contents](Index.ipynb) | [More IPython Resources](01.08-More-IPython-Resources.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.07-Timing-and-Profiling.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"For more information on `%memit` and `%mprun`, as well as their available options, use the IPython help functionality (e.g., type `%memit?` at the IPython prompt)."
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python [default]",
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
@ -540,9 +522,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 1
"nbformat_minor": 4
}

View File

@ -1,27 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Profiling and Timing Code](01.07-Timing-and-Profiling.ipynb) | [Contents](Index.ipynb) | [Introduction to NumPy](02.00-Introduction-to-NumPy.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.08-More-IPython-Resources.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -33,8 +11,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In this chapter, we've just scratched the surface of using IPython to enable data science tasks.\n",
"Much more information is available both in print and on the Web, and here we'll list some other resources that you may find helpful."
"In this set of chapters, we've just scratched the surface of using IPython to enable data science tasks.\n",
"Much more information is available both in print and on the web, and here I'll list some other resources that you may find helpful."
]
},
{
@ -43,10 +21,10 @@
"source": [
"## Web Resources\n",
"\n",
"- [The IPython website](http://ipython.org): The IPython website links to documentation, examples, tutorials, and a variety of other resources.\n",
"- [The nbviewer website](http://nbviewer.jupyter.org/): This site shows static renderings of any IPython notebook available on the internet. The front page features some example notebooks that you can browse to see what other folks are using IPython for!\n",
"- [A gallery of interesting Jupyter Notebooks](https://github.com/jupyter/jupyter/wiki/A-gallery-of-interesting-Jupyter-Notebooks/): This ever-growing list of notebooks, powered by nbviewer, shows the depth and breadth of numerical analysis you can do with IPython. It includes everything from short examples and tutorials to full-blown courses and books composed in the notebook format!\n",
"- Video Tutorials: searching the Internet, you will find many video-recorded tutorials on IPython. I'd especially recommend seeking tutorials from the PyCon, SciPy, and PyData conferenes by Fernando Perez and Brian Granger, two of the primary creators and maintainers of IPython and Jupyter."
"- [The IPython website](http://ipython.org): The IPython website provides links to documentation, examples, tutorials, and a variety of other resources.\n",
"- [The nbviewer website](http://nbviewer.jupyter.org/): This site shows static renderings of any Jupyter notebook available on the internet. The front page features some example notebooks that you can browse to see what other folks are using IPython for!\n",
"- [A curated collection of Jupyter notebooks](https://github.com/jupyter/jupyter/wiki): This ever-growing list of notebooks, powered by nbviewer, shows the depth and breadth of numerical analysis you can do with IPython. It includes everything from short examples and tutorials to full-blown courses and books composed in the notebook format!\n",
"- Video tutorials: Searching the internet, you will find many video tutorials on IPython. I'd especially recommend seeking tutorials from the PyCon, SciPy, and PyData conferences by Fernando Perez and Brian Granger, two of the primary creators and maintainers of IPython and Jupyter."
]
},
{
@ -55,27 +33,20 @@
"source": [
"## Books\n",
"\n",
"- [*Python for Data Analysis*](http://shop.oreilly.com/product/0636920023784.do): Wes McKinney's book includes a chapter that covers using IPython as a data scientist. Although much of the material overlaps what we've discussed here, another perspective is always helpful.\n",
"- [*Learning IPython for Interactive Computing and Data Visualization*](https://www.packtpub.com/big-data-and-business-intelligence/learning-ipython-interactive-computing-and-data-visualization): This short book by Cyrille Rossant offers a good introduction to using IPython for data analysis.\n",
"- [*IPython Interactive Computing and Visualization Cookbook*](https://www.packtpub.com/big-data-and-business-intelligence/ipython-interactive-computing-and-visualization-cookbook): Also by Cyrille Rossant, this book is a longer and more advanced treatment of using IPython for data science. Despite its name, it's not just about IPythonit also goes into some depth on a broad range of data science topics.\n",
"- [*Python for Data Analysis* (O'Reilly)](http://shop.oreilly.com/product/0636920023784.do): Wes McKinney's book includes a chapter that covers using IPython as a data scientist. Although much of the material overlaps what we've discussed here, another perspective is always helpful.\n",
"- [*Learning IPython for Interactive Computing and Data Visualization* (Packt)](https://www.packtpub.com/big-data-and-business-intelligence/learning-ipython-interactive-computing-and-data-visualization): This short book by Cyrille Rossant offers a good introduction to using IPython for data analysis.\n",
"- [*IPython Interactive Computing and Visualization Cookbook* (Packt)](https://www.packtpub.com/big-data-and-business-intelligence/ipython-interactive-computing-and-visualization-cookbook): Also by Cyrille Rossant, this book is a longer and more advanced treatment of using IPython for data science. Despite its name, it's not just about IPython; it also goes into some depth on a broad range of data science topics.\n",
"\n",
"Finally, a reminder that you can find help on your own: IPython's ``?``-based help functionality (discussed in [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb)) can be very useful if you use it well and use it often.\n",
"Finally, a reminder that you can find help on your own: IPython's `?`-based help functionality (discussed in [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb)) can be useful if you use it well and use it often.\n",
"As you go through the examples here and elsewhere, this can be used to familiarize yourself with all the tools that IPython has to offer."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Profiling and Timing Code](01.07-Timing-and-Profiling.ipynb) | [Contents](Index.ipynb) | [Introduction to NumPy](02.00-Introduction-to-NumPy.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/01.08-More-IPython-Resources.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
@ -91,9 +62,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@ -2,64 +2,25 @@
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"# Introduction to NumPy\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"This part of the book, along with [Part 3](03.00-Introduction-to-Pandas.ipynb), outlines techniques for effectively loading, storing, and manipulating in-memory data in Python.\n",
"The topic is very broad: datasets can come from a wide range of sources and in a wide range of formats, including collections of documents, collections of images, collections of sound clips, collections of numerical measurements, or nearly anything else.\n",
"Despite this apparent heterogeneity, many datasets can be represented fundamentally as arrays of numbers.\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"<!--NAVIGATION-->\n",
"< [More IPython Resources](01.08-More-IPython-Resources.ipynb) | [Contents](Index.ipynb) | [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.00-Introduction-to-NumPy.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"# Introduction to NumPy"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"This chapter, along with chapter 3, outlines techniques for effectively loading, storing, and manipulating in-memory data in Python.\n",
"The topic is very broad: datasets can come from a wide range of sources and a wide range of formats, including be collections of documents, collections of images, collections of sound clips, collections of numerical measurements, or nearly anything else.\n",
"Despite this apparent heterogeneity, it will help us to think of all data fundamentally as arrays of numbers.\n",
"\n",
"For example, imagesparticularly digital imagescan be thought of as simply two-dimensional arrays of numbers representing pixel brightness across the area.\n",
"For example, images—particularly digital images—can be thought of as simply two-dimensional arrays of numbers representing pixel brightness across the area.\n",
"Sound clips can be thought of as one-dimensional arrays of intensity versus time.\n",
"Text can be converted in various ways into numerical representations, perhaps binary digits representing the frequency of certain words or pairs of words.\n",
"No matter what the data are, the first step in making it analyzable will be to transform them into arrays of numbers.\n",
"(We will discuss some specific examples of this process later in [Feature Engineering](05.04-Feature-Engineering.ipynb))\n",
"Text can be converted in various ways into numerical representations, such as binary digits representing the frequency of certain words or pairs of words.\n",
"No matter what the data is, the first step in making it analyzable will be to transform it into arrays of numbers.\n",
"(We will discuss some specific examples of this process in [Feature Engineering](05.04-Feature-Engineering.ipynb).)\n",
"\n",
"For this reason, efficient storage and manipulation of numerical arrays is absolutely fundamental to the process of doing data science.\n",
"We'll now take a look at the specialized tools that Python has for handling such numerical arrays: the NumPy package, and the Pandas package (discussed in Chapter 3).\n",
"We'll now take a look at the specialized tools that Python has for handling such numerical arrays: the NumPy package and the Pandas package (discussed in [Part 3](03.00-Introduction-to-Pandas.ipynb)).\n",
"\n",
"This chapter will cover NumPy in detail. NumPy (short for *Numerical Python*) provides an efficient interface to store and operate on dense data buffers.\n",
"In some ways, NumPy arrays are like Python's built-in ``list`` type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.\n",
"This part of the book will cover NumPy in detail. NumPy (short for *Numerical Python*) provides an efficient interface to store and operate on dense data buffers.\n",
"In some ways, NumPy arrays are like Python's built-in `list` type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.\n",
"NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python, so time spent learning to use NumPy effectively will be valuable no matter what aspect of data science interests you.\n",
"\n",
"If you followed the advice outlined in the Preface and installed the Anaconda stack, you already have NumPy installed and ready to go.\n",
@ -73,13 +34,16 @@
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
"editable": true,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"'1.11.1'"
"'1.21.2'"
]
},
"execution_count": 1,
@ -94,13 +58,10 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"For the pieces of the package discussed here, I'd recommend NumPy version 1.8 or later.\n",
"By convention, you'll find that most people in the SciPy/PyData world will import NumPy using ``np`` as an alias:"
"By convention, you'll find that most people in the SciPy/PyData world will import NumPy using `np` as an alias:"
]
},
{
@ -109,7 +70,10 @@
"metadata": {
"collapsed": false,
"deletable": true,
"editable": true
"editable": true,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
@ -118,26 +82,20 @@
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"Throughout this chapter, and indeed the rest of the book, you'll find that this is the way we will import and use NumPy."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"metadata": {},
"source": [
"## Reminder about Built In Documentation\n",
"## Reminder About Built-in Documentation\n",
"\n",
"As you read through this chapter, don't forget that IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature), as well as the documentation of various functions (using the ``?`` character Refer back to [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb)).\n",
"As you read through this part of the book, don't forget that IPython gives you the ability to quickly explore the contents of a package (by using the tab completion feature), as well as the documentation of various functions (using the `?` character). For a refresher on these, refer back to [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb).\n",
"\n",
"For example, to display all the contents of the numpy namespace, you can type this:\n",
"For example, to display all the contents of the NumPy namespace, you can type this:\n",
"\n",
"```ipython\n",
"In [3]: np.<TAB>\n",
@ -151,25 +109,15 @@
"\n",
"More detailed documentation, along with tutorials and other resources, can be found at http://www.numpy.org."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"<!--NAVIGATION-->\n",
"< [More IPython Resources](01.08-More-IPython-Resources.ipynb) | [Contents](Index.ipynb) | [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.00-Introduction-to-NumPy.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -183,9 +131,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@ -1,27 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Introduction to NumPy](02.00-Introduction-to-NumPy.ipynb) | [Contents](Index.ipynb) | [The Basics of NumPy Arrays](02.02-The-Basics-Of-NumPy-Arrays.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.01-Understanding-Data-Types.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -34,11 +12,11 @@
"metadata": {},
"source": [
"Effective data-driven science and computation requires understanding how data is stored and manipulated.\n",
"This section outlines and contrasts how arrays of data are handled in the Python language itself, and how NumPy improves on this.\n",
"This chapter outlines and contrasts how arrays of data are handled in the Python language itself, and how NumPy improves on this.\n",
"Understanding this difference is fundamental to understanding much of the material throughout the rest of the book.\n",
"\n",
"Users of Python are often drawn-in by its ease of use, one piece of which is dynamic typing.\n",
"While a statically-typed language like C or Java requires each variable to be explicitly declared, a dynamically-typed language like Python skips this specification. For example, in C you might specify a particular operation as follows:\n",
"Users of Python are often drawn in by its ease of use, one piece of which is dynamic typing.\n",
"While a statically typed language like C or Java requires each variable to be explicitly declared, a dynamically typed language like Python skips this specification. For example, in C you might specify a particular operation as follows:\n",
"\n",
"```C\n",
"/* C code */\n",
@ -57,7 +35,7 @@
" result += i\n",
"```\n",
"\n",
"Notice the main difference: in C, the data types of each variable are explicitly declared, while in Python the types are dynamically inferred. This means, for example, that we can assign any kind of data to any variable:\n",
"Notice one main difference: in C, the data types of each variable are explicitly declared, while in Python the types are dynamically inferred. This means, for example, that we can assign any kind of data to any variable:\n",
"\n",
"```python\n",
"# Python code\n",
@ -65,7 +43,7 @@
"x = \"four\"\n",
"```\n",
"\n",
"Here we've switched the contents of ``x`` from an integer to a string. The same thing in C would lead (depending on compiler settings) to a compilation error or other unintented consequences:\n",
"Here we've switched the contents of `x` from an integer to a string. The same thing in C would lead (depending on compiler settings) to a compilation error or other unintended consequences:\n",
"\n",
"```C\n",
"/* C code */\n",
@ -73,9 +51,9 @@
"x = \"four\"; // FAILS\n",
"```\n",
"\n",
"This sort of flexibility is one piece that makes Python and other dynamically-typed languages convenient and easy to use.\n",
"This sort of flexibility is one element that makes Python and other dynamically typed languages convenient and easy to use.\n",
"Understanding *how* this works is an important piece of learning to analyze data efficiently and effectively with Python.\n",
"But what this type-flexibility also points to is the fact that Python variables are more than just their value; they also contain extra information about the type of the value. We'll explore this more in the sections that follow."
"But what this type flexibility also points to is the fact that Python variables are more than just their values; they also contain extra information about the *type* of the value. We'll explore this more in the sections that follow."
]
},
{
@ -85,8 +63,8 @@
"## A Python Integer Is More Than Just an Integer\n",
"\n",
"The standard Python implementation is written in C.\n",
"This means that every Python object is simply a cleverly-disguised C structure, which contains not only its value, but other information as well. For example, when we define an integer in Python, such as ``x = 10000``, ``x`` is not just a \"raw\" integer. It's actually a pointer to a compound C structure, which contains several values.\n",
"Looking through the Python 3.4 source code, we find that the integer (long) type definition effectively looks like this (once the C macros are expanded):\n",
"This means that every Python object is simply a cleverly disguised C structure, which contains not only its value, but other information as well. For example, when we define an integer in Python, such as `x = 10000`, `x` is not just a \"raw\" integer. It's actually a pointer to a compound C structure, which contains several values.\n",
"Looking through the Python 3.10 source code, we find that the integer (long) type definition effectively looks like this (once the C macros are expanded):\n",
"\n",
"```C\n",
"struct _longobject {\n",
@ -97,28 +75,28 @@
"};\n",
"```\n",
"\n",
"A single integer in Python 3.4 actually contains four pieces:\n",
"A single integer in Python 3.10 actually contains four pieces:\n",
"\n",
"- ``ob_refcnt``, a reference count that helps Python silently handle memory allocation and deallocation\n",
"- ``ob_type``, which encodes the type of the variable\n",
"- ``ob_size``, which specifies the size of the following data members\n",
"- ``ob_digit``, which contains the actual integer value that we expect the Python variable to represent.\n",
"- `ob_refcnt`, a reference count that helps Python silently handle memory allocation and deallocation\n",
"- `ob_type`, which encodes the type of the variable\n",
"- `ob_size`, which specifies the size of the following data members\n",
"- `ob_digit`, which contains the actual integer value that we expect the Python variable to represent\n",
"\n",
"This means that there is some overhead in storing an integer in Python as compared to an integer in a compiled language like C, as illustrated in the following figure:"
"This means that there is some overhead involved in storing an integer in Python as compared to a compiled language like C, as illustrated in the following figure:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Integer Memory Layout](figures/cint_vs_pyint.png)"
"![Integer Memory Layout](images/cint_vs_pyint.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here ``PyObject_HEAD`` is the part of the structure containing the reference count, type code, and other pieces mentioned before.\n",
"Here, `PyObject_HEAD` is the part of the structure containing the reference count, type code, and other pieces mentioned before.\n",
"\n",
"Notice the difference here: a C integer is essentially a label for a position in memory whose bytes encode an integer value.\n",
"A Python integer is a pointer to a position in memory containing all the Python object information, including the bytes that contain the integer value.\n",
@ -133,7 +111,7 @@
"## A Python List Is More Than Just a List\n",
"\n",
"Let's consider now what happens when we use a Python data structure that holds many Python objects.\n",
"The standard mutable multi-element container in Python is the list.\n",
"The standard mutable multielement container in Python is the list.\n",
"We can create a list of integers as follows:"
]
},
@ -141,7 +119,10 @@
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -164,7 +145,10 @@
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -193,7 +177,10 @@
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -216,7 +203,10 @@
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -245,7 +235,10 @@
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -268,8 +261,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"But this flexibility comes at a cost: to allow these flexible types, each item in the list must contain its own type info, reference count, and other informationthat is, each item is a complete Python object.\n",
"In the special case that all variables are of the same type, much of this information is redundant: it can be much more efficient to store data in a fixed-type array.\n",
"But this flexibility comes at a cost: to allow these flexible types, each item in the list must contain its own type, reference count, and other information. That is, each item is a complete Python object.\n",
"In the special case that all variables are of the same type, much of this information is redundant, so it can be much more efficient to store the data in a fixed-type array.\n",
"The difference between a dynamic-type list and a fixed-type (NumPy-style) array is illustrated in the following figure:"
]
},
@ -277,7 +270,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"![Array Memory Layout](figures/array_vs_list.png)"
"![Array Memory Layout](images/array_vs_list.png)"
]
},
{
@ -297,14 +290,17 @@
"## Fixed-Type Arrays in Python\n",
"\n",
"Python offers several different options for storing data in efficient, fixed-type data buffers.\n",
"The built-in ``array`` module (available since Python 3.3) can be used to create dense arrays of a uniform type:"
"The built-in `array` module (available since Python 3.3) can be used to create dense arrays of a uniform type:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -329,21 +325,26 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Here ``'i'`` is a type code indicating the contents are integers.\n",
"Here, `'i'` is a type code indicating the contents are integers.\n",
"\n",
"Much more useful, however, is the ``ndarray`` object of the NumPy package.\n",
"While Python's ``array`` object provides efficient storage of array-based data, NumPy adds to this efficient *operations* on that data.\n",
"We will explore these operations in later sections; here we'll demonstrate several ways of creating a NumPy array.\n",
"Much more useful, however, is the `ndarray` object of the NumPy package.\n",
"While Python's `array` object provides efficient storage of array-based data, NumPy adds to this efficient *operations* on that data.\n",
"We will explore these operations in later chapters; next, I'll show you a few different ways of creating a NumPy array."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating Arrays from Python Lists\n",
"\n",
"We'll start with the standard NumPy import, under the alias ``np``:"
"We'll start with the standard NumPy import, under the alias `np`:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np"
@ -353,16 +354,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating Arrays from Python Lists\n",
"\n",
"First, we can use ``np.array`` to create arrays from Python lists:"
"Now we can use `np.array` to create arrays from Python lists:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -377,7 +379,7 @@
}
],
"source": [
"# integer array:\n",
"# Integer array\n",
"np.array([1, 4, 2, 5, 3])"
]
},
@ -385,21 +387,24 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Remember that unlike Python lists, NumPy is constrained to arrays that all contain the same type.\n",
"If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point):"
"Remember that unlike Python lists, NumPy arrays can only contain data of the same type.\n",
"If the types do not match, NumPy will upcast them according to its type promotion rules; here, integers are upcast to floating point:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 3.14, 4. , 2. , 3. ])"
"array([3.14, 4. , 2. , 3. ])"
]
},
"execution_count": 9,
@ -415,20 +420,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If we want to explicitly set the data type of the resulting array, we can use the ``dtype`` keyword:"
"If we want to explicitly set the data type of the resulting array, we can use the `dtype` keyword:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1., 2., 3., 4.], dtype=float32)"
"array([1., 2., 3., 4.], dtype=float32)"
]
},
"execution_count": 10,
@ -437,21 +445,24 @@
}
],
"source": [
"np.array([1, 2, 3, 4], dtype='float32')"
"np.array([1, 2, 3, 4], dtype=np.float32)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Finally, unlike Python lists, NumPy arrays can explicitly be multi-dimensional; here's one way of initializing a multidimensional array using a list of lists:"
"Finally, unlike Python lists, which are always one-dimensional sequences, NumPy arrays can be multidimensional. Here's one way of initializing a multidimensional array using a list of lists:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -468,7 +479,7 @@
}
],
"source": [
"# nested lists result in multi-dimensional arrays\n",
"# Nested lists result in multidimensional arrays\n",
"np.array([range(i, i + 3) for i in [2, 4, 6]])"
]
},
@ -493,7 +504,10 @@
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -508,7 +522,7 @@
}
],
"source": [
"# Create a length-10 integer array filled with zeros\n",
"# Create a length-10 integer array filled with 0s\n",
"np.zeros(10, dtype=int)"
]
},
@ -516,15 +530,18 @@
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1., 1., 1., 1., 1.],\n",
" [ 1., 1., 1., 1., 1.],\n",
" [ 1., 1., 1., 1., 1.]])"
"array([[1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.],\n",
" [1., 1., 1., 1., 1.]])"
]
},
"execution_count": 13,
@ -533,7 +550,7 @@
}
],
"source": [
"# Create a 3x5 floating-point array filled with ones\n",
"# Create a 3x5 floating-point array filled with 1s\n",
"np.ones((3, 5), dtype=float)"
]
},
@ -541,15 +558,18 @@
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 3.14, 3.14, 3.14, 3.14, 3.14],\n",
" [ 3.14, 3.14, 3.14, 3.14, 3.14],\n",
" [ 3.14, 3.14, 3.14, 3.14, 3.14]])"
"array([[3.14, 3.14, 3.14, 3.14, 3.14],\n",
" [3.14, 3.14, 3.14, 3.14, 3.14],\n",
" [3.14, 3.14, 3.14, 3.14, 3.14]])"
]
},
"execution_count": 14,
@ -566,7 +586,10 @@
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -582,8 +605,8 @@
],
"source": [
"# Create an array filled with a linear sequence\n",
"# Starting at 0, ending at 20, stepping by 2\n",
"# (this is similar to the built-in range() function)\n",
"# starting at 0, ending at 20, stepping by 2\n",
"# (this is similar to the built-in range function)\n",
"np.arange(0, 20, 2)"
]
},
@ -591,13 +614,16 @@
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0. , 0.25, 0.5 , 0.75, 1. ])"
"array([0. , 0.25, 0.5 , 0.75, 1. ])"
]
},
"execution_count": 16,
@ -614,15 +640,18 @@
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 0.99844933, 0.52183819, 0.22421193],\n",
" [ 0.08007488, 0.45429293, 0.20941444],\n",
" [ 0.14360941, 0.96910973, 0.946117 ]])"
"array([[0.09610171, 0.88193001, 0.70548015],\n",
" [0.35885395, 0.91670468, 0.8721031 ],\n",
" [0.73237865, 0.09708562, 0.52506779]])"
]
},
"execution_count": 17,
@ -632,7 +661,7 @@
],
"source": [
"# Create a 3x3 array of uniformly distributed\n",
"# random values between 0 and 1\n",
"# pseudorandom values between 0 and 1\n",
"np.random.random((3, 3))"
]
},
@ -640,15 +669,18 @@
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1.51772646, 0.39614948, -0.10634696],\n",
" [ 0.25671348, 0.00732722, 0.37783601],\n",
" [ 0.68446945, 0.15926039, -0.70744073]])"
"array([[-0.46652655, -0.59158776, -1.05392451],\n",
" [-1.72634268, 0.03194069, -0.51048869],\n",
" [ 1.41240208, 1.77734462, -0.43820037]])"
]
},
"execution_count": 18,
@ -657,8 +689,8 @@
}
],
"source": [
"# Create a 3x3 array of normally distributed random values\n",
"# with mean 0 and standard deviation 1\n",
"# Create a 3x3 array of normally distributed pseudorandom\n",
"# values with mean 0 and standard deviation 1\n",
"np.random.normal(0, 1, (3, 3))"
]
},
@ -666,15 +698,18 @@
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[2, 3, 4],\n",
" [5, 7, 8],\n",
" [0, 5, 0]])"
"array([[4, 3, 8],\n",
" [6, 5, 0],\n",
" [1, 1, 4]])"
]
},
"execution_count": 19,
@ -683,7 +718,7 @@
}
],
"source": [
"# Create a 3x3 array of random integers in the interval [0, 10)\n",
"# Create a 3x3 array of pseudorandom integers in the interval [0, 10)\n",
"np.random.randint(0, 10, (3, 3))"
]
},
@ -691,15 +726,18 @@
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 1., 0., 0.],\n",
" [ 0., 1., 0.],\n",
" [ 0., 0., 1.]])"
"array([[1., 0., 0.],\n",
" [0., 1., 0.],\n",
" [0., 0., 1.]])"
]
},
"execution_count": 20,
@ -716,13 +754,16 @@
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1., 1., 1.])"
"array([1., 1., 1.])"
]
},
"execution_count": 21,
@ -731,8 +772,8 @@
}
],
"source": [
"# Create an uninitialized array of three integers\n",
"# The values will be whatever happens to already exist at that memory location\n",
"# Create an uninitialized array of three integers; the values will be\n",
"# whatever happens to already exist at that memory location\n",
"np.empty(3)"
]
},
@ -763,50 +804,43 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"| Data type\t | Description |\n",
"|---------------|-------------|\n",
"| ``bool_`` | Boolean (True or False) stored as a byte |\n",
"| ``int_`` | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)| \n",
"| ``intc`` | Identical to C ``int`` (normally ``int32`` or ``int64``)| \n",
"| ``intp`` | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)| \n",
"| ``int8`` | Byte (-128 to 127)| \n",
"| ``int16`` | Integer (-32768 to 32767)|\n",
"| ``int32`` | Integer (-2147483648 to 2147483647)|\n",
"| ``int64`` | Integer (-9223372036854775808 to 9223372036854775807)| \n",
"| ``uint8`` | Unsigned integer (0 to 255)| \n",
"| ``uint16`` | Unsigned integer (0 to 65535)| \n",
"| ``uint32`` | Unsigned integer (0 to 4294967295)| \n",
"| ``uint64`` | Unsigned integer (0 to 18446744073709551615)| \n",
"| ``float_`` | Shorthand for ``float64``.| \n",
"| ``float16`` | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa| \n",
"| ``float32`` | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa| \n",
"| ``float64`` | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa| \n",
"| ``complex_`` | Shorthand for ``complex128``.| \n",
"| ``complex64`` | Complex number, represented by two 32-bit floats| \n",
"| ``complex128``| Complex number, represented by two 64-bit floats| "
"| Data type\t | Description |\n",
"|-------------|-------------|\n",
"| `bool_` | Boolean (True or False) stored as a byte |\n",
"| `int_` | Default integer type (same as C `long`; normally either `int64` or `int32`)| \n",
"| `intc` | Identical to C `int` (normally `int32` or `int64`)| \n",
"| `intp` | Integer used for indexing (same as C `ssize_t`; normally either `int32` or `int64`)| \n",
"| `int8` | Byte (128 to 127)| \n",
"| `int16` | Integer (32768 to 32767)|\n",
"| `int32` | Integer (2147483648 to 2147483647)|\n",
"| `int64` | Integer (9223372036854775808 to 9223372036854775807)| \n",
"| `uint8` | Unsigned integer (0 to 255)| \n",
"| `uint16` | Unsigned integer (0 to 65535)| \n",
"| `uint32` | Unsigned integer (0 to 4294967295)| \n",
"| `uint64` | Unsigned integer (0 to 18446744073709551615)| \n",
"| `float_` | Shorthand for `float64`| \n",
"| `float16` | Half-precision float: sign bit, 5 bits exponent, 10 bits mantissa| \n",
"| `float32` | Single-precision float: sign bit, 8 bits exponent, 23 bits mantissa| \n",
"| `float64` | Double-precision float: sign bit, 11 bits exponent, 52 bits mantissa| \n",
"| `complex_` | Shorthand for `complex128`| \n",
"| `complex64` | Complex number, represented by two 32-bit floats| \n",
"| `complex128`| Complex number, represented by two 64-bit floats| "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"More advanced type specification is possible, such as specifying big or little endian numbers; for more information, refer to the [NumPy documentation](http://numpy.org/).\n",
"More advanced type specification is possible, such as specifying big- or little-endian numbers; for more information, refer to the [NumPy documentation](http://numpy.org/).\n",
"NumPy also supports compound data types, which will be covered in [Structured Data: NumPy's Structured Arrays](02.09-Structured-Data-NumPy.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Introduction to NumPy](02.00-Introduction-to-NumPy.ipynb) | [Contents](Index.ipynb) | [The Basics of NumPy Arrays](02.02-The-Basics-Of-NumPy-Arrays.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.01-Understanding-Data-Types.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
@ -822,9 +856,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

File diff suppressed because it is too large Load Diff

View File

@ -1,27 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [The Basics of NumPy Arrays](02.02-The-Basics-Of-NumPy-Arrays.ipynb) | [Contents](Index.ipynb) | [Aggregations: Min, Max, and Everything In Between](02.04-Computation-on-arrays-aggregates.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.03-Computation-on-arrays-ufuncs.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -30,30 +8,31 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"Up until now, we have been discussing some of the basic nuts and bolts of NumPy; in the next few sections, we will dive into the reasons that NumPy is so important in the Python data science world.\n",
"Namely, it provides an easy and flexible interface to optimized computation with arrays of data.\n",
"Up until now, we have been discussing some of the basic nuts and bolts of NumPy. In the next few chapters, we will dive into the reasons that NumPy is so important in the Python data science world: namely, because it provides an easy and flexible interface to optimize computation with arrays of data.\n",
"\n",
"Computation on NumPy arrays can be very fast, or it can be very slow.\n",
"The key to making it fast is to use *vectorized* operations, generally implemented through NumPy's *universal functions* (ufuncs).\n",
"This section motivates the need for NumPy's ufuncs, which can be used to make repeated calculations on array elements much more efficient.\n",
"The key to making it fast is to use vectorized operations, generally implemented through NumPy's *universal functions* (ufuncs).\n",
"This chapter motivates the need for NumPy's ufuncs, which can be used to make repeated calculations on array elements much more efficient.\n",
"It then introduces many of the most common and useful arithmetic ufuncs available in the NumPy package."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## The Slowness of Loops\n",
"\n",
"Python's default implementation (known as CPython) does some operations very slowly.\n",
"This is in part due to the dynamic, interpreted nature of the language: the fact that types are flexible, so that sequences of operations cannot be compiled down to efficient machine code as in languages like C and Fortran.\n",
"Recently there have been various attempts to address this weakness: well-known examples are the [PyPy](http://pypy.org/) project, a just-in-time compiled implementation of Python; the [Cython](http://cython.org) project, which converts Python code to compilable C code; and the [Numba](http://numba.pydata.org/) project, which converts snippets of Python code to fast LLVM bytecode.\n",
"This is partly due to the dynamic, interpreted nature of the language; types are flexible, so sequences of operations cannot be compiled down to efficient machine code as in languages like C and Fortran.\n",
"Recently there have been various attempts to address this weakness: well-known examples are the [PyPy project](http://pypy.org/), a just-in-time compiled implementation of Python; the [Cython project](http://cython.org), which converts Python code to compilable C code; and the [Numba project](http://numba.pydata.org/), which converts snippets of Python code to fast LLVM bytecode.\n",
"Each of these has its strengths and weaknesses, but it is safe to say that none of the three approaches has yet surpassed the reach and popularity of the standard CPython engine.\n",
"\n",
"The relative sluggishness of Python generally manifests itself in situations where many small operations are being repeated for instance looping over arrays to operate on each element.\n",
"The relative sluggishness of Python generally manifests itself in situations where many small operations are being repeated; for instance, looping over arrays to operate on each element.\n",
"For example, imagine we have an array of values and we'd like to compute the reciprocal of each.\n",
"A straightforward approach might look like this:"
]
@ -62,13 +41,16 @@
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0.16666667, 1. , 0.25 , 0.25 , 0.125 ])"
"array([0.11111111, 0.25 , 1. , 0.33333333, 0.125 ])"
]
},
"execution_count": 1,
@ -78,7 +60,7 @@
],
"source": [
"import numpy as np\n",
"np.random.seed(0)\n",
"rng = np.random.default_rng(seed=1701)\n",
"\n",
"def compute_reciprocals(values):\n",
" output = np.empty(len(values))\n",
@ -86,7 +68,7 @@
" output[i] = 1.0 / values[i]\n",
" return output\n",
" \n",
"values = np.random.randint(1, 10, size=5)\n",
"values = rng.integers(1, 10, size=5)\n",
"compute_reciprocals(values)"
]
},
@ -95,27 +77,30 @@
"metadata": {},
"source": [
"This implementation probably feels fairly natural to someone from, say, a C or Java background.\n",
"But if we measure the execution time of this code for a large input, we see that this operation is very slow, perhaps surprisingly so!\n",
"We'll benchmark this with IPython's ``%timeit`` magic (discussed in [Profiling and Timing Code](01.07-Timing-and-Profiling.ipynb)):"
"But if we measure the execution time of this code for a large input, we see that this operation is very slowperhaps surprisingly so!\n",
"We'll benchmark this with IPython's `%timeit` magic (discussed in [Profiling and Timing Code](01.07-Timing-and-Profiling.ipynb)):"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1 loop, best of 3: 2.91 s per loop\n"
"2.61 s ± 192 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n"
]
}
],
"source": [
"big_array = np.random.randint(1, 100, size=1000000)\n",
"big_array = rng.integers(1, 100, size=1000000)\n",
"%timeit compute_reciprocals(big_array)"
]
},
@ -124,38 +109,41 @@
"metadata": {},
"source": [
"It takes several seconds to compute these million operations and to store the result!\n",
"When even cell phones have processing speeds measured in Giga-FLOPS (i.e., billions of numerical operations per second), this seems almost absurdly slow.\n",
"It turns out that the bottleneck here is not the operations themselves, but the type-checking and function dispatches that CPython must do at each cycle of the loop.\n",
"When even cell phones have processing speeds measured in gigaflops (i.e., billions of numerical operations per second), this seems almost absurdly slow.\n",
"It turns out that the bottleneck here is not the operations themselves, but the type checking and function dispatches that CPython must do at each cycle of the loop.\n",
"Each time the reciprocal is computed, Python first examines the object's type and does a dynamic lookup of the correct function to use for that type.\n",
"If we were working in compiled code instead, this type specification would be known before the code executes and the result could be computed much more efficiently."
"If we were working in compiled code instead, this type specification would be known before the code executed and the result could be computed much more efficiently."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introducing UFuncs\n",
"## Introducing Ufuncs\n",
"\n",
"For many types of operations, NumPy provides a convenient interface into just this kind of statically typed, compiled routine. This is known as a *vectorized* operation.\n",
"This can be accomplished by simply performing an operation on the array, which will then be applied to each element.\n",
"For simple operations like the element-wise division here, vectorization is as simple as using Python arithmetic operators directly on the array object.\n",
"This vectorized approach is designed to push the loop into the compiled layer that underlies NumPy, leading to much faster execution.\n",
"\n",
"Compare the results of the following two:"
"Compare the results of the following two operations:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 0.16666667 1. 0.25 0.25 0.125 ]\n",
"[ 0.16666667 1. 0.25 0.25 0.125 ]\n"
"[0.11111111 0.25 1. 0.33333333 0.125 ]\n",
"[0.11111111 0.25 1. 0.33333333 0.125 ]\n"
]
}
],
@ -175,14 +163,17 @@
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"100 loops, best of 3: 4.6 ms per loop\n"
"2.54 ms ± 383 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n"
]
}
],
@ -194,21 +185,24 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Vectorized operations in NumPy are implemented via *ufuncs*, whose main purpose is to quickly execute repeated operations on values in NumPy arrays.\n",
"Ufuncs are extremely flexible before we saw an operation between a scalar and an array, but we can also operate between two arrays:"
"Vectorized operations in NumPy are implemented via ufuncs, whose main purpose is to quickly execute repeated operations on values in NumPy arrays.\n",
"Ufuncs are extremely flexiblebefore we saw an operation between a scalar and an array, but we can also operate between two arrays:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 0. , 0.5 , 0.66666667, 0.75 , 0.8 ])"
"array([0. , 0.5 , 0.66666667, 0.75 , 0.8 ])"
]
},
"execution_count": 5,
@ -224,14 +218,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"And ufunc operations are not limited to one-dimensional arraysthey can also act on multi-dimensional arrays as well:"
"And ufunc operations are not limited to one-dimensional arrays. They can act on multidimensional arrays as well:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -256,15 +253,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Computations using vectorization through ufuncs are nearly always more efficient than their counterpart implemented using Python loops, especially as the arrays grow in size.\n",
"Any time you see such a loop in a Python script, you should consider whether it can be replaced with a vectorized expression."
"Computations using vectorization through ufuncs are nearly always more efficient than their counterparts implemented using Python loops, especially as the arrays grow in size.\n",
"Any time you see such a loop in a NumPy script, you should consider whether it can be replaced with a vectorized expression."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exploring NumPy's UFuncs\n",
"## Exploring NumPy's Ufuncs\n",
"\n",
"Ufuncs exist in two flavors: *unary ufuncs*, which operate on a single input, and *binary ufuncs*, which operate on two inputs.\n",
"We'll see examples of both these types of functions here."
@ -274,7 +271,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Array arithmetic\n",
"### Array Arithmetic\n",
"\n",
"NumPy's ufuncs feel very natural to use because they make use of Python's native arithmetic operators.\n",
"The standard addition, subtraction, multiplication, and division can all be used:"
@ -284,29 +281,32 @@
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x = [0 1 2 3]\n",
"x + 5 = [5 6 7 8]\n",
"x - 5 = [-5 -4 -3 -2]\n",
"x * 2 = [0 2 4 6]\n",
"x / 2 = [ 0. 0.5 1. 1.5]\n",
"x = [0 1 2 3]\n",
"x + 5 = [5 6 7 8]\n",
"x - 5 = [-5 -4 -3 -2]\n",
"x * 2 = [0 2 4 6]\n",
"x / 2 = [0. 0.5 1. 1.5]\n",
"x // 2 = [0 0 1 1]\n"
]
}
],
"source": [
"x = np.arange(4)\n",
"print(\"x =\", x)\n",
"print(\"x + 5 =\", x + 5)\n",
"print(\"x - 5 =\", x - 5)\n",
"print(\"x * 2 =\", x * 2)\n",
"print(\"x / 2 =\", x / 2)\n",
"print(\"x =\", x)\n",
"print(\"x + 5 =\", x + 5)\n",
"print(\"x - 5 =\", x - 5)\n",
"print(\"x * 2 =\", x * 2)\n",
"print(\"x / 2 =\", x / 2)\n",
"print(\"x // 2 =\", x // 2) # floor division"
]
},
@ -314,14 +314,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"There is also a unary ufunc for negation, and a ``**`` operator for exponentiation, and a ``%`` operator for modulus:"
"There is also a unary ufunc for negation, a `**` operator for exponentiation, and a `%` operator for modulus:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -351,7 +354,10 @@
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -373,14 +379,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Each of these arithmetic operations are simply convenient wrappers around specific functions built into NumPy; for example, the ``+`` operator is a wrapper for the ``add`` function:"
"All of these arithmetic operations are simply convenient wrappers around specific ufuncs built into NumPy. For example, the `+` operator is a wrapper for the `add` ufunc:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -404,25 +413,25 @@
"source": [
"The following table lists the arithmetic operators implemented in NumPy:\n",
"\n",
"| Operator\t | Equivalent ufunc | Description |\n",
"|---------------|---------------------|---------------------------------------|\n",
"|``+`` |``np.add`` |Addition (e.g., ``1 + 1 = 2``) |\n",
"|``-`` |``np.subtract`` |Subtraction (e.g., ``3 - 2 = 1``) |\n",
"|``-`` |``np.negative`` |Unary negation (e.g., ``-2``) |\n",
"|``*`` |``np.multiply`` |Multiplication (e.g., ``2 * 3 = 6``) |\n",
"|``/`` |``np.divide`` |Division (e.g., ``3 / 2 = 1.5``) |\n",
"|``//`` |``np.floor_divide`` |Floor division (e.g., ``3 // 2 = 1``) |\n",
"|``**`` |``np.power`` |Exponentiation (e.g., ``2 ** 3 = 8``) |\n",
"|``%`` |``np.mod`` |Modulus/remainder (e.g., ``9 % 4 = 1``)|\n",
"| Operator | Equivalent ufunc | Description |\n",
"|-------------|-------------------|-------------------------------------|\n",
"|`+` |`np.add` |Addition (e.g., `1 + 1 = 2`) |\n",
"|`-` |`np.subtract` |Subtraction (e.g., `3 - 2 = 1`) |\n",
"|`-` |`np.negative` |Unary negation (e.g., `-2`) |\n",
"|`*` |`np.multiply` |Multiplication (e.g., `2 * 3 = 6`) |\n",
"|`/` |`np.divide` |Division (e.g., `3 / 2 = 1.5`) |\n",
"|`//` |`np.floor_divide` |Floor division (e.g., `3 // 2 = 1`) |\n",
"|`**` |`np.power` |Exponentiation (e.g., `2 ** 3 = 8`) |\n",
"|`%` |`np.mod` |Modulus/remainder (e.g., `9 % 4 = 1`)|\n",
"\n",
"Additionally there are Boolean/bitwise operators; we will explore these in [Comparisons, Masks, and Boolean Logic](02.06-Boolean-Arrays-and-Masks.ipynb)."
"Additionally, there are Boolean/bitwise operators; we will explore these in [Comparisons, Masks, and Boolean Logic](02.06-Boolean-Arrays-and-Masks.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Absolute value\n",
"### Absolute Value\n",
"\n",
"Just as NumPy understands Python's built-in arithmetic operators, it also understands Python's built-in absolute value function:"
]
@ -431,7 +440,10 @@
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -454,14 +466,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The corresponding NumPy ufunc is ``np.absolute``, which is also available under the alias ``np.abs``:"
"The corresponding NumPy ufunc is `np.absolute`, which is also available under the alias `np.abs`:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -483,7 +498,10 @@
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -505,20 +523,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"This ufunc can also handle complex data, in which the absolute value returns the magnitude:"
"This ufunc can also handle complex data, in which case it returns the magnitude:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array([ 5., 5., 2., 1.])"
"array([5., 5., 2., 1.])"
]
},
"execution_count": 14,
@ -535,7 +556,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Trigonometric functions\n",
"### Trigonometric Functions\n",
"\n",
"NumPy provides a large number of useful ufuncs, and some of the most useful for the data scientist are the trigonometric functions.\n",
"We'll start by defining an array of angles:"
@ -545,7 +566,10 @@
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
@ -563,17 +587,20 @@
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"theta = [ 0. 1.57079633 3.14159265]\n",
"sin(theta) = [ 0.00000000e+00 1.00000000e+00 1.22464680e-16]\n",
"cos(theta) = [ 1.00000000e+00 6.12323400e-17 -1.00000000e+00]\n",
"tan(theta) = [ 0.00000000e+00 1.63312394e+16 -1.22464680e-16]\n"
"theta = [0. 1.57079633 3.14159265]\n",
"sin(theta) = [0.0000000e+00 1.0000000e+00 1.2246468e-16]\n",
"cos(theta) = [ 1.000000e+00 6.123234e-17 -1.000000e+00]\n",
"tan(theta) = [ 0.00000000e+00 1.63312394e+16 -1.22464680e-16]\n"
]
}
],
@ -596,7 +623,10 @@
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -605,7 +635,7 @@
"text": [
"x = [-1, 0, 1]\n",
"arcsin(x) = [-1.57079633 0. 1.57079633]\n",
"arccos(x) = [ 3.14159265 1.57079633 0. ]\n",
"arccos(x) = [3.14159265 1.57079633 0. ]\n",
"arctan(x) = [-0.78539816 0. 0.78539816]\n"
]
}
@ -622,35 +652,38 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exponents and logarithms\n",
"### Exponents and Logarithms\n",
"\n",
"Another common type of operation available in a NumPy ufunc are the exponentials:"
"Other common operations available in NumPy ufuncs are the exponentials:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"x = [1, 2, 3]\n",
"e^x = [ 2.71828183 7.3890561 20.08553692]\n",
"2^x = [ 2. 4. 8.]\n",
"3^x = [ 3 9 27]\n"
"x = [1, 2, 3]\n",
"e^x = [ 2.71828183 7.3890561 20.08553692]\n",
"2^x = [2. 4. 8.]\n",
"3^x = [ 3. 9. 27.]\n"
]
}
],
"source": [
"x = [1, 2, 3]\n",
"print(\"x =\", x)\n",
"print(\"e^x =\", np.exp(x))\n",
"print(\"2^x =\", np.exp2(x))\n",
"print(\"3^x =\", np.power(3, x))"
"print(\"x =\", x)\n",
"print(\"e^x =\", np.exp(x))\n",
"print(\"2^x =\", np.exp2(x))\n",
"print(\"3^x =\", np.power(3., x))"
]
},
{
@ -658,14 +691,17 @@
"metadata": {},
"source": [
"The inverse of the exponentials, the logarithms, are also available.\n",
"The basic ``np.log`` gives the natural logarithm; if you prefer to compute the base-2 logarithm or the base-10 logarithm, these are available as well:"
"The basic `np.log` gives the natural logarithm; if you prefer to compute the base-2 logarithm or the base-10 logarithm, these are available as well:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -673,9 +709,9 @@
"output_type": "stream",
"text": [
"x = [1, 2, 4, 10]\n",
"ln(x) = [ 0. 0.69314718 1.38629436 2.30258509]\n",
"log2(x) = [ 0. 1. 2. 3.32192809]\n",
"log10(x) = [ 0. 0.30103 0.60205999 1. ]\n"
"ln(x) = [0. 0.69314718 1.38629436 2.30258509]\n",
"log2(x) = [0. 1. 2. 3.32192809]\n",
"log10(x) = [0. 0.30103 0.60205999 1. ]\n"
]
}
],
@ -698,15 +734,18 @@
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"exp(x) - 1 = [ 0. 0.0010005 0.01005017 0.10517092]\n",
"log(1 + x) = [ 0. 0.0009995 0.00995033 0.09531018]\n"
"exp(x) - 1 = [0. 0.0010005 0.01005017 0.10517092]\n",
"log(1 + x) = [0. 0.0009995 0.00995033 0.09531018]\n"
]
}
],
@ -720,20 +759,20 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"When ``x`` is very small, these functions give more precise values than if the raw ``np.log`` or ``np.exp`` were to be used."
"When `x` is very small, these functions give more precise values than if the raw `np.log` or `np.exp` were to be used."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Specialized ufuncs\n",
"### Specialized Ufuncs\n",
"\n",
"NumPy has many more ufuncs available, including hyperbolic trig functions, bitwise arithmetic, comparison operators, conversions from radians to degrees, rounding and remainders, and much more.\n",
"NumPy has many more ufuncs available, including for hyperbolic trigonometry, bitwise arithmetic, comparison operations, conversions from radians to degrees, rounding and remainders, and much more.\n",
"A look through the NumPy documentation reveals a lot of interesting functionality.\n",
"\n",
"Another excellent source for more specialized and obscure ufuncs is the submodule ``scipy.special``.\n",
"If you want to compute some obscure mathematical function on your data, chances are it is implemented in ``scipy.special``.\n",
"Another excellent source for more specialized ufuncs is the submodule `scipy.special`.\n",
"If you want to compute some obscure mathematical function on your data, chances are it is implemented in `scipy.special`.\n",
"There are far too many functions to list them all, but the following snippet shows a couple that might come up in a statistics context:"
]
},
@ -741,7 +780,10 @@
"cell_type": "code",
"execution_count": 21,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
@ -752,16 +794,19 @@
"cell_type": "code",
"execution_count": 22,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"gamma(x) = [ 1.00000000e+00 2.40000000e+01 3.62880000e+05]\n",
"ln|gamma(x)| = [ 0. 3.17805383 12.80182748]\n",
"beta(x, 2) = [ 0.5 0.03333333 0.00909091]\n"
"gamma(x) = [1.0000e+00 2.4000e+01 3.6288e+05]\n",
"ln|gamma(x)| = [ 0. 3.17805383 12.80182748]\n",
"beta(x, 2) = [0.5 0.03333333 0.00909091]\n"
]
}
],
@ -777,21 +822,24 @@
"cell_type": "code",
"execution_count": 23,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"erf(x) = [ 0. 0.32862676 0.67780119 0.84270079]\n",
"erfc(x) = [ 1. 0.67137324 0.32219881 0.15729921]\n",
"erfinv(x) = [ 0. 0.27246271 0.73286908 inf]\n"
"erf(x) = [0. 0.32862676 0.67780119 0.84270079]\n",
"erfc(x) = [1. 0.67137324 0.32219881 0.15729921]\n",
"erfinv(x) = [0. 0.27246271 0.73286908 inf]\n"
]
}
],
"source": [
"# Error function (integral of Gaussian)\n",
"# Error function (integral of Gaussian),\n",
"# its complement, and its inverse\n",
"x = np.array([0, 0.3, 0.7, 1.0])\n",
"print(\"erf(x) =\", special.erf(x))\n",
@ -803,7 +851,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"There are many, many more ufuncs available in both NumPy and ``scipy.special``.\n",
"There are many, many more ufuncs available in both NumPy and `scipy.special`.\n",
"Because the documentation of these packages is available online, a web search along the lines of \"gamma function python\" will generally find the relevant information."
]
},
@ -814,32 +862,34 @@
"## Advanced Ufunc Features\n",
"\n",
"Many NumPy users make use of ufuncs without ever learning their full set of features.\n",
"We'll outline a few specialized features of ufuncs here."
"I'll outline a few specialized features of ufuncs here."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Specifying output\n",
"### Specifying Output\n",
"\n",
"For large calculations, it is sometimes useful to be able to specify the array where the result of the calculation will be stored.\n",
"Rather than creating a temporary array, this can be used to write computation results directly to the memory location where you'd like them to be.\n",
"For all ufuncs, this can be done using the ``out`` argument of the function:"
"For all ufuncs, this can be done using the `out` argument of the function:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 0. 10. 20. 30. 40.]\n"
"[ 0. 10. 20. 30. 40.]\n"
]
}
],
@ -861,14 +911,17 @@
"cell_type": "code",
"execution_count": 25,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 1. 0. 2. 0. 4. 0. 8. 0. 16. 0.]\n"
"[ 1. 0. 2. 0. 4. 0. 8. 0. 16. 0.]\n"
]
}
],
@ -882,28 +935,31 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If we had instead written ``y[::2] = 2 ** x``, this would have resulted in the creation of a temporary array to hold the results of ``2 ** x``, followed by a second operation copying those values into the ``y`` array.\n",
"This doesn't make much of a difference for such a small computation, but for very large arrays the memory savings from careful use of the ``out`` argument can be significant."
"If we had instead written `y[::2] = 2 ** x`, this would have resulted in the creation of a temporary array to hold the results of `2 ** x`, followed by a second operation copying those values into the `y` array.\n",
"This doesn't make much of a difference for such a small computation, but for very large arrays the memory savings from careful use of the `out` argument can be significant."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Aggregates\n",
"### Aggregations\n",
"\n",
"For binary ufuncs, there are some interesting aggregates that can be computed directly from the object.\n",
"For example, if we'd like to *reduce* an array with a particular operation, we can use the ``reduce`` method of any ufunc.\n",
"For binary ufuncs, aggregations can be computed directly from the object.\n",
"For example, if we'd like to *reduce* an array with a particular operation, we can use the `reduce` method of any ufunc.\n",
"A reduce repeatedly applies a given operation to the elements of an array until only a single result remains.\n",
"\n",
"For example, calling ``reduce`` on the ``add`` ufunc returns the sum of all elements in the array:"
"For example, calling `reduce` on the `add` ufunc returns the sum of all elements in the array:"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -926,14 +982,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Similarly, calling ``reduce`` on the ``multiply`` ufunc results in the product of all array elements:"
"Similarly, calling `reduce` on the `multiply` ufunc results in the product of all array elements:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -955,14 +1014,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"If we'd like to store all the intermediate results of the computation, we can instead use ``accumulate``:"
"If we'd like to store all the intermediate results of the computation, we can instead use `accumulate`:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -984,7 +1046,10 @@
"cell_type": "code",
"execution_count": 29,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -1006,16 +1071,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that for these particular cases, there are dedicated NumPy functions to compute the results (``np.sum``, ``np.prod``, ``np.cumsum``, ``np.cumprod``), which we'll explore in [Aggregations: Min, Max, and Everything In Between](02.04-Computation-on-arrays-aggregates.ipynb)."
"Note that for these particular cases, there are dedicated NumPy functions to compute the results (`np.sum`, `np.prod`, `np.cumsum`, `np.cumprod`), which we'll explore in [Aggregations: Min, Max, and Everything In Between](02.04-Computation-on-arrays-aggregates.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Outer products\n",
"### Outer Products\n",
"\n",
"Finally, any ufunc can compute the output of all pairs of two different inputs using the ``outer`` method.\n",
"Finally, any ufunc can compute the output of all pairs of two different inputs using the `outer` method.\n",
"This allows you, in one line, to do things like create a multiplication table:"
]
},
@ -1024,6 +1089,9 @@
"execution_count": 30,
"metadata": {
"collapsed": false,
"jupyter": {
"outputs_hidden": false
},
"scrolled": true
},
"outputs": [
@ -1051,10 +1119,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The ``ufunc.at`` and ``ufunc.reduceat`` methods, which we'll explore in [Fancy Indexing](02.07-Fancy-Indexing.ipynb), are very helpful as well.\n",
"The `ufunc.at` and `ufunc.reduceat` methods are useful as well, and we will explore them in [Fancy Indexing](02.07-Fancy-Indexing.ipynb).\n",
"\n",
"Another extremely useful feature of ufuncs is the ability to operate between arrays of different sizes and shapes, a set of operations known as *broadcasting*.\n",
"This subject is important enough that we will devote a whole section to it (see [Computation on Arrays: Broadcasting](02.05-Computation-on-arrays-broadcasting.ipynb))."
"We will also encounter the ability of ufuncs to operate between arrays of different shapes and sizes, a set of operations known as *broadcasting*.\n",
"This subject is important enough that we will devote a whole chapter to it (see [Computation on Arrays: Broadcasting](02.05-Computation-on-arrays-broadcasting.ipynb))."
]
},
{
@ -1070,22 +1138,15 @@
"source": [
"More information on universal functions (including the full list of available functions) can be found on the [NumPy](http://www.numpy.org) and [SciPy](http://www.scipy.org) documentation websites.\n",
"\n",
"Recall that you can also access information directly from within IPython by importing the packages and using IPython's tab-completion and help (``?``) functionality, as described in [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [The Basics of NumPy Arrays](02.02-The-Basics-Of-NumPy-Arrays.ipynb) | [Contents](Index.ipynb) | [Aggregations: Min, Max, and Everything In Between](02.04-Computation-on-arrays-aggregates.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.03-Computation-on-arrays-ufuncs.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"Recall that you can also access information directly from within IPython by importing the packages and using IPython's tab completion and help (`?`) functionality, as described in [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb)."
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
@ -1101,9 +1162,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,27 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Sorting Arrays](02.08-Sorting.ipynb) | [Contents](Index.ipynb) | [Data Manipulation with Pandas](03.00-Introduction-to-Pandas.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.09-Structured-Data-NumPy.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -33,14 +11,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"While often our data can be well represented by a homogeneous array of values, sometimes this is not the case. This section demonstrates the use of NumPy's *structured arrays* and *record arrays*, which provide efficient storage for compound, heterogeneous data. While the patterns shown here are useful for simple operations, scenarios like this often lend themselves to the use of Pandas ``Dataframe``s, which we'll explore in [Chapter 3](03.00-Introduction-to-Pandas.ipynb)."
"While often our data can be well represented by a homogeneous array of values, sometimes this is not the case. This chapter demonstrates the use of NumPy's *structured arrays* and *record arrays*, which provide efficient storage for compound, heterogeneous data. While the patterns shown here are useful for simple operations, scenarios like this often lend themselves to the use of Pandas ``DataFrame``s, which we'll explore in [Part 3](03.00-Introduction-to-Pandas.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
"tags": []
},
"outputs": [],
"source": [
@ -59,7 +37,10 @@
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
@ -72,8 +53,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"But this is a bit clumsy. There's nothing here that tells us that the three arrays are related; it would be more natural if we could use a single structure to store all of this data.\n",
"NumPy can handle this through structured arrays, which are arrays with compound data types.\n",
"But this is a bit clumsy. There's nothing here that tells us that the three arrays are related; NumPy's structured arrays allow us to do this more naturally by using a single structure to store all of this data.\n",
"\n",
"Recall that previously we created a simple array using an expression like this:"
]
@ -82,7 +62,10 @@
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [],
"source": [
@ -100,7 +83,10 @@
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -122,7 +108,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Here ``'U10'`` translates to \"Unicode string of maximum length 10,\" ``'i4'`` translates to \"4-byte (i.e., 32 bit) integer,\" and ``'f8'`` translates to \"8-byte (i.e., 64 bit) float.\"\n",
"Here `'U10'` translates to \"Unicode string of maximum length 10,\" `'i4'` translates to \"4-byte (i.e., 32-bit) integer,\" and `'f8'` translates to \"8-byte (i.e., 64-bit) float.\"\n",
"We'll discuss other options for these type codes in the following section.\n",
"\n",
"Now that we've created an empty container array, we can fill the array with our lists of values:"
@ -132,14 +118,17 @@
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[('Alice', 25, 55.0) ('Bob', 45, 85.5) ('Cathy', 37, 68.0)\n",
"[('Alice', 25, 55. ) ('Bob', 45, 85.5) ('Cathy', 37, 68. )\n",
" ('Doug', 19, 61.5)]\n"
]
}
@ -155,23 +144,25 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"As we had hoped, the data is now arranged together in one convenient block of memory.\n",
"As we had hoped, the data is now conveniently arranged in one structured array.\n",
"\n",
"The handy thing with structured arrays is that you can now refer to values either by index or by name:"
"The handy thing with structured arrays is that we can now refer to values either by index or by name:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array(['Alice', 'Bob', 'Cathy', 'Doug'], \n",
" dtype='<U10')"
"array(['Alice', 'Bob', 'Cathy', 'Doug'], dtype='<U10')"
]
},
"execution_count": 6,
@ -188,13 +179,16 @@
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"('Alice', 25, 55.0)"
"('Alice', 25, 55.)"
]
},
"execution_count": 7,
@ -211,7 +205,10 @@
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -234,21 +231,23 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Using Boolean masking, this even allows you to do some more sophisticated operations such as filtering on age:"
"Using Boolean masking, we can even do some more sophisticated operations, such as filtering on age:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"array(['Alice', 'Doug'], \n",
" dtype='<U10')"
"array(['Alice', 'Doug'], dtype='<U10')"
]
},
"execution_count": 9,
@ -265,15 +264,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that if you'd like to do any operations that are any more complicated than these, you should probably consider the Pandas package, covered in the next chapter.\n",
"As we'll see, Pandas provides a ``Dataframe`` object, which is a structure built on NumPy arrays that offers a variety of useful data manipulation functionality similar to what we've shown here, as well as much, much more."
"If you'd like to do any operations that are any more complicated than these, you should probably consider the Pandas package, covered in [Part 4](04.00-Introduction-To-Matplotlib.ipynb).\n",
"As you'll see, Pandas provides a `DataFrame` object, which is a structure built on NumPy arrays that offers a variety of useful data manipulation functionality similar to what you've seen here, as well as much, much more."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Creating Structured Arrays\n",
"## Exploring Structured Array Creation\n",
"\n",
"Structured array data types can be specified in a number of ways.\n",
"Earlier, we saw the dictionary method:"
@ -283,7 +282,10 @@
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -306,14 +308,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"For clarity, numerical types can be specified using Python types or NumPy ``dtype``s instead:"
"For clarity, numerical types can be specified using Python types or NumPy `dtype`s instead:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -343,7 +348,10 @@
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -372,7 +380,10 @@
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -394,21 +405,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The shortened string format codes may seem confusing, but they are built on simple principles.\n",
"The first (optional) character is ``<`` or ``>``, which means \"little endian\" or \"big endian,\" respectively, and specifies the ordering convention for significant bits.\n",
"The shortened string format codes may not be immediately intuitive, but they are built on simple principles.\n",
"The first (optional) character `<` or `>`, means \"little endian\" or \"big endian,\" respectively, and specifies the ordering convention for significant bits.\n",
"The next character specifies the type of data: characters, bytes, ints, floating points, and so on (see the table below).\n",
"The last character or characters represents the size of the object in bytes.\n",
"The last character or characters represent the size of the object in bytes.\n",
"\n",
"| Character | Description | Example |\n",
"| --------- | ----------- | ------- | \n",
"| ``'b'`` | Byte | ``np.dtype('b')`` |\n",
"| ``'i'`` | Signed integer | ``np.dtype('i4') == np.int32`` |\n",
"| ``'u'`` | Unsigned integer | ``np.dtype('u1') == np.uint8`` |\n",
"| ``'f'`` | Floating point | ``np.dtype('f8') == np.int64`` |\n",
"| ``'c'`` | Complex floating point| ``np.dtype('c16') == np.complex128``|\n",
"| ``'S'``, ``'a'`` | String | ``np.dtype('S5')`` |\n",
"| ``'U'`` | Unicode string | ``np.dtype('U') == np.str_`` |\n",
"| ``'V'`` | Raw data (void) | ``np.dtype('V') == np.void`` |"
"| Character | Description | Example |\n",
"| --------- | ----------- | ------- | \n",
"| `'b'` | Byte | `np.dtype('b')` |\n",
"| `'i'` | Signed integer | `np.dtype('i4') == np.int32` |\n",
"| `'u'` | Unsigned integer | `np.dtype('u1') == np.uint8` |\n",
"| `'f'` | Floating point | `np.dtype('f8') == np.int64` |\n",
"| `'c'` | Complex floating point| `np.dtype('c16') == np.complex128`|\n",
"| `'S'`, `'a'` | String | `np.dtype('S5')` |\n",
"| `'U'` | Unicode string | `np.dtype('U') == np.str_` |\n",
"| `'V'` | Raw data (void) | `np.dtype('V') == np.void` |"
]
},
{
@ -419,24 +430,27 @@
"\n",
"It is possible to define even more advanced compound types.\n",
"For example, you can create a type where each element contains an array or matrix of values.\n",
"Here, we'll create a data type with a ``mat`` component consisting of a $3\\times 3$ floating-point matrix:"
"Here, we'll create a data type with a `mat` component consisting of a $3\\times 3$ floating-point matrix:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]])\n",
"[[ 0. 0. 0.]\n",
" [ 0. 0. 0.]\n",
" [ 0. 0. 0.]]\n"
"(0, [[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]])\n",
"[[0. 0. 0.]\n",
" [0. 0. 0.]\n",
" [0. 0. 0.]]\n"
]
}
],
@ -451,27 +465,30 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now each element in the ``X`` array consists of an ``id`` and a $3\\times 3$ matrix.\n",
"Now each element in the `X` array consists of an `id` and a $3\\times 3$ matrix.\n",
"Why would you use this rather than a simple multidimensional array, or perhaps a Python dictionary?\n",
"The reason is that this NumPy ``dtype`` directly maps onto a C structure definition, so the buffer containing the array content can be accessed directly within an appropriately written C program.\n",
"If you find yourself writing a Python interface to a legacy C or Fortran library that manipulates structured data, you'll probably find structured arrays quite useful!"
"One reason is that this NumPy `dtype` directly maps onto a C structure definition, so the buffer containing the array content can be accessed directly within an appropriately written C program.\n",
"If you find yourself writing a Python interface to a legacy C or Fortran library that manipulates structured data, structured arrays can provide a powerful interface."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## RecordArrays: Structured Arrays with a Twist\n",
"## Record Arrays: Structured Arrays with a Twist\n",
"\n",
"NumPy also provides the ``np.recarray`` class, which is almost identical to the structured arrays just described, but with one additional feature: fields can be accessed as attributes rather than as dictionary keys.\n",
"Recall that we previously accessed the ages by writing:"
"NumPy also provides record arrays (instances of the `np.recarray` class), which are almost identical to the structured arrays just described, but with one additional feature: fields can be accessed as attributes rather than as dictionary keys.\n",
"Recall that we previously accessed the ages in our sample dataset by writing:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -500,7 +517,10 @@
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
@ -523,23 +543,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The downside is that for record arrays, there is some extra overhead involved in accessing the fields, even when using the same syntax. We can see this here:"
"The downside is that for record arrays, there is some extra overhead involved in accessing the fields, even when using the same syntax:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1000000 loops, best of 3: 241 ns per loop\n",
"100000 loops, best of 3: 4.61 µs per loop\n",
"100000 loops, best of 3: 7.27 µs per loop\n"
"121 ns ± 1.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n",
"2.41 µs ± 15.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n",
"3.98 µs ± 20.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
]
}
],
@ -553,7 +571,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Whether the more convenient notation is worth the additional overhead will depend on your own application."
"Whether the more convenient notation is worth the (slight) overhead will depend on your own application."
]
},
{
@ -562,26 +580,19 @@
"source": [
"## On to Pandas\n",
"\n",
"This section on structured and record arrays is purposely at the end of this chapter, because it leads so well into the next package we will cover: Pandas.\n",
"Structured arrays like the ones discussed here are good to know about for certain situations, especially in case you're using NumPy arrays to map onto binary data formats in C, Fortran, or another language.\n",
"For day-to-day use of structured data, the Pandas package is a much better choice, and we'll dive into a full discussion of it in the chapter that follows."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Sorting Arrays](02.08-Sorting.ipynb) | [Contents](Index.ipynb) | [Data Manipulation with Pandas](03.00-Introduction-to-Pandas.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.09-Structured-Data-NumPy.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"This chapter on structured and record arrays is purposely located at the end of this part of the book, because it leads so well into the next package we will cover: Pandas.\n",
"Structured arrays can come in handy in certain situations, like when you're using NumPy arrays to map onto binary data formats in C, Fortran, or another language.\n",
"But for day-to-day use of structured data, the Pandas package is a much better choice; we'll explore it in depth in the chapters that follow."
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -595,9 +606,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@ -1,27 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Structured Data: NumPy's Structured Arrays](02.09-Structured-Data-NumPy.ipynb) | [Contents](Index.ipynb) | [Introducing Pandas Objects](03.01-Introducing-Pandas-Objects.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.00-Introduction-to-Pandas.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -33,17 +11,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In the previous chapter, we dove into detail on NumPy and its ``ndarray`` object, which provides efficient storage and manipulation of dense typed arrays in Python.\n",
"Here we'll build on this knowledge by looking in detail at the data structures provided by the Pandas library.\n",
"Pandas is a newer package built on top of NumPy, and provides an efficient implementation of a ``DataFrame``.\n",
"``DataFrame``s are essentially multidimensional arrays with attached row and column labels, and often with heterogeneous types and/or missing data.\n",
"In [Part 2](02.00-Introduction-to-NumPy.ipynb), we dove into detail on NumPy and its `ndarray` object, which enables efficient storage and manipulation of dense typed arrays in Python.\n",
"Here we'll build on this knowledge by looking in depth at the data structures provided by the Pandas library.\n",
"Pandas is a newer package built on top of NumPy that provides an efficient implementation of a `DataFrame`.\n",
"``DataFrame``s are essentially multidimensional arrays with attached row and column labels, often with heterogeneous types and/or missing data.\n",
"As well as offering a convenient storage interface for labeled data, Pandas implements a number of powerful data operations familiar to users of both database frameworks and spreadsheet programs.\n",
"\n",
"As we saw, NumPy's ``ndarray`` data structure provides essential features for the type of clean, well-organized data typically seen in numerical computing tasks.\n",
"As we've seen, NumPy's `ndarray` data structure provides essential features for the type of clean, well-organized data typically seen in numerical computing tasks.\n",
"While it serves this purpose very well, its limitations become clear when we need more flexibility (e.g., attaching labels to data, working with missing data, etc.) and when attempting operations that do not map well to element-wise broadcasting (e.g., groupings, pivots, etc.), each of which is an important piece of analyzing the less structured data available in many forms in the world around us.\n",
"Pandas, and in particular its ``Series`` and ``DataFrame`` objects, builds on the NumPy array structure and provides efficient access to these sorts of \"data munging\" tasks that occupy much of a data scientist's time.\n",
"Pandas, and in particular its `Series` and `DataFrame` objects, builds on the NumPy array structure and provides efficient access to these sorts of \"data munging\" tasks that occupy much of a data scientist's time.\n",
"\n",
"In this chapter, we will focus on the mechanics of using ``Series``, ``DataFrame``, and related structures effectively.\n",
"In this part of the book, we will focus on the mechanics of using `Series`, `DataFrame`, and related structures effectively.\n",
"We will use examples drawn from real datasets where appropriate, but these examples are not necessarily the focus."
]
},
@ -53,24 +31,27 @@
"source": [
"## Installing and Using Pandas\n",
"\n",
"Installation of Pandas on your system requires NumPy to be installed, and if building the library from source, requires the appropriate tools to compile the C and Cython sources on which Pandas is built.\n",
"Details on this installation can be found in the [Pandas documentation](http://pandas.pydata.org/).\n",
"Installation of Pandas on your system requires NumPy to be installed, and if you're building the library from source, you will need the appropriate tools to compile the C and Cython sources on which Pandas is built.\n",
"Details on the installation process can be found in the [Pandas documentation](http://pandas.pydata.org/).\n",
"If you followed the advice outlined in the [Preface](00.00-Preface.ipynb) and used the Anaconda stack, you already have Pandas installed.\n",
"\n",
"Once Pandas is installed, you can import it and check the version:"
"Once Pandas is installed, you can import it and check the version; here is the version used by this book:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
"collapsed": false,
"jupyter": {
"outputs_hidden": false
}
},
"outputs": [
{
"data": {
"text/plain": [
"'0.18.1'"
"'1.3.5'"
]
},
"execution_count": 1,
@ -87,14 +68,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Just as we generally import NumPy under the alias ``np``, we will import Pandas under the alias ``pd``:"
"Just as we generally import NumPy under the alias `np`, we will import Pandas under the alias `pd`:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true
"tags": []
},
"outputs": [],
"source": [
@ -112,17 +93,17 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reminder about Built-In Documentation\n",
"## Reminder About Built-in Documentation\n",
"\n",
"As you read through this chapter, don't forget that IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature) as well as the documentation of various functions (using the ``?`` character). (Refer back to [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb) if you need a refresher on this.)\n",
"As you read through this part of the book, don't forget that IPython gives you the ability to quickly explore the contents of a package (by using the tab completion feature) as well as the documentation of various functions (using the `?` character). Refer back to [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb) if you need a refresher on this.\n",
"\n",
"For example, to display all the contents of the pandas namespace, you can type\n",
"For example, to display all the contents of the Pandas namespace, you can type:\n",
"\n",
"```ipython\n",
"In [3]: pd.<TAB>\n",
"```\n",
"\n",
"And to display Pandas's built-in documentation, you can use this:\n",
"And to display the built-in Pandas documentation, you can use this:\n",
"\n",
"```ipython\n",
"In [4]: pd?\n",
@ -130,22 +111,15 @@
"\n",
"More detailed documentation, along with tutorials and other resources, can be found at http://pandas.pydata.org/."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Structured Data: NumPy's Structured Arrays](02.09-Structured-Data-NumPy.ipynb) | [Contents](Index.ipynb) | [Introducing Pandas Objects](03.01-Introducing-Pandas-Objects.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.00-Introduction-to-Pandas.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -159,9 +133,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load Diff

View File

@ -1,33 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"<!--NAVIGATION-->\n",
"< [High-Performance Pandas: eval() and query()](03.12-Performance-Eval-and-Query.ipynb) | [Contents](Index.ipynb) | [Visualization with Matplotlib](04.00-Introduction-To-Matplotlib.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.13-Further-Resources.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -42,39 +14,29 @@
"editable": true
},
"source": [
"In this chapter, we've covered many of the basics of using Pandas effectively for data analysis.\n",
"In this part of the book, we've covered many of the basics of using Pandas effectively for data analysis.\n",
"Still, much has been omitted from our discussion.\n",
"To learn more about Pandas, I recommend the following resources:\n",
"\n",
"- [Pandas online documentation](http://pandas.pydata.org/): This is the go-to source for complete documentation of the package. While the examples in the documentation tend to be small generated datasets, the description of the options is complete and generally very useful for understanding the use of various functions.\n",
"- [Pandas online documentation](http://pandas.pydata.org/): This is the go-to source for complete documentation of the package. While the examples in the documentation tend to be based on small generated datasets, the description of the options is complete and generally very useful for understanding the use of various functions.\n",
"\n",
"- [*Python for Data Analysis*](http://shop.oreilly.com/product/0636920023784.do) Written by Wes McKinney (the original creator of Pandas), this book contains much more detail on the Pandas package than we had room for in this chapter. In particular, he takes a deep dive into tools for time series, which were his bread and butter as a financial consultant. The book also has many entertaining examples of applying Pandas to gain insight from real-world datasets. Keep in mind, though, that the book is now several years old, and the Pandas package has quite a few new features that this book does not cover (but be on the lookout for a new edition in 2017).\n",
"- [*Python for Data Analysis*](https://learning.oreilly.com/library/view/python-for-data/9781098104023/): Written by Wes McKinney (the original creator of Pandas), this book contains much more detail on the Pandas package than we had room for in this chapter. In particular, McKinney takes a deep dive into tools for time series, which were his bread and butter as a financial consultant. The book also has many entertaining examples of applying Pandas to gain insight from real-world datasets.\n",
"\n",
"- [Stack Overflow](http://stackoverflow.com/questions/tagged/pandas): Pandas has so many users that any question you have has likely been asked and answered on Stack Overflow. Using Pandas is a case where some Google-Fu is your best friend. Simply go to your favorite search engine and type in the question, problem, or error you're coming acrossmore than likely you'll find your answer on a Stack Overflow page.\n",
"- [*Effective Pandas*](https://leanpub.com/effective-pandas): This short e-book by Pandas developer Tom Augspurger provides a succinct outline of using the full power of the Pandas library in an effective and idiomatic way.\n",
"\n",
"- [Pandas on PyVideo](http://pyvideo.org/search?q=pandas): From PyCon to SciPy to PyData, many conferences have featured tutorials from Pandas developers and power users. The PyCon tutorials in particular tend to be given by very well-vetted presenters.\n",
"- [Pandas on PyVideo](http://pyvideo.org/search?q=pandas): From PyCon to SciPy to PyData, many conferences have featured tutorials by Pandas developers and power users. The PyCon tutorials in particular tend to be given by very well-vetted presenters.\n",
"\n",
"Using these resources, combined with the walk-through given in this chapter, my hope is that you'll be poised to use Pandas to tackle any data analysis problem you come across!"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"<!--NAVIGATION-->\n",
"< [High-Performance Pandas: eval() and query()](03.12-Performance-Eval-and-Query.ipynb) | [Contents](Index.ipynb) | [Visualization with Matplotlib](04.00-Introduction-To-Matplotlib.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/03.13-Further-Resources.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"Using these resources, combined with the walkthrough given in these chapters, my hope is that you'll be poised to use Pandas to tackle any data analysis problem you come across!"
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -88,9 +50,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -4,78 +4,44 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"# Further Resources\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Visualization with Seaborn](04.14-Visualization-With-Seaborn.ipynb) | [Contents](Index.ipynb) | [Machine Learning](05.00-Machine-Learning.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/04.15-Further-Resources.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Further Resources"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Matplotlib Resources\n",
"\n",
"A single chapter in a book can never hope to cover all the available features and plot types available in Matplotlib.\n",
"As with other packages we've seen, liberal use of IPython's tab-completion and help functions (see [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb)) can be very helpful when exploring Matplotlib's API.\n",
"A single part of a book can never hope to cover all the available features and plot types available in Matplotlib.\n",
"As with other packages we've seen, liberal use of IPython's tab completion and help functions (see [Help and Documentation in IPython](01.01-Help-And-Documentation.ipynb)) can be very helpful when exploring Matplotlib's API.\n",
"In addition, Matplotlibs [online documentation](http://matplotlib.org/) can be a helpful reference.\n",
"See in particular the [Matplotlib gallery](http://matplotlib.org/gallery.html) linked on that page: it shows thumbnails of hundreds of different plot types, each one linked to a page with the Python code snippet used to generate it.\n",
"In this way, you can visually inspect and learn about a wide range of different plotting styles and visualization techniques.\n",
"See in particular the [Matplotlib gallery](https://matplotlib.org/stable/gallery/), which shows thumbnails of hundreds of different plot types, each one linked to a page with the Python code snippet used to generate it.\n",
"This allows you to visually inspect and learn about a wide range of different plotting styles and visualization techniques.\n",
"\n",
"For a book-length treatment of Matplotlib, I would recommend [*Interactive Applications Using Matplotlib*](https://www.packtpub.com/application-development/interactive-applications-using-matplotlib), written by Matplotlib core developer Ben Root."
"For a book-length treatment of Matplotlib, I would recommend *Interactive Applications Using Matplotlib* (Packt), written by Matplotlib core developer Ben Root."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"## Other Python Graphics Libraries\n",
"## Other Python Visualization Libraries\n",
"\n",
"Although Matplotlib is the most prominent Python visualization library, there are other more modern tools that are worth exploring as well.\n",
"I'll mention a few of them briefly here:\n",
"\n",
"- [Bokeh](http://bokeh.pydata.org) is a JavaScript visualization library with a Python frontend that creates highly interactive visualizations capable of handling very large and/or streaming datasets. The Python front-end outputs a JSON data structure that can be interpreted by the Bokeh JS engine.\n",
"- [Plotly](http://plot.ly) is the eponymous open source product of the Plotly company, and is similar in spirit to Bokeh. Because Plotly is the main product of a startup, it is receiving a high level of development effort. Use of the library is entirely free.\n",
"- [Vispy](http://vispy.org/) is an actively developed project focused on dynamic visualizations of very large datasets. Because it is built to target OpenGL and make use of efficient graphics processors in your computer, it is able to render some quite large and stunning visualizations.\n",
"- [Vega](https://vega.github.io/) and [Vega-Lite](https://vega.github.io/vega-lite) are declarative graphics representations, and are the product of years of research into the fundamental language of data visualization. The reference rendering implementation is JavaScript, but the API is language agnostic. There is a Python API under development in the [Altair](https://altair-viz.github.io/) package. Though as of summer 2016 it's not yet fully mature, I'm quite excited for the possibilities of this project to provide a common reference point for visualization in Python and other languages.\n",
"- [Bokeh](http://bokeh.pydata.org) is a JavaScript visualization library with a Python frontend that creates highly interactive visualizations capable of handling very large and/or streaming datasets.\n",
"- [Plotly](http://plot.ly) is the eponymous open source product of the Plotly company, and is similar in spirit to Bokeh. It is actively developed and provides a wide range of interactive chart types.\n",
"- [HoloViews](https://holoviews.org/) is a more declarative, unified API for generating charts in a variety of backends, including Bokeh and Matplotlib.\n",
"- [Vega](https://vega.github.io/) and [Vega-Lite](https://vega.github.io/vega-lite) are declarative graphics representations, and are the product of years of research into how to think about data visualization and interaction. The reference rendering implementation is JavaScript, and the [Altair package](https://altair-viz.github.io/) provides a Python API to generate these charts.\n",
"\n",
"The visualization space in the Python community is very dynamic, and I fully expect this list to be out of date as soon as it is published.\n",
"Keep an eye out for what's coming in the future!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Visualization with Seaborn](04.14-Visualization-With-Seaborn.ipynb) | [Contents](Index.ipynb) | [Machine Learning](05.00-Machine-Learning.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/04.15-Further-Resources.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"The visualization landscape in the Python world is constantly evolving, and I expect that this list may be out of date by the time this book is published.\n",
"Additionally, because Python is used in so many domains, you'll find many other visualization tools built for more specific use cases.\n",
"It can be hard to keep track of all of them, but a good resource for learning about this wide variety of visualization tools is https://pyviz.org/, an open, community-driven site containing tutorials and examples of many different visualization tools."
]
}
],
"metadata": {
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -89,9 +55,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@ -1,27 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Further Resources](04.15-Further-Resources.ipynb) | [Contents](Index.ipynb) | [What Is Machine Learning?](05.01-What-Is-Machine-Learning.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.00-Machine-Learning.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -33,43 +11,28 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In many ways, machine learning is the primary means by which data science manifests itself to the broader world.\n",
"Machine learning is where these computational and algorithmic skills of data science meet the statistical thinking of data science, and the result is a collection of approaches to inference and data exploration that are not about effective theory so much as effective computation.\n",
"\n",
"The term \"machine learning\" is sometimes thrown around as if it is some kind of magic pill: *apply machine learning to your data, and all your problems will be solved!*\n",
"As you might expect, the reality is rarely this simple.\n",
"While these methods can be incredibly powerful, to be effective they must be approached with a firm grasp of the strengths and weaknesses of each method, as well as a grasp of general concepts such as bias and variance, overfitting and underfitting, and more.\n",
"\n",
"This chapter will dive into practical aspects of machine learning, primarily using Python's [Scikit-Learn](http://scikit-learn.org) package.\n",
"This final part is an introduction to the very broad topic of machine learning, mainly via Python's [Scikit-Learn](http://scikit-learn.org) package.\n",
"You can think of machine learning as a class of algorithms that allow a program to detect particular patterns in a dataset, and thus \"learn\" from the data to draw inferences from it.\n",
"This is not meant to be a comprehensive introduction to the field of machine learning; that is a large subject and necessitates a more technical approach than we take here.\n",
"Nor is it meant to be a comprehensive manual for the use of the Scikit-Learn package (for this, you can refer to the resources listed in [Further Machine Learning Resources](05.15-Learning-More.ipynb)).\n",
"Rather, the goals of this chapter are:\n",
"Rather, the goals here are:\n",
"\n",
"- To introduce the fundamental vocabulary and concepts of machine learning.\n",
"- To introduce the Scikit-Learn API and show some examples of its use.\n",
"- To take a deeper dive into the details of several of the most important machine learning approaches, and develop an intuition into how they work and when and where they are applicable.\n",
"- To introduce the fundamental vocabulary and concepts of machine learning\n",
"- To introduce the Scikit-Learn API and show some examples of its use\n",
"- To take a deeper dive into the details of several of the more important classical machine learning approaches, and develop an intuition into how they work and when and where they are applicable\n",
"\n",
"Much of this material is drawn from the Scikit-Learn tutorials and workshops I have given on several occasions at PyCon, SciPy, PyData, and other conferences.\n",
"Any clarity in the following pages is likely due to the many workshop participants and co-instructors who have given me valuable feedback on this material over the years!\n",
"\n",
"Finally, if you are seeking a more comprehensive or technical treatment of any of these subjects, I've listed several resources and references in [Further Machine Learning Resources](05.15-Learning-More.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!--NAVIGATION-->\n",
"< [Further Resources](04.15-Further-Resources.ipynb) | [Contents](Index.ipynb) | [What Is Machine Learning?](05.01-What-Is-Machine-Learning.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.00-Machine-Learning.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"Any clarity in the following pages is likely due to the many workshop participants and co-instructors who have given me valuable feedback on this material over the years!"
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -83,9 +46,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

View File

@ -1,33 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"<!--NAVIGATION-->\n",
"< [Machine Learning](05.00-Machine-Learning.ipynb) | [Contents](Index.ipynb) | [Introducing Scikit-Learn](05.02-Introducing-Scikit-Learn.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.01-What-Is-Machine-Learning.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -42,12 +14,11 @@
"editable": true
},
"source": [
"Before we take a look at the details of various machine learning methods, let's start by looking at what machine learning is, and what it isn't.\n",
"Machine learning is often categorized as a subfield of artificial intelligence, but I find that categorization can often be misleading at first brush.\n",
"Before we take a look at the details of several machine learning methods, let's start by looking at what machine learning is, and what it isn't.\n",
"Machine learning is often categorized as a subfield of artificial intelligence, but I find that categorization can be misleading.\n",
"The study of machine learning certainly arose from research in this context, but in the data science application of machine learning methods, it's more helpful to think of machine learning as a means of *building models of data*.\n",
"\n",
"Fundamentally, machine learning involves building mathematical models to help understand data.\n",
"\"Learning\" enters the fray when we give these models *tunable parameters* that can be adapted to observed data; in this way the program can be considered to be \"learning\" from the data.\n",
"In this context, \"learning\" enters the fray when we give these models *tunable parameters* that can be adapted to observed data; in this way the program can be considered to be \"learning\" from the data.\n",
"Once these models have been fit to previously seen data, they can be used to predict and understand aspects of newly observed data.\n",
"I'll leave to the reader the more philosophical digression regarding the extent to which this type of mathematical, model-based \"learning\" is similar to the \"learning\" exhibited by the human brain.\n",
"\n",
@ -63,22 +34,23 @@
"source": [
"## Categories of Machine Learning\n",
"\n",
"At the most fundamental level, machine learning can be categorized into two main types: supervised learning and unsupervised learning.\n",
"Machine learning can be categorized into two main types: supervised learning and unsupervised learning.\n",
"\n",
"*Supervised learning* involves somehow modeling the relationship between measured features of data and some label associated with the data; once this model is determined, it can be used to apply labels to new, unknown data.\n",
"This is further subdivided into *classification* tasks and *regression* tasks: in classification, the labels are discrete categories, while in regression, the labels are continuous quantities.\n",
"We will see examples of both types of supervised learning in the following section.\n",
"*Supervised learning* involves somehow modeling the relationship between measured features of data and some labels associated with the data; once this model is determined, it can be used to apply labels to new, unknown data.\n",
"This is sometimes further subdivided into classification tasks and regression tasks: in *classification*, the labels are discrete categories, while in *regression*, the labels are continuous quantities.\n",
"You will see examples of both types of supervised learning in the following section.\n",
"\n",
"*Unsupervised learning* involves modeling the features of a dataset without reference to any label, and is often described as \"letting the dataset speak for itself.\"\n",
"*Unsupervised learning* involves modeling the features of a dataset without reference to any label.\n",
"These models include tasks such as *clustering* and *dimensionality reduction.*\n",
"Clustering algorithms identify distinct groups of data, while dimensionality reduction algorithms search for more succinct representations of the data.\n",
"We will see examples of both types of unsupervised learning in the following section.\n",
"You will also see examples of both types of unsupervised learning in the following section.\n",
"\n",
"In addition, there are so-called *semi-supervised learning* methods, which falls somewhere between supervised learning and unsupervised learning.\n",
"In addition, there are so-called *semi-supervised learning* methods, which fall somewhere between supervised learning and unsupervised learning.\n",
"Semi-supervised learning methods are often useful when only incomplete labels are available."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"deletable": true,
@ -88,9 +60,9 @@
"## Qualitative Examples of Machine Learning Applications\n",
"\n",
"To make these ideas more concrete, let's take a look at a few very simple examples of a machine learning task.\n",
"These examples are meant to give an intuitive, non-quantitative overview of the types of machine learning tasks we will be looking at in this chapter.\n",
"In later sections, we will go into more depth regarding the particular models and how they are used.\n",
"For a preview of these more technical aspects, you can find the Python source that generates the following figures in the [Appendix: Figure Code](06.00-Figure-Code.ipynb).\n"
"These examples are meant to give an intuitive, non-quantitative overview of the types of machine learning tasks we will be looking at in this part of the book.\n",
"In later chapters, we will go into more depth regarding the particular models and how they are used.\n",
"For a preview of these more technical aspects, you can find the Python source that generates the following figures in the online [appendix](https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/06.00-Figure-Code.ipynb).\n"
]
},
{
@ -100,9 +72,9 @@
"editable": true
},
"source": [
"### Classification: Predicting discrete labels\n",
"### Classification: Predicting Discrete Labels\n",
"\n",
"We will first take a look at a simple *classification* task, in which you are given a set of labeled points and want to use these to classify some unlabeled points.\n",
"We will first take a look at a simple classification task, in which we are given a set of labeled points and want to use these to classify some unlabeled points.\n",
"\n",
"Imagine that we have the data shown in this figure:"
]
@ -114,8 +86,7 @@
"editable": true
},
"source": [
"![](figures/05.01-classification-1.png)\n",
"[figure source in Appendix](06.00-Figure-Code.ipynb#Classification-Example-Figure-1)"
"![](images/05.01-classification-1.png)"
]
},
{
@ -125,15 +96,15 @@
"editable": true
},
"source": [
"Here we have two-dimensional data: that is, we have two *features* for each point, represented by the *(x,y)* positions of the points on the plane.\n",
"This data is two-dimensional: that is, we have two *features* for each point, represented by the (x,y) positions of the points on the plane.\n",
"In addition, we have one of two *class labels* for each point, here represented by the colors of the points.\n",
"From these features and labels, we would like to create a model that will let us decide whether a new point should be labeled \"blue\" or \"red.\"\n",
"\n",
"There are a number of possible models for such a classification task, but here we will use an extremely simple one. We will make the assumption that the two groups can be separated by drawing a straight line through the plane between them, such that points on each side of the line fall in the same group.\n",
"Here the *model* is a quantitative version of the statement \"a straight line separates the classes\", while the *model parameters* are the particular numbers describing the location and orientation of that line for our data.\n",
"There are a number of possible models for such a classification task, but we will start with a very simple one. We will make the assumption that the two groups can be separated by drawing a straight line through the plane between them, such that points on each side of the line all fall in the same group.\n",
"Here the *model* is a quantitative version of the statement \"a straight line separates the classes,\" while the *model parameters* are the particular numbers describing the location and orientation of that line for our data.\n",
"The optimal values for these model parameters are learned from the data (this is the \"learning\" in machine learning), which is often called *training the model*.\n",
"\n",
"The following figure shows a visual representation of what the trained model looks like for this data:"
"See the following figure shows a visual representation of what the trained model looks like for this data."
]
},
{
@ -143,8 +114,7 @@
"editable": true
},
"source": [
"![](figures/05.01-classification-2.png)\n",
"[figure source in Appendix](06.00-Figure-Code.ipynb#Classification-Example-Figure-2)"
"![](images/05.01-classification-2.png)"
]
},
{
@ -155,8 +125,8 @@
},
"source": [
"Now that this model has been trained, it can be generalized to new, unlabeled data.\n",
"In other words, we can take a new set of data, draw this model line through it, and assign labels to the new points based on this model.\n",
"This stage is usually called *prediction*. See the following figure:"
"In other words, we can take a new set of data, draw this line through it, and assign labels to the new points based on this model (see the following figure).\n",
"This stage is usually called *prediction*."
]
},
{
@ -166,8 +136,7 @@
"editable": true
},
"source": [
"![](figures/05.01-classification-3.png)\n",
"[figure source in Appendix](06.00-Figure-Code.ipynb#Classification-Example-Figure-3)"
"![](images/05.01-classification-3.png)"
]
},
{
@ -178,12 +147,12 @@
},
"source": [
"This is the basic idea of a classification task in machine learning, where \"classification\" indicates that the data has discrete class labels.\n",
"At first glance this may look fairly trivial: it would be relatively easy to simply look at this data and draw such a discriminatory line to accomplish this classification.\n",
"At first glance this may seem trivial: it's easy to look at our data and draw such a discriminatory line to accomplish this classification.\n",
"A benefit of the machine learning approach, however, is that it can generalize to much larger datasets in many more dimensions.\n",
"\n",
"For example, this is similar to the task of automated spam detection for email; in this case, we might use the following features and labels:\n",
"For example, this is similar to the task of automated spam detection for email. In this case, we might use the following features and labels:\n",
"\n",
"- *feature 1*, *feature 2*, etc. $\\to$ normalized counts of important words or phrases (\"Viagra\", \"Nigerian prince\", etc.)\n",
"- *feature 1*, *feature 2*, etc. $\\to$ normalized counts of important words or phrases (\"Viagra\", \"Extended warranty\", etc.)\n",
"- *label* $\\to$ \"spam\" or \"not spam\"\n",
"\n",
"For the training set, these labels might be determined by individual inspection of a small representative sample of emails; for the remaining emails, the label would be determined using the model.\n",
@ -200,11 +169,11 @@
"editable": true
},
"source": [
"### Regression: Predicting continuous labels\n",
"### Regression: Predicting Continuous Labels\n",
"\n",
"In contrast with the discrete labels of a classification algorithm, we will next look at a simple *regression* task in which the labels are continuous quantities.\n",
"In contrast with the discrete labels of a classification algorithm, we will next look at a simple regression task in which the labels are continuous quantities.\n",
"\n",
"Consider the data shown in the following figure, which consists of a set of points each with a continuous label:"
"Consider the data shown in the following figure, which consists of a set of points each with a continuous label."
]
},
{
@ -214,8 +183,7 @@
"editable": true
},
"source": [
"![](figures/05.01-regression-1.png)\n",
"[figure source in Appendix](06.00-Figure-Code.ipynb#Regression-Example-Figure-1)"
"![](images/05.01-regression-1.png)"
]
},
{
@ -228,8 +196,8 @@
"As with the classification example, we have two-dimensional data: that is, there are two features describing each data point.\n",
"The color of each point represents the continuous label for that point.\n",
"\n",
"There are a number of possible regression models we might use for this type of data, but here we will use a simple linear regression to predict the points.\n",
"This simple linear regression model assumes that if we treat the label as a third spatial dimension, we can fit a plane to the data.\n",
"There are a number of possible regression models we might use for this type of data, but here we will use a simple linear regression model to predict the points.\n",
"This simple model assumes that if we treat the label as a third spatial dimension, we can fit a plane to the data.\n",
"This is a higher-level generalization of the well-known problem of fitting a line to data with two coordinates.\n",
"\n",
"We can visualize this setup as shown in the following figure:"
@ -242,8 +210,7 @@
"editable": true
},
"source": [
"![](figures/05.01-regression-2.png)\n",
"[figure source in Appendix](06.00-Figure-Code.ipynb#Regression-Example-Figure-2)"
"![](images/05.01-regression-2.png)"
]
},
{
@ -253,7 +220,7 @@
"editable": true
},
"source": [
"Notice that the *feature 1-feature 2* plane here is the same as in the two-dimensional plot from before; in this case, however, we have represented the labels by both color and three-dimensional axis position.\n",
"Notice that the *feature 1feature 2* plane here is the same as in the two-dimensional plot in Figure 37-4; in this case, however, we have represented the labels by both color and three-dimensional axis position.\n",
"From this view, it seems reasonable that fitting a plane through this three-dimensional data would allow us to predict the expected label for any set of input parameters.\n",
"Returning to the two-dimensional projection, when we fit such a plane we get the result shown in the following figure:"
]
@ -265,8 +232,7 @@
"editable": true
},
"source": [
"![](figures/05.01-regression-3.png)\n",
"[figure source in Appendix](06.00-Figure-Code.ipynb#Regression-Example-Figure-3)"
"![](images/05.01-regression-3.png)"
]
},
{
@ -287,8 +253,7 @@
"editable": true
},
"source": [
"![](figures/05.01-regression-4.png)\n",
"[figure source in Appendix](06.00-Figure-Code.ipynb#Regression-Example-Figure-4)"
"![](images/05.01-regression-4.png)"
]
},
{
@ -298,15 +263,15 @@
"editable": true
},
"source": [
"As with the classification example, this may seem rather trivial in a low number of dimensions.\n",
"As with the classification example, this task may seem trivial in a low number of dimensions.\n",
"But the power of these methods is that they can be straightforwardly applied and evaluated in the case of data with many, many features.\n",
"\n",
"For example, this is similar to the task of computing the distance to galaxies observed through a telescope—in this case, we might use the following features and labels:\n",
"\n",
"- *feature 1*, *feature 2*, etc. $\\to$ brightness of each galaxy at one of several wave lengths or colors\n",
"- *feature 1*, *feature 2*, etc. $\\to$ brightness of each galaxy at one of several wavelengths or colors\n",
"- *label* $\\to$ distance or redshift of the galaxy\n",
"\n",
"The distances for a small number of these galaxies might be determined through an independent set of (typically more expensive) observations.\n",
"The distances for a small number of these galaxies might be determined through an independent set of (typically more expensive or complex) observations.\n",
"Distances to remaining galaxies could then be estimated using a suitable regression model, without the need to employ the more expensive observation across the entire set.\n",
"In astronomy circles, this is known as the \"photometric redshift\" problem.\n",
"\n",
@ -320,9 +285,9 @@
"editable": true
},
"source": [
"### Clustering: Inferring labels on unlabeled data\n",
"### Clustering: Inferring Labels on Unlabeled Data\n",
"\n",
"The classification and regression illustrations we just looked at are examples of supervised learning algorithms, in which we are trying to build a model that will predict labels for new data.\n",
"The classification and regression illustrations we just saw are examples of supervised learning algorithms, in which we are trying to build a model that will predict labels for new data.\n",
"Unsupervised learning involves models that describe data without reference to any known labels.\n",
"\n",
"One common case of unsupervised learning is \"clustering,\" in which data is automatically assigned to some number of discrete groups.\n",
@ -336,8 +301,7 @@
"editable": true
},
"source": [
"![](figures/05.01-clustering-1.png)\n",
"[figure source in Appendix](06.00-Figure-Code.ipynb#Clustering-Example-Figure-2)"
"![](images/05.01-clustering-1.png)"
]
},
{
@ -359,8 +323,7 @@
"editable": true
},
"source": [
"![](figures/05.01-clustering-2.png)\n",
"[figure source in Appendix](06.00-Figure-Code.ipynb#Clustering-Example-Figure-2)"
"![](images/05.01-clustering-2.png)"
]
},
{
@ -371,10 +334,10 @@
},
"source": [
"*k*-means fits a model consisting of *k* cluster centers; the optimal centers are assumed to be those that minimize the distance of each point from its assigned center.\n",
"Again, this might seem like a trivial exercise in two dimensions, but as our data becomes larger and more complex, such clustering algorithms can be employed to extract useful information from the dataset.\n",
"Again, this might seem like a trivial exercise in two dimensions, but as our data becomes larger and more complex such clustering algorithms can continue to be employed to extract useful information from the dataset.\n",
"\n",
"We will discuss the *k*-means algorithm in more depth in [In Depth: K-Means Clustering](05.11-K-Means.ipynb).\n",
"Other important clustering algorithms include Gaussian mixture models (See [In Depth: Gaussian Mixture Models](05.12-Gaussian-Mixtures.ipynb)) and spectral clustering (See [Scikit-Learn's clustering documentation](http://scikit-learn.org/stable/modules/clustering.html))."
"Other important clustering algorithms include Gaussian mixture models (see [In Depth: Gaussian Mixture Models](05.12-Gaussian-Mixtures.ipynb)) and spectral clustering (see [Scikit-Learn's clustering documentation](http://scikit-learn.org/stable/modules/clustering.html))."
]
},
{
@ -384,7 +347,7 @@
"editable": true
},
"source": [
"### Dimensionality reduction: Inferring structure of unlabeled data\n",
"### Dimensionality Reduction: Inferring Structure of Unlabeled Data\n",
"\n",
"Dimensionality reduction is another example of an unsupervised algorithm, in which labels or other information are inferred from the structure of the dataset itself.\n",
"Dimensionality reduction is a bit more abstract than the examples we looked at before, but generally it seeks to pull out some low-dimensional representation of data that in some way preserves relevant qualities of the full dataset.\n",
@ -400,8 +363,7 @@
"editable": true
},
"source": [
"![](figures/05.01-dimesionality-1.png)\n",
"[figure source in Appendix](06.00-Figure-Code.ipynb#Dimensionality-Reduction-Example-Figure-1)"
"![](images/05.01-dimesionality-1.png)"
]
},
{
@ -412,10 +374,10 @@
},
"source": [
"Visually, it is clear that there is some structure in this data: it is drawn from a one-dimensional line that is arranged in a spiral within this two-dimensional space.\n",
"In a sense, you could say that this data is \"intrinsically\" only one dimensional, though this one-dimensional data is embedded in higher-dimensional space.\n",
"A suitable dimensionality reduction model in this case would be sensitive to this nonlinear embedded structure, and be able to pull out this lower-dimensionality representation.\n",
"In a sense, you could say that this data is \"intrinsically\" only one-dimensional, though this one-dimensional data is embedded in two-dimensional space.\n",
"A suitable dimensionality reduction model in this case would be sensitive to this nonlinear embedded structure and be able to detect this lower-dimensionality representation.\n",
"\n",
"The following figure shows a visualization of the results of the Isomap algorithm, a manifold learning algorithm that does exactly this:"
"The following figure shows a visualization of the results of the Isomap algorithm, a manifold learning algorithm that does exactly this."
]
},
{
@ -425,8 +387,7 @@
"editable": true
},
"source": [
"![](figures/05.01-dimesionality-2.png)\n",
"[figure source in Appendix](06.00-Figure-Code.ipynb#Dimensionality-Reduction-Example-Figure-2)"
"![](images/05.01-dimesionality-2.png)"
]
},
{
@ -439,9 +400,9 @@
"Notice that the colors (which represent the extracted one-dimensional latent variable) change uniformly along the spiral, which indicates that the algorithm did in fact detect the structure we saw by eye.\n",
"As with the previous examples, the power of dimensionality reduction algorithms becomes clearer in higher-dimensional cases.\n",
"For example, we might wish to visualize important relationships within a dataset that has 100 or 1,000 features.\n",
"Visualizing 1,000-dimensional data is a challenge, and one way we can make this more manageable is to use a dimensionality reduction technique to reduce the data to two or three dimensions.\n",
"Visualizing 1,000-dimensional data is a challenge, and one way we can make this more manageable is to use a dimensionality reduction technique to reduce the data to 2 or 3 dimensions.\n",
"\n",
"Some important dimensionality reduction algorithms that we will discuss are principal component analysis (see [In Depth: Principal Component Analysis](05.09-Principal-Component-Analysis.ipynb)) and various manifold learning algorithms, including Isomap and locally linear embedding (See [In-Depth: Manifold Learning](05.10-Manifold-Learning.ipynb))."
"Some important dimensionality reduction algorithms that we will discuss are principal component analysis (see [In Depth: Principal Component Analysis](05.09-Principal-Component-Analysis.ipynb)) and various manifold learning algorithms, including Isomap and locally linear embedding (see [In-Depth: Manifold Learning](05.10-Manifold-Learning.ipynb))."
]
},
{
@ -454,7 +415,7 @@
"## Summary\n",
"\n",
"Here we have seen a few simple examples of some of the basic types of machine learning approaches.\n",
"Needless to say, there are a number of important practical details that we have glossed over, but I hope this section was enough to give you a basic idea of what types of problems machine learning approaches can solve.\n",
"Needless to say, there are a number of important practical details that we have glossed over, but this chapter was designed to give you a basic idea of what types of problems machine learning approaches can solve.\n",
"\n",
"In short, we saw the following:\n",
"\n",
@ -470,27 +431,17 @@
" \n",
"In the following sections we will go into much greater depth within these categories, and see some more interesting examples of where these concepts can be useful.\n",
"\n",
"All of the figures in the preceding discussion are generated based on actual machine learning computations; the code behind them can be found in [Appendix: Figure Code](06.00-Figure-Code.ipynb)."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"<!--NAVIGATION-->\n",
"< [Machine Learning](05.00-Machine-Learning.ipynb) | [Contents](Index.ipynb) | [Introducing Scikit-Learn](05.02-Introducing-Scikit-Learn.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.01-What-Is-Machine-Learning.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"All of the figures in the preceding discussion are generated based on actual machine learning computations; the code behind them can be found in [Appendix: Figure Code](https://github.com/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/06.00-Figure-Code.ipynb)."
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -504,9 +455,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,33 +1,5 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"<!--BOOK_INFORMATION-->\n",
"<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
"\n",
"*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
"\n",
"*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"<!--NAVIGATION-->\n",
"< [Application: A Face Detection Pipeline](05.14-Image-Features.ipynb) | [Contents](Index.ipynb) | [Appendix: Figure Code](06.00-Figure-Code.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.15-Learning-More.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
@ -36,76 +8,34 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"This chapter has been a quick tour of machine learning in Python, primarily using the tools within the Scikit-Learn library.\n",
"As long as the chapter is, it is still too short to cover many interesting and important algorithms, approaches, and discussions.\n",
"Here I want to suggest some resources to learn more about machine learning for those who are interested."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## Machine Learning in Python\n",
"This part of the book has been a quick tour of machine learning in Python, primarily using the tools within the Scikit-Learn library.\n",
"As long as these chapters are, they are still too short to cover many interesting and important algorithms, approaches, and discussions.\n",
"Here I want to suggest some resources to learn more about machine learning in Python, for those who are interested:\n",
"\n",
"To learn more about machine learning in Python, I'd suggest some of the following resources:\n",
"- [The Scikit-Learn website](http://scikit-learn.org): The Scikit-Learn website has an impressive breadth of documentation and examples covering some of the models discussed here, and much, much more. If you want a brief survey of the most important and often-used machine learning algorithms, this is a good place to start.\n",
"\n",
"- [The Scikit-Learn website](http://scikit-learn.org): The Scikit-Learn website has an impressive breadth of documentation and examples covering some of the models discussed here, and much, much more. If you want a brief survey of the most important and often-used machine learning algorithms, this website is a good place to start.\n",
"- *SciPy, PyCon, and PyData tutorial videos*: Scikit-Learn and other machine learning topics are perennial favorites in the tutorial tracks of many Python-focused conference series, in particular the PyCon, SciPy, and PyData conferences. Most of these conferences publish videos of their keynotes, talks, and tutorials for free online, and you should be able to find these easily via a suitable web search (for example, \"PyCon 2022 videos\").\n",
"\n",
"- *SciPy, PyCon, and PyData tutorial videos*: Scikit-Learn and other machine learning topics are perennial favorites in the tutorial tracks of many Python-focused conference series, in particular the PyCon, SciPy, and PyData conferences. You can find the most recent ones via a simple web search.\n",
"- [*Introduction to Machine Learning with Python*](http://shop.oreilly.com/product/0636920030515.do), by Andreas C. Müller and Sarah Guido (O'Reilly). This book covers many of the machine learning fundamentals discussed in these chapters, but is particularly relevant for its coverage of more advanced features of Scikit-Learn, including additional estimators, model validation approaches, and pipelining.\n",
"\n",
"- [*Introduction to Machine Learning with Python*](http://shop.oreilly.com/product/0636920030515.do): Written by Andreas C. Mueller and Sarah Guido, this book includes a fuller treatment of the topics in this chapter. If you're interested in reviewing the fundamentals of Machine Learning and pushing the Scikit-Learn toolkit to its limits, this is a great resource, written by one of the most prolific developers on the Scikit-Learn team.\n",
"\n",
"- [*Python Machine Learning*](https://www.packtpub.com/big-data-and-business-intelligence/python-machine-learning): Sebastian Raschka's book focuses less on Scikit-learn itself, and more on the breadth of machine learning tools available in Python. In particular, there is some very useful discussion on how to scale Python-based machine learning approaches to large and complex datasets."
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"## General Machine Learning\n",
"\n",
"Of course, machine learning is much broader than just the Python world. There are many good resources to take your knowledge further, and here I will highlight a few that I have found useful:\n",
"\n",
"- [*Machine Learning*](https://www.coursera.org/learn/machine-learning): Taught by Andrew Ng (Coursera), this is a very clearly-taught free online course which covers the basics of machine learning from an algorithmic perspective. It assumes undergraduate-level understanding of mathematics and programming, and steps through detailed considerations of some of the most important machine learning algorithms. Homework assignments, which are algorithmically graded, have you actually implement some of these models yourself.\n",
"\n",
"- [*Pattern Recognition and Machine Learning*](http://www.springer.com/us/book/9780387310732): Written by Christopher Bishop, this classic technical text covers the concepts of machine learning discussed in this chapter in detail. If you plan to go further in this subject, you should have this book on your shelf.\n",
"\n",
"- [*Machine Learning: a Probabilistic Perspective*](https://mitpress.mit.edu/books/machine-learning-0): Written by Kevin Murphy, this is an excellent graduate-level text that explores nearly all important machine learning algorithms from a ground-up, unified probabilistic perspective.\n",
"\n",
"These resources are more technical than the material presented in this book, but to really understand the fundamentals of these methods requires a deep dive into the mathematics behind them.\n",
"If you're up for the challenge and ready to bring your data science to the next level, don't hesitate to dive-in!"
]
},
{
"cell_type": "markdown",
"metadata": {
"deletable": true,
"editable": true
},
"source": [
"<!--NAVIGATION-->\n",
"< [Application: A Face Detection Pipeline](05.14-Image-Features.ipynb) | [Contents](Index.ipynb) | [Appendix: Figure Code](06.00-Figure-Code.ipynb) >\n",
"\n",
"<a href=\"https://colab.research.google.com/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/05.15-Learning-More.ipynb\"><img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open in Colab\" title=\"Open and Execute in Google Colaboratory\"></a>\n"
"- [*Machine Learning with PyTorch and Scikit-Learn*](https://www.packtpub.com/product/machine-learning-with-pytorch-and-scikit-learn/9781801819312), by Sebastian Raschka (Packt). Sebastian Raschka's most recent book starts with some of the fundamental topics covered in these chapters, but goes deeper and shows how those concepts apply to more sophisticated and computationally intensive deep learning and reinforcement learning models using the well-known [PyTorch library](https://pytorch.org/)."
]
}
],
"metadata": {
"anaconda-cloud": {},
"jupytext": {
"formats": "ipynb,md"
},
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
@ -119,9 +49,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
"version": "3.9.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
"nbformat_minor": 4
}

File diff suppressed because one or more lines are too long

View File

@ -0,0 +1,6 @@
{
"cells": [],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 5
}

View File

@ -41,3 +41,5 @@ order,name,height(cm)
42,Bill Clinton,188
43,George W. Bush,182
44,Barack Obama,185
45,Donald Trump,191
46,Joseph Biden,182

1 order name height(cm)
41 42 Bill Clinton 188
42 43 George W. Bush 182
43 44 Barack Obama 185
44 45 Donald Trump 191
45 46 Joseph Biden 182

Binary file not shown.

Before

Width:  |  Height:  |  Size: 17 KiB

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 25 KiB

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 14 KiB

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 20 KiB

After

Width:  |  Height:  |  Size: 22 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 24 KiB

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 11 KiB

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 24 KiB

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 13 KiB

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 31 KiB

After

Width:  |  Height:  |  Size: 26 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 37 KiB

After

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 55 KiB

After

Width:  |  Height:  |  Size: 45 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 43 KiB

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 36 KiB

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.5 KiB

After

Width:  |  Height:  |  Size: 6.1 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.6 KiB

After

Width:  |  Height:  |  Size: 6.4 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 10 KiB

After

Width:  |  Height:  |  Size: 10 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 44 KiB

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 36 KiB

After

Width:  |  Height:  |  Size: 32 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 30 KiB

After

Width:  |  Height:  |  Size: 24 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 36 KiB

After

Width:  |  Height:  |  Size: 31 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 50 KiB

After

Width:  |  Height:  |  Size: 37 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 30 KiB

After

Width:  |  Height:  |  Size: 21 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 118 KiB

After

Width:  |  Height:  |  Size: 103 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 48 KiB

After

Width:  |  Height:  |  Size: 48 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 33 KiB

After

Width:  |  Height:  |  Size: 35 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 30 KiB

After

Width:  |  Height:  |  Size: 34 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 15 KiB

After

Width:  |  Height:  |  Size: 14 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 7.2 KiB

After

Width:  |  Height:  |  Size: 7.2 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 156 KiB

After

Width:  |  Height:  |  Size: 190 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 152 KiB

After

Width:  |  Height:  |  Size: 167 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 60 KiB

After

Width:  |  Height:  |  Size: 62 KiB

View File

@ -1,6 +1,6 @@
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt; plt.rcParams['figure.dpi'] = 600
from sklearn.tree import DecisionTreeClassifier
from ipywidgets import interact
@ -30,8 +30,7 @@ def visualize_tree(estimator, X, y, boundaries=True,
Z = Z.reshape(xx.shape)
contours = ax.contourf(xx, yy, Z, alpha=0.3,
levels=np.arange(n_classes + 1) - 0.5,
cmap='viridis', clim=(y.min(), y.max()),
zorder=1)
cmap='viridis', zorder=1)
ax.set(xlim=xlim, ylim=ylim)
@ -63,7 +62,7 @@ def plot_tree_interactive(X, y):
clf = DecisionTreeClassifier(max_depth=depth, random_state=0)
visualize_tree(clf, X, y)
return interact(interactive_tree, depth=[1, 5])
return interact(interactive_tree, depth=(1, 5))
def randomized_tree_interactive(X, y):
@ -80,4 +79,4 @@ def randomized_tree_interactive(X, y):
visualize_tree(clf, X[i[:N]], y[i[:N]], boundaries=False,
xlim=xlim, ylim=ylim)
interact(fit_randomized_tree, random_state=[0, 100]);
interact(fit_randomized_tree, random_state=(0, 100));

Some files were not shown because too many files have changed in this diff Show More