diff --git a/cheat_sheet/Cheat Sheet.ipynb b/cheat_sheet/Cheat Sheet.ipynb deleted file mode 100644 index 83bc104..0000000 --- a/cheat_sheet/Cheat Sheet.ipynb +++ /dev/null @@ -1,2680 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "raw_mimetype": "-" - }, - "source": [ - "# Pandas Cheat Sheet" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Table of contents\n", - "\n", - "\n", - "- **The setup**: anaconda, Python, pandas, Jupyter\n", - "- **Importing data**: from csv (and options), from the web, creating from scratch, convering types, rename cols\n", - "- **Summarizing data**: len(df), shape, value_counts, head, tail, max(), min(), mean, dtype, info(), describe(), memory_usage(), scatter matrix, corr, isnull, notnull, unique(), nlargest\n", - "- **Selecting and computing**: select subset of row and cols, .loc, .iloc, drop columns, assign, apply/map/applymap, multiindex\n", - "- **Filtering and sorting**: >=, AND, OR, ==, ~, str.contains, str.startswith, sort_values, sort_index, filtering on sorted/unsorted, isin()\n", - "- **Split-apply-combine and pivots**: groupby, dt.month, dt.year, groupby.mean(), agg, stack, unstack, pivot, melt, merge\n", - "- **Time series manipulations**: downsampling, upsampling, rolling, mean, simple plotting\n", - "- **Plotting**: built-in plotting, advanced plotting, matplotlib, seaborn, styles, saving\n", - "- **Modeling and machine learning**: .value, feeding data, saving data\n", - "- **Misc tips and tricks**: pandas options, vectorization, timings with %%timeit, profiling with lprun\n", - "\n", - "**principles:** small examples, no more than 5 rows. one or two data sets, no more." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# The setup" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Python and Anaconda\n", - "\n", - "If you haven't done it, start by installing Python.\n", - "The [Anaconda Distribution](https://www.anaconda.com/download/) is great, install version `3.X`.\n", - "- If you're on Windows, you will get a program called *Anaconda Prompt*. Open in at run `conda --version` to verify that everything works.\n", - "- If you're on Linux, open a terminal and run `conda --version`.\n", - "\n", - "## Pandas, NumPy and matplotlib\n", - "\n", - "To install packages, run `conda install `. The Anaconda distribution comes with the three packages we will require, namely [pandas](https://pandas.pydata.org/), [NumPy](http://www.numpy.org/) and [matplotlib](https://matplotlib.org/).\n", - "\n", - "- **NumPy** implements $n$-dimensional arrays in Python for efficient computations. See the [arXiv](https://arxiv.org/pdf/1102.1523.pdf) paper for a nice introduction. To learn basic NumPy, consider doing these [100 NumPy exercises](https://github.com/rougier/numpy-100).\n", - "- **Matplotlib** is the most popular library for plotting in Python. See the beautiful [gallery](https://matplotlib.org/gallery.html) to get an overview of the capabilities of matplotlib.\n", - "- **Pandas** is a library for data analysis based on two objects, the [Series](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html) and the [DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).\n", - "\n", - "## Jupyter\n", - "\n", - "The [Jupyter Notebook](https://jupyter-notebook.readthedocs.io/en/stable/) is an environment in which you can run Python code, display graphs and work with data interactively. Think of it as a tool between the simple terminal and the full fledged IDE. Move to a directory using the `cd` command in the terminal, then run `jupyter notebook` to start up a notebook. " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Importing packages" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "import matplotlib\n", - "%matplotlib inline" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "To make this Jupyter Notebook reproducible, here are the versions of the libraries we will be using." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "pandas version 0.22.0\n", - "numpy version 1.14.2\n", - "matplotlib version 2.2.2\n" - ] - } - ], - "source": [ - "for lib in [pd, np, matplotlib]:\n", - " print(f'{lib.__name__.ljust(12)} version {lib.__version__}')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Importing data\n", - "\n", - "Using `!` let's us use terminal commands. The `head` command shows the first rows of the file." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,movie_title,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes\n", - "Color,James Cameron,723,178,0,855,Joel David Moore,1000,760505847,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,Avatar ,886204,4834,Wes Studi,0,avatar|future|marine|native|paraplegic,http://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1,3054,English,USA,PG-13,237000000,2009,936,7.9,1.78,33000\n" - ] - } - ], - "source": [ - "!head data/movie_metadata.csv -n 2" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "It's a huge file, so we'll only load a couple of columns into a pandas DataFrame.\n", - "To familiarize ourselves with with [magic commands](http://ipython.readthedocs.io/en/stable/interactive/magics.html), we'll use `%%time` to time the execution of the cell below." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Loaded data of size (5043, 6) into memory.\n", - "CPU times: user 44 ms, sys: 525 µs, total: 44.5 ms\n", - "Wall time: 42.9 ms\n" - ] - } - ], - "source": [ - "%%time\n", - "\n", - "cols_to_use = ['movie_title', 'director_name', 'country', 'content_rating', 'imdb_score', 'gross']\n", - "df = pd.read_csv(r'data/movie_metadata.csv', sep=',', usecols=cols_to_use)\n", - "print(f'Loaded data of size {df.shape} into memory.')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The df.shape gives the rows and columns of the DataFrame. \n", - "This leads us naturally to consider summarizations." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Summarizing data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "There are three methods that are useful to peek at the data, they are df.head, df.tail and df.sample.\n", - "Head and tail are $\\mathcal{O}(1)$ operations, while sample is $\\mathcal{O}(n)$, where $n$ is the number of rows.\n", - "For small datasets, this makes no difference in practice. We'll use df.sample here." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
3097Darren Lynn Bousman63270259.0Saw IVUSAR5.9
1999Roman PolanskiNaNCarnageFranceR7.2
\n", - "
" - ], - "text/plain": [ - " director_name gross movie_title country content_rating \\\n", - "3097 Darren Lynn Bousman 63270259.0 Saw IV  USA R \n", - "1999 Roman Polanski NaN Carnage  France R \n", - "\n", - " imdb_score \n", - "3097 5.9 \n", - "1999 7.2 " - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.sample(n=2, replace=False, weights=None, random_state=None)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We should make sure the data types are correct. To do so, we can use df.dtypes, or df.info() for some more information." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "RangeIndex: 5043 entries, 0 to 5042\n", - "Data columns (total 6 columns):\n", - "director_name 4939 non-null object\n", - "gross 4159 non-null float64\n", - "movie_title 5043 non-null object\n", - "country 5038 non-null object\n", - "content_rating 4740 non-null object\n", - "imdb_score 5043 non-null float64\n", - "dtypes: float64(2), object(4)\n", - "memory usage: 236.5+ KB\n" - ] - } - ], - "source": [ - "df.info(verbose=True, memory_usage=True, null_counts=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We have some null values. Let's count them by chaining df.isnull() and df.sum()." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "director_name 104\n", - "gross 884\n", - "movie_title 0\n", - "country 5\n", - "content_rating 303\n", - "imdb_score 0\n", - "dtype: int64" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "null_values = df.isnull().sum()\n", - "null_values" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The result of the above is not a DataFrame, but a Series." - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "pandas.core.series.Series" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "type(null_values)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We can make the output prettier by converting null_values to a DataFrame using to_frame(), then transposing using .T, and finally renaming the first index." - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
Missing values104884053030
\n", - "
" - ], - "text/plain": [ - " director_name gross movie_title country content_rating \\\n", - "Missing values 104 884 0 5 303 \n", - "\n", - " imdb_score \n", - "Missing values 0 " - ] - }, - "execution_count": 9, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "null_values.to_frame().T.rename(index={0:'Missing values'})" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The above is called method chaining, and can be written like so:" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
Missing values104884053030
\n", - "
" - ], - "text/plain": [ - " director_name gross movie_title country content_rating \\\n", - "Missing values 104 884 0 5 303 \n", - "\n", - " imdb_score \n", - "Missing values 0 " - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "(df\n", - " .isnull() # Figure out whether every entry is null (missing), or not\n", - " .sum(axis=0) # Sum over each column, axis=0 is the default\n", - " .to_frame() # The result is a Series, convert to DataFrame\n", - " .T # Transpose (switch rows and columns)\n", - " .rename(index={0:'Missing values'}) # Rename the index and show it\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "A tour of summarization would not be completed without df.describe().\n", - "Calling df.count(), df.nunique(), df.mean(), df.std(), df.min(), df.quantile(), df.max() is also possible." - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
count493941595043503847405043
unique239849176518
topSteven SpielbergKing KongUSAR
freq26338072118
mean4.84684e+076.44214
std6.8453e+071.12512
min1621.6
50%2.55175e+076.6
max7.60506e+089.5
\n", - "
" - ], - "text/plain": [ - " director_name gross movie_title country content_rating \\\n", - "count 4939 4159 5043 5038 4740 \n", - "unique 2398 4917 65 18 \n", - "top Steven Spielberg King Kong  USA R \n", - "freq 26 3 3807 2118 \n", - "mean 4.84684e+07 \n", - "std 6.8453e+07 \n", - "min 162 \n", - "50% 2.55175e+07 \n", - "max 7.60506e+08 \n", - "\n", - " imdb_score \n", - "count 5043 \n", - "unique \n", - "top \n", - "freq \n", - "mean 6.44214 \n", - "std 1.12512 \n", - "min 1.6 \n", - "50% 6.6 \n", - "max 9.5 " - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.describe(percentiles=[0.5], include='all').fillna('')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Visualizations" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
grossimdb_score
gross1.0000000.096247
imdb_score0.0962471.000000
\n", - "
" - ], - "text/plain": [ - " gross imdb_score\n", - "gross 1.000000 0.096247\n", - "imdb_score 0.096247 1.000000" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.corr(method='spearman', min_periods=1)" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAhIAAAE8CAYAAACVc1hkAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAIABJREFUeJzs3XeUXNd94Pnvfa9y6pzQjUY3EolAACTBTEqUqGhZpGlbGkuW7XEYe+SZY48n7KQ9Z9YT9kzYsWfPenY98npkr23RsiTLsnIWk0gCJIhEgEjdjc7V3ZXzS3f/eIUiwAaIRgONQP4+5+h096t6VfeVIL1f3fu7v5/SWiOEEEIIsRrGjR6AEEIIIW5dEkgIIYQQYtUkkBBCCCHEqkkgIYQQQohVk0BCCCGEEKsmgYQQQgghVk0CCSGEEEKsmgQSQgghhFg1CSSEEEIIsWqBGz2AW0F3d7ceGRm50cMQ4h1pYmIC+d+fENffK6+8sqS17rnc8ySQWIGRkRFefvnlGz0MId6R9u7dK//7E7eUquWQKVusa49iGupGD2fVlFJnV/I8CSSEEEKIa8R2PT730iSlusO2gRQf2tl/o4e05iRHQgghhLhGbNej3HAAyFasGzya60NmJIQQQohrJBYK8L5tfZzNVNk70nGjh3NdSCAhhBBCXEM7B9vYOdh2o4dx3UggcRVG/sXXr/lrTvzHj1zz1xRCCCHWiuRICCGEEGLVJJAQQgghxKpJICGEEEKIVZNAQgghhBCrJoGEEEIIIVZNAgkhhBBCrJoEEkIIIYRYNQkkhBBCCLFqEkgIIYQQYtUkkBBCCCHEqkkgIYQQQohVk0BCCCGEEKsmTbuEEEKIVchVLL5ycAbDUDyxZ5C2aPBGD+mGkBkJIYQQ7xi26/HVQ7M8tW+SpXLjql7rRLpErmqTKVucXihfoxHeeiSQEEII8Y5xNlPh9EKZ+UKdg5P5q3qtjT1xoiGTRDjAaHf8Go3w1iNLG0IIId4xelMRYiGTuu2xoSt2da+VjPAb79qIUuoaje7WJIGEEEKId4xUJMgvPzSK62miIfOqX++dHkSABBJCCCHeYUIBWdW/liSQEEIIIdbY0ycW+LMXzzLYEeNf/8S2t1Uw8/a5EiGEEOIm9e3X5inVHV6fK3IyXbrRw7mmJJAQQggh1tj9m7owlKK/LcLGnrfXDo81XdpQSv0i8EuACfw88E+BvcABrfVvN5/z+zfLMSGEEGItPL57kA/tGHhbLWmcs2ZXpJQaBN6ttX5Ma/0o0AfEtdaPACGl1D1KqbtulmNr9TkIIYQQ8PZN8lzLGYkPAqZS6vvAMeB14HvNx74H3A94N9Gx/Vd5vUIIIcQ7zlqGR31ASGv9GFAF2oFi87EC0HGTHbuAUurXlVIvK6VeXlxcvPKrF0IIId4B1jKQKABPN3//QfNn6ryf+eZ/bpZjF9Baf0ZrvVdrvbenp+cKLlsIIYR451jLQOLHwK7m73sADTzW/Pt9wIvACzfRMSGEEEJcoTULJLTWB4GaUupHwD3A/wHUlVLPAp7Wep/W+sDNcmytPgchhBDi7WxNt39qrf/pmw4t22Z5sa2XN+qYEEIIIa7M23MvihBCCCGuCwkkhBBCCLFqEkgIIYQQYtUkkBBCCCHEqkkbcSGEEOIWZ7se+8azmIbi3pFODENdt/eWQEIIIYS4xR2cyrNvPAtAIhxg52DbdXtvWdoQQgghbnHx0BvzAonw9Z0jkBkJIYQQ4ha3fV2KRDiAaSoG26PX9b0lkBBCCCHeBoa7YjfkfWVpQwghhBCrJoGEEEIIIVZNAgkhhBBCrJoEEkIIIYRYNQkkhBBCCLFqEkgIIYS4JeWrFj94Pc2J+dKNHso7mmz/FEIIcUv6/vEFJrNVDk8XGGiPkIoEb/SQ3pFkRkIIIcQtKR42AQgFDELm2/92NpOvkSk3bvQwlpEZCSGEELek923rY1NPgp5kmEjQXNP3euVsjhfHMmzpTfCBHf1r+l4Xc3g6z/ePL2AoxSfuXU9vKnLdx3Apb/8QTgghxNtSwDTY0pekPRZa8/c6OJXHcjxemy1St901fS+tNeWGg9a6dSxXtQHwtKZYt9f0/a+UzEgIIYQQl3HHYBsvjmXY3JtY89mPbxyZ52S6xJa+BD+5ax0A9450Yjke8ZDJpp7Emr7/lZJAQgghxDue1ppSwyERCmAYatnj9452cu9o53UZy/hS2f+5WGkdi4ZM3r+977q8/5Va0dKGUmqTUirc/P1RpdRvKaXa13ZoQgghxPXxzaPz/PGz43z18OxVv9aPTizwJ8+Pcyq9um2pD2/poSsR4uEt3Vc9luthpTkSXwJcpdRm4I+BUeBzazYqIYQQ4jqayPjf/ieWqhfkJlyM62kqDeeij5XqNq9O5slVbV4cz65qLHvWt/OLD4xw53DHqs6/3lYaSHhaawd4EvhvWuvfAQbWblhCCCHE9fPI5h66k2Ee2dLNt1+b5y9eOst8ob7seZbj8RcvneUzz4zx8sTyQCEeCrCu3d9RsaknvqqxHJ0p8NS+SY7OFFZ1/vW20hwJWyn1CeCXgI82j0nlDyGEEG8Ldwy1ccdQG1PZKk+fXARg/0SWj+5ed8HzCjWbTNkCYHypwt6RC/MmDEPx8b3raTjeipIyTy+UqDRcdg62YTZzM54+uYjleGTKDXYOtl2Ly1tTK52R+GXgAeA/aK3HlVKjwJ+v3bCEEEKI66dQszkwmcM0FEFTUazbjHYvn1HoToTYNdRGdzLM/Ru7LvpaSqm3DCJcT1Oo2kxmKnz10Bw/eH2Bfectgwx1RJs/Y1d5VdfHimYktNbHgN8CUEp1AEmt9X9cyblKqX8M/LTW+mGl1O8De4EDWuvfbj5+0xwTQghxYxTrNsWazWB7FKWW75pYa18+ME2uahM0FQvFBvmaRaayvIqkUorHtvXhuB6BVVbT/NKBaWZyNXqS4dYxzRt5GR/dtY5CzaYtemtM/K9018aPlFIppVQncAj4rFLq91ZwXhjY3fz9LiCutX4ECCml7rmZjl3RpyaEEOKaKdVt/uyFs3zh5WleOJO5IWPwmvfxpVKDE+kS6WKDH59ePhbb9Xhq3yR/8MPTHJ7OX/H7OK7HTK4GQN12+ciuAR69rYf7Rt+Y3TAMRUc8dNFtqDejleZItGmti0qpXwM+q7X+N0qpwys479eAPwX+Lf7SyPeax78H3A94N9Gx/Su4HiGEENdYpeFiOR4AmYp1Q8bwxJ51nFoo0xUP8UfPjlNpOOwaWp6fkK/arSTME/Mldg1dWSWEgGnwrq3dnJgvc/eGDrb2Ja/J+G+klQYSAaXUAPBx4F+v5ASlVBB4t9b6vyul/i3QDpxpPlwAdgDuTXTszeP/deDXAYaHh1dyyUIIIVahvy3Cw1u6WSo1eGDTxfMO1lpXIkxXwl9q+J33byFXsbmtf/lNvise4rb+JLP52qq3Z969oZO7N1yf4lbXw0oDiX8LfBt4Xmu9Xym1ETh1mXN+gQtrTeSBVPP3VPNv9yY6dgGt9WeAzwDs3bv3rTcVCyGEuCr3jNw8N9aBtigDbdGLPmYYiq19CZKRAP1tq2ucdXqhxKl0mV3r2xlsv/j7XAmt9Q3JKzlnRTkSWusvaK13aa0/3fx7TGv9M5c57Tbg00qpb+F/4+8GHms+9j7gReCFm+iYEEIIQbpY5/X5Iq63/DtkoWrztcNzvDyR4/vH01f82o7r8Y0j87w+X+LbR+evapy26/H5/ZP8Xz84zbHZ4lW91tVYabLlkFLqy0qpBaVUWin1JaXU0Fudo7X+51rrD2qtPwS8prX+XaCulHoWv8DVPq31gZvl2Ko+PSGEEG8ruYrF5/dP8c0j8zx3eql1vFi32T+RJVu1CDSTIMOBK2/eZRqKZMRfDGiPXd2ujGzFYjZfx/U0x+duXCCx0qWNz+IvU3ys+fenmsfev5KTtdYPN38u22Z5Mx0TQgjxzma7Xmsm4vx24V8/PMd8oU4oYPDTdw+Sq9gXJEpO5ar87cFZdq5L8e7bepe9bqFms388S18qws/dM0y6WGew4+qWNboTYUa748wX6+xef+MKV600kOjRWn/2vL//RCn1j9ZiQEIIIcSN0puK8MEd/WQrFndveCOZ8lwGglLQm4iwru3CYlF/8IPTTCxVeO7UEtsGUvSmLsyf+M5r8/z4zBLxcIB/9L6tjDSLXR2fK1K1HHYPtV9xXQrTUPzUnYNXfpHX2EoDiSWl1KeAp5p/fwK4MZt9hRBCvOM8c3KR6VyNR7Z0s77z2ld8LNVtxpcqbOiKk4oGcD1N0HwjgfEjuwY4MV9iuDN20Rt+tFnJMmAqQoHlj48tlZnN1zENRc3yZzrOZip8q5kn0bA9Htx8a3T7fLOVBhK/AvwB8PuABn7cPCaEEEKsqWzF4pWzOQBeOJNZk0Diy6/OkClbmIZCa/C0ZqnS4D3NZYpkJLisr8b5/ukHbuMHJ9Js70/RHgste/zO9R1U6i6JcKCVG2Gct9PiVik+dTGXDSSUUibwM1rrx6/DeIQQQoiWuXyN9miQzniIbMVifWes1eb7Ulsei3Ub19V0xJff0C/lXEEsy/Uwm6/ruCvf+Z+IBHh896WXGd57ey9DHTG6kyGSET+QWN8Z44k966haLtsHUpc8F/zdIt87niYeNnnftr5Vl+deC5cNJLTWrlLqCfzZCCGEEOK6+IMfnOLZU0sMdUT5D0/spOFpLMfjfzwzBsDOdSm+/Vqa2weS/Px9GwB/6+Zf7Z/C1Zqf3DXA5t43EiJt1yN4iRvwE3sGOTFfYktfgmLNJlOx2LP+yqpWvhXL9chXLUIBg/OGxMaexIrOPzCZYzJbBWC0O3HRYlk3ykqXNp5XSv0B8Hmgcu5gcxulEEIIcc291qyNMJ2rUbU9OhMh9k9kWzkGf/zcOFXL5WS6xLs297C+K8ZiqYHT3HWRLjZagcTXD89xMl1iz3B7a7nifD3JcKuJVl8qwpZrfC0/fH2Rk+kSSsEvPTByRbMlAIMdUQ5N5wmaBrP5Gt87nub2/iSPbeu7xiO9cisNJB5s/vzd5k+Fnyvx3ms+IiGEEO9YWmtOpEuETIOfuWuQrxyaZddgG50J/8a7tTfJ63NFUIptA0leOZsnGQnQHveXC27rTzJXqGM5Ht2JEP/zuXG6EyFOLZQxlOLkfOmCQKJuuwQMdc2XCt5cbfJc4qahFKZ55fkQW/uS9LdFCJkGf/7iWSzH4/B0gXdt7bnkLMv1stJA4mv4gcO5q9dAUSm1R2t9cE1GJoQQ4h3n1ak8T59YBGDvhg4e2tTNpt43pv/bYkF+4YERADzP48hMgfUd8VbeQdA0eP92/1v637w6Q6FmU6jZbOlNkC42uHvkjS2dJ9MlvnlknnjY5BP3DhMPX/qWeHSmwEvjWbb0JnjX1p5LPs/zNH97aJaJTIVHtvS0tpC+5/Ze1rVH6U6ESUVWV4jq3Hm39yf54YkFdg213/AgAlYeSNwN7AX+Fj+Y+Ah+t8zfUEp9QWv9n9dofEIIId5Bzk9wfGk8i2ko5gp17hhsIxK8sJKkYRjsXt/xpvM9nj65iO16DHfGOJup0hkP8sGd/YQDJnXb5al9k5TrDpGgwXSuSiRokC7W3zJfYd94lmLN5pWzOe4d7Vw2lnMqlsP4kp8BcGyu2AokgqbBzsFLF41yXI/XZou0x4Js6Iq/5WeUqVjEQwEKNfuG99mAlQcSXcBdWusygFLq3wBfBN4FvAJIICGEEOKq3TXcjqH88tOL5TqHpgr0psKEVvjN+0S6xOHpAuB/g//N92wiYCiePrnI8bkSfclwqw14uuSwVG4QDpgELrPcsLk3wdMnF7itP0X4InUiTqZLfPnADNsHkgx2RDk2W+Td581czOZrPH1ykb5UmPfc1rvs5v/c6SVencyjFHzyvmF6k5duCJYpWyilyFVsXE9fduxrbaWBxDBwfpN4G9igta4ppRrXflhCCCHWQsNxmcrWGGiLXHQq/8BkjudOLTHaHecndw1c92+7AdNo1WvQOsWd6ztIRgIrrrPQFQ9jGgpPa3qSYYKmgedpXp30mzzPFGp0xkOUGw7DXSnioQBKXb5vRqnuEAn6Mxpa+xUuz/dHz4wxma3yymSWu4Y7aIsGmcxWuas5I/HSeIb5Qp35Qp0d69roe1Ply+aOVrR+4/dLef/2Pl6dyrOlN3FTbANdaSDxOeBFpdRXmn9/FHhKKRUHjq3JyIQQQlxzf3twlulcjbZokF9+aGRZoPDaTAHX05xeKFOx/AJKV2KhWOevXp5q1ki4uvLNSqkr3t3Q3xbhlx4YwfY8uhP+LgzDUAx1RNk3nuWxbb18aOcAWmtcT3NsrkgqEiQZCXB8rsj6zthFr/lsxq9MWbNcyg0by9Wt1wfojIeYzFaJBMxW0GO5XuvxeCjAq5M5BtoipJpNuzxP8+zpJaoNh/s3dvlJo7HgsiDjzQylqDacZcHMjbKifyFa63+nlPoG8DB+jsTf11q/3Hz459dqcEIIIa6tUt0BoNJw8DS8eVZ811A7z532ZyTioSvvbvnHz41xcMpfWtjSm2D7uuvfTKrtTV01tdZkKhYDbREWSv4kulKKgKnYNeTXivjzF8+yWGrQHgvyyw+NLnvNmu0yk6uCjvGX+6eoNFzuG+1slbX+Jx+4jRfOLLG1L4ntaiazVe4YeuPaD0/nqVoOM/kai+UGw50BziyWOdCs2BkJmRfdlnoxv/fdE0znanz3WJo//NRdBFfRhfRaWnGoqbV+BT8fQgghxC3qw3f0c2S6wJa+JOZFlgt2r29n91UUYjq3e8I0FMnw1bXJPl+uYrFYbrCxO37R6fyXJ7KMLVa4d7Sz1RDrfAo/eLjUUs1UtsrYUoXB9ovPBswW6iilmC/WGGyPYJoGs81cC4BQwOChzd2tsQ13XVjGWymFaRgYSrW2P7bHQgQMheNpes6b3biccwmptrfyyptr6crmrIQQQtzSBtqiDLRdXfvqt/L337WJ2/uTrOuIsr7r2vTEqFkun9s3ieV4bBtI8aGd/csef/bUEgDPnFpcFkgopfjZu4eYyFQuqHR5PtNQxEImAdO46E6I3UPtBA2D7kSY+zZ2slBs8NB5TbaeP73E37w6w5a+BL/68MZlQdqn7t9ATzLM+o4o6zv98fUkw/zigyM0bHdZt9C38g/fu5nvv77A3pGOGz4bARJICCGEuIYCAYP3be+//BOvgOV62M18g6rlLHs8HDDoToZZKjUYbL94kNSVCNP1Ft/6t/QlCJoGgx1RTi2UyZQt7hxub23zfPLOQbYNpNjQGbvoTf9vDs4wma0ynavyyJZuXM/f6XHu/L5UhF9s1r84X1s0CNErm7nZ0pdkS9+tVyJbCCGEWDNus49G9CJ5GW3RIB/eOcBsvsZdwx3LHjcMxc/ds55S3aEjdvmb8sRShXLDYdtAqjVz8PjuQTLlBo72+Py+aQCKNYvhrjiRoMlod5x73qL752hXnNl8jY5YkO8cTeMBZxbL3Lm+HdvVFxTVeruRQEIIIcQNda5IVKFm89jtfRckKZ5zW3/yoo2qClWbWNgkaBp0rmCHx2y+xpdfnQH8LqEPbvKXJ0p1m7PZKp2xEIbyt4+ezVY5NlcC4KfuHCRgKHqS4QuKUZ1eKGMoGO6M8sxJTVskyGSuSq5qs1iq84c/OoOnNf/kA1v5wI6BVX0+NzsJJIQQQtxQmYpFvmoDMLZUvmggcY7naV6d8gs3VRsO+ydydMZDfPK+4RWVi3bPS1A8//cvvzpDvmqTCAd419ZuJrNVOmJBXjnr15/4/vF0a8bj3tFO0qUGkYDB5/dPoRTkqzY12+XYXInR7jipSICpXI1i3b+u505nJJAQQgghVuuVszmKdZv7R7uWLV8MpCJsG0iyWGq0ilFdytHZAs+c9HtxeFpjKEW2YlGuO62aE+eKRl1smWR9Z4wP39FPqe5c0Cbccjy/c6jr8czJJTyt2TXYxiNbuokETV6eyAJ+R9FvHp3HUIr5Qo255s4NBeSqFm3REB3xIHP5Og9t7qJmubiex+O7Vx5EaK2p2S7RoHnDy1+vhAQSQggh1tRkptq6+Wutee/tF7a+NgzFh3Ze+kZ7rhHWdK7KSHecbMUvtPzYtl7mCnWGOqLsn8gytlRhc0+CLx2YxnY9fvuxLdwxdOFW1rrtcniqQKnhMNTxxg4WrTWFmkUyYhLQ/syGq3UrsGmPBXl1Ms+6tigvjmewHI/bB5J4zSqX6zsi5KoOAVNhoBjuipGKhPijX9zrb+9Mrnx757eOzvP6fInb+pP8xB03/yyGBBJCCCHWVDRktvIOEquoLZGrWq1GWIcm87wwlkEBD27qJNZ87ddmiwB848hcK9D47rE0Y0sVepLhVi7EVLbKTL4GwNGZYiuQODZXYqls4XgeD23uYTpXY9tAqjWGoY4YQx3+dtbRnjhL5QbDHVG+ezyNoQzu2tDeDDQivHw2R6nu0B4NcnS2gONqHtzcddky3OecWSz7PxfKV/xZ3QgSSAghhFhTPckwn7hvPeW6w+hFikVdTkcsxGh3nOlclaVyg1wzUPjsjyeIhwIkIwHuHO5gfKnCo7d389RL01iOSzCgGFusMLZYYWN3gv62COvao7THglQaDlv73thJMdQRRSmIBg0y5QbRoMmh6TzrO5fXwuiMh+iMh3jlbI7TC36AEw4YlBsOdcfjk/cNkylbZCsWP3h9AfAriZ5cKLOxO85P3fnWpcMf3NzN4an8stmUm5UEEkII8Q6QKTcIBYxW5cnrrTcZ4RK1oC7LMFTr5vu5l87y/JkMAAFDUbVcbNcjW7FojwaZyTXYO9KB1hALBSjWHKIhk3jYJF+1aIsG+aUHRnBcj9B5uy9++q4hjszk2didYN94lnLDYX3HWxfUOr8nx7G5Iq6nmcnVcD3NUrnRCk60hu8eTzNfqLNvLMPOwdSywlhT2SrfOjpPeyzIE3sGL7rN9WYlgYQQQrzNvTZb4DuvpQmaik/cO/yWhZkuZypb5emTiwy2R3n0tp5rngzoeprD03lCAYMd69qoNByiwTcaYd072sWTd/qzAFv7khybK9KbjHB8rkiuajHaFac9HkJruH9jFz2JMIlIgM8+P8FrswV2D7ZzbK5IueHw6Uc3tXIgPvv8GM+fzrB3pJP/8ORO6rbnF4tqqloOp9JlhjqiVBou88U6dwy28bN3D6EUnFmscOBsjljI5MWxDIZS5KoWHbEgluuxrj3CfKFO0DQ4lS7xwpkMd2/oZPs6f/nkyEyBcsOh3HCYzdda1TkPTObIVy3uG+26aLfWm8HNOSohhBDXzELRb1Rlu5pc1bqqQOKFsQyLpQaLpQZ3DLVd0AHzWnh1Mtcqd31kusBcoc5ge5SP7R1CKcXm3gS/9OAoSvnlvj9Qt9Fa81++fRLX0/QkI3xs7xCW67VyGhzH47lT/k6Mrx+Za+3m+OGJxVYg8fzpDLbr8eJYhtPpIoWay+717ZxeKBMNmRyczDOTr6EULJXqZKs2j27poSsZxlCKh7d0s3NdimjQ5Kn9UxRrNpbtUbFcAO4b7eLu4Q4G22N867V5AH58ZqkVSGztS3JmoUxbLEjNdvnqoVm6k2FebM6+2K7mgzuubcXQa2XNAgml1H3A7wMu8LLW+neUUv8MeAI4C/xdrbV9Mx1bq89CCCFupL0jHZQbDolwgNHuq6uwONodZyZXozMeIrUGyyTnz3DM5GsYSjGTr5Gv2liuR28yzLrzymAnI0E8TxMNmZzN2KRiAT63b5Ka5fKrD4/Sm4oQCBjsHExxZrHC3cMdnFooUajbbOtL8odPn6EtGmTP+nZePptjW3+S7xzzd5i8OpXHcvzS3OcqYJZqDk+fWsLzNIulRiuJsyMeYtdgG4ah+Pn7hslULCo1m//lr4/guB6PbOluBS3H5opMZqus74jxraNzVBou79vWxz94z2YMQ/HvvnaMmVyVrkSY/lQEx9MkIzfv9/61HNlZ4L1a67pS6i+UUo8A79FaP6yU+ufATymlfnSzHAO+sIafhRBC3DDJSJCP7l53TV7rnpFOtg2kiASMi3bhvFoDbRFOL5aJBU0+dd8GDs3kGe6M8aUD05TqDndt6ODdW3sAmMnV+JtXZxjtjnEqXaJmuXzvWJq67d/8n3ppkmQ0SE8yzG8/toWZfI267fL1p2apOx4/OrnAYEeMmuXyW49tYUNnlELd5c9eOIunNUFD0dAapeDRrb2UG/72zgOTOSzHoz8V4VzcM5ev8aMTC6zviPHknYMMtkf567EMQVMRNEwOTeVbgcS5gCMeMfn+sQUs16MjFuS92/xtsdO5KvmqjQb+8fu3Uqw7jFyjBmhrYc0CCa31/Hl/OsAu4EfNv78HfBKo3kTHJJAQQojLKNVtvvPaPFv6kuy6yl0FharFy2dz7F7fRnfCb4T11UOzzOVrKGAqV+UXHxghV7F47tQS5YbDTK7aOv+///A0ZxbL/OiERzDg51FEgiYKhetpCjUbDyjUbI7PFhlbKqO1JluxAc3YYplIyKQn4c9ymKZJZ9zk4/cMkavYRIIG//P5ceKhACM98dYMzKcf3cTJdImfuWsI01QYSvGd19LYjsfZTIVS3aEtFuT2gRSxUADP060lDMf1+OZr87ie5vhcgVenctiOvqA41ru39nAyXWZrX+KyzcZuBms+V6KU2gV0A3n8ZQ6AAtABtAPFm+TYm8f968CvAwwPD1/pZQshxNvS//n9UxyZLhAwFL/3d/bQ1+yEWbNcHM+7ol0h//7rx5nMVumMh/h/PnU34NecWCjWMQyDVNS/RYWaWytzVQvXg+eaORSmAblKg0QkyK8+sIGZfI1Ht/aQq9pYjkc8bPLDE4u0RQN862gaAFd7dCdC1B2P2/qTdMXDaPxg41wPjbOZKpmyBWg6YyGUUhyazFOxXDb3xvnAjn4+0MxXmM3XMJS/g2T/RIaNPQniYf91tg+k+N8+uh3b02ztS1KzXIKmIhkJkK/aGEoRNA1MQ7e2tALsWJeiWLfZ1p/im0fmyFVtHtvW2/qsbzZrGkgopTqBPwA+DtwNnNs8m8IPLPI30bELaK0/A3wGYO/evfrNjwshxDu4tGP9AAAgAElEQVTRuZwBT4Pj+b8vlRt8fv8Ujqv5yK4BNl+k0+UPXl8gXazz5J51REL+rWe2UCNTbmC7Hp7nYRgGm3oSPLatj4ChWrMUntYMtEXpS0UYWyrxpy+MA7CpO46n/UDj3tEO4uFewgGTrx+epm55PLy5h3jIZKg9yo51KY7NFXloYxdt0TDZisVId5SnTywRCZpMLFV4cSxDdyLMvnG/HLbjuXz10Jy/a0RpPK04mS7xxG6DTNUiZBp895gfoByayjOTr1OoOWQrFr0pf5fG1w7P4Wm/udeJ+RKd8RBP3jlIruoniRaauR/nF7967rRfOfPrR+ZaOzVeOZu7aatcrmWyZQD4c+Cfaa3nlVL7gd8E/jPwPuBF4GY6JoQQ4jL+4Xs285VDM9zen2Kw3V+3Xyg2WgHGbL62LJA4NJXjfzx9BvCrVP7mo5sBiAdNqpZLTyqM360C7hruoFR3CAeNVrfPZCTIE3vWMVuo8cyJRSp1B4DJbA3TMKhbLn/xwhRzxToKjy8dmEUDm3sThEyDYEChtWZssUoiZLJjsB3TUBydKTGdqxELmXz/eJpoKMCpdJlI0MB2NQfO5slXbQrK5mS6zObeJEp7/M5fHSRbsbh7Q0drZ0i20qBmuSjgm0fnKDdc+lJhSg0HrTVHZwoETYNsxaLheK3CXL/6yCiVhsvOwTcalQ22RxlfqrC5L0G57m8J3dAVQ2t9U/beWMsZiY8B9wD/qXnh/xJ4Rin1HDAJ/DettaWUuimOreHnIIQQa8JyPNzmjoXrpTcV4e89sumCY5t7E0xkktRtlz3Dl8+bqFkuAVPhaE04aOC5b0z6RkMmH9q5fJvjSHecke5465s6wPZ1KcaXKrRFgzx3ZhFQzOSqWK4f1ExmKjiexkBje6CBF8ezbB9sw3Y0lusRChi4WpOtWKTnS9zen+Q33r2RcsMhX2lwbLaIgeJdW7rZO9rFVKbC//v8WTzP48h0vlXj4vaBJDXboy0aYDZXIxIKMJ2rcWK+hO15PLlngG+/tsCm3gQn0yW+emiWe0Y62b1++ef1+O51zRoUIVytWwW3/u8fnSEWMvnY3vUXFMO60dYy2fIp4Kk3HX4B+E9vet5/ulmOCSHErSJftXhq3xSW4/GTuwfY1HN12zqvRihgvOW0++71HXz60U2ki3XWd0b5lT/ZT1ssQKnuULNcarbHc6cXOJWu8HP3rGeuWCccMC9anjoaNLirGawMd/r9L5SCE/MlFkt1RrrjzBcbaK2JBE1s18M0FG7Dr4AZNg0Wig3yVZvH9wwQCRi0x4IUKjalhp9b4ffrsPmJ3etQhkEyEuCjewaJhQKETEVnLEip4dDXFm0lQrZH4wy0xeiKhwgHTU6nS2jtsX88gweUazbRUICDk3lqlkNHzF9C2b2+fdlMw2K5wcl0ia19SfpSEYKmwYtj/nKH5XhM56rc3p9a9tncKDdPSCOEEGLF5ot16rafvz6Zrd7QQGIlHr2tF4Df+twBJrMVyPo9NDZ0xXE8j3/110fRwPeOpalYDsGAye8+vuOCKX+Auu0xlfObbv3UnevoS8XojAX4V18+SrZiMdQRpS/l39zXtUWYyFRpiwXpjHosVWy6kuFWA7BT6TL/4sPbMA3Fv/zrw+SrDlrXiIcDWK7HTK7Ge2/vJRI0iTYTMYc6YvyvP7mdyUyVuzZ08OJYBrNZwttQinBA8Rf7poiFAzx7ahHL9dDAYqnBcFeAoGkw2p0gX7UZ7orxJ8+PU7FcHt+9rhU4feXgDJWGy7HZIr/xbn/2Z/tAG2OLFWKhABs6r7xfyVqSQEIIIW5BG7sTbOpNULMc9lzD5k5HZwo8f3qJke44H9jed8Vr8plygx+8vkBfKsIjW7qXnX/bQIpXJnMEAwb/4NFNzBTqxEIm//2HZwDNWKZCqW6jUDx9YmFZIDGdq7UCgflig/fc3s9CscbphTKO52/tvK0/Sd32GO6I0HA04YAiV7VJRIJEgyZBU1G1XW7vS6LRgOKOwTa64mEiQYOvHZ6lUHMo1W36U37xq45YkC19SZRSPLKlh+oGh/ZYaNnSRKXhUKz59Q03dsc4vVButU7P1yw29iT4hfs30HA8pnM1jjW7lp5aKLUCiXDApNJwCQfeqNPR3xbh1x7ZeEX/XVwvEkgIIcQtKBQwePwaFZk636uTOaqW/234oc3db7kWX6jazORrbOyJt7ZO/tXLU/zg9QUiQZOhjigb3zRTsmeojVPpTuJhk7tGOvlwKoLWmnzV5sximXLd5rnTSyggYLxxIy3WbbJli5fGlig0b9T7xjN84t4NJMIBIiGTTLlBOGBwbLaI42riQYNoyA8c7tqQoFh36EmEODJdoO54HJsr8o3X5v0y31rz/dcXGO2OMZur43geh6by9O/wG28FTYN94xk2dMX4+uF5CjWbHetSTGarGEpx/8ZOjs2VGO6MsXMwxaHpAr/x7s08vnuQmu1SbjhMZv2ZlL/cP8ViqcHOwRR9qQhVy7lg18ZP3zXI2UyVDTdxEarzSSAhhBCiZdtAiudOLzHSFSf+Fkmcjuvxl/snqVouw50xfubuIQDGFsuML5YJB0zKDWfZeQulBlO5KrGQn0fw8kSO2/qT/OZ7/J0cXz80y8l0maBpsKk3ztcOzzLYHuUrB2dZKjc4lS6hm7mZubLFZKZKJGSA1rieZjpXZaHk12SYztf5yK4BNnYnuH9TFxOZCmfS5VZr768enqPhuBhKka9auJ5moVjDdjUefgOxn75rkHDA5A+fPs2hqQKpaIBcpUGp7rbyGAD+9tAs4YDJ+GIZlCJgKF4az7BYtrAdj0dv6+b4fImh9ijpYh1DKSaWqnz4jn5qlkv/eTUiEuEAw12xmyqh8q3cGqMUQgjxlhaKdWLhwAU3n9MLJQo1mzsG2wkF3rqctev5d+e9I53cNdzR6rZ5KZ5+o6ZEzXZ4eSJLw/HzAUzDwDShZrtMLFXoS4X5xpE50sU6s4Ua07kaAUPxtcNz9KYijC2WW30mEpEge0c6MQ3FMycXWSpbBE3FbMG/+bqeJtAcm2EqvvjKFK7nMZuvY7keDdvh3B6QSsPiyEyByWyVLX0Jqg2XDd0xNnYnqNkusZDJZLbqJ2NqqDsekYBBRzxE3XHZs76dM4tlIkGTg1MFFop10kWIhwxQfkfQ/RNZAobigzv7eX2uRG8yBErhuJqT6RIn5suAP6PSFQ9TsfytnpOZKiNdMb74yjRac0Evji8dmObVyTx3Drfz5J1D2K7XmvG5Ui+cyTCZrfDgpu6LJq9eCxJICCHELe7liSzPnloiHDT41P0bSEWCzBfqfPWQv02yWHd4TzPZ8WJm8zW+/OoMAUPx8b3r6YiHLvueoYDB43vWMb5UIR4yWx07qw2XcNAgHDD58qvTZEo2yajJyxM5XE+TPC/QUUozm6+xqSfOkZkCi6UGnfEgpuFXfMxULCazVeLhAJW6zULZYmtvnPmC383UcTTPn84QDRoopdFaY5gGCj+gqTYcprJV0gb82Ytn6YiFaI8G+Nm9gyyVLB7Y2Ml//s5JBtuifPu1eRzXIxhQDLaHyVZsokGTQ1MFABJhkzlP05sKMdqdYK5Qpy8V4eBUHgU8e3KB0wsVBtsj/MpDGzmeLrG1r4vT6TKuBkPB8bkCbdEQf+ee9XxwRz/jSxWONnMkJjIV9k1k6YiF+M7Reaq2y3Su1srV+OCO/guWP1aiWLd5cczvHvrsqSU+ed/aVGmWQEIIIW5xiyX/xtqwPV44k2Gh1GB9h7+2rzUYl0mYHF+q+FsL8ftbXCyQqDT8b/rnz3hs6IqzoSvOdK7aeq9zXUEjQYPXZ0toYCbv0ZzwIBY2KTUcYkGTzniIcsOlUHf4wetpzhWlKtVtgqbRKj3tuh65qoWp4MxSBbtZJ2Ku4OcyxIMmtqNxPIgoRcDwa0YYhsnJdImgabClP8VMzs87+MqhORq2y0vjWWqWy+nFMsrAb8DlaY7Pl/E8zb6JLB/cMYChlN/Qq2Yz3B5lPl9nrlgjGQ1Qt10UigNn81Qsh4VSnS8dmMZtzths7E3gupqG7TGRqRILWq3dNqPdcXYPtZOrNggYiobtMV+o0x4LQg3aY8FW4ub4UuWKA4lY0KQrESJT9nezrBUJJIQQ4hb3wKYuHE/TGQ/xyln/m3+xZvNTewZbSYHnazgu4cAbU+XbB1Icmy0SCqiLbiM9vVDiv3z7BFrD33/XRqq2R39bpFXBcqgjxsfuHqJuu/zVy9MslOskQwGSkSDpYp3tAyk+sKOdhWKDhu2SLjZouB6Zsk1vKoKhYDZfp1S3SUYCzBfqKKW4rT/B1v4kUVMxna/5yxFBE7sZlVQth5rt11bwgIABruf54YiGiu2CBtvx+NGJReq2n9dQrDm4WlOzHEoNxw9a0GgNWvnP9zQ0HI+P7R0iHDD5uc+8wNlMhbHFMkopFHBqvszdGzowlGLfeIalikfIVBycyuFqmC/U2NKXxNN+y3Hb1TQMl9dm8uwbzzLUEePwTB6tYUtfgnDQoCMW4qfvGmRsqcLm3jiHpgrkqjZ3DS9rCXVZAdPgE/cOU647K5plWi0JJIQQ4ibieZq64xILvfX/PTccl28cmaNquXxoR3+rTXimYnFmocxIl18J8s1+eGKBg5N5NvbEeWKP33IoW7WoWA51W1FuOK3+DufsG8+Rr/rfjP9i32SrCNQvPzRKWzRIsW7zzaPzNByPg1N58hWbat0hHDAYaIvieR4PbuqmWLM5PJ0nPhsgaCruHe0kXWqwuSfO4ekCxbqNYWjGliqYhuK+0XZOzJcZ7oqzYyDJZK5GPGRSbCZx2q6HpzWWB/3tEZZKFu0Rk7mSBRpCWoOCoKnIVS10MzgImAau59FwPOq2h+tB3XbRGmqW5wcU+LMhjqsJGH4gYCiF5Wg6YiaWq9k2kCQSMDBNg3g4QCxkEjAUqWgIpfzZlyPTeVyt2T3Uju16tEWDvHw2TywU4FS6RKgZ0LVFg63S4eDP9gCs7/B/LpTq/OmPJ4iGTB7fva6VM1G3XYp1m97kxRt6BU1jTYMIkEBCCCFuGq6n+fz+KdLFOvdt7OTBTd2XfO7YYoWJJb+l9uGZQisH4qO7Big1nAtyEc53Kl1qne+4/k01XayjNThas1BsLOsy+fCWLl44s4SnNXcMppjK1UmEA62kx/HFCgen8jiux4n5IparsV1NX1uEaDDA+k4/B8FyPLb2xelNhuhNhlko+/0pnj+ToWo5GMrvf1G1HBTw5y9NUml4hAMZbFfjuJq+VKi1VDPcGedstkoybPLIlh4MpTg4mWW26O/aSIQC9CT9G/xUtkqjmUxZd1xcD2qWQ8PxsFyPoAGWpwmZoA3QHhRrDv/4CwcJGwY71yU5OltiqCPKz983zOvzJcp1m789NAvApt448VCAwY4It/enOLVQpj8ZZWKp6i+zKNjal6Q7ESYaMslXbbYNpBhoj1JpONzTTLS8lKMzBbIVCyp+PsXt/Snqtsv/98IElYZ72X8va0kCCSGEuElULId0sQ7A8bkiZ5pFlp7YM0jnm75VrmuL0nBcKpbLyHnZ+EopwgHjkoWkOmIh9k+k2buhg4Dp7+QYbI9yNlshZJqsa1/+zXa4M85//fgeNJrnTy9xIl0mEQ5gNgOJquX4BaFcr5W/AHD/aBc/sWsdmXKdf/KFw9iux0vj/k3dUIpCzV+aGOyI0puMMF+sEw0aWI5GKXBcFw+oWrq1E2OhZNGT9CtXns1WyVVtqnWbZ08ukqlatEcCfo6EBk97LJUaRIImnbEg6VKDVCRIuejPPrgaYiE/F6PhaFAaZRhEDYXjaUwDTs2XmkWounnv7VE6Y0G++MoM2WoDPN263qVSA09DpmyRab5nzXHojIdaO2KOzRboToT53Sd2YruaSNDgi69MU7Pci84gOa7fVyRftdm+LkXQVESCJoPtfr5DsW5TqNnYjtf6d3MjSCBxkxn5F1+/5q858R8/cs1fUwhx7SXDATytOTFf4s7hdoo1fwr/xHyJBzZ1XfDcYt3GdTWm8pczRnv84986OsfxuRLb16X44I43ml+li3UiAZNspcGmnjiVhtOakZjO1Vpll8eXKiyWGwykorTFgoDfZOuLB6ZwPU2mbHFstsh8ok7VcokETVyt6YiF8LTG8TysXI1QwOC2gSTrO2O8Pl8gX7VwXE1eW9iuh1KKfM1msC1K0FCEIyaeDtGwXUzDD4jaowGyFZt42K/0qIFY2GglK9ZtF09rag6kS3U8D6qGS9A0/OROpXA8j5rtULM1AdMgW623li4SIZOhzhipaJBnTi7iNfMpdo90kK/amIYiXfSXNNLFGo6nmM0rjs8V/JyUWJBkJIihwDAMLMvB8TTpcoNqw6UrlqI9FsT1NNmqTcVysQp1P29kXRsn5kuUmp1Mx5bKywKJmXyNsUW/iudiqcGnH92MoWgFialIkELVZrHU4K4NV55Dca1IICGEEDeJYt2f3t82kMI0FIlwAFdrNvUsz3VIF+scnS3iac3WvlKrBsHJtF+34PRCmQ/u8J97ZLrA946nCRiKfM3i6EyRzb2J1ozElr4Ex2aLBE3FiXSRYzMlelNhnrxrkMWSxWSmwhf2TwNgGoqlcoOq5ZCvWnTGQ9wx2M49Ix3ULJdqb5yvHp4jEQ6Qr9r84dNnsG0XT4OHpj8VZrbQIGQauK7m2dNL9KXCLJX93QzxkInr+VtDFRAw/RmWoY4IhZrD1r4kZzNVDGCpYlF3/GWQiuXPDDieRzTkB2SRQICaqYkEDAKmv1SRCJtUm4FIJGBQsVyCptnaVeJqvz255XoEDJo3br/ypuv51193/CfXHY8n9gyhFOQrFoemc/QkwiQiQcIBk6rlkC1beFqTjJjULZdI3GSgOaOwoSvG+s4YVcth57o2ziyWSUWCrRmX3mSEtmiQUt1hU0+iNQP0xr8Xm65EmK5EmFrzmm4ECSSEEOImkQwHGO6MMZWrcveGzlaXy4stU6SiQUa6YtiepjcZ5vX5Iu3REPdv7OLITIE969/oUbFUadZd8DT5ik04YNCwvdaMRG8ywt97l9/H4Vf+ZB8n5ktEgiYz+SpVy8PzPM5mK6ChNxmiYXutJRSAYs3mRLpEw/FwHY+goVAaPr9/ikLNpisWxnI8HM9DGQrTUJimYjJboWF7TGUrWI5uzgj4Mw9oWCrbeIDt2nQnwv4sRrnB2YyfG3Ku8KbfLcP/6Xj4FTU1pML+2D0NnbEoNdsjYJpAs9lZrg6qjqloLZ0o/G//rtYkQ36AoYBCzSFfs4kEDQLKL8gVDZiU6v7MxfZ1baRLDQbbo+wYTJGr2NRth7GlClpDKBDgw3cMoJTi5HyJhVKDPevb+dlmRdAfn1nipbEspqF48s5B5go1tg+08XcfHMHx9EULivUmI9w72sl8oc5DNyg/AiSQEELcYuYKNWLBQGva/VYznauSLjbYsS61rFqhYSh+5u4hPE9ftrLkpp4EH9jZT93yb7zfPDKPoRS/8MAG7h29MHHv3pFOGra/E6Rct8lWLS5VWsJx/VLTrqc5PFXE0RoDjdf8yh40DVLRAMlwoBXgvHI2y6l02V/acDVVy6XhaA5PZrG1YtKocC5zYrG5VODpZhlqrQkZBp728xac5hdrzRs3d0/D+JJf2CldqLdeq3Hel3BD+bMJARPOfTlfKDXw8DuGVq2yvyvjvLLdbvONXH2ugoXv3PbSSnMmBQ25ioWroeq5pCImNduf6fjKQT/Z8p4NHZimwnY9+pJhqg2XXUOdzBbqeBo+sL2XfeM5bhtI8OJYFk9rFkoNfvXh0dbncjJdIho0+d+/cYxsxWaoI8p//fgeQm/xb+GhzTcugDhHAgkhxC3jwGSOp08sEjQVn7xvw7IExJtdoWbzpVdm/JtIsc6H7xi46PMuF0SAP8V+bqfGd4+lAfytkM2y1Zbj8YVXpsiULZ68c5AP7fTfa2KpwkSmyrqOKIZSrZoSS6U6QdNkS1+CTMWiPRqgarnUHY+waWAYBlprPK0p1G1cz6/DMLZYptqwmcpW8bRmuCtGWzRIyDSYyTm4zSBENb/1R4MGSxWbgGkQCfjbKU3l4eqLXibg3+Qt943fzzn/lHPnnz/D75z3hLrzFm9w3mt55x07L28Ut5lX4WoIBkxQmtlivXXe4ek80XCARLhGrtrAdmGx3OD2/hQN2+Vrh2Z5aSLH946Z3L+pi2zFai1HnbuwkGkQNA2mczZK+YHQrUACCSHELSNb9rf12a5fcOlWCyTO99a3tSvz8OZuokGTjniQ/jZ/18WJdJGvvDpNw9EETcWnmzUKTsyXKNZsTs4X+dMXJijUbHqTYb5+ZI6AYfCe23pgBPraIrw0nmEqV2O0P0m9GaCgtV+p0fH4o2fGqDkek5kKznlFooKmSSJsEgwoPEc3t5b6p88XLTzAcl2q/n+dVO2V3eQB3rpjyMqcWwY59/NSj0cCioarWwWu3OZjjuPiakUsZFKo+5FLPBwgHDT9xz3/mseXKnzveBrtaVztV7e0HI+a5bKhK46B3+yr2nDoS4UZ6Y4TChg8elsPL01keXRrzzW42rUngYQQ4pZx38ZOHM8jFQneMi2Wz9cWDfLknYOkS3XuGGy7/AkrFA2ZPLzFn+J+9uQi3zw6x0hXjNOLVRzX49RCufXcmu2gFJTqLtmyhWEoXhzLUK77x799dJ7xTIXuRJhwwCAa9HdMPLipG63h+dNLlOsOluNxfK5IJBQgW2m0bsi27dCXjGIaYDn+ts3z0wDP/8a/GtcipVC/6eelRIIGjvYrZSo0rusvnYz2+LM2o10xXp7IohTsWJfi4FSOwY4oQ+1RDk8XGewwyFdsNNAZD2C7HrGQyd0bOijWHWKhAC+NZbBdj/dv7+OT9w0TC5kkI8FLzlZdiu167B/3cyzuGelc0azWtSKBhBDilpGMBFtT9NeT1v66/5srPq7GcFeM4csEQSvJkdBa8+JYlprtsKErztMnFumMh/i975ygVLcJBQxSkQCW45eVPmf7ujamcjV2DPjtryeWKtwx2MbphTKmYbBQrmMaiky5wcaeOA1Hc3t/ki19CTwPfnRyAcNQaKAnGaZqu8TDQaBZx8Awmc5VSUbMqw4abpRzAUau5octF+RPuHB4uoDjQb5iU3f8J7wwlsH2oDhf4WS6ggbmizUiQdVM2FREgoafmDmQ4o717UwslvnigSksR///7L13kF3neeb5+066ue/tnBNyIkAEEoyiKCqLlExTwbYky5rxSq6a2dmqnfGuJ+/YNVs766rdsmfGu6UZz9rWSLYSJVmZokQxiQQIEETOnXP3zenkb/843Q00EkEQIAjg/KpY1bz33IOvTwN9nvN+7/s83N2bWTYC2z+a4+B4ka096ZXbH1fg4HiBPcM5AJJRjc1d10+ovhmhkAgJCQl5E757YJLRbI27utO8f1P7DftzfF/y3QOTjOdrPLy2lZ2L3gC+LxnLBWFa6VjQZHpqtsLf7h3F8SS9TXFiukqx7pCt2tiuj+H59DYlqNkuG84Le/Kl5IHVLVQXmw4HWhKU6i4Prm5ZbJ70+cXxeTZ2ppgtmUwU6syWExTrwZN1Y9xgtmQS1zVKpkO2YlMwz+3lF6rBFEP1vKZGAH3JJEqAfwmFkYmqWK5P/U16GW4U52Y5LubCFTmL6y+YbrDVsjjFscRSv0bN8uhrjuNLSBgqhbqDQJBJGOiqoLjoIaEqwSjnEq+cDZw8XzmbXRYShZrNM8dmSRgaH9zcjq6u3OSJGefE4pvZq19vQiEREhJyR5Kv2hwYz9PTGGdde+qyxy1tDZTrDqfmytckJF45O8/fvTbOe9e18eSOYNyvbrv89SsjRFSVT+3qZiRbJxHRGMsFo43Hp0vLQuL50/O8MVYgoiv83gMDxA2NhYrFXCnYUuhvjqMIQSau05wwyNVsGhMG6ztSlE132QkRoLcpzjNHZ9jak+HAWIFsxWKgJc7PjsyiKIJH1rbQmDAo1GxGczV8H359ZoGK6SGRbO1pIB3VycQ1Dk8UsT25olHSh+UJj/NJRFTqtgfepbc3XMfDf5OU0hvJm22Z6CLo81AuOFZZvJ+f/y0rLPZYGCoNMR0pIaYpaErQV/Hy6Xm+uW+czoYoCgIpVgqBNW1Jjk+XWdt+LkDtwFiByXwdgNWtCTrTMZLRc+6im7vSJAwNTRX0NL6z236hkAgJCbkjefb4LBP5OocminSmo6Silx4nXXq6nsjXaU5GrunP+udPHyFbsXjpTJYPbGojGTX4by8N89VXRxFCsGckh+V4NCcMUjGdofkqT27v5hfHZ3E8Sc0OnlwtJ2jUixsa6ztSbO/LYDo+H97cGfg1JHX2Dac5MVuhpzHK0akSjudzerZMVyZG3FDZN5JjumhiOlks28fyfPYM5SjUbQSCnxyapOYGHg3puEHVchEo5BddNl85m8P1g4mCK01aXEixHoypXq5ZsrI0i/kuRdcUHMe/SHC4l1BFjQmdmu1xV3cD3ZkEnpQcHM/jeJJ8zeYHB6eIGhrjuRo9TTHqjkdXOsb+0RzpmMHuwWaklOwePOdm2tsU49BEEUNTOD5d4idHZujOxPj0Pb3Lx1wqpO2dIBQSISEhdyTJxX6HiKauKBOfnCnjeD6buxoQItjfbktFaIwbZK7Ru8J0POxFW+iS6ZCrukwV61RMF4RgeL5CJm5QMWts7U1zd2+G/aO5ZZfKx7d2kjBU1rSliOgqp2fLpOM64/k6JdPhjfE8+cV0zlzNwXI9yqaH6/l4vmTfSI5nj88R0QWT+TrZqk1MV0lFVCqWh49cvCHK5bK96YFdtpGAc95exNL7b0VEBGcOuFX7JmrOm69cAKqAmuVguUHuxvs3dFC1HcbzVSYLdQxVxdAUslWLRETh8EQZ2/N59tgsHemgcnRsqkC26vCjQ9P8p8/uBJYfUSYAACAASURBVGBNW4rffziGpgq++sooEFhoO55/0TbHpXA9n5fOLCBl4D1xKYOrayUUEiEhIXckH9jUzpq2JG2pKLmqzXiuRlRX+OWJeSDogt/e14iqCJ7Y1sWZuQpbutNMFurU7cCy+HzHSdPxLjKYWuLhtc3sHS7Q2xjlq68Ebo+aIkhGA1OnR9e3MVMyGWxJLDpb1ulKx5aFxKnZMk2JCEcmC7x8Zp6RXJW4rrJ3OBdYT/s+iYhOMqIxVagH2Q5Vi46GKCXTZaFmcWauiopAETLYYgB8z8f0fIS8tCpYunVe6qk7ZCXNMZWqI0lGFBaqQfXm1FyVP/nxcQBa4iq+D1L1KS9OvUzkTQqLvSdn5yt0pGMIAcMLdeqOS81yeeboDIWaw6Mb2patsx9e28r+0TzrO5JXJSIAjk2XODBWAIJmzDdLG30rhEIiJCTktmKubGKoCpn4xR4TpuPx3Ik5hIBHN7Sxtj2F6Xh89dUJHC/wW1ji/Ftrf3OC/uYE08U639o3jpTw0NoW7upOoymCZ4/PcXy6xIaOFE0Jg7F8jY9v6yKiBcLiQ5u7aEvFSERUXjy9gC+DpsV7B5tRBGzryeCO5tjSneaxjUEPRsUM7JWD6kg62I5wff7utTFMxydhKCiKipSSsuWxUHZIRjV6G2OM5up0ZqKM5+o4nkQTQc6EQtDY5/hg112UxSZBQ7t5vQm3C7nFrRvzMqorW/PQVXBcyUi2iuNLMlGdTFzHcnwe29DO/WtayMR0Zoomp2fLdGaC7SmA10ZyfHRxJHR9R4r1HZfv67kUmZgRmIJJaLzOrrChkLgDCBNFQ+4Ujk0V+fqeMSK6ypfes2p5nA7A8yWHJ4ucmCkD0NYQZUdfI2I5TVHSmY6xsbMBx/NX+DzkqjZjuRqKgDOzFWzXIx3TePnMAsmIRtl08XzJi6fmeelsFs/3OTRe4J99cAO6JlCUwGmyNRWhvzlB2XRY05bkxdMLaIrgT585wWTB5FuvT/CdP3gABAzNVzg2HYRyre9IMpKt0pOOUrd9fKDu+Ay2xDEdDwXJTKmOVg1GPot1GyUflNZd38dZnIQIGiHPXa+lBkH7Jk1K3E5caJrlAxEFrMXrnYqqOJ4koqsUFrehinWHHQONmI5PS8rg+29M0tEQJRlRqTseLckIFcslX7V5z7oWaraLqohlgfpW6GuO8zu7+5CSFf8urgd3tJAQQvzfwC7gdSnl/3Sz1xMSEvL22DucXzZf+sXxWSQw2Jzg7w9OcXa+yse3dqAsbUdI+P4bk/Q2xVnVkuCNiQIbO1Ns6mpYcU7fl/zVy8NMFU1SEY2RbBXL8WhMGLQkIxR9B9P2ODRZpLMhgum4+L7kzFyF/+/Xw+iqwum5MsPzFdqzMf7J+9eSrdq8MZYnV7EQimAyX8P1g33sv3xpiLaGGKdny5yZqwCSb++bxJOS41MCXRPLN6TRbBVfSiw7KIMbmkKpHthXV20Xz1uZWRHyzrBU6Tm/OOG6HjFDpzkRRH8v/VxOTJWwPZ//uFBhIm+iKrCqLUlEVTg4USQd1anZLi+eWuCl01kMTeG37+0lEzfwfcnPjs4wUzJ534Y2+puv3GzZlrq+AmKJO1ZICCF2AAkp5cNCiP9HCHGPlPK1m72ukJCQa2ewJU5LMoKmCsZzdQxN4dB4gRdPLyAE/OjIDP/v53YiEPz1K8PsG8mTiGj0NcaIGhr7R/NsusDIx/clBycKmI6P43lM5IMehNmSyXzFIhMzKNaXGh1tepvilOoua9uTSBlUIl46PU+u6nBGr/ChLe0sVGyminVmyxZCCBKGRqHuomsKqWjwa3mmaOIuhj2M56s4HggBq5ui1D2BadlUFjseZ6tOYM3s+CCDp2F53pjl5aygQ24MSwWe8yc8ai7UXIdc7ZxfhJQwXjBXfNb3YSpfQ1EUWpIG5bqD40v2j+ZRVYGuKjy0poVM3GC+Yi1X2PaN5JeFxA8PTvH0gQnWtzfwhx9ah6Jcv8bKS3HHCgngfuDZxa+fBe4DQiEREnIL88CaFiK6SlQPQqgOT5ZY05bk0ESBqu2xqjW53DuRrdpYro+ULulEEI3d23Tx/L2iCDZ3pZkpmUhfkq86uL5EVQRbuoOYb3u+gu36NCUMdvYn8aVkY2cKy/WJ6sFkBAQplEcmSzTEdBYqNo2La/F8SSYuMFTBzv4mFEVBFZIz85XFz3lAkBrVlonT05jgjbEcC/VqsMbFIosAVDVI1lQWJ058KVHFuYRLX15ZVCyV5QXnnqxDEXJ9UQj+Xkl56Ysb1bVFLwhJIqJRsVxaGwxKdRdNEcvJrY1xg5akQbZqr/Cc+NmxGaqWx+tjeaZLJt2ZG+srcScLiQxwdvHrIrD5/DeFEF8CvgTQ19f3zq7sFuBG9F1A2HsR8vbQVYX7Vp2bvb9noJlkVOODmzsYmq/y4Jpz7z2xtQtNmWawJcmndvZgOv4lo8kVRfDFBwcZyVbpbYzxlReGKJkOv7Wrl5NzFTIxg4/e1cGJmTKbOhtIRDQKNYf1Halls6Ct3WkOTRZpS0VZ35FitmTxmXt6+fsDU6gqxA2VF09l6WqM8t51bUQMlXsHmyjUXVzfZ7pQ54XTCyQiGv/0g+spmi5PbOvkn37zIJbrs3tVE6+N5EkaKv3NCc7OV1jXnmSyYFI2Xe5f3cgLp7NENYU1rQkOTZboyUQ5NVfB8SR9mQiTJRsFwVPbuzg+W2F9R5KDE0VmiiZRVWF6MYkyaSjUHR9DgfolXJzC6sc5IgrEIxqeL2lLGZxZqKMp8P6NbZyeq7KpI8VPj83i+5J7+zMcnamQjul8+ZFB9g4XuG9VE6YTTHnc1dPA0ckSUV1dtlg3NIXP3deP7fkr+iZ2DzTxw8PTDDYnaE/emO2M8xHyMmM/tztCiH8EzEspvymE+E2gR0r555c6dteuXXLfvn0XvX6jbqYh15dQnNza7Nq1i0v9+7se2K6ProoVY5w3iolcjY6GCJqmLmdpmI6HIgSGplAxHRIR7bJrGZor094QIxE99/xXrjtYnkdLMkqhFnhDGJrCQsWiJRnBl2C7HjFDo1QP8jeiusp8yaQpYVC3PSYLddZ3NpAvm2iaSioW7MlHF29MVcsNTLLmSjieZH1nmvFchdZkDM9x2TOW430bOzkzWyRfsblndSvPHJtkU1uaprjCdw/N8Mnt3Zydr3Jqtsxv7Ozn50em6G6Msr4jzTf2jPHk9g4mizYHxnN86p5B/vOzJ2hLanzqvjX84d/t5X983yrqjuCbr43zr3/jbv7Lc8cRQvD7793Av/veAT6/ewBV8fiLX43yHz6zk2cOj5Ktuvz2fav5i58f5f2buuhtivKXLw7zjz6wkUOj85yarfCp3av4ycFJ1rbH6cnE+Nqr43z2vl4OjRf55ckp/ujx7bxwbIrWTIyNXY18Z98IH1jfhicFr0/keWxTF786Oo2uw4PrOpkr1mlMRHBsl2OzFXYNNmE5HlJKoobGbKlOYzyCoSmYtkfUUCnWbWqWS2cmaJzVVWVZgEJQrXL9QCg4no8qxFUFctmu/7a9IoQQ+6WUu970uDtYSOwAviyl/LIQ4i+Av5JS7r3UsaGQCLmQUJy8c9xIIRESEnJ5QiFxFQgh/gzYARyUUv7jyx3X0tIiBwYG3rF1hYSEnGNkZITw319IyDvP/v37pZTyTcsad3KPBFc78jkwMBA+EYWE3CTCikRIyM1BCPH61Rx3Y2dCQkJCQkJCQm5rQiEREhISEhIScs2EQuIGUbNd7uT+k5CQkJCQO4M7ukfiRvGL47Mcmigy0BLnye09N3s5ISEhISEhN4xQSNwAzi660Y0s1HA9H+0qY17fLtPFOlMFk81dDZeNMw4JCQm5lQnN8N59hELiBvDA6hb2j+bZ0JF6x0REzXb59r4JXF8yka/xibu735E/NyQkJCTkziYUEtfAwfE8f/XrUbozMf7nD6y9KBBlS3eaLd3py3z6xnC+ZXvYmhESEhIS8k4RColr4DuvTzJVqDNVqHNwosj2vsa39HkpJbmqTSqqv20L0yUSEY0nt3czXQy2NkJuPbIVi2RUW+GZHxISEvJuJxQS18CW7jQnZ8qkYzqDLVfOf78Uvzo1zxtjBZoSBp/d3Xfdtj96m+KXTC8Meffz6zML7BnO0RDT+dx9faGYCAkJuWUIhcQ18OldvTy8poWi6XBkssTW3jQN0YtTAy/HVKEOQK5qU3c8UlcQErMlk9OzFdZ1JGmI6rw+lqcpYbChI6w6XAkpJQfGC0gp2d7beFUhNzeTycW/E6W6Q8V0iSRDIRESEnJrEAqJayQd1/nW/gk8XzJdrPOpXb1X/dn3rG3llaEsA80JUm8iQL53YJKa7XF8usSq1gSHJooAZGIGHekbHw97q3J0qsTzJ+cBUBWFu3szN3lFV+ahtS28dHqBrkyM5mTkZi8nJCQk5KoJhcQ1ogiBqgg8X162z6FYc/jBoSmEgCe2dS1XLd7KFoShKdRsD0NT0BcrF4oQaOq7+wn7zRhZqPKVF4cwbY/P39+/os/k2WOzjGSrPLC6hU1dDYznavz82CzNSYOP3dV5VVtBkfN+JvotcK0607G3JEZDQkJC3i2EQuIakFKyULH42F0dnJmvcnfPpSc0Ts6WmS9bAJyaKdGRjpGJGyQjV3/Zn9rZw+hCjYGWOHFDozlp4PqSt3NrdDyfmaJJW0Pkpu3F7x3OcXYu8Nt47uTcspAomw6HJ4Oqy/7RHJu6Gjg4UaBYdyjWHaaLJqoiSEQ00rHLV3PWtqf4+N0CKSVr2lLXff2W6zFXsmhviF63htmQkJCQW5FQSFwDz5+a58BYgbmySVPc4NRsmS/cP0DiAoEw0Bxn36iCQDBdtHjxdJaYofKF+weIGVd3A2+I6tx1nlDRVYVnjk6jCMEnd/XQnYm95fX//RtTjOVqtKQifP6+/rf8+evBpq4GfnnSwHI8dvSeq0YkDI3epjjjuRrrF/tA1ralODtXJRPXGcvV2Ducw9AUPre7n3T88mJidWvyhq3/O/snmS2Z9DSGlYSQkJA7m1BIXAPjuRpD8xWminWimkpEU3h4bQtbutLsG81TMV3uW9VMW0OUL79nNQL4+4NTANRtj6rtXrWQAJgvWxwYyzPYkiBbsQHwpSRftZeFxP7RPMW6zX2rmokbV/6x5qrBOfJVG9+XKIpY/vzuweaLBJHnS/YMZXF9yX2rmq/LE/iW7jT/6be340uJcV5VRFEEn9zZg+P5y1s56ztSrG5NoCqCnx2dBcB2fUqmc0UhcaMIxneDSlN28VqGhISE3KmEQuIacH2J50vqtk+h6hAzVPYO50lFdF46vRAcJODR9W2oi9MC71nXiq4qdKQjtLzFZrpnjs0wV7I4Pl3m9x7op2K5aKpgQ0dQsh/P1XjhVNBY6PnwgU3tVzzfBze3c2iiyIaOFIoimMhf+fPHp0vsGc4BENVV7h1sekvrvxxX6nXQL3hv6dj7VzcjpSQTN27aqKsQgg9v6eT4dIm73mHjsZCQkJB3G6GQuAaimsKZ+Qq5qk0mqjJTcvjO/nFWtcTJ12xOz1U4NF7gK8+fZW17in/z+CaaEgYf29rJL0/M8ue/OM32vgwPr21dcd7vHpjg2/smGGxJ8G+f2Iy++OSfiurMlSzihkoiol10o09ENObKJiemy+wdyTFVqPGpXb0rKhN12+Pb+8cpWy5PbO3iiW1d504g4dBEAdvzWdcebAdMFep8/40pJvM1hBCUTGdxymTlXxkpJT8+PMPZ+QoPrmlmZ//1ERmXIx3T+chdnTfk3MMLVX58eJpMXOepHT0r8kqWrkdMV/jkrl7WtCVZ03bjtk5CQkJCbhXCLrFrIBHVqJoOpuORq9lE1GCC4+RMhfXtKda3J5ks1inUHA5NFJgtmUBw0z08UcLz5fIY5xK+L/nRoWnqjseJmTIvnp7HdDwAPrKlgye2dfFb9/Ze8im+KWHQENWI6iqe5zNTshjP1VccM1mosVCxsRyfEzPlFe+VTJe1bUnWtCZJLIqPk7NlTMfj7HwVX0raU1E+tauHlmSEM3OV5Yh00/E5NVvG8yWHL/ieroXJQp3RbPVtn+daOD5dwnZ95koWM0VzxXtL1yNfcxjL1m7K+kJCQkLejYRC4hoYmqswVbSoWB65msd0yWKqUGdzV4r7VzfT3hBjoDlOxXIQgOMFN10hBNt60xiawvYLfA1ePLOApgiyFRvPlxwYy/P9NyaBoMy/pi15Wc+JU7NlinUX2/VoiOl0Z6L0XVD272mM09YQIW6obOxcOcXQ3xyMo/Y0xtnQGTQ4buxoIBFRWd+RoiUZ4aG1LcQNjb/dO8YPDk7xytksAFFdYWNnCkNT2Po2vRpGFqp887Vxnn59kmNTpbd1rmthc1cDMUOlMx2lM7PSo2PperSkIvQ3h+6hISEhIUuEWxvXwIHRAitysWRQcs/XHFa1qbQkDWzPJ6prNCUizFcsRnM1SvVAWMQNhclCnb1DWaYXO/+rlsv6jgbWtadwPImiKFSsoCJRs12ePzlPVFd5z7rW5b6L18fyjOdqJCMa7Q1R2huidGWiRHWVsumsaOh8Y7xAKqrz4c0dFxkeJSIan79/gFeHsvzyxCySQFysaknS15TgvetbieoqU4U6nh985xXLBc71C1wLQ/MVDk8W2dgZfN9L5wSo2u4VPrmSvcM5ZkomD6xufsv9JxBUQfaN5BhsSfAHj6y+5DEd6Shfes+l3wsJCQm5kwmFxDVwcLKw4v/jhkJnOsqhiQIjCzUOTxY4OhlsYcR0hUPjBXI1m1zVpmZ5zFUsOtNRfnlijnXtKYbmK3zi7m5ihkpLwiBmqIxma2zuChr5Xh8tLG9HdKSjbOxsoFi3+cXxWTRFob0hwu5VTVRNj8OTBYQQ2K7Pk9u7kQTTGUsVBCklH9/WheX6K3oA5komr5zNcnKmhJTQENNJx3QSEY3GuM69g000JQzet6GNfM2+Lg2XPz82S832GMvWWNuWZFNnA1XLxfXlVTtRLlQsXj4TNLh6vs+T23tWvO/5El/Ki5o3IbgWluvz3Ik55ssWwwtV1rQl33TqJSQkJCTkHOFvzGsgbqgU6ueemG1Psmc4x6vDeRpjGghBue4SNxRcKXnm2AyWI2lK6nSmY1Rtl4Sh0hg3kFJydr7Cd16f4KE1LZyYKTORr7N7sGnZArslZQCgKoKmhIHr+Xz/jSkOTRTpTEfZ1pvmgdUtWK7HSLZKxXKJ6ApfeXEIKYMei5ihUrc9WhIRvrVvgslCnftXN3PfqmYAklGNmKFiaArDCzVyNZu1bSmS0cAE6+nXJxnL1dg10Mh717ddl+vYkows+1kIIRACdi+u52pJGBpxQ6Vme7QmV25HFGo233htHNv1+Y3t3RdNefzw0DRn5iq4no+mKjREdYzrFKAWEhIScqcQColrYFd/Iz84NIMEBEGjpC/BR1IyHXRVpa1BpzUZZaA5wd6RHJm4QWPM4Lfv7WVkocps2UJXBKtak8yW6uwfzXN6tkx/c4K4oXF6rkJnJkbNdlnVkqC/KYahqTQlDIp1h/FcjbZUhK50FE0Ifn5slgdWN/Po+lbG8zUSEY2zc1Ucz+fls1ke39qJpij40ue/7xklYaicmjXYPdjEydkyqhB8/r5+jk6W+NmxGSKawurWJO9d30oiovGDg9MAnJ6tkInpnJqt8NiGNjIJ45quoel4dKWjgOTewUA8nJoNqi7r2i92onQ8n+PTJZqTkRUmXDFD5SN3dXBqtkJTQufoVJGNHQ0oimCyUKdmB9tDQwvVFULC9wMBB4GI+sTd3TQljOuWxPpuxHZ9fnJkmtZkhAfWtNzs5YSEhNwm3LFCQggRB74FJIAi8GkppXU1n33h1MJyj4SE5S0Cy/WQUmC5HrNlSV9TgpeHspTrDtmqTUc6wl++NMJkocZYtkZEV9nW3cCJ2Qpj2RpxQ8VyJY+sa6WvKc73DgTNlrbnc2A0jxCAgA9taidbsclWLRzf57WRPJ6UnJwp4foSKWFjZ4quTJRXh3L4vuSHh6b5Hx5exZ89e4qh+QqOJ3l4XStHJks8ezwweXp8ayfb+zNMl+oU6w4PrW1Z7qfY2d/ImbkKg61x/vRnJ7Fcn8OTRf63j2++puv/s6Mz/OTwDMW6zViuziPrWnl+yctii2Rj58p001+dnOfIZBFVEXzh/oFlIyrL9fjBwWnmy0HD68bOBmq2xz0DTaxuTdLbFExbXOj3oCiC3YPNHJ8usaO/ka5rcAi91fjrV4Z59tgcAImIyrbzHEVDQkJCrpU7VkgAHwb2SCn/WAjxLxf///tX80Ffrmi1pGJ7tCR0GhM6E3kzqFT4PhFNJbHY8FgxXWaLJtNFi7rtUrFcqpbLq0M5TNdbHqdsTUX42NZO/uaVEV4+s0BcVxEIaraHBH5+bIbDE0VGslVaUxEs22e6aBIzVGw3+PrETImxXJV/8/hmHE8yX7awHI9/8fQhfnR4GikljXGD4YUqbakIp+fKOJ7kwTXNrFVTfOLubhYqFl95YYi67fEPHgosvWOGiq6oQfVFSk7PlvnegUk+uLl9RV/BWLbGXzx3msmiyW/c3c1TO1f2LUDQuyClRC5+7XpyxXsXH+8vX3vvvOsv5VJFSC7/XBwvODaqq3zyEn/2Evevbub+1W9tK+VWxjnvGluufxNXEhIScjtxJwuJs8DOxa8zQPZqP7i9L8Pzp1cenqs5xBx3RaXC8X3+lw+t48eHZynUHWZLJs0JnbrlkomrVEwPx/NQhCAV1djQ0cCX3zPIq0NZRrM1HFcyUzfZ2d9E2XTQFEHN8njh1DwRTcF2PWKGytq2BJbr8zu7e/nff3wC15PMFE1ePDPP41s7OTZdYr5s8rd7x7CcQJBIJJbrMZ6rkYpoCCEom+f6Pl44Nc/B8aCp9HsHppZfL9Ydfv+hAV48s4CUgYnT4YkiO/obqZgujQmDZ47N8OpwDtv1+f4bk2zrTbO6NYkQ56LGPri5g0zMoGw6bO9rpL0hguV5pKI6m7sakFJSqDmkohqaqvDe9W1k4gatqQhN522nRHWVJ3d0M5aroSkCRYjrFhluuR6m418xHOxW4vceGCAZ0WhNRZa3k94t3G7XOiTkTuJOFhKngd1CiKPAHPC/nv+mEOJLwJcA+vr6Vnxw76Jd9IrjJShCAbzl114+nWUyb/L5+/v42ZFZxnI1huaruItP3MlI8JQvCG6IJcvlh4emuas7Q0NUQ9cU6g4cnizS3xQnXwsSMF1fktJVJvPmchLoQEucN8aL7OhrZLZkka857B3Osbo1yQOrW3j69YmgCgCkIhq9jQmOTJaomC7JiEYqpq/oIVjVkiBuqDieZENHinzN4fBkYXlE87fv7eOnR2bwJbQ1RPi7vWMsVGx29DcS0RTqtofl+tRtj+8dmGRTV5qPnudImYxovG9j0LTpej5f3ztGtmKza6ARIQS/OD7LoYki7Q1RfuueXqK6utwYeiE9jYEHxvWkZrt87dUxKpbLoxvarps4uZnEDY3fvX/gZi/jIqqWy9f2jFK1PN63oY1tt8G1Dgm5k7iThcQXgJ9JKf9UCPHPgM8Bf7P0ppTyK8BXAHbt2rWi1l53Ly69J6MKtuOteE0Cs6U6T+8bZyRXx3Z9zqsuowr4rXv6sD2PF0/NU647nJmr0JmO8vi2LjZ3V9g/UmBovkKpbjGcrfPQmhaaEgaZmMbX9o6jCai7PoWaw7f3j/NvH9/Cpq40Pz40RclyOTJZYnNXmprt8Zs7eshXbT53fx+m4/P1PWM4no+Ukg49wlzZJBXVaIwbmK7PP350Db1NcdoaorieT0RTGM1WyVdtDk4U+ehdnbQ3RPGlZKFiU6o7PH9yjo0dDazvSOJ6Pn3NSVRF4Y3xQIR0paPcO7gy+KtqB5WRkzNlyqbDw2tbGc8F7pGzJTPw5FCuHHLmuj5PLxp4/ebd3WhvM1gsV7WXRdNEvnZbCIl3K7mqTXXRM2U8XwuFREjILcbt26L+5ghgqbSwALyt9KWi6VM/T0eoQEQD15Mcm65QtrwVIgKCsVHb9RnL1snVHPI1m4l8ndfHCuwZyuH5UDIdNFVwZKqC6XgcmSqyqi3Jy2dzlE2X2bKN4/kcmy5zYrrMf3lpiO29GSzPp1x3Gc9X8X3Jw2tbGGxJ8OSOHrb2NFKxXBzP59BEkYPjRb62d5yvvTrG069P8ovjc7xyNsuLZxaWza+C7YVWepvilC2XkYUqPzkyQ1RXSUV1tvdlmMgHhlXPn55nIm+yUHFob4jSEAuyQL6zf4K/3TvGq0Mrt4XSMZ3RbI3xfI0DYwXG8zUeXtdKZzrKw2tbVvhdXI4fHp7mO/sn+M7+CX5waPrt/CgB6ErH2NqTprsxdt1CykIuTXcmvNYhIbcyd3JF4uvAN4QQnwcc4DPX8+Q9jVEmCuaKxsALcXyfo9MFCjWXcs2h4vgUJopMFWps72vk8a1ddGdiy7HftuszulDj//rZKZJRlVRUY6kvsWbZ5Ks2r43keP7UHIMtCWaKJsPzNSzXo785QX9zAoBfn13guRNzJKM6PY0xzsxVKdUdaraLqggMNRAPAsGhiSJDC1W29aTZNdDEp3b14vmS4YUqp2bLfGvfOB+/u4v7VjWzbyTP4ckC47k6ru+TiUdY15bkobWtzBRNpgsmihBoirjoWvQ3xynWgz4QQ1XobYyzuvXqQ7HO93/Q1YvP/1ZRFMFjG6+conq7Mpat8YsTs7Q3RPnw5g6US/y8rid38rUOCbkduGOFhJSyAHzoRp1/Im/iXeF9QWAwZbsSFbD9c130+ZqDIgTSl2SrFh0NUYp1m1zZomx7VCyHdFznUzu7AMErZxcw1GCyI6ar/PrsApmYTlPCIKarDC1Ul10yPV+yZyiLpioUaxaPrm8Fgpt7f3OcT+/qJRXVaE9HaU5G+MsXh3B9kenEOQAAIABJREFUyVzJZF1HioaozhPbuvj2/gkkMF00OTNXYWtPhg0dKY5Nl0hFNRpiOlu703xwSwe6qvDFBwc5PFmgvSHKXd0ZpJRMF00ycZ24ofFPHlvLz47MsK4jRXtD9BJX7Mp8dGsn2qKA+ODmjuXXFyoWmiLIxM81aPq+ZKZk0pQwLqp21G2PfM2mMx1d0Rx6J7F/LEeh5lCoOezoa1w2RgsJCQm5FHeskLjRXElEQNA/oSsKVcul7vicfzuL6ionZkq8dHoe1wdPSgxVwfd9VCVIGr13sAnT9fnWvgnqjkdTXGd1a5KIpvDi6WzQ8Oh4dKZjzBTNZSGhKgLXl7xyNsts0eS1kRxdjXG292Z438b25YmIrT0ZXj6zwEtnFshXbVa1JvibX4/w2d39NCYM3r+pne++PoGmKsuNjhFdoWI6lE2X7X0ZPrG9e9mauisTW+HV8KuTcxwYK5CMaHz+/n5SUZ1P7up9W9f8fAEBgcHVjw5NoyqCT+/qXb4h/uLEHEcmizTEdH73/v7lNdquz9f2jFI2Xbb2pO/Yp+Q1rSlGszWaEwaNiXCKIiQk5MqEQuImElt6GpYSoSjo0idmKKgCyjUb013yTgBNB0VT+ehdHdw70ISuqvzr7x+lULOJ6ArpuM7n7+9nNFvl2/snmC8HoV3b+zLUL2gCXduW4vBEkfmyuTjFodLdGEO54Al8dKFKY9zA8XzqjsdUwaRQd2hMGHRnYnz5kdUoQiz3UShCsGugicl8jU2dDay5zNbEZKHO8yfnyVYtBILZksl00cTzJfcONqGrCkPzFYYXqmztydCauvogrslCnRPTJdZ3pJgvB/5i3lJlZ1FILL1eqjtYrr8sJEzXWx6BXTrmUtiuz97hHLoquGeg6YaX/t9p7upJs64jia4ot933FhIScv0JhcQ7SFQVmOd1XJYtFysXzM+7izbbVctHEaAoCjFDxfeDAC2kpCsTQ1M1hFD4ry8NUazbSAmaEKxpS/HSmQXqtkfd8XA8n6SikYpqPLh6pR3yQ2tbsF2PdFTDlcGoZ65q88KpeZoTBgMtQS/FE9u6mCzUOTpdQlcEdcclHT33V+bCIKx7B5s4MllCCMFEvs6JmTKbulY6VAL88OAUtudzYrrM3X0ZfnRoetkgKaIpbO3J8MND03i+ZKpo8vn7+q/6Gv/o0BRVy+PkbJkvPjBI2XSIaCrrz7PdfnRDK3uHcww0J0hGzn0/DVGd965vZSxXu+yoKQSpq6+NBH26DTH9IhfO24GI9uYNriEhISEQCol3jGD7fmXjpen4WA4oInBolIAng/8U3ydh6NQ9n4rpENNVxvM1clWHyXwNQ1VQFYWoLnhkQxvdmRj7RvNYjkdHQwzXD8x9ntjaxdBClVf3jrG6NcmHt3TQkozw8bu7KZsu3z0wyf7RHK3JCO9Z18pPjk4jEDy+tZOWZITuxjgzJZOGqE5LKkLkChMUqajOYxvb+PmxIHPk7w9O8vpYnqd29KyINE9ENBqiGr1NcTrTMZoSBtNFc/k9VRHEdJWK5S47g14tcUOjankkI0EI2aUizjvTMT5xd/clP7+9r5HtfVe2jj5ffJz/dUhISMidSPhb8B3Ck+Bd0DihKUHHemNcJ1+1sc57P26IwOMBH18KXN/H9UHgMLJQYfeqZgaa4+iKwpaeNKPZGl0NEVRVYUd/Btv1sV2fY1Mlfn58FkMT2K7HQHOcllSEUt3hxEyZmWIwWVK2XJoTOmUrqAwcny6zqtVnNFslpqv0NMV4cnsPifNunPmqTa5mM9icQFGCjJGorvDhLe0cmigxVagzVzL59dkFdvU3LedjPLWjh/F8jbiuYroeq1qSTJdMPE/S1xz0W3zm3l5miib9zZc3mjIdj4l8je5MfFmoLJ1bEUFoV/dlMjTqtsdYrornS7ob48uOikvfb9sVGj63dKdJRTV0VbkjMjpCQq6VgT/60c1eQsg7QCgkbiKOD4aAQs3lwuiDqi2JaT6OL1E1hUREo277FOsuhZrDZGGShKEhkXz34BSaIojpGoOtcSr1FD89NsPIQhWJRCBILTpJ/uTIDKoi+OTOHta2JzkyVSRXsUgYGvNlm/jiDXJDR4pEJJj4qFouW7rTK6ypK5bL1/eOYbs+d/dleHR9G99/Y4rJfJ10TOf9G9v40eEZxnJV9o/mOTVb4YsPDhDVAzfPCxM+L7zhN0R1GqJXbvQLekEsWlMRPre4/REzVBQh+MHBwNb7iW2drGm7OE30W/vH2TOUo+4EAV9ffHCA49MlfnVyHkUIPnNP7xWnFZZGaUNCQkLudEIh8S7Adn0u5TbhyWBMNKYrqIogoivUHACB6fjYjo1QgmM8VcH1HHIVm5fOzjMyX6FouqhC0BDTaE9HsF2fM3MVYrrKM0dn+MTd3ewebOa5E3NULJeoofHYhna2LCZl5qs2mxb3/09Ml/jCf9vLg6ub+dIjqzEdj5rtMpatUTYdapbLofECTQmDM3NlWpIGn9zZw/On5hnPBV4WtudzYqa87BR5YrqMosDDa1sv6re4GkazVSbyder2ylLPkiMlsCI/5HzKpkvdcZdDzuq2u3ysLyUVy2VmcaplsCWxfE1CQkJCQlYSCombjH2h3eUimgrOYgNmvuZQtT1cTxLVFRzXQ18c49RUBU0oaGpw7FTRxPF8qraHAHQV7u7JsGuwkVLdpVCzmS07dNdi/PDQFMWas5zBce9gI+vazk1aZOI6j25oZaFs83/+9ARV2+XsfIXP3NtLSzJCTybGWLbGdNHkxdMLJCIqiahGoe5wdr6K7Uke29DGvtE8PY0xpITnTgQx1kenSssmUo1xY7kvYSnJU1MVHM9HU8Ql/RyW0lJVRaBcoEG2dDVQtVyk5KL48CWe2NpFzXIRCBKGxkTe5N7BJjwpSRgaq1sTfH3vGHMli7PzFVa1JlYknIaEhISEBIS/GW8CmsJFWxkX4nhBpUERENEEqqLgeC4120NVxGIMNxiqQFcFvhSoQhLTNVRVWY7ojugaX3hwkM1dDXx9zxirWlPUbJeK6XJmtsyp+QoNEY0P3dXB6dkK+0YKfOSuDla3Jvn2/nGmCibvWddKRybK2bkKqahGTA36Ee4ZbGa2bDFbNInqKpmYwRNbu/jh4uRE4ENg8IFNgR+D5QZNkBXLpScTY74SjFg2J4LxzrLp8I3XxqnbHhs6UxydKtGSjPDpXb0rsjkAhBAMtiSJGxo9jSu3RTRV4cE1KydVLqSvOc6TO7r5wcFpFCFoTOhEdZVH17ctH9OciDBXskhF9RXOmSEhISEh57gthIQIHlk/C6ySUv6xEKIP6JBS7r3JS7sk8k1ExBKagEzC4P0bWhnL1zg9W6ZkuhiqwHQhqimkYwZSShIRjYGWBH/wyCq+s3+CmuWSr1m0pwwWKhbNyQiPbWzj2FSJVa0JJvI1POkznq8FCaRSkq85OJ7PM0emsX3J0ckim7vSnJot8yef2MRfvjjC49s6MRYbG9e0Jfns7n4m8zXKlstd3WkycYPP7u4nV7EpWw5j2dpyA2VEU/nsfX3kqjZd6RjZqo0ioDkZCImpgkmp7rBQsZnI1+hIx5gvW+Rr9iXdLp/a2c1cybpm58U1bSk+e5+OKsTyGs7ng5va2dzVQEsygnYHCAkpJadmKxiawmBL2AMSEhJyddwWQgL4C8AH3gf8MVAGvgPcczMXdTku5XopuHA4FBwJxZrNsyfmKVsutuMjCPbwhQi2RcqLseI12+MfPDjI9r4mDowVeflMFk8KSqbHL4/PsaOvkW/tn+DYVInuxhgf39ZJRzqKL4PzJSI6rSmNXx6fY7xQ5excFU0NKiFPbOviPz83xGi2xuTzw6zvSC9POViux3Mn54HAkfOegSYSEY2jUyVePrMAwKfv6V1upowb2vIWwYVGU/3NcWzPZ6pQp7UheG9de4rWS9zkIRAm50efXwttqcuLEEURb/v8txIHJ4rLW09Pbu9e9hMJCbkTuBETJiP/x8eu+znfjdwuQmK3lHKHEOIAgJQyL4Qw3uxD7xYu5x2oiKAHoGQ62IvR5XFDRVECDwpfSmqORyKisao1yVzZ5CeHp+lMR3lsYzu/PDFHOq5TNB2++uoIrw1nyVWDqsNXXxnjzGwZ25esa0vSlDBY357iG/vGmS6Y+FIS1TS29WQYaI5zdq7KeD5wujQdb1lIOOf1eDjn7dc4nn/J15fwfclzJ+co1Bwe3dBGY1znlbNZfClZ2x5sWTy1o4e+5jhVy+XZ47PoqsJjG9tCs6QbxPk/M9u7yrJZSEjIHc/tIiQcIYTK4kO9EKKVoEJxS3B+JWKpgK4qLBpCRTk2WUQBVBXWtSWYr1rUbQtPgi4EyYjK5s4kpuNzYCzPI+tbed/GNrZ0NzBdNBnL1jg1W6Zm+xha0Kx5eCJPoe6iqYJTcxX+/bo2/utLQ9iOF5hFNcbYPdjMP3x4kKGFGpmEzlxZpTUVIVsJthos16M1FeEDm9qpWi47+s8ZOd072ISmiOUtlwsZy9U4NFEEYO9wjs1dDbwxXsBQVXQ1SIPsa45jOh57hrMMzVcB6GmMsbUnc4N+Enc223uD62qoCmvbrj55NSQk5M7mdhESfw58F2gTQvx74JPAv7q5S7o2fM5tc3Q3xpCAj8BHogqBYahYRYmqCKQfbG9MFUyeP71AazISND+WTL78yBoeWN3CWLbKHz19mFOzZWK6SksqwmzJJFdz8PxgW6NkOuwdzrK6NcHeIQ3L9RloSdDfkiCqB+KhvymoDKxpS9HWEKFYd/jbvWOYjsdHtnSy+wJLaV1VLnrtfJqSQfKm6Xh0Z2I0JgzihkrN9nhkfRtbutPMly2+uW+chYoFEpqTxjUlg4ZcHZqqcM9A081eRkhIyC3GbSEkpJRfE0LsBx4juA//hpTy+E1e1jUjCSy1LccnX7NJxzRKpoeuCobnq0RUhfaGKLmqRdUO+iXyNQffl1Rsj+H5Cn/yg6M0JgyaEjqaAn1NcVqSBumYwVS+RjKiYTkuPgIhfU7PVfiHDw1yd2+Gb+ydoGQ6PHdyjuaEwb2rmvnDD21goWrRnDDI1xxeODVPseZgaIKfHp5m/2iOj2zpoDFxdQFbDVGdLz44gOl4yxHfX3hggIl8nYWKRa5qM1M0sV2fhqjO5u4Gdg80cWa+SqHmsL7jYpOpkJCQkJB3nlteSAghFOCQlHILcOJmr+d64fowtFBFVwO/iJiuULY8HE9iez6qEMtBX5oQ6IpgvmJjuT5Vy8X3QSJpiOq0N0RZ157Ek4GJU90OQr0cPxgjnSnbxA0VIQRdmTiPbmzlz549zXzZ4u9eGydfc3hqZw+9RrDV8N3XJ7Bcn4rl0qTqvDyURQgYzdb45x/deNXfY1RXiZ6X3RHVVX55Ypaq5XFsqsTv7O5jaKGC40nuW9XM4Ykie4eDsKy48fYbLUNCQkJC3j63vJCQUvpCiINCiD4p5djNXs/1pLro2NgY04JQLxl4Srh+0J4phEBZ9Iuo2S6eHxxju8FrArBcH4mkbnsMZ2tIKcnEdTxfMlu2ESKIMz8/M6InE0cCs2UT1/cxXZdXh7KcmC7Rk4nzxngBTQn6GNa1J/nJ0Rls18dyLzWPspIXTs0zvFDl/tXNF9lkA8tR5qoiiOrqinCt832pwnjrkJCQkHcHt7yQWKQTOCqE2AtUl16UUn785i3p2omoQZfEUshX3fYwdAVDFXQ3xulORxnPmxiq4NRsZTnQqzFuULU9HNdDiCBBc3VrnIiuc2a+iqJAY8JgR1+GofkqykwJy/V5z9oWehvj1G2PmKHy67MLgUlT3KCjIUZHKsYrZ7MAHJmapr8pzlzZoqMhiun63NWdpmy67FxstpwrmSSj2kVOkFXLZf9oHoA9Q9lLComndvQwtFBlTevFzX5dDTE2dqZY1566bBjXzaJue5RMJ+zhCAkJueO4XYTEv7vZC7ieuIuW1eeQlMzA8jqqqUgEhbqDKoKmRtv3URBk4gaKcCj4kpZkhLv7MvQ2xrFdD9Px8CTs7G/kS+9Zzb94+jCqotCdibJQcfhX3zvCPYON7B5s4q9/PcJ00aQ5abChM8X6zhRly2V4ocq2ngz7R/PMlS2+8uIQGzpSpOM6/c0JNnel2TOU5ddns8QMlc/f178iLTSmq3RnYkwW6qy6hFCAQOjsTFw8uXtwPM9/+OlJfCn5g0dWX/bzNwPT8fjvr45SsVzuGWjiobVXdtUMCQkJuZ24LYSElPJ5IUQ75wyo9kop527mmt4OS9YMugApIGKoWPWgPDGarbG2PUFEFcyWLZrjOhFdwfclTQmdVESlvzlOMqqTMlROTJfoTEd5amcPmipoTkSwPY+4obKjrxHL9Tg+XcZyPNIxjUxMR1MVuhtjfHBTO7/7wAARLRAAdccjbmgsVGwKdYexbJVcxWL3qmZ+595eYobGt/ZPUKw5tKYilE13hZBQFlNHLddfjv2+Ws7MVajbQVDYvpEc7z3PyvpmU7Hc5aCw2ZJ5k1cTEhIS8s5yWwgJIcSngT8FfkXQGvAfhRB/KKX89k1d2NvE0BU0VaGvMcaQV8VyfSDI06gshnLNV2x0VcHxfSYLJooQ2J6P5VSoWR4+EkUoHJkqEtHUoGownCOqq2QrFnXHpWq5VGwXT0oeXNPCfNnG8yWf2tW7bP4khCBuaNRtj6rl4no++ZpDHoc9Q1m29zWiKgLX88lWLR5a23JJ62pFEW9ZRAB85K5OvntgCiHAl+D5wQjsu4GWZIQHVjczXTR5YM3lR15DQkJCbkduCyEB/EvgnqUqxKIh1bPALSMkLmWR3Z6K4ElIRnViukrV9lmoOnz3wCSZmI7t+iiKxPV8HFdSs13aUlEqJZeK5SGlRBECVQlMhjRVQVNFELylq7Q1RMhWLObLNoamYNoePz48w1M7e9gznOXb+yd4YlvXin1/RQmyMTZ2NjBbsoIAMMvjhVPz7BpoJG5obOlKXzI06+hUkRdOLdDXFOejd3VcMtXzcsQNjY9v62KyUCcV1VBEsKXw9OuT1GyXj23tpDN98/omruSZERISEnI7c7sICeWCrYws50wibwk0AZomqDuBnEjp0Jg0UIWCrq684dZtj9ZkhK09aRKGyutjBUzHI2ForG1LoiqCtOXSnYmzriPBlu4M/U1BDLYUklUtCcZydeKGSlPC4LkTsyxU/v/27jw8rrs89Pj3PbNv2iVLtizZsuMlXuM4e0jS0DaQAClpgbKE0tJS+tD19nJL7+UWnvZ5bmlpLw8UWppyWcoSmhYCAQoFAglkIYnJ5tixE++yZO3SjGZfznv/OGNH3mVZsqzx+3keP545Z8457xmPNa9+y/srcGAkB8DDu4dI5b2m+p1HvOW+U/kSXU1RQn4fb716KQOpPHdtWcKPdw9xaDRLWyJMNOjn9g0dNMWCx0poT/Vcb5J8qcJLg5PcuLKF+ujJrzkqmS0xni3S3Rw9lnDcvr6dbz7fz1XdTYgIh8ayx7oSdvan5jWRMMaYS1WtJBLfE5H/Au6tPn8L8N15jOeclRTKpVfaJDIlePZQknDAx+r2BJGg39uIV82yLuInFvKzrqOO/aMZjiRdRISxbJF8sYI4wg0rW/iNG5YdV6vhqJVTSiC/dsNicsUK923rJV0oc/3KZh7fO0a+XKG9LsyXfnaQsuvVcrhuRTMN0eCxIlLLW+LVcRFFLl9cx4ozDIK8fHEdw5MFljZFSIRP/9FLF8p86YmDFMsuV3Y3ctOqVgA+/ZN9PHVgjO/vGOTjv77Zq4gZDZApVk45A8QYY8zcq4lEQlXfLyJ3ATfi9RLco6r3z3NY52xq14Zb3ZAveWMSEmE/Dq8sIJItVuhuihD0OyxpiLIoEaY5HiJfqtAYC7K4IcI1K5r4z2rVyXIFIkGvbPVr13ec4tpKR32YaNDHriOTDKby3LGxnbpIgFypwqGxLD4Rru1pOq5LIhzwZmeo6lm7KjYvbWBTZ/1ZX5ctlimWXQaSeR5+aZj1S+ppigUZTRcA7z1J5kp0N8d41w3Lp3VtY4wxc6MmEgkRWQ78p6p+vfo8IiLLVPXAWY57J/AbgA94u6r2zXmwM3AkmccnSsjvkCu7+AR6x7IcHM1RLCtHkjm6m6Isa4lRqrjIcIag3+Hh3cM8fzjJiwMpssUy9WE/+4bTbF7acFI3wBP7xtjRn2I8W2RnfxKf4zA0WeDv3rSJ+kgAAXKlCgdHs6dchOtUX+SnGhA5nS/8tkSYK7sbuW9bL42xAA+/NMQbr+jkPTf1cN+2w6xuT9Dd/EoMM0kiVBVVK2xljDHnqyYSCeDfgeunPK9Ut1116peDiCwBblbVV89xbOdFgYBPKJaVUNBbGbNQ8QpQDaYKJCJ+QBjNlEiE/FzT08wPdg7iiNBeH+LASAa/I5TKLn0TBcazZQaS+ZMSicZq7YZIwEciHCBbrNASD+IIbF3WRLZYwe/IGbskjqq4yteePkz/RI6bV7VyRVfjWY850VXLmtjRnzpuLY7lrXH+7LVrzvlcJxrLFPn3bb24Cr+6ZQltVkTKGGNmrFYSCb+qFo8+UdWiiJxc1eh4twE+EXkQ2An8saoeq/EsIu8B3gPQ1dU1u8EC5dPtE6q/KXtJRCTgY9WiOIlIEFTZ3FXP97cPUahUWLUoxorWOA7euAlXYW1HHWOZIm2JEJctSnBtTzMjkwU+8r0X2T+cRUT58s8O0hgNEgo47B/JUB8OsG5xHfuG04T9Dr9143IOjmbY2NmAiHBldyPt9WGiAd+xhONMUrkSfePewM1dA5MzSiQiQR/vuLaL8UyJpU2zO4hy/0iGbLX8+L6RjCUSxhhzHmolkRgWkTeo6gMAInInMHKWYxYBQVV9tYj8DXAn8PWjO1X1HuAegK1bt544M/O8nC6JcIB42E+mUKHi6rGaCZ2NUY4k8zREAkzmKtRFA2SLDsta4ly5rImdA5OMpouEAg4/eXmYZw9N4HOE9vow3c2x6liCHj754MuMpAvsGU7zwW9sZ11HPc/3JemoD9OaCPJsbxLwlpP+5XXtx8V2LiWpG6IB1rQn6B3PsmUGScRRiXCARPj0Mztm6rJFcV48ksJVZY2tImqMMeelVhKJ9wJfFpFP4g227AXeeZZjksDD1cc/ArbOXXjTEw4IHfVhDo/nyBUrxMJ+4iE/yVyZl4fSFMsua7JFrl7WTDjgUB8J8NWnDnFoNMPihghlV/m3J3s5MJKmrS7Mlq56BlJFVrTGuGNDBwdG0nzu0QOMpIvVwlUVXFXKrstk4ZX0JlM4++JbZyIivHbDyQM6LxZ14QDvuLZ7vsMwxpiaUBOJhKruBa4VkTggqjo5jcMeA36n+ngzsH+u4puuhmiQXMkrJBXwe90aPa1x9o2kGavOWOgdzXLH+g4CfuGxvWPsHpjEdZV8yWXbgVEOjmQYyRSJhwN8b8cgzbEQvWNZXFW+u30AR6BUcbl9QwfXrWjmyf1jtMZDXL+yhW8/348DvH7juScB5YpLoeweVxLbGGNM7auJn/oi8kfA54BJ4F9EZAvwAVX9/umOUdVnRSQnIg/hdYN8bC5j9Iu3hsaZ+kgm82UEyJZcHPEGWR4YyTCRLVKpzvssV5S+iRzRkNdaEQk6+B2H/okcR5I5XIW6iJ+O+jA9LXGSuRIt8SDdjTEaogEmskVWtCW4ZXUbY5kCP9w5iOMIK9vivPO6ZTO6t3ypwleeOEQyV+LWNW1sWtowo/MYY4xZeGoikQB+S1U/LiK3AW3Ab+IlFqdNJABU9b9fiOC8a0FD2CFfUXIlPWVJbMeBQnV2RDzkBxUSYR8iQWIhl4ZIgMZYgKDfx3imyPoldVy3opnmWIAPfuMFRCAR9HP3td286cqlNMaCDKcLNESCBP0On3zbFvYPp1neGqchGuRffjpI2VVwlRf6kqzpqJvRvY1liiRzXrGsA6MZSySMMeYSUiuJxNFiALcDn1PV5+Qiq1BUAcbz7rHnJyURQCpXQQQcUYoVl1WNYSou9LTFiQT8RII+tnY30jeR48BogR3PpGhLhGiOB1GFkgvNiSBv2LyEpngI8GoyHNUQDXJFd9Ox57dv6OClwTRBn8OtaxbN+N7a68KsW1zHcLrAVcuazn6AMcaYmlEricTPReT7wHLgz0UkwStFIC9aAohA2O/gVGs9lF3Fwds2mimytDHKe161gs88so+9Qyl+sGOAYsUlFPCRLZQZngzSVhfG5zi0xkMsSoT53KMHKJW9+guDqQLLW2PcuXnxcUkFwJKGKB/9tU3nfR+OIyfN8piOT/14D7sGUvzalk5uvoiWBTfGGDN9tZJIvBtvwOQ+Vc2KSDNe9wYAIrJOVXfMW3Qn8Akkwn4CjiACkYCDOD6iQYcjyQLFsovgTf1M5cvc+9RBnjucZDiVI19SHAecfJnGaJBsscKKlhj5YoWmWJDxbJHxg2MUyoqDEg76yRTKLG2M8pr1p/6yn8yXqLh6rPDThdA7nuUnLw0D8PVn+iyRMMaYBaomEglVdYGnpzwfxVsB9KgvAlsudFynU6kmCD4RBGVEwe8I4aAPVSi7StlVAtVaEPFQgEyhTKns1ZZQhWDARzzsp70+TGMsyOr2BAdGsxTKSmPUTyLsEAn6SOfLNEQDLGuJnjKWock8//ZkLxVVXrexg5VtF6auwqJEmPb6MAPJPOsXz2xshjHGmPlXE4nENFxU4yXAa5WoqOJ3qlWnAFTpbIggQCwU4DXr23n7Nd1849k+brqslf0jaTqbImSLLqsXJehpjfGGTUsoVlx29qf48a4hfA5s7W6ighIP+lnTniDo9xEJnrwCKMDwZMEbcAkMJAsXLJEI+h0++mubGM8WWWSVJY0xZsG6VBKJWa1Meb6ifmhOhCmUlGK5DEEh4HO4orOBW9a0sm80S9DvcNeWTsJBH69a1YoI9LTFWNoQJeAXShXlltVtBP0OQb9w0UjBAAAc50lEQVTDxs56Rqq1Jvw+4ZkDE4A3wHL1Gao3rlqU4PB4jmLZ5YquCzvbIuh3LIkwxpgF7lJJJOadg/cF3xgN0BANEg362D+SRREawn7CoQD1sSBj2RI+EfyO8OmH9/DSYJqelhhrOuoI+31s6W6kLhLggef6eeDZPt6waQn10QDhgI/bq9Ukn+udOHbdkN85Y1wBn8Nt69qZzJf48Ld2kMyW+L1bVpy0PsYPdw6ydzjN9Sta2NBZP+vvjzHGmIXpUkkkimd/yfTF/JA53YIZp+H3wdr2BH6fQ8l1OTiWRVEqqiBCrlDmucMTHB7Pki2WaYmH2D/iLQc+lMpTqiiNsSDPH06yqC7EQDLH8GSBxmiA121acty1Ni1tIBbyEfT56Go+9diIEz11YJxDo1kAfrBz8LhEIlessL3PW4fj5wfHLJEwxhhzTM0kEiJyF3AjXjfGI6p6/9F9qnrtbF7rXJMIgLIL6UKZYMCHI0JDJEiuWCYa8uM4wkgqT8l1GUzlcV3lSDJPpaIUXaUtEaI1EUQRVrd7xaS2P5hk/0iG/okcazrqWdkWP+565zrWYWNnHY3RAOlCmetXNB+3Lxxw6GmNsW84w9oZFq0yxhhTm2oikRCRfwRWAvdWN/2uiPyiqr5vHsM6TsARxjIl6iLKL1++iNvWd5DKl/jGM334RRiMB0nlShwu5nF8giNCNOKnJR5kRWuCOzYuZllzDMfxxo12NUfJlyq4ChPZ829waYmH+fTdWymXXfwndIeICHduXkLFVXzORTdu1RhjzDyqiUQCuBlYr6oKICJfALbP1cXa4n6G0tNvlmiOBSi7Lq66TObL9I7nWNYc48++9hypfJmyq1zWFifkLzCSKaGq1IV8dDRGWdtRx/Urm1neEmNqsc53Xb+M+5/pY2ljlI2dZx4kqapMt9DniUnEVHOZRJxLjMYYYy4eZx6Jt3DsBrqmPF8KPD9XF1u7uJFz+U4tlSs44lCqKJGgj2t6mvif92/n8b1j7B/O0NMSY93iejLFMmG/oMBotsxEtkgs5Gd5c/ykL9mVbQnef9safv3qLoJn+PJ/8MVBPv7gyzy0e2iGdzv3fn5wnE88uIdvPttHNRc0xhizQCzoREJEviUiDwDNwIsi8pCI/Bh4EWidq+uubY8hZ/m+m/q1nyq4BHxCazzI7//CSm66rJXnD09QF/GhKDdf1kJjNMCa9jrWdNTTEAnQHA/iqrfa566BFC8NTjKUynvny5d48UiKfKly7Br9Ezn2DKVP+iJ+oS+FKuzoT83W7c+6nf1JXFX2DWfIFCtnP8AYY8xFY6F3bfzdfFz0O9sHOdvX3Yl5xnimSDEc4NVrWvnqtj4ao0EGJ/MEfQ5ffrKXrd2NdDZGuawtTjzsZyiVZ2VrnETYT7pQ4TvPH8HnCHdf28XXnu5jMl9mSUOEN1+1lCPJHPdt60UVblrVwpVTFua6sruR5/sm2HwRr8i5eWkjj+wZYXlLlNhpCmcZY4y5OC3oREJVH56P6x5dMvt0/OKtGCZ4tSPKrhIK+IgFfTx1YJwd/UmuX9lMLOTjey8MMpouUFHld27qAeAdJ5zvey8MAFBxlXzJPdYSkS164zRyxQpHGyKyJ/xGf+NlLdx4Wct53e9c29BZb1NKjTFmgVrQiYSITHKGqpWqOidzFV339AuLSjWgupBDxYVoyI+oEg35uaK7iZeGMtSHA7iu8qqVbezsnyRTLLN60SvTNXPFCiG/Q7HiEvA53LSqBZ8Di+rCdDREeN3GxewenDxWibKnNc4tq1vJFitsXdZ46sAuYtlimUjAZ4MtjTFmAVrQiYSqJgBE5C+BAbzFuQR4OzBni0b4fQ6nW6X8aFZTrHhLhPsch7KrdDRE+YXVrWw7OMHBsSyr2hP4HMgUygym8jy6d5Se1jiZQpkn9o9Rdl18IjTHQyxrjvJCX4qxTJEVLXF+tGuIVL5EZ2Pk2NLgJ1aiXCh+sHOQF/qS9LTGuHPzkrMfYIwx5qKyoBOJKW5T1WumPP8nEXkC+Nu5uNjSpggTfZOn3R/0ecnGsZoPTVHWtCdYt7ieZK6M3xFiIR/Dk0VWtMUplF18AvtHMkwWvO6KlwYmqY8GGMsUSee9rpT+iTy9E9ljXSv7RzKsW3zhugR2DaRwXVjbkZi11oN9w2nAuxfX1WPvmTHGmIWhVhKJioi8HfgqXqPAW+Gs4yFnTuVYF8Ypd+NQqlTwq0MkGmBtR4LNXY0sbYpygwgVVRbXR9i6rJGxbAFHhK6mKFd2N5IvVXhs7yhrF9fx/OEksaCfm1e30j+RZ1lLjMvaEuxdlGEsU2DLBWyF2DWQ4rvbvbEarirrl8xOAnPdimaePjjO2o46SyKMMWYBqpVE4m3Ax6t/FHi0um1OBHyn/8JzgEjAh6tKe12YUMBhJF2kMRYgmSvxxP5RWuMhcqUyH3pgB3VhP/FQgHypwkS2xIbOei5blODpQ+NIdRJpd3OMX7q8/dg17tjYcdJ1R9IFHnxxkLpwgF+6fFG1+2X2TB0WUnFnr9bDxs6GsxbUMsYYc/GqiURCVQ8Ad16o6wX9PnwCZeW4lgkf0N4QwnVhRVuMckVJF8qMZYp85WeH+MXL23hpcJJKRfn5oXEcEcazRVrjIRpjQYBjsxc2dzbgE8FVpSUeOm0s2WKZUkV5+uA4/RN5+smzqj3Bitb4aY+ZibUdCVxVrzXiAnanGGMujGUf+M58h2AWqJpIJERkOfAHwDKm3JOqvmEurpculqlUs4ejSYQjEA74qIsEKVdc9g5lCAd9lCtKxQVF+fxjBxhJF5nMlymVKwT9PrqbozTFg9RHgnQ3x45dw3GENR0JvvSzQzy0e5hre5q57oTFtEbTBb76VC+lisvajjpEvNaQ1sTpE4+ZEpFZ684wxhhTO2oikQC+Afw/4FucbjrFLHIQ/A6UXPAJqELQJ9RH/HTUh4mH/Dx/eAIBmuNBbl3dwpFkkRf6k4T9DmNll7DfoSke4sOvv5wVbd7gxUjA4Yl9o8RCftYvqWcyXyZVHVjZN5E7KY7hdIFi2bvddKHMuo56NnbWUxcOzPVbYIwxxgC1k0jkVfUTF+pixYpLZUq6IgKOCPmSy8hkkWLZZUVrnCPJPM2xIOs7G2mvL1IfDfD0wXEACmWXu7YsYePSRgLV8QyP7hnhyf1jAMRDfpa1xLh6eRP9EzluWNl8UhwrW+Os7agjUyhxYDiN4zgMTOa5+9ruuX8TjDHGGGonkfi4iHwI+D5QOLpRVZ8+24Ei8t+Au1T1xulerOK6x5o9KlodYBn0kQj5GUkXSOaKCBAJ+lnWHKM1HqY1HmY0U8DnCF3NMbZ0NfL7t1523HkDUwZIZoplvvj4AfIlF1Xl0T2jvH5TB8lsie++MEA06OP1mxbzmvXtVFzlMz/dR7ZYIXSGQZa5YoUHnuujUHa5Y0MHzWcYe2GMMcZMR60kEhuAu4FbeaVrQ6vPT0tEQsCmc71YOODDETg6ecHvg7DfoTEaIFOsMJou4vc5BPzKovowGzrr+ewj++kdy1GsKJ1NIVa2nTwY8qpljSTCfmJBPwfHMoyki+wdThMP+ckUK/SOZTk0lmUsU2QsAwdHs6xuT+BzhDdvXcrh8Rwr2mInnfeovcNp+ie8hb929Ke4adWcrWtmjDHmElEricQbgR5VLZ7jcb8NfAH4y3M56LoVzewZSpMreTmL4whD6SIDKa8xxO9ziASgMRbkly5fBMDq9gRDqTxLGiJ0NUXZcIqBiyLC2g6vqrfPJzx/OMnihgh+R6iLBOiojxDwOezoSxEJ+ljcED52bGMseGzmx+ksbYwSC/koVZSe1tMnHMYYY8x01Uoi8RzQAAxN9wARCQA3q+qnqiW2T9z/HuA9AF1dXcftu3p5E1/b1ks0IESCfhRhJF3ABRwHYkEf1/Y08ZatXSxviZMvVShWXF61qpVrlzeBCL6zFF9a0hDhvTevQPAKQPkcQUSIhfz83i0rcETOuYBTfTTAb9/Yg8JZr2+MMcZMR60kEouAXSLyFMePkTjT9M+7ga+cbqeq3gPcA7B169bjKjD91bdeZCJXRoFMsUJLwpup0RANEA/6CAX8+ByHQnVE5lMHxnj20IQXaF34lN0ap3L0y97h+C/98yk2ZdUjjTHGzKZaSSQ+NINjVgObReS9wDoR+QNV/YfpHBgOOLjqDcLIl2Fkskgs5OPK7iauWd7M4fEc+0bSfOrHe/juCwO8cfNiwJvdEQ+d+1u+sz/Fgy8O0tEQ4Y1XLLHWBGOMMReNmkgkVPXhGRzzZ0cfi8gj000iAN53y0o+8t2djGWK3qwNgaDPYShV4NBYlrdd3cVH/2sXIb/DnqE02ZI31TOZK5EvndsSIH0TOR5+aYiyqxwYybDtwBhrOuqoj1itCGOMMfNvdhdkuMBE5JHq35MikpryZ1JEUtM9z7lM/QTYuryJX16/mJa6MEG/EA35WdIYpbMxwsbOehbVh7lz8xKyxQoV12VHf5JS2eVHu4a4/5k+th9OTus6B0cz3PdUL4fHc4xni0zkijyyZ4SvPnmIUmXO624ZY4wxZ7WgWySOJgCqmriQ1z0ykWP/SJoru5pwXZdth8bpHc+ytDHMli5vAapXrWrl+cMTPLpnhGcPTbC2I4FWR1qkq0uFn02m4LVetMRD3LCyhT1DaQZTefIll4qrBHxzcnvGGGPMtC3oRGK+fOyHL7N3eBJVxRGHkckCZRceeXmErz7Vy7tv7MHvCOO5IsOTBUIBhyMTOW5Y2UKhXOHK7ukt/72mPcFkvkSpolzR1cCK1hjP9k7Q3RwjbFmEMcaYi4AlEjMQ8gtjmSKuC3Vh37GBlypeYap/emgvw5N5Xh6apD+Zx1Xl288P8JaruoifwzoYjiNc0/NKaezmeIhXr100B3dkjDHGzMyCHiMxX65e3kRTLEh9xE9DLMSiuhArW2Pctq6dTLFC33iWXQOTxIJ+gn6HRNhPqeLSO5497TmHJws82ztBrnhugzGNMcaY+WQtEjMQDQZojAZJ5cpc1d1Iulg5Vn1y98AkuwcmaY6H6GqK4nOEgVSBa3uaWNl26qEchXKF+7b1Uiy77BtOc9eWzgt8R8YYY8zMWCIxAy8PTnJgNIvf8UpTr2oPEw44HBzNEg74aEmEiAZ9tNeHSeVL7B/JkC2WKZXd4xbmOkoV3OrCHeWKnrTfGGOMuVhZIjEDD780TLFcIa/ww12DNISDXFXt7njVZa08tneE8UyRbQfGeaE/SaZQ5qcvj/DQS8PcvqHjpPOFAz7euGUJh8ayrD/FGhzGGGPMxcoSiRlY3R5nIJVDFCpll3SxTKXicvniOm68rIWGaIAf7BwkFHDonciSLVZoiARY0376WaqdjVE6G6MX8C6MMcaY82eJxAzc886reGjXIA/tHiJbcvE5XgXKwVQe11XWL6mnqzlKyO8wmS8xmMyzuCF61tU5jTHGmIXGEokZKrsQ9Pvpm8hwYCRNpljm+b4UPS0xtnQ3UVed5hmK+2iJh89yNmOMMWZhsumfM6CqPNs7wc8PjZHMlfD7HFQhUyjz0K4hssXpVa40xhhjFjprkZiBkXSRQrlCsexSH/Fx9fImBlM5/D4HcRxeHkyzaWnDfIdpjDHGzDlLJGagUKqwbzhDqaJEg37GskU66iNM5MpEgj6WNtmgSWOMudQt+8B3Zv2cBz5yx6yf83xZ18YMjGcLVFyXeMjHoroQqVyJfNlleUuMnpaYrcxpjDHmkmEtEjMwkCzQn8yTzJZI5koEfT78joPfESqucng8x2/duHy+wzTGGGPmnCUSM9CaCBHyO5Rdl7FMkWjQxxVdDbTXRyiWXeIh7211XeXb24/QO5bl+hXNvDyYZjRT5DXr21neEpvnuzDGGGPOnyUSM7C5q5GbVrWyZyhN/3iOnrY4q9sTvHFLJ/0TObqqYyRS+RJ7h9IAPLZ3lGLZ6/LY0Z+0RMIYY0xNsDESM/SadR2s7ahjXWc9K1tjXNPTTDzkZ9WiBOGAD4C6cIDlLTH8jnBdTzPt9WGCfoe1HXXzHL0xxhgzO6xFYoY2dNazofPU62I8sW+U0UyR61c08ytXLDm2fUt344UKzxhjjLkgLJE4D6pKxVX8U1b0PDSW4bG9owC4qrxu4+L5Cs8YU6PmYlqhMTNlicQM5UsV7tvWy3imxG3rF7GmvY4n94/xo12DHBzNcllbnOZYaL7DNMYYY+aUjZGYgVLF5dE9I2w/PMHQZJ7dA5MA7B5IEfL76G6O8rqNHVy3oplSxeWFviQDyfw8R22MMcbMPkskZuBHu4Z4aPcQz/ZOsH84TcjvvY1blzWRCPvZ2t3EqnZvQOVDu4f5wc5B7tvWSzJXms+wjTHGmFlnXRszcHQaZ1MsxIbOBppiIX5+cJy9Q2levXbRcVM7j1a5dKvjKYwxxphaYonEDNy6po2mWIB1i7MkQn5WL4rz2UcPAJDdPcTylleqWt6yupWGSIC2uhBNseA8RWyMMcbMjUs2kRCRa4CPARVgm6r+yXSPjYX8uAqff+wg2WKZl4cytMSDjKSLLG6IHPfaaNDP9StbZjd4Y4wx5iJxySYSwEHgVlXNi8iXRWSDqm6f7sHb+5KkciWKFZcd/Uk+9pbNFMqutToYY4y5pFyygy1VdUBVj06lKOO1TEzbq1a2EA36CPsd1rTXEQ8HaI6HEJHZD9YYY4y5SF2yicRRIrIRaFHVnSdsf4+IbBORbcPDwycdt6Qxyh0bO9jQ2cDekTSfePAlHt0zcqHCNsYYYy4Kl3LXBiLSBHwSePOJ+1T1HuAegK1bt5403aIpFuTa5c0MJAv0TWTZNTBJxYUruxuPrbVhjLm0WQVKcym4ZBMJEfEDXwLer6oDMznHpq4GdhxJUay4RIM+Ohsjx2pKGGOMMZeCSzaRAN4EXAX8TXVcw5+r6uPncoJo0M9v3rCcUsXFVSUS8NkYCWOMMZeUSzaRUNV7gXvP9zw+R/A51pVhjDHm0nTJJhIXk1S+xDef6aNUUe7cvJjmuC32ZYwxZmGwDv2LwL7hDCPpIslcid2Dk/MdjjHGGDNt1iJxEVjWHCUR9lN2lZWt8fkOx5hLks2wMGZmLJG4CDREg/z2q3pQVRusacw02Je+uVTNxWf/wEfuOK/jrWvjImJJhDHGmIVGVG1p67MRkWG8tTlO1ALUcjnLWr6/Wr43qK372wI8fR7H19J7cVQt3hPYfV1sulW19WwvskTiPIjINlXdOt9xzJVavr9avjeo/fs7F7X4XtTiPYHd10JlXRvGGGOMmTFLJIwxxhgzY5ZInJ975juAOVbL91fL9wa1f3/nohbfi1q8J7D7WpBsjIQxxhhjZsxaJIwxxhgzY5ZIGGOMMWbGrLLlORCRK4FrgUZgAviZqm6b36jMTIjI+1T1U/Mdx2wQkQ5VPSJeRbM7gbXAfuA/VLU8v9GZuVALn99a/dyKSAB4DTCqqo+JyDuAeuDLqjoxv9HNDRsjMU0i8jEgBPwQSAJ1wC8CFVX9w/mMbbaIyDq8+9k1Zds1qvrEPIZ13kTkp8DRD/rR8qHrgBdU9ab5iWr2iMiPVPVWEfk4kAN+BGwGtqrqm+c3uguv1j7Htfr5rdXPrYjcDzwFNABXAv+JV4zqbap623zGNlesRWL6rjzFf9r7ReQn8xLNLBORvwcWAWURaQZ+S1WHgb8Gbp3X4M7f/cBG4POq+hCAiHxXVV87r1HNHrf69zpV/cXq4++LyI/nK6D5UqOf41r9/Nbq57ZBVf8PgIi8oKp/X338rnmNag5ZIjF920Tk03gtEim8FolXc36ley8mW1X1ZgAR2Qj8u4i8f55jmhWq+n9FJAj8toi8F/jKfMc0y74gIp8BekXkS8DDeF88l2K3W819jmv481urn9uMiHwQrwX7iIj8KTAGFOY3rLljXRvnQESuAK7Da7KaAB5X1WfmN6rZISKPAr+gqsXq80bgS3g/mBfNa3CzSET8wN3AalX9wHzHM1tEZDFwG95v40ngMVV9bn6juvBq/XNca5/fWvzcikgEb4zEXuBl4DfwuqS+oqrJ+YxtrlgiYQAQkauBA6o6NGWbD3iTqn51/iIzZvrsc2zMhWeJhDHGGGNmzOpIGGOMMWbGLJEwxswJEfmsiAyJyAvTeG2XiPxYRJ4RkedF5PYLEaMx5vxZImGMmSufxxt0Nh0fBO5T1SuAXwf+ca6CMsbMLkskTM2pjmw380xVf4I37e0YEVkhIt8TkZ+LyE9FZM3Rl+NNqQavCmD/BQz1kiIij53n8enq37eIyLdnJyqzkNkPXLPgiMj/Bt4O9OJVjPs58DrgMeAG4AER+Q/gs0ArMAz8pqoeEpE3AR8CKkBSVW+qVkL8HBDES65/VVVfvsC3dam4B3ivqr4sItfgtTzcCnwYrxjRHwAxvKqxZg6o6vXzHcP5EhGfqlbmOw7jsRYJs6CIyFbgV4ErgLuArVN2N6jqzdVKcp8E/lVVNwJfBj5Rfc1fALep6ibgDdVt7wU+rqqbq+c7PPd3cukRkThwPV6RqGeBfwY6qrvfile5sRO4HfiiiNjPpzlwQovCwyJyn4i8JCIfEZG3i8iTIrJdRFZUX7dcRB4XkadE5K9OOF2diNwvIjtF5NOn+zcTEZ+IfF5EXqie+0+q21eKyA9F5DkRebraYiUi8tEpr33LlHh/LCJfAbZXt72jGu+zIvLP1am+5gKzFgmz0NwIfFNVcwAi8q0p+/5tyuPr8BINgC8Cf1t9/CjweRG5D/h6ddvjwP8SkU7g69YaMWccYKKasJ3o3VTHU6jq4yISBlqAoVO81syeTXiLZY0B+4DPqOrVIvJHwB8Afwx8HPgnVf1XEXnfCcdfDVwOHAS+h/d/7j9OcZ3NwBJVXQ8gIg3V7V8GPqKq91f/zZ3qOTZXY2sBnpqyFMHVwHpV3S8ia4G3ADeoaklE/hGvpfJfz+8tMefKMn6z0MgZ9mXOsE8BVPW9eAP7lgLPikizqn4Fr3UiB/yXiCzUNRkuaqqaAvZXu5eo/ua5qbr7EF7JeapfEGG8Likzt55S1SOqWsCrxPj96vbtwLLq4xuAe6uPv3jC8U+q6r5qN8O9eIn+qewDekTkH0TkNUBKRBJ4ycX9AKqaV9Vs9Rz3qmpFVQfxSmdfNeV6+6uPX423KNZT1RauVwM9M3gPzHmyRMIsNI8ArxeRcLWp/I7TvO4xvNH/4P2W8gh4g/1U9QlV/Qu88RVLRaQH2KeqnwAewKv3b86TiNyL19qzWkQOi8i78f4t3i0izwE78JaPBvhT4Heq2+8F3qVWLe9CmLr+gzvlucvxLdan+7c4cfspX6eq43gtDA8B7wM+w+l/KZjuLwsCfEFVN1f/rFbVD5/hWDNHrGvDLCiq+pSIPAA8h9ecug2vRv+J/hD4rHgLNg0Dv1nd/lERuQzvh9CD1fN8AHiHiJSAAeAv5/YuLg2q+tbT7DppSqiq7sT7zddcfB7FS8q/hJcITnW1iCzH+7/4FrzBtCcRkRagqKpfE5G9eONhUtUE81dU9RsiEgJ8wE+A3xWRLwBNwE3A+4E1J5z2QeCbIvIxVR0SkSYgoaoHZ+WuzbRZImEWor9T1Q+LSBTvh87fq+q/TH2Bqh7gFMtGq+pdJ27DW2L6r+ciUGNqwB8BX6mOm/jaCfseBz4CbMD7v3j/ac6xBPjclMGYf179+27gn0XkL4ES8KbqOa7DS/IV+B+qOjBlqjDgJZ/irbL5/ep5S3itHZZIXGC21oZZcKqjti/H60f/gqpaEmCMMfPEEgljjDHGzJh1bRhjjJk1IvIEEDph892qun0+4jFzz1okjDHGGDNjNv3TGGOMMTNmiYQxxhhjZswSCWOMMcbMmCUSxhhjjJmx/w9voC7p8w83ZAAAAABJRU5ErkJggg==\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "plot = pd.plotting.scatter_matrix(df, alpha=0.5, figsize=(8,5))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Honorable mentions are len(df)." - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [], - "source": [ - "# len(df), shape, value_counts, head, tail, max(), min(), mean, dtype, info(), \n", - "# describe(), memory_usage(), scatter matrix, corr, isnull, notnull, unique(), nlargest" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Selecting and computing" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "# - **Selecting and computing**: select subset of row and cols, .loc, .iloc, \n", - "# drop columns, assign, apply/map/applymap, multiindex" - ] - }, - { - "cell_type": "code", - "execution_count": 16, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
movie_titledirector_namecountrycontent_ratingimdb_score
0AvatarJames CameronUSAPG-137.9
1Pirates of the Caribbean: At World's EndGore VerbinskiUSAPG-137.1
2SpectreSam MendesUKPG-136.8
\n", - "
" - ], - "text/plain": [ - " movie_title director_name country \\\n", - "0 Avatar  James Cameron USA \n", - "1 Pirates of the Caribbean: At World's End  Gore Verbinski USA \n", - "2 Spectre  Sam Mendes UK \n", - "\n", - " content_rating imdb_score \n", - "0 PG-13 7.9 \n", - "1 PG-13 7.1 \n", - "2 PG-13 6.8 " - ] - }, - "execution_count": 16, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "columns = ['movie_title', 'director_name', 'country', 'content_rating', 'imdb_score']\n", - "df[columns].head(3)" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
movie_titledirector_namecountrycontent_ratingimdb_score
100The Fast and the FuriousRob CohenUSAPG-136.7
101The Curious Case of Benjamin ButtonDavid FincherUSAPG-137.8
102X-Men: First ClassMatthew VaughnUSAPG-137.8
\n", - "
" - ], - "text/plain": [ - " movie_title director_name country \\\n", - "100 The Fast and the Furious  Rob Cohen USA \n", - "101 The Curious Case of Benjamin Button  David Fincher USA \n", - "102 X-Men: First Class  Matthew Vaughn USA \n", - "\n", - " content_rating imdb_score \n", - "100 PG-13 6.7 \n", - "101 PG-13 7.8 \n", - "102 PG-13 7.8 " - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.loc[100:102, columns]" - ] - }, - { - "cell_type": "code", - "execution_count": 18, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/home/tommy/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: \n", - "Passing list-likes to .loc or [] with any missing label will raise\n", - "KeyError in the future, you can use .reindex() as an alternative.\n", - "\n", - "See the documentation here:\n", - "http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike\n", - " \"\"\"Entry point for launching an IPython kernel.\n", - "/home/tommy/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py:1367: FutureWarning: \n", - "Passing list-likes to .loc or [] with any missing label will raise\n", - "KeyError in the future, you can use .reindex() as an alternative.\n", - "\n", - "See the documentation here:\n", - "http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike\n", - " return self._getitem_tuple(key)\n" - ] - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
movie_titledirector_namecountrycontent_ratingimdb_scoredirector_facebook_likesgross
0AvatarJames CameronUSAPG-137.9NaN760505847.0
1Pirates of the Caribbean: At World's EndGore VerbinskiUSAPG-137.1NaN309404152.0
\n", - "
" - ], - "text/plain": [ - " movie_title director_name country \\\n", - "0 Avatar  James Cameron USA \n", - "1 Pirates of the Caribbean: At World's End  Gore Verbinski USA \n", - "\n", - " content_rating imdb_score director_facebook_likes gross \n", - "0 PG-13 7.9 NaN 760505847.0 \n", - "1 PG-13 7.1 NaN 309404152.0 " - ] - }, - "execution_count": 18, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_cols = df.loc[:, columns + ['director_facebook_likes', 'gross']]\n", - "df_cols.head(2)" - ] - }, - { - "cell_type": "code", - "execution_count": 19, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
movie_titledirector_namecountrycontent_ratingimdb_scoregross
0AvatarJames CameronUSAPG-137.9760505847.0
1Pirates of the Caribbean: At World's EndGore VerbinskiUSAPG-137.1309404152.0
\n", - "
" - ], - "text/plain": [ - " movie_title director_name country \\\n", - "0 Avatar  James Cameron USA \n", - "1 Pirates of the Caribbean: At World's End  Gore Verbinski USA \n", - "\n", - " content_rating imdb_score gross \n", - "0 PG-13 7.9 760505847.0 \n", - "1 PG-13 7.1 309404152.0 " - ] - }, - "execution_count": 19, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_cols = df_cols.drop(columns=['director_facebook_likes'])\n", - "df_cols.head(2)" - ] - }, - { - "cell_type": "code", - "execution_count": 20, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
movie_titledirector_namecountrycontent_ratingimdb_scoregross
2765Towering InfernoJohn BlanchardCanadaNaN9.5NaN
1937The Shawshank RedemptionFrank DarabontUSAR9.328341469.0
3466The GodfatherFrancis Ford CoppolaUSAR9.2134821952.0
\n", - "
" - ], - "text/plain": [ - " movie_title director_name country \\\n", - "2765 Towering Inferno  John Blanchard Canada \n", - "1937 The Shawshank Redemption  Frank Darabont USA \n", - "3466 The Godfather  Francis Ford Coppola USA \n", - "\n", - " content_rating imdb_score gross \n", - "2765 NaN 9.5 NaN \n", - "1937 R 9.3 28341469.0 \n", - "3466 R 9.2 134821952.0 " - ] - }, - "execution_count": 20, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_cols.nlargest(3, columns=['imdb_score'], keep='first')" - ] - }, - { - "cell_type": "code", - "execution_count": 21, - "metadata": {}, - "outputs": [], - "source": [ - "df_cols = df_cols.assign(gross_log = lambda df: np.log(df.gross))" - ] - }, - { - "cell_type": "code", - "execution_count": 22, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
imdb_scoregrossgross_log
count5,0434,1594,159
mean648,468,40816
std168,452,9902
min21625
50%725,517,50017
max10760,505,84720
\n", - "
" - ], - "text/plain": [ - " imdb_score gross gross_log\n", - "count 5,043 4,159 4,159\n", - "mean 6 48,468,408 16\n", - "std 1 68,452,990 2\n", - "min 2 162 5\n", - "50% 7 25,517,500 17\n", - "max 10 760,505,847 20" - ] - }, - "execution_count": 22, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_cols.describe(percentiles=[0.5]).applymap(lambda x: '{:,}'.format(round(x)))" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
imdb_scoregrossgross_log
imdb_score1.0000000.1980210.074280
gross0.1980211.0000000.616034
gross_log0.0742800.6160341.000000
\n", - "
" - ], - "text/plain": [ - " imdb_score gross gross_log\n", - "imdb_score 1.000000 0.198021 0.074280\n", - "gross 0.198021 1.000000 0.616034\n", - "gross_log 0.074280 0.616034 1.000000" - ] - }, - "execution_count": 23, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_cols.corr()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Filtering and sorting" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
movie_titledirector_namecountrycontent_ratingimdb_scoregrossgross_log
2SpectreSam MendesUKPG-136.8200074175.019.114199
4Star Wars: Episode VII - The Force Awakens  ...Doug WalkerNaNNaN7.1NaNNaN
9Harry Potter and the Half-Blood PrinceDavid YatesUKPG7.5301956980.019.525795
\n", - "
" - ], - "text/plain": [ - " movie_title director_name country \\\n", - "2 Spectre  Sam Mendes UK \n", - "4 Star Wars: Episode VII - The Force Awakens  ... Doug Walker NaN \n", - "9 Harry Potter and the Half-Blood Prince  David Yates UK \n", - "\n", - " content_rating imdb_score gross gross_log \n", - "2 PG-13 6.8 200074175.0 19.114199 \n", - "4 NaN 7.1 NaN NaN \n", - "9 PG 7.5 301956980.0 19.525795 " - ] - }, - "execution_count": 24, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_cols[df_cols.country != 'USA'].head(3)" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
movie_titledirector_namecountrycontent_ratingimdb_scoregrossgross_log
4498The Good, the Bad and the UglySergio LeoneItalyApproved8.96100000.015.623799
270The Lord of the Rings: The Fellowship of the R...Peter JacksonNew ZealandPG-138.8313837577.019.564386
4029City of GodFernando MeirellesBrazilR8.77563397.015.838831
\n", - "
" - ], - "text/plain": [ - " movie_title director_name \\\n", - "4498 The Good, the Bad and the Ugly  Sergio Leone \n", - "270 The Lord of the Rings: The Fellowship of the R... Peter Jackson \n", - "4029 City of God  Fernando Meirelles \n", - "\n", - " country content_rating imdb_score gross gross_log \n", - "4498 Italy Approved 8.9 6100000.0 15.623799 \n", - "270 New Zealand PG-13 8.8 313837577.0 19.564386 \n", - "4029 Brazil R 8.7 7563397.0 15.838831 " - ] - }, - "execution_count": 25, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "mask = ((df_cols.imdb_score > 8) & (df_cols.country != 'USA') & (df_cols.gross > 10**6))\n", - "df_cols[mask].nlargest(3, columns=['imdb_score'])" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [], - "source": [ - "# >=, AND, OR, ==, ~, str.contains, \n", - "# str.startswith, sort_values, sort_index, filtering on sorted/unsorted, isin()" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
movie_titledirector_namecountrycontent_ratingimdb_scoregrossgross_log
2765Towering InfernoJohn BlanchardCanadaNaN9.5NaNNaN
339The Lord of the Rings: The Return of the KingPeter JacksonUSAPG-138.9377019252.019.747807
270The Lord of the Rings: The Fellowship of the R...Peter JacksonNew ZealandPG-138.8313837577.019.564386
340The Lord of the Rings: The Two TowersPeter JacksonUSAPG-138.7340478898.019.645864
1196The Conjuring 2James WanUSAR7.8102310175.018.443520
\n", - "
" - ], - "text/plain": [ - " movie_title director_name \\\n", - "2765 Towering Inferno  John Blanchard \n", - "339 The Lord of the Rings: The Return of the King  Peter Jackson \n", - "270 The Lord of the Rings: The Fellowship of the R... Peter Jackson \n", - "340 The Lord of the Rings: The Two Towers  Peter Jackson \n", - "1196 The Conjuring 2  James Wan \n", - "\n", - " country content_rating imdb_score gross gross_log \n", - "2765 Canada NaN 9.5 NaN NaN \n", - "339 USA PG-13 8.9 377019252.0 19.747807 \n", - "270 New Zealand PG-13 8.8 313837577.0 19.564386 \n", - "340 USA PG-13 8.7 340478898.0 19.645864 \n", - "1196 USA R 7.8 102310175.0 18.443520 " - ] - }, - "execution_count": 27, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_cols[df_cols.movie_title.str.lower().str.contains(\"ring\")].nlargest(5, columns=['imdb_score'])" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "USA 3807\n", - "UK 448\n", - "France 154\n", - "Canada 126\n", - "Germany 97\n", - "Name: country, dtype: int64" - ] - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.country.value_counts().head(5)" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "Afghanistan 1\n", - "Argentina 4\n", - "Aruba 1\n", - "Australia 55\n", - "Bahamas 1\n", - "Name: country, dtype: int64" - ] - }, - "execution_count": 29, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.country.value_counts().sort_index().head(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Split-apply-combine and pivots" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "- **Split-apply-combine and pivots**: groupby, dt.month, dt.year, groupby.mean(), agg, stack, unstack, pivot, melt, merge" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Directors with the most movies" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
movie_title
director_name
Steven Spielberg26
Woody Allen22
Clint Eastwood20
Martin Scorsese20
Ridley Scott16
\n", - "
" - ], - "text/plain": [ - " movie_title\n", - "director_name \n", - "Steven Spielberg 26\n", - "Woody Allen 22\n", - "Clint Eastwood 20\n", - "Martin Scorsese 20\n", - "Ridley Scott 16" - ] - }, - "execution_count": 30, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "(df_cols.groupby(df.director_name).nunique().movie_title.nlargest(5).to_frame())" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
imdb_score
director_name
John Blanchard9.5
Cary Bell8.7
Mitchell Altieri8.7
Sadyk Sher-Niyaz8.7
Charles Chaplin8.6
\n", - "
" - ], - "text/plain": [ - " imdb_score\n", - "director_name \n", - "John Blanchard 9.5\n", - "Cary Bell 8.7\n", - "Mitchell Altieri 8.7\n", - "Sadyk Sher-Niyaz 8.7\n", - "Charles Chaplin 8.6" - ] - }, - "execution_count": 31, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "(df_cols.groupby(df.director_name).mean().imdb_score.nlargest(5).to_frame())" - ] - }, - { - "cell_type": "code", - "execution_count": 32, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
movie_titledirector_namecountrycontent_ratingimdb_scoregrossgross_log
0AvatarJames CameronUSAPG-137.9760505847.020.449494
1Pirates of the Caribbean: At World's EndGore VerbinskiUSAPG-137.1309404152.019.550159
2SpectreSam MendesUKPG-136.8200074175.019.114199
3The Dark Knight RisesChristopher NolanUSAPG-138.5448130642.019.920595
4Star Wars: Episode VII - The Force Awakens  ...Doug WalkerNaNNaN7.1NaNNaN
\n", - "
" - ], - "text/plain": [ - " movie_title director_name \\\n", - "0 Avatar  James Cameron \n", - "1 Pirates of the Caribbean: At World's End  Gore Verbinski \n", - "2 Spectre  Sam Mendes \n", - "3 The Dark Knight Rises  Christopher Nolan \n", - "4 Star Wars: Episode VII - The Force Awakens  ... Doug Walker \n", - "\n", - " country content_rating imdb_score gross gross_log \n", - "0 USA PG-13 7.9 760505847.0 20.449494 \n", - "1 USA PG-13 7.1 309404152.0 19.550159 \n", - "2 UK PG-13 6.8 200074175.0 19.114199 \n", - "3 USA PG-13 8.5 448130642.0 19.920595 \n", - "4 NaN NaN 7.1 NaN NaN " - ] - }, - "execution_count": 32, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df_cols.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 33, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
imdb_scoregross_logmovie_title
director_name
A. Raven Cruz1.90.0000001
Aaron Hann6.00.0000001
Aaron Schneider7.116.0321621
\n", - "
" - ], - "text/plain": [ - " imdb_score gross_log movie_title\n", - "director_name \n", - "A. Raven Cruz 1.9 0.000000 1\n", - "Aaron Hann 6.0 0.000000 1\n", - "Aaron Schneider 7.1 16.032162 1" - ] - }, - "execution_count": 33, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "director_stats = (df_cols.groupby(df.director_name).agg({'imdb_score':np.mean, 'gross_log':np.sum, 'movie_title':'nunique'}))\n", - "\n", - "director_stats.head(3)" - ] - }, - { - "cell_type": "code", - "execution_count": 34, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
imdb_scoregross_logmovie_title
director_name
A. Raven Cruz-3.885663-0.737455-0.488133
Aaron Hann-0.213965-0.737455-0.488133
Aaron Schneider0.771125-0.321155-0.488133
\n", - "
" - ], - "text/plain": [ - " imdb_score gross_log movie_title\n", - "director_name \n", - "A. Raven Cruz -3.885663 -0.737455 -0.488133\n", - "Aaron Hann -0.213965 -0.737455 -0.488133\n", - "Aaron Schneider 0.771125 -0.321155 -0.488133" - ] - }, - "execution_count": 34, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "director_stats_norm = ((director_stats - director_stats.mean()) / director_stats.std())\n", - "director_stats_norm.head(3)" - ] - }, - { - "cell_type": "code", - "execution_count": 35, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
imdb_scoregross_logmovie_titlescore
director_name
Steven Spielberg1.11211711.41921611.60925424.140587
Woody Allen0.6897127.2995739.67367217.662958
Clint Eastwood0.8830678.0419618.70588117.630909
\n", - "
" - ], - "text/plain": [ - " imdb_score gross_log movie_title score\n", - "director_name \n", - "Steven Spielberg 1.112117 11.419216 11.609254 24.140587\n", - "Woody Allen 0.689712 7.299573 9.673672 17.662958\n", - "Clint Eastwood 0.883067 8.041961 8.705881 17.630909" - ] - }, - "execution_count": 35, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "director_stats_norm.assign(score = lambda df: df.sum(axis = 1)).nlargest(3, 'score')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, - { - "cell_type": "code", - "execution_count": 36, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "/home/tommy/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py:858: FutureWarning: \n", - "Passing list-likes to .loc or [] with any missing label will raise\n", - "KeyError in the future, you can use .reindex() as an alternative.\n", - "\n", - "See the documentation here:\n", - "http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike\n", - " return self._getitem_lowerdim(tup)\n" - ] - }, - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
content_rating
title_year
\n", - "
" - ], - "text/plain": [ - "Empty DataFrame\n", - "Columns: []\n", - "Index: []" - ] - }, - "execution_count": 36, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "(df.loc[:, ('title_year', 'content_rating', 'movie_title')]\n", - " .groupby(['title_year', 'content_rating']).nunique().movie_title).unstack(1).fillna(0).tail(5)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Pivot table" - ] - }, - { - "cell_type": "code", - "execution_count": 37, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
0James Cameron760505847.0AvatarUSAPG-137.9
1Gore Verbinski309404152.0Pirates of the Caribbean: At World's EndUSAPG-137.1
2Sam Mendes200074175.0SpectreUKPG-136.8
\n", - "
" - ], - "text/plain": [ - " director_name gross movie_title \\\n", - "0 James Cameron 760505847.0 Avatar  \n", - "1 Gore Verbinski 309404152.0 Pirates of the Caribbean: At World's End  \n", - "2 Sam Mendes 200074175.0 Spectre  \n", - "\n", - " country content_rating imdb_score \n", - "0 USA PG-13 7.9 \n", - "1 USA PG-13 7.1 \n", - "2 UK PG-13 6.8 " - ] - }, - "execution_count": 37, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.head(3)" - ] - }, - { - "cell_type": "code", - "execution_count": 38, - "metadata": {}, - "outputs": [ - { - "ename": "KeyError", - "evalue": "'title_year'", - "output_type": "error", - "traceback": [ - "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", - "\u001b[0;31mKeyError\u001b[0m Traceback (most recent call last)", - "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'title_year'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mcolumns\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'content_rating'\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m aggfunc=pd.DataFrame.nunique)\n\u001b[0m\u001b[1;32m 7\u001b[0m \u001b[0;34m.\u001b[0m\u001b[0mfillna\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m .tail(5))\n", - "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36mpivot_table\u001b[0;34m(self, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name)\u001b[0m\n\u001b[1;32m 4466\u001b[0m \u001b[0maggfunc\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0maggfunc\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfill_value\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mfill_value\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4467\u001b[0m \u001b[0mmargins\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmargins\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdropna\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdropna\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 4468\u001b[0;31m margins_name=margins_name)\n\u001b[0m\u001b[1;32m 4469\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4470\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mstack\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdropna\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/pandas/core/reshape/pivot.py\u001b[0m in \u001b[0;36mpivot_table\u001b[0;34m(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name)\u001b[0m\n\u001b[1;32m 79\u001b[0m \u001b[0mvalues\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 80\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 81\u001b[0;31m \u001b[0mgrouped\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroupby\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkeys\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 82\u001b[0m \u001b[0magged\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mgrouped\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0magg\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0maggfunc\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 83\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py\u001b[0m in \u001b[0;36mgroupby\u001b[0;34m(self, by, axis, level, as_index, sort, group_keys, squeeze, **kwargs)\u001b[0m\n\u001b[1;32m 5160\u001b[0m return groupby(self, by=by, axis=axis, level=level, as_index=as_index,\n\u001b[1;32m 5161\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msort\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgroup_keys\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mgroup_keys\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msqueeze\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msqueeze\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 5162\u001b[0;31m **kwargs)\n\u001b[0m\u001b[1;32m 5163\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 5164\u001b[0m def asfreq(self, freq, method=None, how=None, normalize=False,\n", - "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py\u001b[0m in \u001b[0;36mgroupby\u001b[0;34m(obj, by, **kwds)\u001b[0m\n\u001b[1;32m 1846\u001b[0m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'invalid type: %s'\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0mtype\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1847\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1848\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0mklass\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mby\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwds\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 1849\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1850\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, **kwargs)\u001b[0m\n\u001b[1;32m 514\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlevel\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 515\u001b[0m \u001b[0msort\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0msort\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 516\u001b[0;31m mutated=self.mutated)\n\u001b[0m\u001b[1;32m 517\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 518\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mobj\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mobj\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;32m~/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py\u001b[0m in \u001b[0;36m_get_grouper\u001b[0;34m(obj, key, axis, level, sort, mutated, validate)\u001b[0m\n\u001b[1;32m 2932\u001b[0m \u001b[0min_axis\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mname\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgpr\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mFalse\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgpr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2933\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2934\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgpr\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2935\u001b[0m \u001b[0;32melif\u001b[0m \u001b[0misinstance\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mgpr\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mGrouper\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mgpr\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mkey\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2936\u001b[0m \u001b[0;31m# Add key to exclusions\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", - "\u001b[0;31mKeyError\u001b[0m: 'title_year'" - ] - } - ], - "source": [ - "(df\n", - " .pivot_table(\n", - " values='movie_title', \n", - " index='title_year', \n", - " columns='content_rating', \n", - " aggfunc=pd.DataFrame.nunique)\n", - ".fillna(0)\n", - ".tail(5))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "values : column to aggregate, optional\n", - "\n", - "index : column, Grouper, array, or list of the previous\n", - "\n", - " If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.\n", - "\n", - "columns : column, Grouper, array, or list of the previous\n", - "\n", - " If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.\n", - "\n", - "aggfunc : function or list of functions, default numpy.mean\n", - "\n", - " If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves)\n", - "\n", - "fill_value : scalar, default None\n", - "\n", - " Value to replace missing values with\n", - "\n", - "margins : boolean, default False\n", - "\n", - " Add all row / columns (e.g. for subtotal / grand totals)\n", - "\n", - "dropna : boolean, default True\n", - "\n", - " Do not include columns whose entries are all NaN\n", - "\n", - "margins_name : string, default ‘All’\n", - "\n", - " Name of the row / column that will contain the totals when margins is True.\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Plotting" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "to_plot = df.groupby(df.title_year).agg({df.duration.name:{'m':np.mean, 'sdf':np.std}})\n", - "\n", - "to_plot.columns = to_plot.columns.droplevel()\n", - "\n", - "#to_plot = to_plot.assign(low = lambda df: df.mean - df.std)\n", - "\n", - "to_plot.plot()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# https://www.kaggle.com/zynicide/wine-reviews" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.4" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/tutorial/Tutorial.ipynb b/tutorial/Tutorial.ipynb new file mode 100644 index 0000000..560d697 --- /dev/null +++ b/tutorial/Tutorial.ipynb @@ -0,0 +1,7627 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "raw_mimetype": "-" + }, + "source": [ + "# Pandas for Data Analysis in Python" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Table of contents\n", + "\n", + "\n", + "- **The setup**: anaconda, Python, pandas, Jupyter\n", + "- **Importing data**: from csv (and options), from the web, creating from scratch, convering types, rename cols\n", + "- **Summarizing data**: len(df), shape, value_counts, head, tail, max(), min(), mean, dtype, info(), describe(), memory_usage(), scatter matrix, corr, isnull, notnull, unique(), nlargest\n", + "- **Selecting and computing**: select subset of row and cols, .loc, .iloc, drop columns, assign, apply/map/applymap, multiindex\n", + "- **Filtering and sorting**: >=, AND, OR, ==, ~, str.contains, str.startswith, sort_values, sort_index, filtering on sorted/unsorted, isin()\n", + "- **Split-apply-combine and pivots**: groupby, dt.month, dt.year, groupby.mean(), agg, stack, unstack, pivot, melt, merge\n", + "- **Time series manipulations**: downsampling, upsampling, rolling, mean, simple plotting\n", + "- **Plotting**: built-in plotting, advanced plotting, matplotlib, seaborn, styles, saving\n", + "- **Modeling and machine learning**: .value, feeding data, saving data\n", + "- **Misc tips and tricks**: pandas options, vectorization, timings with %%timeit, profiling with lprun\n", + "\n", + "**principles:** small examples, no more than 5 rows. one or two data sets, no more.\n", + "\n", + "\n", + "- (1) Setup\n", + " - Installation, packages, Jupyter Notebooks\n", + "- (2) Importing data\n", + " - (2.1) Importing .csv files\n", + " - (2.2) Other ways of creating DataFrames\n", + " - (2.3) Changing names and data types\n", + "- (3) Summarizing data\n", + " - (3.1) Peeking at the data\n", + " - (3.2) Null values and summary statistics\n", + " - (3.3) Unique values, value counts and sorting\n", + " - (3.4) Basic visualizations\n", + "- (4) Selecting and computing new columns\n", + " - (4.1) Accessing rows, columns and data\n", + " - (4.2) Selecting subsets of columns\n", + " - (4.3) Selecting subsets of rows\n", + " - (4.4) Selecting subsets of rows *and* columns\n", + " - (4.5) Creating new columns\n", + " - (4.6) Applying functions\n", + "- (5) Filtering and sorting\n", + " - (5.1) Equality, non-equality and logical operators\n", + " - (5.2) Group membership and string filtering\n", + "- (6) Split-apply-combine and pivots\n", + " - (6.1) The groupby operation\n", + " - (6.2) Several groups and aggregations\n", + " - (6.3) Unstacking and stacking" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---------------------------------" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# (1) Setup" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Python and Anaconda\n", + "\n", + "If you haven't done it yet, start by installing Python.\n", + "The [Anaconda Distribution](https://www.anaconda.com/download/) is great, install version `3.X`.\n", + "- If you're on Windows, you will get a program called *Anaconda Prompt*. Open in at run `conda --version` to verify that everything works.\n", + "- If you're on Linux, open a terminal and run `conda --version`.\n", + "\n", + "## Pandas, NumPy and matplotlib\n", + "\n", + "![](https://indranilsinharoy.files.wordpress.com/2013/01/scientificpythonecosystemsi.png?w=584&h=442)\n", + "\n", + "*Image source: https://indranilsinharoy.com/2013/01/06/python-for-scientific-computing-a-collection-of-resources/*\n", + "\n", + "To install packages, run `conda install `. To upgrade, run `conda update --all` \n", + "The Anaconda distribution comes with the three packages we will require, namely [pandas](https://pandas.pydata.org/), [NumPy](http://www.numpy.org/) and [matplotlib](https://matplotlib.org/).\n", + "\n", + "- **NumPy** implements $n$-dimensional arrays in Python for efficient computations. See the [arXiv](https://arxiv.org/pdf/1102.1523.pdf) paper for a nice introduction. To learn basic NumPy, consider doing these [100 NumPy exercises](https://github.com/rougier/numpy-100).\n", + "- **Matplotlib** is the most popular library for plotting in Python. See the beautiful [gallery](https://matplotlib.org/gallery.html) to get an overview of the capabilities of matplotlib.\n", + "- **Pandas** is a library for data analysis based on two objects, the [Series](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html) and the [DataFrame](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).\n", + "\n", + "## Jupyter\n", + "\n", + "The [Jupyter Notebook](https://jupyter-notebook.readthedocs.io/en/stable/) is an environment in which you can run Python code, display graphs and work with data interactively. Think of it as a tool between a simple terminal and a full fledged IDE. Move to a directory using the `cd` command in the terminal, then run `jupyter notebook` to start up a notebook. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Importing packages" + ] + }, + { + "cell_type": "code", + "execution_count": 433, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To make this Jupyter Notebook more easily reproducible, we list versions of the libraries we will be using." + ] + }, + { + "cell_type": "code", + "execution_count": 435, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Today is 2018-12-14 15:51:54.076989\n", + "----------------------------------------------------------------\n", + "pandas version 0.23.4\n", + "numpy version 1.15.3\n", + "matplotlib version 2.2.3\n" + ] + } + ], + "source": [ + "import datetime\n", + "\n", + "print('Today is', datetime.datetime.utcnow())\n", + "print('-'*2**6)\n", + "\n", + "for lib in [pd, np, matplotlib]:\n", + " print(f'{lib.__name__.ljust(12)} version {lib.__version__}')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Using Jupyter Notebooks**:\n", + "\n", + "- Useful shortcuts.\n", + "- Executing terminal commands from within the notebook.\n", + "- Timing cells.\n", + "- Using markdown.\n", + "- Pitfalls when using notebooks: state, order of execution, tidyness." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# (2) Importing data\n", + "\n", + "Using `!` let's us use terminal commands. The `head` command shows the first rows of the file. **This is a Unix-only command.**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (2.1) Importing `.csv` files" + ] + }, + { + "cell_type": "code", + "execution_count": 439, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "movie_metadata.csv wine_data.csv world_population_history.csv\r\n" + ] + } + ], + "source": [ + "!ls data/" + ] + }, + { + "cell_type": "code", + "execution_count": 440, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,actor_1_name,movie_title,num_voted_users,cast_total_facebook_likes,actor_3_name,facenumber_in_poster,plot_keywords,movie_imdb_link,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes\r", + "\r\n", + "Color,James Cameron,723,178,0,855,Joel David Moore,1000,760505847,Action|Adventure|Fantasy|Sci-Fi,CCH Pounder,Avatar ,886204,4834,Wes Studi,0,avatar|future|marine|native|paraplegic,http://www.imdb.com/title/tt0499549/?ref_=fn_tt_tt_1,3054,English,USA,PG-13,237000000,2009,936,7.9,1.78,33000\r", + "\r\n" + ] + } + ], + "source": [ + "!head data/movie_metadata.csv -n 2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The file has many columns, so we'll only load a couple of columns into a pandas DataFrame.\n", + "To familiarize ourselves with with [magic commands](http://ipython.readthedocs.io/en/stable/interactive/magics.html), we'll use `%%time` to time the execution of the cell below." + ] + }, + { + "cell_type": "code", + "execution_count": 442, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loaded data of size (5043, 6) into memory.\n", + "CPU times: user 18.8 ms, sys: 0 ns, total: 18.8 ms\n", + "Wall time: 17.7 ms\n" + ] + } + ], + "source": [ + "%%time\n", + "\n", + "cols_to_use = ['movie_title', 'director_name', 'country', 'content_rating', 'imdb_score', 'gross']\n", + "df = pd.read_csv(r'data/movie_metadata.csv', sep=',', usecols=cols_to_use)\n", + "print(f'Loaded data of size {df.shape} into memory.')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The df.shape gives the rows and columns of the DataFrame. \n", + "This leads us naturally to consider summarizations." + ] + }, + { + "cell_type": "code", + "execution_count": 443, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(5043, 6)" + ] + }, + "execution_count": 443, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.shape # Alternatively, use len(df) for row count" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (2.2) Other ways of creating DataFrames\n", + "\n", + "**Creating a DataFrame from scratch**" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
nameage
0Max31
1Mark25
2Mia38
\n", + "
" + ], + "text/plain": [ + " name age\n", + "0 Max 31\n", + "1 Mark 25\n", + "2 Mia 38" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.DataFrame({'name':['Max', 'Mark', 'Mia'], 'age':[31, 25, 38]})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Reading a table form the web**" + ] + }, + { + "cell_type": "code", + "execution_count": 445, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
World rankingNameCitizenshipNet worth (USD)Sources of wealth
021Georg SchaefflerGermany26.9 billionSchaeffler Group
137Beate Heister (b. Albrecht) & Karl Albrecht Jr.Germany21.3 billionAldi Süd
246Dieter SchwarzGermany19.4 billionSchwarz Gruppe
349Theo Albrecht Jr.Germany19 billionAldi Nord and Trader Joe's
450Michael OttoGermany18.1 billionOtto Group
\n", + "
" + ], + "text/plain": [ + " World ranking Name Citizenship \\\n", + "0 21 Georg Schaeffler Germany \n", + "1 37 Beate Heister (b. Albrecht) & Karl Albrecht Jr. Germany \n", + "2 46 Dieter Schwarz Germany \n", + "3 49 Theo Albrecht Jr. Germany \n", + "4 50 Michael Otto Germany \n", + "\n", + " Net worth (USD) Sources of wealth \n", + "0 26.9 billion Schaeffler Group \n", + "1 21.3 billion Aldi Süd \n", + "2 19.4 billion Schwarz Gruppe \n", + "3 19 billion Aldi Nord and Trader Joe's \n", + "4 18.1 billion Otto Group " + ] + }, + "execution_count": 445, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Read HTML tables into a list of DataFrame objects.\n", + "tables = pd.read_html(r'https://en.wikipedia.org/wiki/List_of_Germans_by_net_worth', header=0)\n", + "\n", + "df_net_worth = tables[0]\n", + "df_net_worth.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Reading from databases is also possible.**\n", + "\n", + "Reading from Microsoft SQL using `pyodbc` and `pd.read_sql(sql_code, connection)` is neat." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---------\n", + "\n", + "**Gotcha.** Methods on DataFrames return a new instance by default. In other words, they behave like methods on *immutable* Python object, and not like methods on *mutable* objects." + ] + }, + { + "cell_type": "code", + "execution_count": 449, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[1, 2, 4, 6, 9]\n", + "Tommy\n" + ] + } + ], + "source": [ + "# Lists are MUTABLE\n", + "scores = [6, 2, 4, 9, 1]\n", + "scores.sort() # Changes the object in-place, returns None\n", + "print(scores)\n", + "\n", + "# Strings are IMMUTABLE\n", + "my_name = 'tommy'\n", + "my_name = my_name.capitalize() # A new instance is returned\n", + "print(my_name)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (2.3) Changing names and data types" + ] + }, + { + "cell_type": "code", + "execution_count": 450, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Director_nameGrossMovie_titleCountryContent_ratingImdb_score
0James Cameron760505847.0AvatarUSAPG-137.9
1Gore Verbinski309404152.0Pirates of the Caribbean: At World's EndUSAPG-137.1
\n", + "
" + ], + "text/plain": [ + " Director_name Gross Movie_title \\\n", + "0 James Cameron 760505847.0 Avatar  \n", + "1 Gore Verbinski 309404152.0 Pirates of the Caribbean: At World's End  \n", + "\n", + " Country Content_rating Imdb_score \n", + "0 USA PG-13 7.9 \n", + "1 USA PG-13 7.1 " + ] + }, + "execution_count": 450, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_net_worth = (df_net_worth\n", + " .rename(columns={'Net worth (USD)': 'net_worth',\n", + " 'World ranking': 'world_ranking',\n", + " 'Sources of wealth': 'wealth_source'}))\n", + "\n", + "\n", + "df.rename(columns=str.capitalize).head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 451, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
world_rankingNameCitizenshipnet_worthwealth_sourcenet_worth_num
021Georg SchaefflerGermany26.9 billionSchaeffler Group26.9
137Beate Heister (b. Albrecht) & Karl Albrecht Jr.Germany21.3 billionAldi Süd21.3
\n", + "
" + ], + "text/plain": [ + " world_ranking Name Citizenship \\\n", + "0 21 Georg Schaeffler Germany \n", + "1 37 Beate Heister (b. Albrecht) & Karl Albrecht Jr. Germany \n", + "\n", + " net_worth wealth_source net_worth_num \n", + "0 26.9 billion Schaeffler Group 26.9 \n", + "1 21.3 billion Aldi Süd 21.3 " + ] + }, + "execution_count": 451, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_net_worth['net_worth_num'] = (df_net_worth['net_worth']\n", + " .str\n", + " .replace(' billion', '')\n", + " .apply(float))\n", + "\n", + "df_net_worth.head(2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# (3) Summarizing data\n", + "\n", + "This section summarizes some important functions, and shows how to summarize data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (3.1) Peeking at the data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Three methods that are useful when peeking at the data are `df.head`, `df.tail` and `df.sample`.\n", + "Head and tail are $\\mathcal{O}(1)$ operations, while sample is $\\mathcal{O}(n)$, where $n$ is the number of rows.\n", + "For small datasets, this makes no difference in practice. We'll use `df.sample` here." + ] + }, + { + "cell_type": "code", + "execution_count": 456, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
0James Cameron760505847.0AvatarUSAPG-137.9
1Gore Verbinski309404152.0Pirates of the Caribbean: At World's EndUSAPG-137.1
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title \\\n", + "0 James Cameron 760505847.0 Avatar  \n", + "1 Gore Verbinski 309404152.0 Pirates of the Caribbean: At World's End  \n", + "\n", + " country content_rating imdb_score \n", + "0 USA PG-13 7.9 \n", + "1 USA PG-13 7.1 " + ] + }, + "execution_count": 456, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(n=2) # df.tail(n=2) returns the last rows" + ] + }, + { + "cell_type": "code", + "execution_count": 457, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
3922Steven Shainberg4046737.0SecretaryUSAR7.1
569Judd Apatow51814190.0Funny PeopleUSAR6.4
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title country content_rating \\\n", + "3922 Steven Shainberg 4046737.0 Secretary  USA R \n", + "569 Judd Apatow 51814190.0 Funny People  USA R \n", + "\n", + " imdb_score \n", + "3922 7.1 \n", + "569 6.4 " + ] + }, + "execution_count": 457, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.sample(n=2, replace=False, weights=None, random_state=None)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (3.2) Null values and summary statistics" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We should make sure the data types are correct. To do so, we can use `df.dtypes`, or `df.info()` for some more information." + ] + }, + { + "cell_type": "code", + "execution_count": 458, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "RangeIndex: 5043 entries, 0 to 5042\n", + "Data columns (total 6 columns):\n", + "director_name 4939 non-null object\n", + "gross 4159 non-null float64\n", + "movie_title 5043 non-null object\n", + "country 5038 non-null object\n", + "content_rating 4740 non-null object\n", + "imdb_score 5043 non-null float64\n", + "dtypes: float64(2), object(4)\n", + "memory usage: 236.5+ KB\n" + ] + } + ], + "source": [ + "df.info(verbose=True, memory_usage=True, null_counts=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We have some null values. Let's count them by chaining `df.isnull()` and `df.sum()`." + ] + }, + { + "cell_type": "code", + "execution_count": 459, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "director_name 104\n", + "gross 884\n", + "movie_title 0\n", + "country 5\n", + "content_rating 303\n", + "imdb_score 0\n", + "dtype: int64" + ] + }, + "execution_count": 459, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "null_values = df.isnull().sum()\n", + "null_values" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The result of the above is not a DataFrame, but a Series." + ] + }, + { + "cell_type": "code", + "execution_count": 460, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pandas.core.series.Series" + ] + }, + "execution_count": 460, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(null_values)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![alt text](https://www.mathsisfun.com/algebra/images/scalar-vector-matrix.svg)\n", + "*Image source:* https://www.mathsisfun.com/algebra/scalar-vector-matrix.html\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can make the output prettier by converting null_values to a DataFrame using the `to_frame()` method, then transposing using `.T`, and finally renaming the first index." + ] + }, + { + "cell_type": "code", + "execution_count": 461, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
Missing values104884053030
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title country content_rating \\\n", + "Missing values 104 884 0 5 303 \n", + "\n", + " imdb_score \n", + "Missing values 0 " + ] + }, + "execution_count": 461, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "null_values.to_frame().T.rename(index={0:'Missing values'})" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The above is called method chaining, and can be written like so:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
Missing values104884053030
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title country content_rating \\\n", + "Missing values 104 884 0 5 303 \n", + "\n", + " imdb_score \n", + "Missing values 0 " + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "(df\n", + " .isnull() # Figure out whether every entry is null (missing), or not\n", + " .sum(axis=0) # Sum over each column, axis=0 is the default\n", + " .to_frame() # The result is a Series, convert to DataFrame\n", + " .T # Transpose (switch rows and columns)\n", + " .rename(index={0:'Missing values'}) # Rename the index and show it\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A tour of summarization would not be complete without `df.describe()`.\n", + "Calling `df.count()`, `df.nunique()`, `df.mean()`, `df.std()`, `df.min()`, `df.quantile()`, `df.max()` is also possible." + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
count493941595043503847405043
unique239849176518
topSteven SpielbergKing KongUSAR
freq26338072118
mean4.84684e+076.44214
std6.8453e+071.12512
min1621.6
50%2.55175e+076.6
max7.60506e+089.5
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title country content_rating \\\n", + "count 4939 4159 5043 5038 4740 \n", + "unique 2398 4917 65 18 \n", + "top Steven Spielberg King Kong  USA R \n", + "freq 26 3 3807 2118 \n", + "mean 4.84684e+07 \n", + "std 6.8453e+07 \n", + "min 162 \n", + "50% 2.55175e+07 \n", + "max 7.60506e+08 \n", + "\n", + " imdb_score \n", + "count 5043 \n", + "unique \n", + "top \n", + "freq \n", + "mean 6.44214 \n", + "std 1.12512 \n", + "min 1.6 \n", + "50% 6.6 \n", + "max 9.5 " + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.describe(percentiles=[0.5], include='all').fillna('')" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(4092, 6)" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dropna(axis=0, how='any').shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (3.3) Unique values, value counts and sorting" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['PG-13', nan, 'PG', 'G', 'R', 'TV-14', 'TV-PG', 'TV-MA', 'TV-G',\n", + " 'Not Rated', 'Unrated', 'Approved', 'TV-Y', 'NC-17', 'X', 'TV-Y7',\n", + " 'GP', 'Passed', 'M'], dtype=object)" + ] + }, + "execution_count": 23, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.content_rating.unique() # Not the same as: df.content_rating.is_unique" + ] + }, + { + "cell_type": "code", + "execution_count": 468, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['PG-13', nan, 'PG', 'G', 'R', 'TV-14', 'TV-PG', 'TV-MA', 'TV-G', 'Not Rated', 'Unrated', 'Approved', 'TV-Y', 'NC-17', 'X', 'TV-Y7', 'GP', 'Passed', 'M']\n" + ] + } + ], + "source": [ + "print(df.content_rating.drop_duplicates().tolist())" + ] + }, + { + "cell_type": "code", + "execution_count": 469, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "R 2118\n", + "PG-13 1461\n", + "PG 701\n", + "Not Rated 116\n", + "G 112\n", + "Unrated 62\n", + "Approved 55\n", + "TV-14 30\n", + "TV-MA 20\n", + "TV-PG 13\n", + "X 13\n", + "TV-G 10\n", + "Passed 9\n", + "NC-17 7\n", + "GP 6\n", + "M 5\n", + "TV-Y 1\n", + "TV-Y7 1\n", + "Name: content_rating, dtype: int64" + ] + }, + "execution_count": 469, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.content_rating.value_counts()" + ] + }, + { + "cell_type": "code", + "execution_count": 471, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titlegross
0Avatar760505847.0
26Titanic658672302.0
29Jurassic World652177271.0
\n", + "
" + ], + "text/plain": [ + " movie_title gross\n", + "0 Avatar  760505847.0\n", + "26 Titanic  658672302.0\n", + "29 Jurassic World  652177271.0" + ] + }, + "execution_count": 471, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[['movie_title', 'gross']].nlargest(3, 'gross')" + ] + }, + { + "cell_type": "code", + "execution_count": 473, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
4735Siddiq Barmak1127331.0OsamaAfghanistanPG-137.4
4000Juan José Campanella20167424.0The Secret in Their EyesArgentinaR8.2
4415Fabián Bielinsky1221261.0Nine QueensArgentinaR7.9
4666Jorge GaggeroNaNLive-In MaidArgentinaUnrated7.2
4450Lucrecia Martel304124.0The Holy GirlArgentinaR6.7
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title \\\n", + "4735 Siddiq Barmak 1127331.0 Osama  \n", + "4000 Juan José Campanella 20167424.0 The Secret in Their Eyes  \n", + "4415 Fabián Bielinsky 1221261.0 Nine Queens  \n", + "4666 Jorge Gaggero NaN Live-In Maid  \n", + "4450 Lucrecia Martel 304124.0 The Holy Girl  \n", + "\n", + " country content_rating imdb_score \n", + "4735 Afghanistan PG-13 7.4 \n", + "4000 Argentina R 8.2 \n", + "4415 Argentina R 7.9 \n", + "4666 Argentina Unrated 7.2 \n", + "4450 Argentina R 6.7 " + ] + }, + "execution_count": 473, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Sort by country, then by IMDB_score. Put NA values last\n", + "df.sort_values(by=['country', 'imdb_score'], \n", + " ascending=[True, False], \n", + " na_position='last').head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (3.4) Basic visualizations\n", + "\n", + "Some quick visualizations." + ] + }, + { + "cell_type": "code", + "execution_count": 474, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
grossimdb_score
gross1.0000000.198021
imdb_score0.1980211.000000
\n", + "
" + ], + "text/plain": [ + " gross imdb_score\n", + "gross 1.000000 0.198021\n", + "imdb_score 0.198021 1.000000" + ] + }, + "execution_count": 474, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.corr(method='pearson')" + ] + }, + { + "cell_type": "code", + "execution_count": 475, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "df.imdb_score.plot.kde(bw_method=0.09, grid=True, title='IMDB score', lw=3);" + ] + }, + { + "cell_type": "code", + "execution_count": 476, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "plot = pd.plotting.scatter_matrix(df, alpha=0.5, figsize=(8,5))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# (4) Selecting and computing new columns\n", + "\n", + "This section is about selecting subsets of a datset, or creating new data from existing data, i.e.:\n", + "\n", + "- Selecting a single column, or a subset of columns\n", + "- Selecting a subset of rows, i.e. filtering\n", + "- Chaining the above operations to do both\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (4.1) Accessing rows, columns and data" + ] + }, + { + "cell_type": "code", + "execution_count": 486, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['director_name', 'gross', 'movie_title', 'country', 'content_rating',\n", + " 'imdb_score'],\n", + " dtype='object')" + ] + }, + "execution_count": 486, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": 487, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "RangeIndex(start=0, stop=5043, step=1)" + ] + }, + "execution_count": 487, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.index" + ] + }, + { + "cell_type": "code", + "execution_count": 488, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([['James Cameron', 760505847.0, 'Avatar\\xa0', 'USA', 'PG-13', 7.9],\n", + " ['Gore Verbinski', 309404152.0,\n", + " \"Pirates of the Caribbean: At World's End\\xa0\", 'USA', 'PG-13',\n", + " 7.1],\n", + " ['Sam Mendes', 200074175.0, 'Spectre\\xa0', 'UK', 'PG-13', 6.8],\n", + " ...,\n", + " ['Benjamin Roberds', nan, 'A Plague So Pleasant\\xa0', 'USA', nan,\n", + " 6.3],\n", + " ['Daniel Hsia', 10443.0, 'Shanghai Calling\\xa0', 'USA', 'PG-13',\n", + " 6.3],\n", + " ['Jon Gunn', 85222.0, 'My Date with Drew\\xa0', 'USA', 'PG', 6.6]],\n", + " dtype=object)" + ] + }, + "execution_count": 488, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# This is very useful when piping data to other libraries\n", + "df.values" + ] + }, + { + "cell_type": "code", + "execution_count": 489, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([7.60505847e+08, 3.09404152e+08, 2.00074175e+08, ...,\n", + " 4.58400000e+03, 1.04430000e+04, 8.52220000e+04])" + ] + }, + "execution_count": 489, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.gross.dropna().values" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (4.2) Selecting subsets of columns" + ] + }, + { + "cell_type": "code", + "execution_count": 491, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['director_name', 'gross', 'movie_title', 'country', 'content_rating', 'imdb_score']\n" + ] + } + ], + "source": [ + "print(df.columns.tolist()) # Get the columns" + ] + }, + { + "cell_type": "code", + "execution_count": 492, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 James Cameron\n", + "1 Gore Verbinski\n", + "2 Sam Mendes\n", + "Name: director_name, dtype: object" + ] + }, + "execution_count": 492, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.director_name.head(3) # Alternatively, use df['director_name'].head(3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Select two columns." + ] + }, + { + "cell_type": "code", + "execution_count": 493, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titlecountry
0AvatarUSA
1Pirates of the Caribbean: At World's EndUSA
\n", + "
" + ], + "text/plain": [ + " movie_title country\n", + "0 Avatar  USA\n", + "1 Pirates of the Caribbean: At World's End  USA" + ] + }, + "execution_count": 493, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[['movie_title', 'country']].head(2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The most useful selection function is `df.loc[[row1, row2, ...], [col1, col2, ...]]`.\n", + "\n", + "- `df.loc[:, [col1, col2]]` selects every row, and columns `[col1, col2]`\n", + "- `df.loc[[row1, row2], :]` selects rows `[row1, row2]`, and every column" + ] + }, + { + "cell_type": "code", + "execution_count": 495, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_titlecountry
0AvatarUSA
1Pirates of the Caribbean: At World's EndUSA
\n", + "
" + ], + "text/plain": [ + " movie_title country\n", + "0 Avatar  USA\n", + "1 Pirates of the Caribbean: At World's End  USA" + ] + }, + "execution_count": 495, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[:, ['movie_title', 'country']].head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 496, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "\n" + ] + } + ], + "source": [ + "a = df.loc[:, 'gross'] # Returns a Series\n", + "b = df.loc[:, ['gross']] # Returns a DataFrame\n", + "\n", + "print(type(a))\n", + "print(type(b))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Instead of selecting which columns to *keep*, we can select a subset to *drop*." + ] + }, + { + "cell_type": "code", + "execution_count": 497, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countrycontent_ratingimdb_score
0USAPG-137.9
1USAPG-137.1
2UKPG-136.8
\n", + "
" + ], + "text/plain": [ + " country content_rating imdb_score\n", + "0 USA PG-13 7.9\n", + "1 USA PG-13 7.1\n", + "2 UK PG-13 6.8" + ] + }, + "execution_count": 497, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.drop(columns=['director_name', 'gross', 'movie_title']).head(3)" + ] + }, + { + "cell_type": "code", + "execution_count": 498, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_title
1Gore Verbinski309404152.0Pirates of the Caribbean: At World's End
2Sam Mendes200074175.0Spectre
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title\n", + "1 Gore Verbinski 309404152.0 Pirates of the Caribbean: At World's End \n", + "2 Sam Mendes 200074175.0 Spectre " + ] + }, + "execution_count": 498, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Integer-location based indexing\n", + "df.iloc[1:3, [0, 1, 2]]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (4.3) Selecting subsets of rows" + ] + }, + { + "cell_type": "code", + "execution_count": 499, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
0James Cameron760505847.0AvatarUSAPG-137.9
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title country content_rating imdb_score\n", + "0 James Cameron 760505847.0 Avatar  USA PG-13 7.9" + ] + }, + "execution_count": 499, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head(1)" + ] + }, + { + "cell_type": "code", + "execution_count": 500, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
0James Cameron760505847.0AvatarUSAPG-137.9
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title country content_rating imdb_score\n", + "0 James Cameron 760505847.0 Avatar  USA PG-13 7.9" + ] + }, + "execution_count": 500, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[[0], :]" + ] + }, + { + "cell_type": "code", + "execution_count": 501, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
0James Cameron760505847.0AvatarUSAPG-137.9
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title country content_rating imdb_score\n", + "0 James Cameron 760505847.0 Avatar  USA PG-13 7.9" + ] + }, + "execution_count": 501, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[[0]]" + ] + }, + { + "cell_type": "code", + "execution_count": 502, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
2765John BlanchardNaNTowering InfernoCanadaNaN9.5
2824NaN447093.0DekalogPolandTV-MA9.1
3207NaN447093.0DekalogPolandTV-MA9.1
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title country \\\n", + "2765 John Blanchard NaN Towering Inferno  Canada \n", + "2824 NaN 447093.0 Dekalog  Poland \n", + "3207 NaN 447093.0 Dekalog  Poland \n", + "\n", + " content_rating imdb_score \n", + "2765 NaN 9.5 \n", + "2824 TV-MA 9.1 \n", + "3207 TV-MA 9.1 " + ] + }, + "execution_count": 502, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Top three movies / TV-series not from the USA\n", + "df[df.country != 'USA'].nlargest(3, 'imdb_score')" + ] + }, + { + "cell_type": "code", + "execution_count": 503, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
4498Sergio Leone6100000.0The Good, the Bad and the UglyItalyApproved8.9
270Peter Jackson313837577.0The Lord of the Rings: The Fellowship of the R...New ZealandPG-138.8
4029Fernando Meirelles7563397.0City of GodBrazilR8.7
\n", + "
" + ], + "text/plain": [ + " director_name gross \\\n", + "4498 Sergio Leone 6100000.0 \n", + "270 Peter Jackson 313837577.0 \n", + "4029 Fernando Meirelles 7563397.0 \n", + "\n", + " movie_title country \\\n", + "4498 The Good, the Bad and the Ugly  Italy \n", + "270 The Lord of the Rings: The Fellowship of the R... New Zealand \n", + "4029 City of God  Brazil \n", + "\n", + " content_rating imdb_score \n", + "4498 Approved 8.9 \n", + "270 PG-13 8.8 \n", + "4029 R 8.7 " + ] + }, + "execution_count": 503, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Best non-American films, with no missing information\n", + "mask = (df.country != 'USA') & (df.isnull().sum(axis=1) == 0)\n", + "df[mask].nlargest(3, 'imdb_score')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (4.4) Selecting subsets of rows *and* columns" + ] + }, + { + "cell_type": "code", + "execution_count": 505, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titlecountry
1196James WanThe Conjuring 2USA
1562Martin ScorseseBringing Out the DeadUSA
2163James WanThe ConjuringUSA
2765John BlanchardTowering InfernoCanada
2969Peter WebberGirl with a Pearl EarringUK
3419NaNWuthering HeightsUK
3858Todd SolondzLife During WartimeUSA
4298Lance MungiaSix-String SamuraiUSA
\n", + "
" + ], + "text/plain": [ + " director_name movie_title country\n", + "1196 James Wan The Conjuring 2  USA\n", + "1562 Martin Scorsese Bringing Out the Dead  USA\n", + "2163 James Wan The Conjuring  USA\n", + "2765 John Blanchard Towering Inferno  Canada\n", + "2969 Peter Webber Girl with a Pearl Earring  UK\n", + "3419 NaN Wuthering Heights  UK\n", + "3858 Todd Solondz Life During Wartime  USA\n", + "4298 Lance Mungia Six-String Samurai  USA" + ] + }, + "execution_count": 505, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Above average movies, with the title containing 'ring'\n", + "row_mask = ((df.imdb_score > df.imdb_score.mean()) & \n", + " df.movie_title.str.contains('ring'))\n", + "df.loc[row_mask, ['director_name', 'movie_title', 'country']]" + ] + }, + { + "cell_type": "code", + "execution_count": 506, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namemovie_titlecontent_ratingimdb_score
0James CameronAvatarPG-137.9
1Gore VerbinskiPirates of the Caribbean: At World's EndPG-137.1
2Sam MendesSpectrePG-136.8
3Christopher NolanThe Dark Knight RisesPG-138.5
4Doug WalkerStar Wars: Episode VII - The Force Awakens  ...NaN7.1
\n", + "
" + ], + "text/plain": [ + " director_name movie_title \\\n", + "0 James Cameron Avatar  \n", + "1 Gore Verbinski Pirates of the Caribbean: At World's End  \n", + "2 Sam Mendes Spectre  \n", + "3 Christopher Nolan The Dark Knight Rises  \n", + "4 Doug Walker Star Wars: Episode VII - The Force Awakens  ... \n", + "\n", + " content_rating imdb_score \n", + "0 PG-13 7.9 \n", + "1 PG-13 7.1 \n", + "2 PG-13 6.8 \n", + "3 PG-13 8.5 \n", + "4 NaN 7.1 " + ] + }, + "execution_count": 506, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Columns containing and underscore\n", + "cols = [c for c in df.columns if '_' in c]\n", + "df.loc[:, cols].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 507, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
grossimdb_score
0760505847.07.9
1309404152.07.1
2200074175.06.8
3448130642.08.5
4NaN7.1
\n", + "
" + ], + "text/plain": [ + " gross imdb_score\n", + "0 760505847.0 7.9\n", + "1 309404152.0 7.1\n", + "2 200074175.0 6.8\n", + "3 448130642.0 8.5\n", + "4 NaN 7.1" + ] + }, + "execution_count": 507, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Columns containing and underscore\n", + "numeric_cols = df.dtypes[df.dtypes == np.float].index.tolist()\n", + "df.loc[:, numeric_cols].head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (4.5) Creating new columns" + ] + }, + { + "cell_type": "code", + "execution_count": 508, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_scorelog_gross
0James Cameron760505847.0AvatarUSAPG-137.98.881103
1Gore Verbinski309404152.0Pirates of the Caribbean: At World's EndUSAPG-137.18.490526
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title \\\n", + "0 James Cameron 760505847.0 Avatar  \n", + "1 Gore Verbinski 309404152.0 Pirates of the Caribbean: At World's End  \n", + "\n", + " country content_rating imdb_score log_gross \n", + "0 USA PG-13 7.9 8.881103 \n", + "1 USA PG-13 7.1 8.490526 " + ] + }, + "execution_count": 508, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp = df.copy() # Copy the DataFrame\n", + "\n", + "# Create a new column - based on the gross income\n", + "temp['log_gross'] = temp['gross'].apply(np.log10)\n", + "\n", + "temp.head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 509, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "temp.plot.scatter(x='imdb_score', y='log_gross', alpha=0.2, s=3);" + ] + }, + { + "cell_type": "code", + "execution_count": 511, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
log_gross
countrycontent_rating
AustraliaPG8.22
PG-138.19
R8.17
CanadaPG-138.35
PG8.05
\n", + "
" + ], + "text/plain": [ + " log_gross\n", + "country content_rating \n", + "Australia PG 8.22\n", + " PG-13 8.19\n", + " R 8.17\n", + "Canada PG-13 8.35\n", + " PG 8.05" + ] + }, + "execution_count": 511, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Equivalent to the above\n", + "(temp.assign(log_gross=lambda df:df.gross.apply(np.log10))).head()\n", + "\n", + "# One advantage is that method chaining can be used\n", + "(temp\n", + " .assign(log_gross=lambda df:df.gross.apply(np.log10)) # Create a new column\n", + " .loc[lambda df:df.log_gross > 8, ['country', 'content_rating', 'log_gross']] # Filter\n", + " .groupby(['country', 'content_rating']) # Group by and mean\n", + " .mean()\n", + " .reset_index() # Reset the index to sort\n", + " .sort_values(['country', 'log_gross'], ascending=[True, False]) # Sort the results\n", + " .set_index(['country', 'content_rating']) # Re-index\n", + " .assign(log_gross=lambda df:df.log_gross.round(2)) # Re-define the column and round it\n", + " .head(5)\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (4.6) Applying functions\n", + "\n", + "On a `pd.Series`:\n", + "\n", + "- `pd.Series.map` applies an elementwise $f: \\mathbb{R} \\to \\mathbb{R}$ function (e.g. `str`, or `float`)\n", + "- `pd.Series.apply` applies a vectorized $f: \\mathbb{R}^n \\to \\mathbb{R}^n$ function (e.g. `log`, or `sin`)\n", + "- `pd.Series.aggregate` applies an aggreation $f: \\mathbb{R}^n \\to \\mathbb{R}$ function (e.g. `mean`, or `std`)\n", + "\n", + "On a `pd.DataFrame`:\n", + "\n", + "- `pd.DataFrame.applymap` applies an elementwise $f: \\mathbb{R} \\to \\mathbb{R}$ function\n", + "- `pd.DataFrame.apply` applies a vectorized $f: \\mathbb{R}^n \\to \\mathbb{R}^n$ function\n", + "- `pd.DataFrame.aggregate` applies an aggreation $f: \\mathbb{R}^n \\to \\mathbb{R}$ function" + ] + }, + { + "cell_type": "code", + "execution_count": 512, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 760505847\n", + "1 309404152\n", + "Name: gross, dtype: int64" + ] + }, + "execution_count": 512, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.gross.dropna().map(int).head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 513, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
content_rating
NaN2881
inappropriate for children under 131461
may not be suitable for children701
\n", + "
" + ], + "text/plain": [ + " content_rating\n", + "NaN 2881\n", + "inappropriate for children under 13 1461\n", + "may not be suitable for children 701" + ] + }, + "execution_count": 513, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Dictionaries are also maps, but brittle since no keys maps to NaN\n", + "(df.content_rating\n", + " .map({'PG-13':'inappropriate for children under 13', \n", + " 'PG': 'may not be suitable for children'}, na_action='ignore')\n", + " .value_counts(dropna=False)\n", + " .to_frame())" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 8.881103\n", + "1 8.490526\n", + "Name: gross, dtype: float64" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.gross.dropna().apply(np.log10).head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "48468407.52680933" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.gross.dropna().aggregate(np.mean)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---------------" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
grossimdb_score
020.4494942.066863
119.5501591.960095
\n", + "
" + ], + "text/plain": [ + " gross imdb_score\n", + "0 20.449494 2.066863\n", + "1 19.550159 1.960095" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[:, ['gross', 'imdb_score']].dropna(how='any').apply(np.log).head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
grossimdb_score
07605058477
13094041527
\n", + "
" + ], + "text/plain": [ + " gross imdb_score\n", + "0 760505847 7\n", + "1 309404152 7" + ] + }, + "execution_count": 53, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[:, ['gross', 'imdb_score']].dropna(how='any').applymap(int).head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 54, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "gross 4.846841e+07\n", + "imdb_score 6.469897e+00\n", + "dtype: float64" + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[:, ['gross', 'imdb_score']].dropna(how='any').mean().head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 55, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "gross 4.846841e+07\n", + "imdb_score 6.469897e+00\n", + "dtype: float64" + ] + }, + "execution_count": 55, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[:, ['gross', 'imdb_score']].dropna(how='any').aggregate(np.mean, axis=0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# (5) Filtering and sorting" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We've already seen basic filtering. \n", + "\n", + "- `==` defines equality\n", + "- `!=` defines inquality equality\n", + "- `~` negates logic, e.g. `True` -> `False`\n", + "- `&` represents elementwise `and`\n", + "- `|` represents elementwise `or`\n", + "\n", + "Remember to parenthesize expressions, write:\n", + "\n", + "> `(df.col_A > 5) & (df.col_B <= 5)`\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (5.1) Equality, non-equality and logical operators" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
2594NaNNaNLilyhammerNorwayTV-MA8.1
3336Nils GaupNaNShipwreckedNorwayPG6.7
3690Morten Tyldum1196752.0HeadhuntersNorwayR7.6
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title country \\\n", + "2594 NaN NaN Lilyhammer  Norway \n", + "3336 Nils Gaup NaN Shipwrecked  Norway \n", + "3690 Morten Tyldum 1196752.0 Headhunters  Norway \n", + "\n", + " content_rating imdb_score \n", + "2594 TV-MA 8.1 \n", + "3336 PG 6.7 \n", + "3690 R 7.6 " + ] + }, + "execution_count": 56, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Movies and TV shows from Norway\n", + "df[df.country == 'Norway'].head(3)" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
4498Sergio Leone6100000.0The Good, the Bad and the UglyItalyApproved8.9
270Peter Jackson313837577.0The Lord of the Rings: The Fellowship of the R...New ZealandPG-138.8
4029Fernando Meirelles7563397.0City of GodBrazilR8.7
\n", + "
" + ], + "text/plain": [ + " director_name gross \\\n", + "4498 Sergio Leone 6100000.0 \n", + "270 Peter Jackson 313837577.0 \n", + "4029 Fernando Meirelles 7563397.0 \n", + "\n", + " movie_title country \\\n", + "4498 The Good, the Bad and the Ugly  Italy \n", + "270 The Lord of the Rings: The Fellowship of the R... New Zealand \n", + "4029 City of God  Brazil \n", + "\n", + " content_rating imdb_score \n", + "4498 Approved 8.9 \n", + "270 PG-13 8.8 \n", + "4029 R 8.7 " + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "mask = ((df.imdb_score > 8) & (df.country != 'USA') & (df.gross > 10**6))\n", + "df[mask].nlargest(3, columns=['imdb_score'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (5.2) Group membership and string filtering" + ] + }, + { + "cell_type": "code", + "execution_count": 58, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
4747Akira Kurosawa269061.0Seven SamuraiJapanUnrated8.7
2373Hayao Miyazaki10049886.0Spirited AwayJapanPG8.6
2323Hayao Miyazaki2298191.0Princess MononokeJapanPG-138.4
98Hideaki AnnoNaNGodzilla ResurgenceJapanNaN8.2
204Hideaki AnnoNaNGodzilla ResurgenceJapanNaN8.2
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title country content_rating \\\n", + "4747 Akira Kurosawa 269061.0 Seven Samurai  Japan Unrated \n", + "2373 Hayao Miyazaki 10049886.0 Spirited Away  Japan PG \n", + "2323 Hayao Miyazaki 2298191.0 Princess Mononoke  Japan PG-13 \n", + "98 Hideaki Anno NaN Godzilla Resurgence  Japan NaN \n", + "204 Hideaki Anno NaN Godzilla Resurgence  Japan NaN \n", + "\n", + " imdb_score \n", + "4747 8.7 \n", + "2373 8.6 \n", + "2323 8.4 \n", + "98 8.2 \n", + "204 8.2 " + ] + }, + "execution_count": 58, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Top three movies from Japan or Hong Kong\n", + "df[df.country.isin(['Japan', 'Hong Kong'])].nlargest(5, 'imdb_score')" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
0James Cameron760505847.0AvatarUSAPG-137.9
1Gore Verbinski309404152.0Pirates of the Caribbean: At World's EndUSAPG-137.1
2Sam Mendes200074175.0SpectreUKPG-136.8
\n", + "
" + ], + "text/plain": [ + " director_name gross movie_title \\\n", + "0 James Cameron 760505847.0 Avatar  \n", + "1 Gore Verbinski 309404152.0 Pirates of the Caribbean: At World's End  \n", + "2 Sam Mendes 200074175.0 Spectre  \n", + "\n", + " country content_rating imdb_score \n", + "0 USA PG-13 7.9 \n", + "1 USA PG-13 7.1 \n", + "2 UK PG-13 6.8 " + ] + }, + "execution_count": 59, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Movies and TV shows NOT from scandinavia\n", + "df[~df.country.isin(['Norway, Sweden', 'Denmark'])].head(3)" + ] + }, + { + "cell_type": "code", + "execution_count": 60, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_score
270Peter Jackson313837577.0The Lord of the Rings: The Fellowship of the R...New ZealandPG-138.8
339Peter Jackson377019252.0The Lord of the Rings: The Return of the KingUSAPG-138.9
340Peter Jackson340478898.0The Lord of the Rings: The Two TowersUSAPG-138.7
1170Andrew Niccol24127895.0Lord of WarUSAR7.6
1974Catherine Hardwicke11008432.0Lords of DogtownUSAPG-137.1
\n", + "
" + ], + "text/plain": [ + " director_name gross \\\n", + "270 Peter Jackson 313837577.0 \n", + "339 Peter Jackson 377019252.0 \n", + "340 Peter Jackson 340478898.0 \n", + "1170 Andrew Niccol 24127895.0 \n", + "1974 Catherine Hardwicke 11008432.0 \n", + "\n", + " movie_title country \\\n", + "270 The Lord of the Rings: The Fellowship of the R... New Zealand \n", + "339 The Lord of the Rings: The Return of the King  USA \n", + "340 The Lord of the Rings: The Two Towers  USA \n", + "1170 Lord of War  USA \n", + "1974 Lords of Dogtown  USA \n", + "\n", + " content_rating imdb_score \n", + "270 PG-13 8.8 \n", + "339 PG-13 8.9 \n", + "340 PG-13 8.7 \n", + "1170 R 7.6 \n", + "1974 PG-13 7.1 " + ] + }, + "execution_count": 60, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Contains the word 'lord'\n", + "mask = df.movie_title.str.lower().str.contains(\"lord\")\n", + "\n", + " # Gross better than 25 % of the movies\n", + "mask = mask & (df.gross > df.gross.quantile(q=[0.25]).values[0])\n", + "\n", + "df[mask]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# (6) Split-apply-combine and pivots" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (6.1) The groupby operation" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](https://data36.com/wp-content/uploads/2017/06/SQL-GROUP-BY-clause-768x540.png)\n", + "\n", + "*Image source is https://data36.com/sql-functions-beginners-tutorial-ep3/*" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countryimdb_score
0USA7.9
1USA7.1
2UK6.8
3USA8.5
4NaN7.1
\n", + "
" + ], + "text/plain": [ + " country imdb_score\n", + "0 USA 7.9\n", + "1 USA 7.1\n", + "2 UK 6.8\n", + "3 USA 8.5\n", + "4 NaN 7.1" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df[['country', 'imdb_score']].head()" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
grossimdb_score
country
Afghanistan1.127331e+067.4
Argentina7.230936e+067.5
Aruba1.007614e+074.8
\n", + "
" + ], + "text/plain": [ + " gross imdb_score\n", + "country \n", + "Afghanistan 1.127331e+06 7.4\n", + "Argentina 7.230936e+06 7.5\n", + "Aruba 1.007614e+07 4.8" + ] + }, + "execution_count": 62, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby(df.country).mean().head(3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Directors with the most movies." + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
movie_title
director_name
Steven Spielberg26
Woody Allen22
Clint Eastwood20
Martin Scorsese20
Ridley Scott16
\n", + "
" + ], + "text/plain": [ + " movie_title\n", + "director_name \n", + "Steven Spielberg 26\n", + "Woody Allen 22\n", + "Clint Eastwood 20\n", + "Martin Scorsese 20\n", + "Ridley Scott 16" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "(df\n", + " .groupby(df.director_name)\n", + " .nunique()\n", + " .movie_title\n", + " .nlargest(5)\n", + " .to_frame()\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (6.2) Several groups and aggregations\n", + "\n", + "A group can be a combination of columns, e.g. [`country`, `content_rating`]." + ] + }, + { + "cell_type": "code", + "execution_count": 64, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
imdb_score
countrycontent_rating
AfghanistanPG-137.40
ArgentinaR7.60
Unrated7.20
ArubaR4.80
AustraliaG6.30
PG6.42
PG-136.55
R6.43
Unrated6.30
BahamasR4.40
\n", + "
" + ], + "text/plain": [ + " imdb_score\n", + "country content_rating \n", + "Afghanistan PG-13 7.40\n", + "Argentina R 7.60\n", + " Unrated 7.20\n", + "Aruba R 4.80\n", + "Australia G 6.30\n", + " PG 6.42\n", + " PG-13 6.55\n", + " R 6.43\n", + " Unrated 6.30\n", + "Bahamas R 4.40" + ] + }, + "execution_count": 64, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.groupby(['country', 'content_rating']).mean().imdb_score.round(2).to_frame().head(10)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Serveral aggregation functions may be used. \n", + "Below we see directors and their `average`, `max` and `min` imdb_scores." + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
meanminmax
director_name
A. Raven Cruz1.91.91.9
Aaron Hann6.06.06.0
\n", + "
" + ], + "text/plain": [ + " mean min max\n", + "director_name \n", + "A. Raven Cruz 1.9 1.9 1.9\n", + "Aaron Hann 6.0 6.0 6.0" + ] + }, + "execution_count": 65, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "(df\n", + " .groupby(df.director_name)\n", + " .agg(['mean', 'min', 'max'])\n", + " .imdb_score\n", + " .head(2)\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 66, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
meanspreadnunique
director_name
Adam McKay6.9166671.56
Adam Rifkin6.5000000.62
\n", + "
" + ], + "text/plain": [ + " mean spread nunique\n", + "director_name \n", + "Adam McKay 6.916667 1.5 6\n", + "Adam Rifkin 6.500000 0.6 2" + ] + }, + "execution_count": 66, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def spread(series):\n", + " \"\"\"Custom aggregation function.\"\"\"\n", + " return series.max() - series.min()\n", + "\n", + "\n", + "(df\n", + " .groupby(df.director_name)\n", + " .agg(['mean', spread, 'nunique'])\n", + " .imdb_score\n", + " .loc[lambda df:df['nunique'] > 1, :]\n", + " .head(2)\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
imdb_scoregross
meanstdmeanmax
countrycontent_rating
AfghanistanPG-137.400000NaN1.127331e+061127331.0
ArgentinaR7.6000000.7937257.230936e+0620167424.0
Unrated7.200000NaNNaNNaN
ArubaR4.800000NaN1.007614e+0710076136.0
AustraliaG6.3000000.7071074.245900e+0766600000.0
PG6.4181821.1443935.703676e+07257756197.0
PG-136.5454550.7216146.369501e+07174635000.0
R6.4307690.8422682.382520e+07153629485.0
Unrated6.300000NaN2.651070e+05265107.0
BahamasR4.400000NaNNaNNaN
BelgiumR5.3666671.5011111.357042e+061357042.0
BrazilR7.7666670.7737363.385652e+067563397.0
Unrated6.100000NaN2.026200e+0420262.0
BulgariaR6.100000NaNNaNNaN
CameroonNot Rated7.500000NaN3.263100e+0432631.0
\n", + "
" + ], + "text/plain": [ + " imdb_score gross \n", + " mean std mean max\n", + "country content_rating \n", + "Afghanistan PG-13 7.400000 NaN 1.127331e+06 1127331.0\n", + "Argentina R 7.600000 0.793725 7.230936e+06 20167424.0\n", + " Unrated 7.200000 NaN NaN NaN\n", + "Aruba R 4.800000 NaN 1.007614e+07 10076136.0\n", + "Australia G 6.300000 0.707107 4.245900e+07 66600000.0\n", + " PG 6.418182 1.144393 5.703676e+07 257756197.0\n", + " PG-13 6.545455 0.721614 6.369501e+07 174635000.0\n", + " R 6.430769 0.842268 2.382520e+07 153629485.0\n", + " Unrated 6.300000 NaN 2.651070e+05 265107.0\n", + "Bahamas R 4.400000 NaN NaN NaN\n", + "Belgium R 5.366667 1.501111 1.357042e+06 1357042.0\n", + "Brazil R 7.766667 0.773736 3.385652e+06 7563397.0\n", + " Unrated 6.100000 NaN 2.026200e+04 20262.0\n", + "Bulgaria R 6.100000 NaN NaN NaN\n", + "Cameroon Not Rated 7.500000 NaN 3.263100e+04 32631.0" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "funcs = {'imdb_score': [pd.Series.mean, pd.Series.std], 'gross': [pd.Series.mean, pd.Series.max]}\n", + "\n", + "df.groupby(['country', 'content_rating']).agg(funcs).head(15)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "https://github.com/highcharts/highcharts/blob/master/samples/data/world-population-history.csv" + ] + }, + { + "cell_type": "code", + "execution_count": 77, + "metadata": {}, + "outputs": [], + "source": [ + "df_world = pd.read_csv(f'data/world_population_history.csv')" + ] + }, + { + "cell_type": "code", + "execution_count": 81, + "metadata": {}, + "outputs": [], + "source": [ + "df_world = (df_world\n", + " .drop(columns=['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code'])\n", + " .dropna(axis=1, how='all'))" + ] + }, + { + "cell_type": "code", + "execution_count": 83, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Data SourceWorld Development Indicators19601961196219631964196519661967...2006200720082009201020112012201320142015
0ArubaABW56225.056695.057032.057360.057715.058055.058386.058726.0...101353.0101453.0101669.0102053.0102577.0103187.0103795.0104341.0104822.0105000.0
1AfghanistanAFG9345868.09533954.09731361.09938414.010152331.010372630.010604346.010854428.0...27294031.028004331.028803167.029708599.030696958.031731688.032758020.033736494.034656032.035530000.0
2AngolaAGO5866061.05980417.06093321.06203299.06309770.06414995.06523791.06642632.0...21759420.022549547.023369131.024218565.025096150.025998340.026920466.027859305.028813463.029784000.0
3AlbaniaALB1711319.01762621.01814135.01864791.01914573.01965598.02022272.02081695.0...2947314.02927519.02913021.02905195.02900401.02895092.02889104.02880703.02876101.02879000.0
4AndorraAND15370.016412.017469.018549.019647.020758.021890.023058.0...83861.084462.084449.083751.082431.080788.079223.078014.077281.077000.0
\n", + "

5 rows × 58 columns

\n", + "
" + ], + "text/plain": [ + " Data Source World Development Indicators 1960 1961 1962 \\\n", + "0 Aruba ABW 56225.0 56695.0 57032.0 \n", + "1 Afghanistan AFG 9345868.0 9533954.0 9731361.0 \n", + "2 Angola AGO 5866061.0 5980417.0 6093321.0 \n", + "3 Albania ALB 1711319.0 1762621.0 1814135.0 \n", + "4 Andorra AND 15370.0 16412.0 17469.0 \n", + "\n", + " 1963 1964 1965 1966 1967 ... \\\n", + "0 57360.0 57715.0 58055.0 58386.0 58726.0 ... \n", + "1 9938414.0 10152331.0 10372630.0 10604346.0 10854428.0 ... \n", + "2 6203299.0 6309770.0 6414995.0 6523791.0 6642632.0 ... \n", + "3 1864791.0 1914573.0 1965598.0 2022272.0 2081695.0 ... \n", + "4 18549.0 19647.0 20758.0 21890.0 23058.0 ... \n", + "\n", + " 2006 2007 2008 2009 2010 2011 \\\n", + "0 101353.0 101453.0 101669.0 102053.0 102577.0 103187.0 \n", + "1 27294031.0 28004331.0 28803167.0 29708599.0 30696958.0 31731688.0 \n", + "2 21759420.0 22549547.0 23369131.0 24218565.0 25096150.0 25998340.0 \n", + "3 2947314.0 2927519.0 2913021.0 2905195.0 2900401.0 2895092.0 \n", + "4 83861.0 84462.0 84449.0 83751.0 82431.0 80788.0 \n", + "\n", + " 2012 2013 2014 2015 \n", + "0 103795.0 104341.0 104822.0 105000.0 \n", + "1 32758020.0 33736494.0 34656032.0 35530000.0 \n", + "2 26920466.0 27859305.0 28813463.0 29784000.0 \n", + "3 2889104.0 2880703.0 2876101.0 2879000.0 \n", + "4 79223.0 78014.0 77281.0 77000.0 \n", + "\n", + "[5 rows x 58 columns]" + ] + }, + "execution_count": 83, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_world.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (6.3) Unstacking and stacking" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's load a data set from [https://github.com/highcharts/highcharts](https://github.com/highcharts/highcharts/blob/master/samples/data/world-population-history.csv), which is not tidy.\n", + "\n", + "> (1) Each variable you measure should be in one column. \n", + " (2) Each different observation of that variable should be in a different row. \n", + " (3) There should be one table for each \"kind\" of variable. \n", + " (4) If you have multiple tables, they should include a column in the table that allows them to be linked.\n", + " \n", + "\n", + "\n", + "\n", + "Read [\"Tidy Data\" by H Wickham](https://www.jstatsoft.org/article/view/v059i10/v59i10.pdf) for more information." + ] + }, + { + "cell_type": "code", + "execution_count": 264, + "metadata": {}, + "outputs": [], + "source": [ + "df_world = pd.read_csv(f'data/world_population_history.csv')\n", + "\n", + "drop_cols = ['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code', 'World Development Indicators']\n", + "\n", + "df_world = (df_world\n", + " .drop(columns=drop_cols)\n", + " .dropna(axis=1, how='all'))" + ] + }, + { + "cell_type": "code", + "execution_count": 265, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Data Source1960196119621963
0Aruba56225.056695.057032.057360.0
1Afghanistan9345868.09533954.09731361.09938414.0
2Angola5866061.05980417.06093321.06203299.0
3Albania1711319.01762621.01814135.01864791.0
4Andorra15370.016412.017469.018549.0
\n", + "
" + ], + "text/plain": [ + " Data Source 1960 1961 1962 1963\n", + "0 Aruba 56225.0 56695.0 57032.0 57360.0\n", + "1 Afghanistan 9345868.0 9533954.0 9731361.0 9938414.0\n", + "2 Angola 5866061.0 5980417.0 6093321.0 6203299.0\n", + "3 Albania 1711319.0 1762621.0 1814135.0 1864791.0\n", + "4 Andorra 15370.0 16412.0 17469.0 18549.0" + ] + }, + "execution_count": 265, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_world.iloc[:5,:5]" + ] + }, + { + "cell_type": "code", + "execution_count": 266, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CountryYearPopulation
0Aruba1960-01-0156225.0
1Aruba1961-01-0156695.0
2Aruba1962-01-0157032.0
3Aruba1963-01-0157360.0
4Aruba1964-01-0157715.0
\n", + "
" + ], + "text/plain": [ + " Country Year Population\n", + "0 Aruba 1960-01-01 56225.0\n", + "1 Aruba 1961-01-01 56695.0\n", + "2 Aruba 1962-01-01 57032.0\n", + "3 Aruba 1963-01-01 57360.0\n", + "4 Aruba 1964-01-01 57715.0" + ] + }, + "execution_count": 266, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_world_tidy = (df_world\n", + " .set_index(['Data Source'])\n", + " .stack(0)\n", + " .rename('Population')\n", + " .to_frame()\n", + " .reset_index()\n", + " .rename(columns={'level_1':'Year', 'Data Source':'Country'})\n", + " .assign(Year=lambda df:pd.to_datetime(df.Year)))\n", + "\n", + "df_world_tidy.iloc[:5,:5]" + ] + }, + { + "cell_type": "code", + "execution_count": 267, + "metadata": {}, + "outputs": [], + "source": [ + "to_plot = (df_world_tidy\n", + " .set_index(['Country', 'Year'])\n", + " .unstack(level=0)\n", + " .loc[:, (slice(None), ['Norway', 'Sweden'])])" + ] + }, + { + "cell_type": "code", + "execution_count": 268, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "to_plot.columns = to_plot.columns.droplevel(0)\n", + "\n", + "(to_plot / 10**6).plot(grid=True);\n", + "plt.ylabel('Population (millions)'); \n", + "plt.ylim([0, 11]); plt.show()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (6.4) Pivoting and melting\n", + "\n", + "We'll show how pivoting and meling can help us create data for plotting." + ] + }, + { + "cell_type": "code", + "execution_count": 269, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CountryYearPopulation
0Aruba1960-01-0156225.0
1Aruba1961-01-0156695.0
2Aruba1962-01-0157032.0
\n", + "
" + ], + "text/plain": [ + " Country Year Population\n", + "0 Aruba 1960-01-01 56225.0\n", + "1 Aruba 1961-01-01 56695.0\n", + "2 Aruba 1962-01-01 57032.0" + ] + }, + "execution_count": 269, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# The tidy data set\n", + "df_world_tidy.head(3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `.pivot()` method is more powerful than unstack. Both move rows up to the columns." + ] + }, + { + "cell_type": "code", + "execution_count": 270, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CountryAfghanistanAlbaniaAlgeriaAmerican SamoaAndorra
Year
1960-01-019345868.01711319.011690153.021117.015370.0
1961-01-019533954.01762621.011985136.021882.016412.0
1962-01-019731361.01814135.012295970.022698.017469.0
1963-01-019938414.01864791.012626952.023520.018549.0
1964-01-0110152331.01914573.012980267.024321.019647.0
\n", + "
" + ], + "text/plain": [ + "Country Afghanistan Albania Algeria American Samoa Andorra\n", + "Year \n", + "1960-01-01 9345868.0 1711319.0 11690153.0 21117.0 15370.0\n", + "1961-01-01 9533954.0 1762621.0 11985136.0 21882.0 16412.0\n", + "1962-01-01 9731361.0 1814135.0 12295970.0 22698.0 17469.0\n", + "1963-01-01 9938414.0 1864791.0 12626952.0 23520.0 18549.0\n", + "1964-01-01 10152331.0 1914573.0 12980267.0 24321.0 19647.0" + ] + }, + "execution_count": 270, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Demonstrating the pivot method\n", + "df_world_pivot = df_world_tidy.pivot(index='Year', columns='Country', values='Population')\n", + "df_world_pivot.iloc[:5, :5]" + ] + }, + { + "cell_type": "code", + "execution_count": 271, + "metadata": {}, + "outputs": [], + "source": [ + "# Drop every country where there are any missing values\n", + "df_world_pivot = df_world_pivot.dropna(how='any', axis=1)" + ] + }, + { + "cell_type": "code", + "execution_count": 272, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
CountryAfghanistanAlbaniaAlgeriaAmerican SamoaAndorra
Year
1960-01-019345868.01711319.011690153.021117.015370.0
1961-01-019533954.01762621.011985136.021882.016412.0
1962-01-019731361.01814135.012295970.022698.017469.0
1963-01-019938414.01864791.012626952.023520.018549.0
1964-01-0110152331.01914573.012980267.024321.019647.0
\n", + "
" + ], + "text/plain": [ + "Country Afghanistan Albania Algeria American Samoa Andorra\n", + "Year \n", + "1960-01-01 9345868.0 1711319.0 11690153.0 21117.0 15370.0\n", + "1961-01-01 9533954.0 1762621.0 11985136.0 21882.0 16412.0\n", + "1962-01-01 9731361.0 1814135.0 12295970.0 22698.0 17469.0\n", + "1963-01-01 9938414.0 1864791.0 12626952.0 23520.0 18549.0\n", + "1964-01-01 10152331.0 1914573.0 12980267.0 24321.0 19647.0" + ] + }, + "execution_count": 272, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Compute the relative change since 1960\n", + "df_world_rel = (df_world_pivot / df_world_pivot.iloc[0, :])\n", + "df_world_pivot.iloc[:5, :5]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `.melt()` method is more powerful than stack. Both move columns up to the index.\n", + "\n", + "> `unstack` and `stack` are inverses. \n", + " `pivot` and `melt` are inverses." + ] + }, + { + "cell_type": "code", + "execution_count": 273, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
YearCountryPopulation
01960-01-01Afghanistan1.000000
11961-01-01Afghanistan1.020125
21962-01-01Afghanistan1.041247
31963-01-01Afghanistan1.063402
41964-01-01Afghanistan1.086291
\n", + "
" + ], + "text/plain": [ + " Year Country Population\n", + "0 1960-01-01 Afghanistan 1.000000\n", + "1 1961-01-01 Afghanistan 1.020125\n", + "2 1962-01-01 Afghanistan 1.041247\n", + "3 1963-01-01 Afghanistan 1.063402\n", + "4 1964-01-01 Afghanistan 1.086291" + ] + }, + "execution_count": 273, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_world_rel = df_world_rel.reset_index().melt(id_vars='Year', value_name='Population')\n", + "df_world_rel.iloc[:5, :5]" + ] + }, + { + "cell_type": "code", + "execution_count": 281, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "top_n = 5\n", + "countries = (df_world_rel\n", + " .groupby('Country')\n", + " .max()\n", + " .Population\n", + " .nlargest(top_n)\n", + " .index)\n", + "\n", + "df_world_rel.pivot(index='Year', columns='Country', values='Population').loc[:, countries].plot()\n", + "plt.legend(bbox_to_anchor=(1,1));" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (6.4) Merging" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "![](https://www.dofactory.com/Images/sql-joins.png)\n", + "\n", + "*Image source is https://www.dofactory.com/sql/join*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Add a column showing every directors average imdb score." + ] + }, + { + "cell_type": "code", + "execution_count": 284, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_imdb_score
director_name
A. Raven Cruz1.9
\n", + "
" + ], + "text/plain": [ + " director_imdb_score\n", + "director_name \n", + "A. Raven Cruz 1.9" + ] + }, + "execution_count": 284, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "director_means = (df.groupby(df.director_name).mean().round(1)\n", + " .loc[:, ['imdb_score']]\n", + " .rename(columns={'imdb_score':'director_imdb_score'}))\n", + "director_means.head(1)" + ] + }, + { + "cell_type": "code", + "execution_count": 285, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
director_namegrossmovie_titlecountrycontent_ratingimdb_scoredirector_imdb_score
0James Cameron760505847.0AvatarUSAPG-137.97.9
1Gore Verbinski309404152.0Pirates of the Caribbean: At World's EndUSAPG-137.17.0
2Sam Mendes200074175.0SpectreUKPG-136.87.5
3Christopher Nolan448130642.0The Dark Knight RisesUSAPG-138.58.4
4Doug WalkerNaNStar Wars: Episode VII - The Force Awakens  ...NaNNaN7.17.1
\n", + "
" + ], + "text/plain": [ + " director_name gross \\\n", + "0 James Cameron 760505847.0 \n", + "1 Gore Verbinski 309404152.0 \n", + "2 Sam Mendes 200074175.0 \n", + "3 Christopher Nolan 448130642.0 \n", + "4 Doug Walker NaN \n", + "\n", + " movie_title country content_rating \\\n", + "0 Avatar  USA PG-13 \n", + "1 Pirates of the Caribbean: At World's End  USA PG-13 \n", + "2 Spectre  UK PG-13 \n", + "3 The Dark Knight Rises  USA PG-13 \n", + "4 Star Wars: Episode VII - The Force Awakens  ... NaN NaN \n", + "\n", + " imdb_score director_imdb_score \n", + "0 7.9 7.9 \n", + "1 7.1 7.0 \n", + "2 6.8 7.5 \n", + "3 8.5 8.4 \n", + "4 7.1 7.1 " + ] + }, + "execution_count": 285, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.merge(director_means, how='left', left_on='director_name', right_index=True).head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The inner join, or `merge(how='inner')` can be used if you want the intersection." + ] + }, + { + "cell_type": "code", + "execution_count": 286, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
imdb_score_USAimdb_score_Canada
director_name
Adam Shankman6.25.5
Andrzej Bartkowiak5.83.7
Atom Egoyan6.37.0
Bennett Miller7.67.4
Bob Clark5.36.2
\n", + "
" + ], + "text/plain": [ + " imdb_score_USA imdb_score_Canada\n", + "director_name \n", + "Adam Shankman 6.2 5.5\n", + "Andrzej Bartkowiak 5.8 3.7\n", + "Atom Egoyan 6.3 7.0\n", + "Bennett Miller 7.6 7.4\n", + "Bob Clark 5.3 6.2" + ] + }, + "execution_count": 286, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# For every director which has made movies in the USA and Cananda\n", + "# get average imdb scores for both locations\n", + "\n", + "american_directors = (df[df.country == 'USA'].groupby('director_name')\n", + " .mean().imdb_score.to_frame())\n", + "canadian_directors = (df[df.country == 'Canada'].groupby('director_name')\n", + " .mean().imdb_score.to_frame())\n", + "\n", + "(american_directors.merge(canadian_directors, how='inner', \n", + " left_index=True, right_index=True, \n", + " suffixes=('_USA', '_Canada'))\n", + " .round(1)\n", + " .head())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# (7) Plotting" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (7.1) Built-int `plot()` methods\n", + "\n", + "Excellent for creating plots quickly. The downside is less control." + ] + }, + { + "cell_type": "code", + "execution_count": 371, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "# Plot countries by number of occurences\n", + "df.country.value_counts().head(n=10).sort_values(ascending=True).to_frame().plot.barh();" + ] + }, + { + "cell_type": "code", + "execution_count": 372, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "(df.country\n", + " .value_counts()\n", + " .apply(np.log10)\n", + " .head(n=10)\n", + " .sort_values(ascending=True)\n", + " .to_frame()\n", + " .plot\n", + " .barh(grid=True, color='green', fontsize=15, title='Number of movies (log) per country'));" + ] + }, + { + "cell_type": "code", + "execution_count": 373, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 373, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "countries = set(['Norway', 'Sweden', 'Denmark'])\n", + "df_world_pivot.loc[:, countries].plot(linewidth=3, title='Population growth', fontsize=14, figsize=(8, 4))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (7.2) Using matplotlib\n", + "\n", + "Greater control. Allows creating of publication-quality plots. \n", + "Does require more code and knowledge of matplotlib." + ] + }, + { + "cell_type": "code", + "execution_count": 414, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "fig, ax = plt.subplots(figsize=(8, 4))\n", + "\n", + "ax = (df_world_pivot.loc[:, countries] / 10**6).plot(ax=ax, lw=3, zorder=50)\n", + "\n", + "ax.set_title('Population growth', fontsize=17)\n", + "ax.legend(fontsize=14, loc='upper left')\n", + "ax.tick_params(axis='both', which='both', labelsize=14)\n", + "ax.set_ylabel('Population', fontsize=14)\n", + "ax.set_xlabel(ax.get_xlabel(), fontsize=14)\n", + "ax.grid(True, zorder=-50, ls='--')\n", + "ax.set_ylim([3, 11]);\n", + "\n", + "#plt.savefig('my_figure.pdf')" + ] + }, + { + "cell_type": "code", + "execution_count": 415, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "fig, ax = plt.subplots(figsize=(8, 4))\n", + "\n", + "# Compute kernel density estimates\n", + "from KDEpy import FFTKDE\n", + "data = df[['imdb_score', 'gross']].dropna(how='any')\n", + "\n", + "kde = FFTKDE(bw='ISJ', kernel='gaussian')\n", + "\n", + "x, y = kde.fit(data.imdb_score.values).evaluate()\n", + "ax.plot(x, y, label='imdb_score', lw=2)\n", + "\n", + "y = kde.fit(data.imdb_score.values, weights=data.gross.values).evaluate(x)\n", + "ax.plot(x, y, label='imdb_score weighted by gross', lw=2)\n", + "\n", + "ax.set_title('Score distribution', fontsize=17)\n", + "ax.legend(fontsize=14, loc='upper left')\n", + "ax.tick_params(axis='x', which='both', labelsize=14)\n", + "ax.set_yticklabels([])\n", + "ax.set_xlabel('Score', fontsize=14)\n", + "ax.grid(True, zorder=-50, ls='--')\n", + "ax.set_xlim([0, 10]);\n", + "\n", + "#plt.savefig('my_figure.pdf')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# (8) Time series manipulations" + ] + }, + { + "cell_type": "code", + "execution_count": 419, + "metadata": {}, + "outputs": [], + "source": [ + "df_stocks = pd.read_csv(r'https://raw.githubusercontent.com/vega/datalib/master/test/data/stocks.csv')" + ] + }, + { + "cell_type": "code", + "execution_count": 420, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
symboldateprice
0MSFTJan 1 200039.81
1MSFTFeb 1 200036.35
2MSFTMar 1 200043.22
3MSFTApr 1 200028.37
4MSFTMay 1 200025.45
\n", + "
" + ], + "text/plain": [ + " symbol date price\n", + "0 MSFT Jan 1 2000 39.81\n", + "1 MSFT Feb 1 2000 36.35\n", + "2 MSFT Mar 1 2000 43.22\n", + "3 MSFT Apr 1 2000 28.37\n", + "4 MSFT May 1 2000 25.45" + ] + }, + "execution_count": 420, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_stocks.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 421, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
symboldateprice
0MSFT2000-01-0139.81
1MSFT2000-02-0136.35
2MSFT2000-03-0143.22
3MSFT2000-04-0128.37
4MSFT2000-05-0125.45
\n", + "
" + ], + "text/plain": [ + " symbol date price\n", + "0 MSFT 2000-01-01 39.81\n", + "1 MSFT 2000-02-01 36.35\n", + "2 MSFT 2000-03-01 43.22\n", + "3 MSFT 2000-04-01 28.37\n", + "4 MSFT 2000-05-01 25.45" + ] + }, + "execution_count": 421, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_stocks = df_stocks.assign(date=lambda df:pd.to_datetime(df.date))\n", + "df_stocks.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 427, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
symboldatepriceyear
0MSFT2000-01-0139.812000
1MSFT2000-02-0136.352000
2MSFT2000-03-0143.222000
3MSFT2000-04-0128.372000
4MSFT2000-05-0125.452000
\n", + "
" + ], + "text/plain": [ + " symbol date price year\n", + "0 MSFT 2000-01-01 39.81 2000\n", + "1 MSFT 2000-02-01 36.35 2000\n", + "2 MSFT 2000-03-01 43.22 2000\n", + "3 MSFT 2000-04-01 28.37 2000\n", + "4 MSFT 2000-05-01 25.45 2000" + ] + }, + "execution_count": 427, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_stocks.assign(year=lambda df:df.date.dt.year).head()" + ] + }, + { + "cell_type": "code", + "execution_count": 432, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
price
meanstd
symboldate
AAPL200021.7483339.622822
200110.1758331.380932
20029.4083332.155993
20039.3475001.709099
200418.7233337.723485
200548.17166712.035228
200672.04333310.322289
2007133.35333340.334396
2008138.48083333.998239
2009150.39333342.152117
2010206.56666715.571530
AMZN200043.93083317.159926
200111.7391673.664516
200216.7233332.799420
200339.01666712.326313
\n", + "
" + ], + "text/plain": [ + " price \n", + " mean std\n", + "symbol date \n", + "AAPL 2000 21.748333 9.622822\n", + " 2001 10.175833 1.380932\n", + " 2002 9.408333 2.155993\n", + " 2003 9.347500 1.709099\n", + " 2004 18.723333 7.723485\n", + " 2005 48.171667 12.035228\n", + " 2006 72.043333 10.322289\n", + " 2007 133.353333 40.334396\n", + " 2008 138.480833 33.998239\n", + " 2009 150.393333 42.152117\n", + " 2010 206.566667 15.571530\n", + "AMZN 2000 43.930833 17.159926\n", + " 2001 11.739167 3.664516\n", + " 2002 16.723333 2.799420\n", + " 2003 39.016667 12.326313" + ] + }, + "execution_count": 432, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_stocks.groupby([df_stocks.symbol, df_stocks.date.dt.year]).agg([pd.Series.mean, pd.Series.std]).head(15)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# https://www.kaggle.com/zynicide/wine-reviews" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# (9) Machine Learning and modeling" + ] + }, + { + "cell_type": "code", + "execution_count": 632, + "metadata": {}, + "outputs": [], + "source": [ + "pd.set_option('display.max_rows', 2**6)\n", + "pd.set_option('display.max_columns', 2**6)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (9.1) Dummy variables for categorical data" + ] + }, + { + "cell_type": "code", + "execution_count": 633, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Loaded data of size (5043, 6) into memory.\n" + ] + } + ], + "source": [ + "cols_to_use = ['movie_title', 'duration', 'genres', 'content_rating', 'budget', 'gross']\n", + "df_model = pd.read_csv(r'data/movie_metadata.csv', sep=',', usecols=cols_to_use)\n", + "print(f'Loaded data of size {df_model.shape} into memory.')" + ] + }, + { + "cell_type": "code", + "execution_count": 634, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(3840, 6)" + ] + }, + "execution_count": 634, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Drop any row with missing information\n", + "df_model = df_model.dropna(how='any')\n", + "df_model.shape" + ] + }, + { + "cell_type": "code", + "execution_count": 635, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
ApprovedGGPMNC-17Not RatedPGPG-13PassedRUnratedX
0000000010000
1000000010000
\n", + "
" + ], + "text/plain": [ + " Approved G GP M NC-17 Not Rated PG PG-13 Passed R Unrated X\n", + "0 0 0 0 0 0 0 0 1 0 0 0 0\n", + "1 0 0 0 0 0 0 0 1 0 0 0 0" + ] + }, + "execution_count": 635, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dummies = pd.get_dummies(df_model.content_rating)\n", + "dummies.head(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 636, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
durationgrossgenresmovie_titlebudgetApprovedGGPMNC-17Not RatedPGPG-13PassedRUnratedX
0178.0760505847.0Action|Adventure|Fantasy|Sci-FiAvatar237000000.0000000010000
1169.0309404152.0Action|Adventure|FantasyPirates of the Caribbean: At World's End300000000.0000000010000
\n", + "
" + ], + "text/plain": [ + " duration gross genres \\\n", + "0 178.0 760505847.0 Action|Adventure|Fantasy|Sci-Fi \n", + "1 169.0 309404152.0 Action|Adventure|Fantasy \n", + "\n", + " movie_title budget Approved G GP M \\\n", + "0 Avatar  237000000.0 0 0 0 0 \n", + "1 Pirates of the Caribbean: At World's End  300000000.0 0 0 0 0 \n", + "\n", + " NC-17 Not Rated PG PG-13 Passed R Unrated X \n", + "0 0 0 0 1 0 0 0 0 \n", + "1 0 0 0 1 0 0 0 0 " + ] + }, + "execution_count": 636, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_model = df_model.merge(dummies, how='left', left_index=True, right_index=True)\n", + "df_model = df_model.drop(columns='content_rating', errors='ignore')\n", + "df_model.head(2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "-----------------\n", + "\n", + "A more advanced example follows." + ] + }, + { + "cell_type": "code", + "execution_count": 637, + "metadata": {}, + "outputs": [], + "source": [ + "from functools import reduce\n", + "\n", + "# Split the genres, take the union over every set to get every genre in the data set\n", + "genres_sets = df_model.genres.str.split('|').apply(set)\n", + "genres = reduce(set.union, genres_sets)" + ] + }, + { + "cell_type": "code", + "execution_count": 638, + "metadata": {}, + "outputs": [], + "source": [ + "# For every genre, add a dummy column\n", + "for genre in genres:\n", + " df_model[genre] = np.where(df_model.genres.str.contains(genre), 1, 0)\n", + " \n", + "df_model = df_model.drop(columns='genres', errors='ignore')" + ] + }, + { + "cell_type": "code", + "execution_count": 639, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
durationgrossmovie_titlebudgetApprovedGGPMNC-17Not RatedPGPG-13PassedRUnratedXWesternFamilyMusicSportRomanceAdventureFilm-NoirBiographyMysteryActionComedyHorrorFantasyDramaHistoryCrimeThrillerMusicalWarDocumentarySci-FiAnimation
0178.0760505847.0Avatar237000000.00000000100000000010001001000000010
1169.0309404152.0Pirates of the Caribbean: At World's End300000000.00000000100000000010001001000000000
\n", + "
" + ], + "text/plain": [ + " duration gross movie_title \\\n", + "0 178.0 760505847.0 Avatar  \n", + "1 169.0 309404152.0 Pirates of the Caribbean: At World's End  \n", + "\n", + " budget Approved G GP M NC-17 Not Rated PG PG-13 Passed R \\\n", + "0 237000000.0 0 0 0 0 0 0 0 1 0 0 \n", + "1 300000000.0 0 0 0 0 0 0 0 1 0 0 \n", + "\n", + " Unrated X Western Family Music Sport Romance Adventure Film-Noir \\\n", + "0 0 0 0 0 0 0 0 1 0 \n", + "1 0 0 0 0 0 0 0 1 0 \n", + "\n", + " Biography Mystery Action Comedy Horror Fantasy Drama History Crime \\\n", + "0 0 0 1 0 0 1 0 0 0 \n", + "1 0 0 1 0 0 1 0 0 0 \n", + "\n", + " Thriller Musical War Documentary Sci-Fi Animation \n", + "0 0 0 0 0 1 0 \n", + "1 0 0 0 0 0 0 " + ] + }, + "execution_count": 639, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_model.head(2)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## (9.2) Training a model" + ] + }, + { + "cell_type": "code", + "execution_count": 640, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.linear_model import LinearRegression\n", + "from sklearn.model_selection import train_test_split" + ] + }, + { + "cell_type": "code", + "execution_count": 646, + "metadata": {}, + "outputs": [], + "source": [ + "df_model['gross_log'] = df_model['gross'].apply(np.log1p)\n", + "df_model['budget_log'] = df_model['budget'].apply(np.log1p)" + ] + }, + { + "cell_type": "code", + "execution_count": 647, + "metadata": {}, + "outputs": [], + "source": [ + "linreg = LinearRegression()\n", + "X = df_model.drop(columns=['movie_title', 'gross', 'budget', 'gross_log']).values\n", + "y = df_model.gross_log.values\n", + "\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123)" + ] + }, + { + "cell_type": "code", + "execution_count": 648, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,\n", + " normalize=False)" + ] + }, + "execution_count": 648, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "linreg.fit(X_train, y_train)" + ] + }, + { + "cell_type": "code", + "execution_count": 649, + "metadata": {}, + "outputs": [], + "source": [ + "y_pred = linreg.predict(X_test)" + ] + }, + { + "cell_type": "code", + "execution_count": 659, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1.6093703080115105" + ] + }, + "execution_count": 659, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.Series(y_test.mean() - y_test).abs().mean()" + ] + }, + { + "cell_type": "code", + "execution_count": 660, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1.1842132628414064" + ] + }, + "execution_count": 660, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pd.Series(y_pred - y_test).abs().mean()" + ] + }, + { + "cell_type": "code", + "execution_count": 658, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 658, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "pd.Series(y_test.mean() - y_test).plot()" + ] + }, + { + "cell_type": "code", + "execution_count": 655, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 655, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvIxREBQAAIABJREFUeJztnXm4HEXV/781c7fc7DshJLkQkBDWhMsSwr7JpuirKIiCKz8WX0VRDOCuKKCyKYLIor7IJqAoSwSSEAKEbISQkISE7Htu9pDlLjP1+6O7eqqrq6qre3ru9Az1eZ773Ht7uqvPVFefOnXq1ClCKYXFYrFYqpdMuQWwWCwWS2mxit5isViqHKvoLRaLpcqxit5isViqHKvoLRaLpcqxit5isViqHKvoLRaLpcqxit5isViqHKvoLRaLpcqpKcdN+/XrR5uamspxa4vFYqlYZs2atYlS2j/qdWVR9E1NTZg5c2Y5bm2xWCwVCyFkRZzrrOvGYrFYqhyr6C0Wi6XKsYreYrFYqhyr6C0Wi6XKsYreYrFYqhyr6C0Wi6XKsYreYrFYqpyqUPT5PMWTM1ehI5cvtygWi8WSOqpC0T8+YxWuf+pdPPzG8nKLYrFYLKmjKhT91t1tAIAt7m+LxWKxFKgKRW+xWCwWNVbRWywWS5VTVYqe0nJLYLFYLOmjqhS9xWKxWIIYK3pCyEOEkI2EkHncsT6EkJcJIYvd371LI6apjOW8u8VisaSTKBb9XwCcIxwbB2ACpfQgABPc/8uGdd1YLBZLEGNFTyl9DcAW4fCFAP7q/v1XAJ9KSC6LxWKxJESxPvqBlNJ1AOD+HlC8SPGxrhuLxWIJ0mmTsYSQKwghMwkhM1taWkpyD+u6sVgsliDFKvoNhJBBAOD+3qg6kVJ6P6W0mVLa3L9/5L1tLRaLxRKTYhX9vwFc7v59OYBniyzPYrFYLAkTJbzyMQBTARxMCFlNCPkagFsAnEUIWQzgLPf/Tsf65i0Wi0VNjemJlNJLFB+dkZAssbG+eYvFYlFjV8ZaLBZLlVMVit66biwWi0VNVSh6i8Visaixit5isViqHKvoLRaLpcqxit5isViqHKvoLRaLpcqxit5isViqnKpS9BR25ZTFYrGIVJWit1gsFksQq+gtFoulyqkKRU9gl8ZaLBaLiqpQ9NY3b7FYLGqqQtFbLBaLRU1VKHrrurFYLBY1VaHoLRaLxaLGKnqLxWKpcqyit1gslionEUVPCPkOIeQ9Qsg8QshjhJCGJMq1WCwWS/EUregJIYMBfAtAM6X0MABZABcXW67FYrFYkiEp100NgC6EkBoAjQDWJlSuxWKxWIqkaEVPKV0D4LcAVgJYB2A7pfSlYsu1WCwWSzIk4brpDeBCAPsD2BdAV0LIFyXnXUEImUkImdnS0lLsbS0Wi8ViSBKumzMBLKOUtlBK2wE8A+AE8SRK6f2U0mZKaXP//v0TuK0EmwnBYrFYAiSh6FcCOJ4Q0kgIIQDOALAggXItFovFkgBJ+OinAXgKwNsA5rpl3l9subGwmRAsFoslQE0ShVBKfwLgJ0mUVZwg5RbAYrFY0oddGWuxWCxVTnUpeuu6sVgslgDVpeit68ZisVgCVJeit1gsFksAq+gtFoulyqkKRU+sb77q2NXagbXb9pRbDIulKqgKRU+tb77quOi+qTjhlonlFsNiqQqqQtFbqo/563aUWwSLpWqoCkVvXTcWi8WipioUvcVisVjUWEVv+Uhz3ZNz8ONn55VbjFQxZ9U2tOfy5RbDkiBW0Vs+0jz99mr8beqKcouRGj7YuBMX3vMGfv3CwnKLYkkQq+gtFovHpg/bAADz1m4vsySWJLGK3mKxWKocq+gtFoulyrGK3mKxWKocq+gtFoulyqkqRW8zIVgsFkuQRBQ9IaQXIeQpQshCQsgCQsiYJMq1WKhNZBRKLk9tPVm0JGXR3wVgPKV0BIAjASxIqNxI2EwI1Uc+hfrrmbdXY8uutnKL4TH8xhfwg6ffLbcYlhRTtKInhPQAcDKABwGAUtpGKd1WbLlxSKFOKIpZK7Zg/Lz15RajrKTNUl21ZTe+++QcXP33WeUWxceTM1cnW2C6qt1SJElY9AcAaAHwMCFkNiHkAUJI1wTKLRmLN+zE/LXpz474mXun4spH0qVQkua5d9fi/fU7lZ+nzaJv7XBSA2zc2VpmSUqDHRVXJ0ko+hoAowHcSykdBWAXgHHiSYSQKwghMwkhM1taWhK4bXzOuuM1nHf3lEjXzFi+BU/PSthqsuCbj87Gx+98Tfk5taZlWbD1Xl0koehXA1hNKZ3m/v8UHMXvg1J6P6W0mVLa3L9//wRuW6AzrJCL7puK6/4xpxPuZOFJmecG1e7TIDbnd1VStKKnlK4HsIoQcrB76AwA84stN5IMnXkzS6eSPkVvsVQeNQmV878A/k4IqQOwFMBXEirX8hEnnzpNby1eS+WRiKKnlL4DoDmJsuJgX73qJW1qPo0SWSxhVMXKWPvqVS/ps+gdrHFhqSSqQtFbqpe06fm0yWNJFzOWb0FHCnfnqgpFb62r6iVtC6YYaYlO4evn1vELkU/bwoOPEG+v3IqL7puKO19ZXG5RAlSFomekVSlY4mMfqR6+fu59dQlmrthaPmFKxGPTV+K6J9Mf2rxxx14AwKIN6gWA5aKqFL2l+kibjz5d0gTlSVt9JcENz8zF02/bxYrFUFWKPi3DaUtypEFt7W3P4dbxC7G3PecdS0tLs6NYiwlVpeg7q9EvbfmwU+6TNO25PFZt2V1uMSKRBgv1wdeX4d5Xl+DB15eVW5QAYu2koLo+8qTR3qwqRd8ZjJ+3Dqf/bnJFZpX85XPzcdJtk7DpwwpKyJUCxcUSmbXn8qlTpFHlWdLyoedLtiRL2toGj1X0EWFZL3UZF9PK5EVOMrkde9pjXf/EjJX43J+mJilSKGkKIuFf5LRYbVGTj53xu8k49lcTcOCNL2Dl5soa3VUKJDWOvQJVoejT8tKlHaYSMjEr7AdPz8X0ZVuSE8iANGRR5GsrDfLwxLUiO/IUz89dl6wwCXPPpA8qIp24SNraCFAlir4zh0zpe4TmMH93JXWMabDoZSLIrLa12/bgqkdmYU9bTnKFn4079uLk2yZhxeZdCUhYfVBK8Zv/vo9P/uH1cotSFVSFok+SP776Ac67S52rnnUqlaQsGZ7sKRxaqqikqJJfvbAAL85bj5fmh8/f/HvOWqzcsht/fXNFJ0gmJ42WJ4M99o409PQRSeP7VRWKPkmle9v49zF/nXq4yF6O9D3KcCqxk0qDnve5bjTyZDPOmVEihYpVtqWsn1ff34imcc9j/fbOn7yNUoe7Wjtwy4sL0doRPpIqJSloqkqqQtGXorEvafkQTeOex5tLNkk/ryRlyaAV6LpJg6Ln0bm/2NxH3iDVCVvzUez3K5VVTinwyFsrAQBzVnf+FtBRvtXdExfjvslL8OSMVSWTJwppfL+qQtGXgreWbgYA/GdOuiesolDsZGzSmLhl0hBHz6MTx1P0BjIn9QTEW0VR/DIxWdOgKK/CivLcW9tZ+Gu62kqaqApFX9oG6W88KdM7kUjbZKyJ+zVtil6H67npVJmTvpNM9HI8ggp67BVBVSj6UqCaUGHtrxLTLbCXJy0vkYlFn5Somz9sxcadxfmaKfRKnPnoOzNLbdKT1WmZ/E6JGFWDVfQRqeQGSIXfsctRVMKmD1tx/t1TsHqr2UIcEzmSUjxH//IVHHvzhKLL0bpuIkzGJmUnJG7Rc38T6dHOIdKEdkpGqmnWDYkpekJIlhAymxDyXFJlpgHVwyt3o4pDwaIvTaTHv2avwXtrd+DhN5bHLuf5d9dh2aZd2nPKiUwBrd66G8s27YrluinVs0iivHK28ZQ99kikUTcktTk4AHwbwAIAPRIss2yk8WEVj/P6FKsc8pQiI3FtFeL0TaUJCnLNo297CtO5VwwBE4ZvCzJxTrx1EgDg8jHDAAA5A6FZkXG+3uPTV+KRaSvw3P+elLhGlD2TcnS2lTQ3w0jzuoRELHpCyH4AzgfwQBLlpZlCHH3l9QR5z6JPphwRr24Mq0YlB19+Gl4eXk4z1014mcXM8Yx7Zi7mrXHWeiReP7xFX8Y2HqeNdra0C9btwANTlnbyXeORlOvmTgDXAyjLZoml7PwDZVfgoiMGcxMUvUhHcX3eqxuzyjF5biYx6Z1JwR8c/I6FOPpOjLpRtM/Y5RV3eWKUalK4tSOH65+ak0gGz3PvmoJfPr8gAalKT9GKnhByAYCNlNJZIeddQQiZSQiZ2dLSUuxtS45KVaXlRYhDUha96vokXDdxzik1Ya4bhhd1E8lHH1MoA3nC723mpinHE4hSL1HkGz9vPZ6cubpiFHRSJGHRjwXwSULIcgCPAzidEPKIeBKl9H5KaTOltLl///4J3JYrO9HSQu7FLLpOvGdSFCz6YstRHEc0TW/yMqfKVUtpcgumkoq6Ee9VZLl8x+otnqoQH73JSJLNn2QSfIHFZ5BGt27Rip5SegOldD9KaROAiwFMpJR+sWjJYslSgjIFtViOycGkhrFeeGWR5alewqhJ08zCKw2F4njo9WV4fbE8dYWOHXvbQ0ND85qO3ou6idBIihmxUCq5OsHRWnlXxgaPJfEesHKTXB2etvUpMqoijr4zK7gcq0uT+n5egyyynDBry9RaKlUKhJ8/Nx9ffHBa5OvOvXOKF0GjwiSpmcmCKS/qpoiHUQoXXFp0Vamifwrvb3IvcCVECCUZXglK6asAXk2yTKP7Roz2MCvT/S08w3Kk+k2qGXmumxL5hZklaxx1k9A5SbFm2x7pcf5Z6xbnMCsxio++GPIhrqQ4SP32ZVD/UTqhSP589+RsgiZuGkKAw6gKi56RZKNXWb/l6L2Tct0UGmSRUTcKi5WVauy6MbB802At8YrOZDK2s9II5GnySpgvjT3HpL/Os++swQJNKnDVPcPagomBUQrXTUCu9Lnoq0PRd4ZvnlEO101SFgP7TkksmJKW74VXRpOncH1phusm7G03y2Wun4x1fpssmGKVVMzXy1MaKCBKedJz/Zpefz2l+MbfZuK1RdGi6L79+Ds4V7O5DyBvY2l13ajkahr3PO54eVFi9ymGqlD0pcBkMU9UFm/Yif/MWRtdloSstqR89MohNKK9RCq3mP9Y52j6n/1nvvIzfoSi6+iJF3UTfr8k1IxEzxdfZoQSWzvyeHn+Bnz9rzMTlkKu6JUGRgSZ8yWIumFyyaS4a8Li5G5UBFbRK1D5s4vRO2fd8Rr+97HZMWSJf09ZOVHK27G3HXdPWOyzUlUvnLdgylQe7u+NO/dij8Sq7izHTZgrAXBkSdp1U8yzLY2PXnIs7KISjG6L+V7b97TjlhcXol0yK94prpsUkuhkbLkohdWnnvhJfugXKkvSrpsI6vPm5xbgiZmr8LGB3cLlCXFr7dzbjrteWYzvffxgNNRmfc/t2Jsn4PDBPQPXdNYq0yQihaK4bgp1FP/75SmVuL/k5+5py2Hxxp1G5zqfUS4ySH4i+56leBPi+OgZt7y4EI9NX4kR+3THp0YNlpaRTdCkL1UoaJJUhaKPC6VUqbDVrpsyTMYmZNfGWRm727WyWzsK1pGqEYdNxv5+4gd44PVlGNq3EZeNaQp8q7lrtivLNCXuC2Zi4RHoo646P+om+CxV7fO7T76DF+eFb1ruj6PX1wn7nqXYsSxKeGXhmTiwvWNlG4sX0nQUKyF/f/99CJKbV0uKqnDdxH2vdA+j4HejwnHnd2dOrCfnuok+GSuL91YmNQt5idrczuKPk5Zg2tLNRp1m1I7VaCJUgsmL77huwsvvLFtAtmBKdet3Vpnt+xpF9FxOP4IrBqmVbHpxyEgFSKZzYkUUDKhw92a5qA5FH/c6zcMIs+g7N+omYYs+Qo0V9hAtXKO6nh0PGxWv37EXn7//rZIE0susOBNMXXEsyZrsdPaczNIUFx+66Fj0ouumuLYS5fpSjlxkciTxHiSZAoEVIcpFCIltcJSKqlD0cdE9Cxr4w/23DM8v6VuWyqKPnL3S4Jyo70tsi97wPF3prI54Bbhzbzs+c++bWNryof9+CSiaXD44GVtsW+GvD1u9mzfw0V9035s47Cf/jSyH3O8dchGrVI1ASU7GFqKsgp1tygz66lD0cStVZ9mqLJuy+OjLmKo3IwkZVE2QRq0ak/Ojzk9Esej5Z6x7733ZK3WjQPc3Xz8TF27ErBVbcccr8jC7YpqTfN2B+bPR+cFNxDLx0c9YvhUftnYYlCbKYf7dgieqP0oyjl7XEVrXTYqI8yzYNZ05Mkt89WMUiz5CRsaocurOF/2fpsgs+ttfel9+f+5UUwtPN0fjzetIZBbPL+ww5T95w469+Nx9U7H5w1YjWZIM/2USeTK6QrbsbMUtLy4M1G1HLvqEFa+spy7ZjJ8r1i/IvoZyMtb89pyPPsJFCgpt1P/cCSFW0ZeCuIpQ9yxUiyAKD7XzHmTisdIxfPT8JcpGzDpBhXYWdamRRR/xy3dIdiq5e+IH8rK5v82H8vJ2AfDWcPwH9uDryzB9+RY8NWt16LlRwivj7PrFLrn5hQW4b/ISvP6BPyOoKpOn41IKH1lc8ue38NAby6TnRVkwJcqrIwnXTVtHHpc9NB3tbkcnm/tKmYu+ShR9zErVNZywUK7O7LCTtg5i+ej5yVilj975wHSSTndW3OyOUXz0pq4bHpPii3nJmZVsEuctTRMg/D9hwQYcc/MrvvBY71ypO0dNu1CGqq6H3/gCLr7/LelnMpk7JAubosomg9Xgtt1tuOPlRcjnaWEytgiTfuH6Hb60D8yw8Sbq0bmGoAlVoejjolX0IdcUo3yjNgLx7N1tHXhgytLYi4miyC710SvnL9zfhnKZnBfZR5+LoOi5v3mfrc4aFWO2/Z+r24bYkag29ci52sJE0fPyMMR7/+K5+WjZ2Yotu9pCy5PJwyPOf+j83dOWbZGWIXvkbdIVrDIfvVo2Gez0Hz/7Hu6asBivLtqYiNFUK6S+lM1rWIu+BFDhd9TrZKhcNLq8FqZEbQRi27z1xYX45fML8NL88AUw0vIinCtTSCr5mbWURCP35gYiTkRHs+jNjvNqTNfx5CM0RNWiMqZMawwtevFWxeox/w5TfhnEumX6OYpxLFO0re2GFn2EiWae3W3uAqoc9eq3GItbVPQy48/66EtIwLrJU0x6f6O6gWiUiNo9wX7Hf5BRrxXlb3En6tojWK/+8szPlcXRqzQZUwTGrhvNaaoYZZ7VW3cHJi0jRd0ovkcgXM73GRMwqN2o0DYopcoc97KygUId1hgkTJfF0Ysl6mpD9pnMR+/JJtyLzYdEiWCRKvoE3EqOHML/kuuZm6iYOHdxtOV9J4NRb7moDkWvqNSH31yOrzw8Ay/MlVu+cSbNkti8I7KiF/5v63CO1NXEfXzm95dlZFS9Ix2eRW+o6E1WmSqO5/IUJ946Cc03vxI4bopKqWmVo9bd5x/R3Dd5KW4b/36gfPkBB1aHWQPlKbPoix1N6S7PCcMr3iftXR/y7GXyyVJEx5mMVcFXZUcCo05RDtlkbMr0fHUo+sKI2V+7q7Y4+39u2LFXep12wZQy6sb/eRwi+xqF85lPM66ij3L/jOe6CR+WMkWgjLoRtJvWopfcl9Gey2P4jS9Iy5BF3UQlsNLR+DrnN7v6zSXR961lHZWZj16yYErjdjJB167F+Q/ZyE1mnfMYW/RS2VSlmjXo309cjK3uXEWSc2wF143zPyFVaNETQoYQQiYRQhYQQt4jhHw7CcHiELVu40XdFG8RFCtnm5u0qT6bwZMzVuHCP7we7f4RzpUt1VfJH9Va0p2mSxGgm3CNbdET+XER3ebg7MK85wfW3Vx+L89HnzWx6LmCvGLNXTehuCKosnJ62Su5ygvbwEXmLmVJyHiStOgZ89bswL/ecfaDiBrI0NqRw1/eWIZcngb2BJatn6jGydgOANdRSg8BcDyAawghIxMo15i4z1//QuuPl9N1wyygTIbg+qffxZzVwayP2vIi3F5mWcvkX7NtDzbucPzlpspWWw/ufX/5/AKM+vlLvo90Lp+4Pnr/SxrdAHDKczD5/qqU0fkIFr0sH33Rk7GS61nklVi3BUVfOLZXMrEKOJvuzFm1rTgffYzvNmP5FumIPqoi/tPkpfjpf+bjHzNXBV03eVYm946kTNMXregppesopW+7f+8EsADAYP1VyVLwjVI89+5a40o28beqwtfKORnLskDqvmcuT/H7CYuly8+juJ3YS85fIbt87C0TvQU1puWbnLZm2x5s3d3uO6Z7vKKS1dUR/5G5RR88v/CZ8yFzaegiWNSjIufZGkXd5GWuxeIUjKwTZTHnj05b6Tsuy0evsujPuuM1XHjPGxGiboLnhUVJyT6/6L6peFdiCEWtJ/Yebd/THmxjlOK1RS344b/mxS6/1CTqoyeENAEYBWCa5LMrCCEzCSEzW1qi7TGp48mZqzBxoVPeY9NX4puPzsZjM1aGXOWgHVkrPsx5w/IifHxRzxcuYDvn8D5SUZ4X5q7D715ehFteXFD0/QG/Ugz76uZZDdXn6dSc7iUS3Tp665xTxAbXUKhXfDrludcbuG6o8MfC9TvQNO55LN7gJD8zWblZ9A5TIcqUuc+YJPOFXbhkmVxlu4TxyP36MteNRFy3su59dQlWbN4l+dycqIqYjbA6ZInkKPC3qSu8/6s6Hz0hpBuApwFcSykN7MtGKb2fUtpMKW3u379/UrfF9U+9620Dx8INN2z3D9VU74ypEuDxFH1UQfmyOQNm4869eGDKUiMFwmAWvX97P/85rDPY1SrZnq8ErhseyfoXKbFdbpryA35k7ahNjlFWU81nJh2deMoTM1YBAJZuCiowXRmmKRCiQmnh2as6HZmLirfop0sWTcnkk7tu5J1Qy85W3Dp+IS5/aLryGpMqiBpeWesq+vZcPvB8w+YTmsY9j7vLvHdsIoqeEFILR8n/nVL6TBJlFoP4DFWN3+SFVg2Nk3LdfPPR2fjl8wuweOOHRucDnOuGOy42XDHhEk+UsFLPdcNb9CHX8C/pd554B/e+usQnE0PnT9cZtFqLXoi60VrVqs+01wTdFaJcsnkc8XzxO4grV81SLQSty6JdNxKZVfMFhdzuBPPWbMdL7633WfRffjiojOU+ejOLPk+pZ8DI5gKifPWoFjdb19CRo5LwSlmn5D92+8uLot0wYZKIuiEAHgSwgFJ6e/EiFY/5gh1zK5rRYTAsF/nb1OWYx22TxzeM7a7/OUo6hjZ35MJbzqKil6UuUBaoISPpMPKU4vaXF3kKXISX5Z+z1+DW8Qul50VJV8Cjqyu2CjLs3FyeYkmLvHMNbiRR+Fv73EXXjaaiRUMimKIgvG6KnYw1bRqqTpf30V/w+9dxxf/Nwod7HV+2ao5B1h5l7UCaQpm7Z0eeomnc83hy5iqubPMvH3WylEVBtefzkvBKSfnCsST3qI1DEnvGjgXwJQBzCSHvuMdupJS+kEDZsdC9qDz6STe96yZKO/nxs+8JZQfvo1oSD4S7kQB15yYdAivvFIRIJ2OpNxS96tThgWtM66ZdE/Ouqw9d+UzRMMQOkO0T/Oj0lfgRN3nGT5YGN5Lg/lbfOhB1o+8U2DnO73gWPfD5+6dKZRBuY0zogqc89SZnWZvj626HW/9d62u8kad4feCY4cQrpYWkZDv2OAbS7S8twikf6y+9Ju6oUEZtpmDRi65Jp03pyy+znk8k6uZ1SimhlB5BKT3K/SmbkgfMe2vds84JL6J3PIHJ2BueeRdN4553ynGP6Rul/3+ZW0Zl0ctfGHNZZVkkw643fYmi+En5Z6qr+51udASzoMRbPO76wrcKitU/ByHcW9YxSx6YNgWEmKJZ+HiXEB1lUoWUUuzcK15nXqfStsH97bUzrgJ4d5vsPdu511HAXeuycqtcZtFHUP7s3NosCZwnXqFrXtFdN27nlpe5bvznEkICOZpKsYF6FKpiZayI6UPUKSRVZ5HEZOwrCzZ6f+t8vgUMLPo8xXtrt+NT97yBXa0dihw16mNKpK4b/SWmil7nutH583X3Z4qmsTbrnOue3LdrHQDghmfmAgAGdK9Xln/Mza/gNs7dZLq03RvtKdrIm0s2IZeneGvpZm+zbtGFwzCpQ2lkSjENUyYIgHahnXl/S6JuWMfTWF8jlUXWCcreNdm1eVqYg6l1V4X7o8FEBax5vyNWVA03GSvKa7K4qxpcN6kjiQU7qiLEl7lYCha9znUjPy5a9L94bj7eWbUN76zaVvDRS7wjUdp4oRwzi5rJYoIuXYFu8lL33ETXDTtXXGmqWvzDeOiNZbj+nBEACvVFqb9jnr92B/a0F+7Hx9GzERtjyuJNeObtNbj+nIO9/DfedXnquSJMvqPuHLETj2pI+mL/UbBiGc4zy/qO+xV9u3ut3DyR5qM3tehBPeOAKV6+LQbnKxJU9L7JWH1ZRCKLSe6iUlKVit54wY7mM5XlnoRF75PBoCD1Kl2/8mNhavU1GeVWdc4xc1g5Od+99NeovpN4XB91o1bMupeUWZRMXnaZ+KKJZYiKnvlkAb/VzWcMPe/uKdIyZEZAy05n1fCKTbsDn90yfmFgUZgJJr5tXfsyda2II0fxb34+xVf/0hFH8KDMMJB2Enk+d31hnoBfMOkvV1KI5p46WPvpyNPAqEQ++vAfLGajkySoSteN6F//2X/m+3aEYcTp8QsKxHDUENKgqCCr9Byl64b/m3Jhl/Ksk+I9o5CERS/WWZSom5zv/urz2ArGwqS58zsrWvRigi5BZn4EwGcs/flz8j1O/fdUy5cR3jhKgWffWRM4z6R9hfnY48BfLzNC+Y6Ofd+VWwqd1y436imfD25zCMjrRqroFRY92+VKln8nYJTp3inJRx25PJrGPY9H3loRPN8tvSOfDzybW15cGBo+W27XTVUqellj+tULkhWimrdClmMaMIyokJWj/NwtN0KjZE0mL1hXzKLf055T7mAERFMG7FzfKtyQa1TfOWhxRXDdGFr0LCabneIpek5rOYmphE4noOg5i541hdBnSb3yVQRSIUARnqexRolEyYkyxEXR7D1yIc+hQ7IbSUKxAAAgAElEQVRqO0w+2chOdvn5d7+OXW1ORy5r3+I1YSlCRFgndeuLwXBgdnpHjgbKnbtmO94I7Kfrv77ck7FV6bphD4J/qaRREpqXQjXsixp1o7vHp//4hhdtoXuxw0YXTC6m5Pa25/QjhQi6gN3b1HWilVdo/VE2TjGdjGVl5jRKl6U41slWmyG44+VFOP+IQVwCMj2szejqR+arjRoCy/zfRq6biDa+Pz9P8HN+XkWmoL0MpnmFS0PSzuUdlly+JRudlcNMcTobkbNr/BfFnoyVfO9C/H5eKtsuYf2G+EyzGSd1w5mHDMBBA7ur710iqtSiDyq5QpggpzC0O0zJG4JpKt5lm3bhkvvfCoS/8cxeuQ2bFfmx+fs/8tZKbNxZSOsgs+hytOC62due85SOTEzZy//K/A14fLo/R9DC9Tvw4JRljnyGilaUiyfML+5DE5ese0nbhR2EwixUhqi02nIUd01YjM/e+6b3fcNdVuG7F8lG8FEtc6IJnZ24cGOsPPiMMPvFZ9FL5ObrX1aU7HtF+f7MovcUvdZYU3+2pOVDTFy4AQDw/vqdTu4czXdnz162MjYACd57594O3Dp+Ib7215n6a0tEVSp6mTWeyQCLNuzEibdO8o7lKcUn//A6/jAxmIeiMOkqWAmC71fFLS8uwNSlmzFp4UbtebwsW3a14UsPTsOqLbt9L9xj01fiqkfell7Dy8v+3dOW084lyET/+t9mYpwbesi44O7XPQXI12kgjC0QbhYsn1/swrj2iXeCJyr430dnY8riFuX9x89bjwdfX+YpGnZe3LkUFqe9Y2+HV198HcizV7LzIrhuaDCKgx0PlVlyzsSFG/GFPxdyCuoWnskIu6vMR8/DMlHmqDwBnKmiV319ceWz33DzG3i6KlzSsgtf/ctM5PMUH7/zNZzym1e5xYtBmIgTFm7Eq++Hv9PiV2Jy1xrsM1AKqlLRyxoYAcGfJi/17eGZpxTvrt6O3760CBt37PWFxKkshUIKWj3shTDZQAJwGvvUJZsxZfEm/PTf7wVeiOASeXU0xN72nDaDIt8IH522ErdIfJKA38rlh+ximSZJnlo75ENeFWKtTV++BV96cLpbflDOKx+ZhV88N9+niDokvngV4ipd3qfKOns+j8uM5VsDZbB60K34lflqTTtjhjdHE80rI72HuOeu/8bqRWEAIPO8MfdhXmnRB4/xbjaWFE3V0TFXZ2HRoHotgkkqlHVcrnrT0cFj01cpz2Oo5B/YoyH02lJQlYpe9sAIcfxkPHyjm7p0s+8z1TPP5YJuIRnMsjSdhMlToGeXWgDAe2t3hPhonTL/+15hL9xcnnr32tOeD0Sd8PCy3/jPubhvsjxnDQ+/nF18hWX5uUVkC0106NYVPDdnre//P3Hy8ytMnVWMrswR3U2+6Bj3o7Ddk1jbyGnmHkTXDVXIZlJVRpE5mpb08vwNOPqXr+Atru2zs1vbc3hsejDdN9/hy57nXp9FH7ynbAKe1f01f38bI3403ieHCLOMeR/9Njc0ld2PcJ+FsaylkC1UNrfnfRZxklt16/7CQr3OoioVvayS3129Hcs3+2OY+Yaweuse6Wfi8zVNatZumqvXJU+p9xLJc14Hb/jW0kIa2BwXzra3Pae1TsThrwm8khPfVdUWc+Ix8zz1+oU+d0/8wPf/77jMgCu4UL+OvLnrRpVCAii85HtC6i3vWfQaRS/R9KYuDoZuP90osIypfDphVuTC9Tul1/gXT8lGbrnAeTyyCXgW6jqeM1xU33+3EHUDAK8s2OC7xiRkmcEWeAGF7yNre5GqmqrlrxWtzU6iIhX9zr3t+OpfZmD9dtWm3xSvvr8Rf+U2AwCC+bH5h/GBkCY4zHUTpkBYgw7bLNmTxedmCCqosHaWo4XwSt51I5Pz2ifewXtr5dsPrt6626sLPvaX/x6BeHgDH30UN0ox8COPjlwh5jlsYCWmveWjY7y5jxCL3otQ0nTyohwU8g6wrSPv61xfW9SCpnHPYxmXrz6p6uRHQeEj1cLnKhcdoA5tlK2dEMtp68grjRG2TkI2UhZLNtlWso17VrIds1Qy6qCa84vsm2NTkYr+X7PXYOLCjfjDJHkyf0opvvXY7NByeKtb7DRUDb6gQPVlM4XTGqIcGDlKvZcoTmPIc/7N1o48t2hMfv7yTbsx7ul3A8dPvHUSzrx9MgC/smv1uW6C95b9z9ehLBmUjiSmrPjOJezWYgoCX2pi93eYomeKRRc2KvfRB8/7+XPzPTcG4KR7BoC3V2z1XHfFxsx3rXNSGezkFX3INap5IQZrJypDSTbSFVNh7GnL4fqngm0TKKy8lbUPVQJCHXzHM801BKOGYsvkKJdCV1GRcfTshaoRlxm65PLUt+BFWQ73kDfv8k9KJeW6UW2WLEKpP71C0HVT+Fu1YpHdq7UjHxoddO/kDzBvTWAjMB+ZDABXt/HWpfhCBS36YGcY1aLX+ehN4SORwmgTFBCRWfShrhvnt85tJ5PHxMXg5djhqsXE2Ngt2WGM0Vhfg11tOcGi15fJK2XdDlOq+WixnmXl7G5XhyR74coG7hUTi57/Pt/7xxzleVEUt86iLxcVadEzBa2a6MxTsyXHC7g9MDd/GC0f+N72nM+/J1JQ9IYWfb7go48SFujJw+3S09qRC8aRC+WFKS3A35H6XTdB2X3/0+DxXI7G3mgkKnVuZsP2XHC5ehw8H32Y68b9vjoFI8t1b9L/sVMIgafkwkatF/1pqrdOQwYbsYkpknUUa9HL2sBj01dh3prtXuihbPtLBp80TSQsSEBGm0QeqesmgpGSN3ymnUllKvo8U/Tyz3N5qtzhhufXXFih+EIU9p+UP7Hn567D4T99SVk2a+d7JdukycjTgiKk0A+hZe8Qr7jbOvKesqVUfo3JSIOvQn67t9DwSslooiOfl1pzKoqx57vUFrIrqtZDSO/Ju2s42dl3l215x+Plo9cpeslnUfLakAiR8XPcVMgqmGHxYQTXDa+oVXMLgLoOVBlLL/j962iocZ4bm3CVwTY2MVlNa2JYyOZTZPZjVNeN0kefWDrEaFSconcW3rihiwpl3p7LK+PXmRIII2pOG8BZScqyFLK3cU+bmXJ7ef4GXOcOHSkNWuB8A5E1Ov5l5V037DrxmjClBfjzvbS28xa9YDkJL1TBdeO3/qJGIsWloZZZ9PLFSCr4iAj+sqVuCJ4ubBIwUwaiPGu27TFqY7xFP6RPY/gFBjCDaWnLLnzm3jexccfeUDdS2MYjYejmL+prmaJXt03WkYgRdIDEANEtfffkMQyWiOi6KTYiKmkqStH/4rn5OOm2SYUwKMV57bm81H8//tqT0MfdgCKMOEOvc+6cgo/f+ZpPtrDhPuPv0/wxy+L9+XYja0T/ml3IgNjGTcbm3U7jir/5l16bWfSFGuZHJgFFr5gEE330UXLbFOOiZ535mbdPxvefUvtdReolicwAYKPbeYf5fE0UX1vO3x7eXS2PfhIp+OhJYkqEWbwbd7Zi1oqteOD1Zcpz69y6yYX46MPQKdZ61+Wms+h1UEqxbXeb17ZNfPTyNmmWj4hx1JBewsnA3RM+kJ9cJv2fyGQsIeQcAHfB2ZHgAUrpLUmUK1Jfk8H67Xu9xqKasFu9dU8gLh5wfM6m6UKj+HZv+udcHHdAXwCFFaxMNtOoGx5Kqdbilr1gbPPx+poMWjtyeNB9aakbdjnpfX+a5rC5gz1tOWziVk3yFr34crCX/7SD+2PzrjbvXNGi14UdipgokSOH9JK6J7rUFZo1s8ZNHB51NRnA/cq87Ky9MYWvlNmgzazfri9DxuRFLXju3XUA1Bt6xEF0oyzasBP9u8kX9NTVZNCWy/uUZxR3hndPTRtgI7E46zwAp16O+vnL3v8mbchkrkpXVq/G2sBuZc/PXacup0yWftEWPSEkC+AeAOcCGAngEkLIyGLLldGvWz068tSbOFXpbJmSB5xdaUx890Dhwb40fwO271FPur63djv+Pm1lYGKM3cXUR89DKQLREqu37nESL0E+2miozaJXYy2O3b8Pduzp8FxIjkUfPD/M2jnkx+N9//NKjo9Vf+StFd7k2UXNQ7Bf7y6F0YSwuKYtRzGsbyO+cdL+2nsDZiOhkw7sJz3OFAaPiW+UTeICfv+uqj2JmFj0bHFPFC5/aHrhHgmG7ok+7I6cPIc8UKgbn48+adeN66Nfu82svkXCosFkyNqZLo+RSC5PIy2C6oy1JDKScN0cC+ADSulSSmkbgMcBXJhAuQHY8uF1bsx71BzP2Qwxtuj5BzJ7ZTCvCYNfwMJYvGGnZw2bhlfyUBSy9PGcc6ezo5FMobTl8qjJENRlM1jFrQ6l0G/ZF4c2rvP64b/m4bb/OpPa2QxBhhBpeGUun0d7Lo/D9u2J7551cOg9VPXGv8yqF0zcTtC5Tn0vlnqCLy/OfEJnWGsdOXmysFhlSdqRqmjmuvFZ9Em7btwO+lcvyHMvhSGK89Ss1aHXSBW9+/u+yUvQNO55tHbklHWez1PjfFZAZSv6wQD4LD+r3WM+CCFXEEJmEkJmtrQEd3syoZ87rGSLm6K+WLXZeK6bNRoLQ7YF3Fl3vOYdn2iYvZKHUiodvrJGqYp2yGYI6mszvgUw+XzyYY1i9MyGHU6nliWuopdEn7TnnMnY2izxWc5RmcP5tFXlrJBM1OlsgkZ34RBfXixF3wlzzap86CacdJB8BMRQ2/OFuuF99Cr3Zr3m+e7QhCSzqJu4vKhxmajYK3nPNu5sxaSFG/HHSY6ffXdrTqmgc5Qq1/NIz69gRS8PaRUPUHo/pbSZUtrcv3//WDdilteW3Y7rps0wvQAjmyGRskkyfvZv9fZxm0L8tnGg8EfRMNgko+wFa+twJqDrBCu3XfCrJoFY70wpZrPOiIndTlwZ296Rj9TZyvjUPW94f6sU/Q3njQgc09kEDW698nUXp3NMasN4HtGSbHfdK4N7dYlcltg2RNo68sp6YspbtOh7NdbizEMG+s5t0ES2ietVeKJYxowDB3Tz/l6rSImi45nZwW0cAeArf5nhSwyoerSUwtgdzMoqB0ko+tUAhnD/7wdgreLcomANgfmfo1pdNRmCrGHvyyt6Xfw3vyFIUuQpla5oZK4rmSemI08di16wimTb5hWLqOiZUqzJEBBu0wXe/fKtx2Zj04dtRiuWTVEp+sMH9wwcG9JbHZLIOlBe0USJ+WeUwnUjPjonCyhw3AF9cM1pwyOVFeZLnrF8q3LxVGO9M8HNd4AdeYqudTXo180fySabI2FsEtMic8Rpp18Z2xT5GlNYp6ZLjkdptA6qki36GQAOIoTsTwipA3AxgH8nUG4A0RJ8fEZ4Xmjf9dkIk7GGL+3Tb8stgmLY257Hmm1B9wMb0agaHXPd8HRwK26TojWnsOgJQdb10e9tz+Hk30zyztm8qw1tuTzqEtx4oV6huGSdybVnHhRQSIyu9Y6i5yWLMwoqhUUvGjM/+898rNm2BxlC0LdrtJS3tQYusz9MkocFNrqdYUeeYndbB/a0OYnzshkSWM8iGhtAYcONTRqLPo4SHLFPj2B4Y0Iwec6/e4oykoZCPRl7zqH7BMssU3hl0YqeUtoB4JsA/gtgAYAnKaXvFVuuDFFJR3Xd1GYy0j07ZYTlgYkrgymyCSnPT6pR9D0aan3HOvL5RKwIvtomLvDPO7CIHDbZnadU6noCkk3TqrLoZZ35jr0dOO/wQdLzmauBr6U4dWZqHIiuDh2qUStBdFdHmOtGB5vHmLBgA0b++L847levIEfdZy68UzKLvnej08mqMs4C5tFNPF1qs/jXNWMjX2cC6+w3fdhWWAgpoHPd/ODcoAvRZBFXKUjkraOUvkAp/RildDil9OYkypRRjG+XXV9sGeXE8aGqw+tqMiRgtXbkaCI+el5JLBZSOjOyGQJCCHJ5dXwysyonf//UomVSdRqyZ3zovj2Uzz6Kj1VkYI966R6+Ok4fMcC4fFU9ZtyJbxWXPvBW4FhdTfzv2cVV9G8ucTYp2bHXseozJFjf/DoGxqCezs5K63eoFb0u6EFFhHlQj2yG4IHLmotalMegkI8gAWD/fl0DxyrZddNpRJndll8vV/SmaRFMSKLxMESl3daR90XUiGQzBN0Fi37d9r14d7U+54kJJpa405E6E4iqOHhWjs5nborKohf35bzh3BHYt1cX5aKpYkYZ+/RowENfPgaAeQK7KB3LLoWiJ0QfXvzGB5sDx1Tf85XvnhwqR1eJ8l7a8iFqMplAm+9eHzx3n57xt9DT7cpkOkLnyVOKM0cOxH69o09oi1BKI0WRvbV0C+avNfMWJElFKXoTPX/lKeoJqoxC0esmj6KS5HzcPj0bvEVhZ40ciNaOHI7QJFJTdWTfecI8DYAKo9w4mQwyhGDzrjZlylfmo1flKYoC/4LxL6044c46P9UtTXzXKiiAbq5i2ymJ35cRxeUyQ9gsh+Eo+uDxg7goFBGVoj9wQHfc+fmjtHIwi55nxZbdzjslKNtuEkU/qGd8pXr1qep3Ok46a/aORl2HIy0L+nBSGbL9n0tNRSn6MIt+8vdPDVXaMmsqSYs+Sfbp0YCnrjoBXzhuKHo01IbOB2QzBIfv50Sc3Hvp6EQjEkzy1GQyhZdHlcMlUR89V9aXT2gCABw9rHfgGXepc87r0cU/2pGVEwfmvzbdTSxK5NH1ks1hAEfByfTU8W4qDpFshmjr/lOjBqN7gzojisxqzeUpspmg66abpJxvnHyAsuwwdBazib1w8TFDpMfZpUOLSBJHqb79yAIAZJ1mqakoRR/mXx/Wt2uoIpFb9GYVH7XnjguLrunZpQ6jh/bGrz59OBpqM6GKJJshGN6/GxbffC7OPXxQUb7nODCLXnuOgZIzHQrzzzJPKd7/5Tl44orjA8+YRYH0LIGip1RuweqoFeS79syDIt+XQG7NquZjsoTgyP2CYac8uvdA9VyzJBh1I9bHiH26Y19udBoVXTsO0wn/M2owfvrJQ6WfsfqL817z1SG6Cnlm/vAsfOv0A33HymFYVpSiN1FcYQ9NNmyuN6x48RU6UDNMBoBhfQuWgsxvqYJ1VswSBRzlJ4tk2adHwffJGj27PsmYdRMc605/ziCJr1asG1O/KwuLBJx48/qaLGqymcAzZqO8Hl3kz6DWnaRs1FhaP/2EOn1T14iKnn8uy285H5ceNyzS9YCjeKNskJHNEJx7+CBMuO4UZZm6d0cV/SNz3fQQLHpKHaUq6xBN3mldQroww+L2zx+Fhtos7vnCaOU5I/ftESqDyCNfO877uy5kRa84wtG1s1JRUYreJGJGDC8MlhH8yl1MffTCO/T5Zv+QcIwwbL7ylOHe4p1Bvcwno5gfm+/562uyUov+S2MKSkJ0bSVl0X/v7I95f+saadbAoh81NBjz3CCUaeo6bayrwddPdBKk8WsLBnRvwD+vPgF93ZTUzKJv6huMggAKHaOu7Xx5rDoRm2yiUofYEeksQhWERNvb9FBXmQ3vrzZOeIv+lI/5V6+rFH2NJI5e1fHtL7n3xwZ2V8rD0LUH9tlFR++nLeP8I4KhtazYY/fvEyoDz31fHO0bdYaNQLvV+9uVVfQhGCl6xfCcIXupTF03LBNIj4Ya3PfF0Rg+wK84zjnMv0CiLpuJNdHLLD5e0ZtEmIh5vKNGKb0x7nT87qIjA8e/eXrBtaDL5y976Xl+dMFI6aSc6DoxVXsZUqgX0ZIdNbR3YGg+amhvPPaN45X3FyOWTKCgaKjNRHJL1ArPpVdjHf546ehIQ/ooFn1jXRYPupFBOniL/v7LjvZ9ppofyhDJZKxo0bvvjdh5mKKbcGU64ZLjhhqXJ7aB2kwm4F7RMWpob9/zDlX0Qn2Ihk1nUFmK3sDUU/lhGTILwljRu+9Qn651OOewQQFLrq8w8dK9oUa6SjAMFmvLNwjVsJofoby90h9GGXVBzeBeXUKv0U6MZYhW4YkdIUO8p/hiqyxePpZc5rFgVj7/DMYMD05WHryP0yZko40wmFsiSni0rI7PO3yQNoxQhumWd6d8rH/oewEA73Fhf2K7VaWEYCG1PKoOc5RkBatJtelaJHv+UeZZvDbgFpzJkEhx0RlCfG007N6ia7LR+uj1mITkiX5YcdWcbFWiqdXNGiV7qcXZc3FJeveGWhw00Bmu9mo029kKKCh6v+vGfBWoyWcqVKOmSd87FROuO0XbcdVI/LW+soXPnr1mLJ644vjABLpYgm6CvcldlCKLiWabXIQ932P374OJ152CSw2tQp0v20ThqDquKNF+9bXB+HVAvVdBsagselkKhK7Ce8H6npM/1h/fO/tj+J/RgeS2WnT14in6IgIlsploexSzdNyMsIVookXf2XNnQIUpehNEK1vMg3HggG6Y/aOzYpUtbhguKiCZRX/DuYfggcuaA/57kaOH9fb+ZlaZietG1/nFWQUsdg5sMc3+/bpieP9uWiWXkURg+D4XLj1ySC8cd0DfYIck/KtT9J8ZPRiPfuM4fHpUUHmwDlPsnG77zBE4pqlQ3zWZDA7o3w2EEJw1ciC+dqLaHw8Ab91whjcZJxrQFzXrfcXsfjLE5/WsZml/Q01WOh8ic92YGjKylZyqz5gyz2aC25WrfPTZDME3Tz8It3/uKNx9ySgA+i362OStXtE7v+OE7bJiM4pQVRVOOu7C/3XZQvv69hnBCKqoUVmloOoUfR9F4iqeXo3+oaVpvhrWJFnbFIfaoo+1R0Mt6moyOHPkwECyMcYdnz8Sk79/qi8qwLPoNa4bFp+rs9rjLAjhXUHTbzoDBw7wu7p0llMNZ+kc29QnoDBV1r5YpqlFz9wmJwzvpw01FOv+c8cMwT+uPKEgF1eHf76sGT+6YKS2Xrs31HhtSFRTuhEP+54q95jou9d11DIrvTZLpEvsD91XH1bJ+MWFh0mPP/qN4/AtQYH1dudqspySvGzMMDx7zVhv3wgdYTZINkPQ1M+JWjOJuokzoc3fy2SrSe+eGQgWfeG5XXJscFRoFX0RsMr9vWsZMHo01GLuT88GoJ5NF5XCQa7f/qyRA/GHL4ySXQKgoODZ737d6jH1htO9hi6+wPwCFJUC+PSo/TCsb1c0uqGCGVKIcGhQWPQvfedkL1+KzsJetGGn8jMVzFKrzRIM6B6MFDpLk5CL5aMHHCtsiOBOUSmuMGtMlfEybHvAgkUffRGdTsnqOlBVhw4Uoi1Mc/To7tOlNhNox2+OO8Pnoz/poH54+qoxxi4pVQd0wvB+AZlZWG9PzmhqqM3iyCG9AvUpe0qs0zfJz04I8KXjh0ndc2wEGWcujNVfNqpFn/Gfz7+bsjrk9cCL3z4pspxJULGKvsGtXNkL2b2hFotvPtebXT9u/z6BDXwZC35+jmeJD+3TiPOFDIfnHR6cQOSHm4N6dvEatm6FYJiyyXK+RpmPnh8e9mioxWVjmvDVsfsrJzgB4LIxTTh9xAD0bjSPJmGWmmol7NdP2h8nKvZq5V+YDCH40pgm3HVxYWm9yq0jWmOiAlONIsImQJnSC/NRZyUvp67zyXAvuuh60D1nNgmnGi2I9SC2p6F9GnHqwU7kSkNtNmCD1tVkfK6buy4ehaOH9TFOExDFKh7mhqp2qc3iqCG90K9bPY5p6iOVWwa/B+3BkgAJPnkfIQS/+NRh+MZJwdW17KupUlCbEBZEMKhnAx6/4njvuYkJ5fh5GXFUBvhdWYcMih6znwQVq+izXKXL4HcyeuL/jcH0m86Unldfk8H5RwzCxw8diG+dflDgpfjjpUdj2o1n+I6pFAzve92nR4NPWYTlt+hSm8Woob3w+0tGe9sI8pESvALp0aUGhw3uiR9/YiQaNTHcI/ftgYe+fEykieC+mvBJwHnpVLsb8SlrM+7S+AuPKvjOVa4bUak29W2Ufv4zYYVj2N6p+7qhnGETpDLFy1tmsrBH5p4QQwZ1liVzxalWr4qTdMEVvhnvvl3qsgErNJshPos+6hxNlHBcZqXmqZN2YeYPz8RZI53Rnvg8Zc+JKfr2fB5PXjlGOx/BvoXM+mcdGyEEw/t3jRS5xMrNClE0PCMH9cBTV53gSy2h8un36VonteiTTPsRl/I7j2LiDbsyBE9dOQartgY36jAh46YN+NOXmpXnBIei6tWHA3vUY8OOVtxzqX8l3qdHDcaarXvw+gebpOlYMxmCf17tb+z8Ah/eqo26hDpK9E3vEEUvysLDRyPIfJ4mrpumvo0YuW8P396wtV6cuxDOGrLxxpNXjsHslVtDo7VkcrE6e/aasdKFVgN7NOCNcaf7ViYDelfEmOF9saRll/L5iQpRJtcPzhmB+poMzjlsH0wQ9gXIEr+PPqrfOkrHcO5h++BvU5fjCxKftEl/wTrf9g6Knl1qcZhkVzCmTNlvsX/84fmH+IyY8dc6gQMH3fRiuAAcuu/9maP38wybP146Gve/thS1Wb9FP6xvI4b2acSv/+fwWNshdgYVq+jZs8lmgOamPmhuira6LQrMEjugf1csbdmlzQc/tE8jNuxoRauQsnZIn0bc+tkjcPYdk43vy/s/eYs+asa+2z57BD79xzeln/VoqMEOLuuiifWh9DFzUTcyEVWjL74jchZUyV0Y7HePhhpMv+nMUJfM4F5djPZWlVmy7NiAHvW+5yCWL6JKQTC4Vxf85BOH4nPNQzCkTyP6dK0LjPJES18cARHiBADc/OnDnf+Fe7BNX/j/dYw7dwTWckaH+Fwf+nKzb9vKh79yDL7y8AwAwAH9u2Hpr8+Xlmuyt4bnunFP1snKjAbWEX6+eQi+f87BgUnfqJaz52bMBC30Mw8ZiFcWbMBobm3F2Yfug7PdXaN4cbvW1+C1608DUL5882FUrKJnzTxOmtKoNNbV4B9XjkGfrnU443eTA5YFe7eyGYKLjxmKGcu3KoeQJpEwD1zWjOWbd/mOFRMnPGpob+VnU/JPPhMAABCKSURBVH5wOo78mT/18W8+e4Q3QS2jVhE37OQ9UcuhepnPPXwfvDR/g/e/aNmyUQa/E1QSseEMmVjMMou6SbjK4Jhy/WnIZAiO2M9RHBOvOwU79vhXMov3EvsfcZQkc93wZcj8xTxiSm/RGj19hH/i/bSDCxum6Cad+3Stw+BeXXDp8UNx2/j3peNf1p75iLeHv3yMl+Kav6Zg0bsr07vUGEX2mJIlwaib4w/ogz996Whlm1XpHdX5g3o2aNNPlJqiFD0h5DcAPgGgDcASAF+hlBa/y4UBnkVfhKI3WS3IOKapj7dx8tmH+l+AgwZ2x4J1O5DNEHzm6P1w1qEDQ3Pu6DhzZDCyxUTRm1ivPN0baqTJ1i5qlqd19WTRWE4swZPMhaEy2j49aj/k88B1/5gDQoLX3vG5I/HsO2u9NRFJ78Ete2mH9G7E6q17IufNr80SHD2sN2at2Oo7LpbTq7EuMHci7u0b5jMX5c4Qf91Elj2Cj167aKwmgzfGnY4PNn6I28a/Lw27qfcs+sKHp40YgKlLghumFHz0zu8k8sg75Rbmk8QiM0S/G12YDOKk69QbzlCc2TkUa9G/DOAGSmkHIeRWADcA+EHxYoXD6jnu1oAvf+dkDOgRbdebrvU1mH7TGegjvKCPfO1YzF+3wyg5VlxMwsei7p1JEG8DEN0QmWXclEXt6EZffCZKUZH37VaPr564P7a6rg6TkLxiuefS0XhtUUvkzjOTIbHTWYuum4BFL1SfGG5ICDHet1ZGFP+yyQpgFu11mmTrRBZFJiZLY+8z/zVEiz7pUXxWkjcoTK/oPh5/7UlFbbRSCoqaDqaUvuRuDg4AbwEIXxaYEF5vHPOhHzSweySLnjGge0MgOqJvt3qcdJBZwqa4jZQpD12oZNRcKXHRKvpa+QscBl8vKmXFdoLqDEXfp2sdPiVZbRtGTYZ46av3jbh9nui6yRKCIX3UCuPQfXsGJoOL2S0tyqS9STvu260e0248Azeed0jgMy/qRnRXccXy4ZUAcMkxQ3HSQf3w1RObjOXUwRuLAYs+VNGrPx+xT49YuqWUJBn381UA0aa7i6BYi75cxJWWfc+jh8WbdL7q1OFSCzAOLJrjk0fu68vLDRR851F927wkusluIDx+3pQR+4SnyI1KNkNw43mH4N5LR2PKD06PdK04kZfNEEz47ql46MvqiLArT/HHlt/+Of2WgDpKkYNlYI8G6TuqckXK2iQ70rtrHf7va8dJF/IVA5H46MPUSidMDSZK6JMlhLxCCJkn+bmQO+cmAB0A/q4p5wpCyExCyMyWlpaiBfdiYMsfohqJuA1k315d8ODlzb4FSFH4wTkjvPCzYmEvac8utThyiD8sLq5F74VlSnz0jPqaDM4aORAPXKZWfFF46qoTMMWNlkiKbIagoTaLcw8fFLlTF+sskyGoq8loJ54vd7dQZAyM6I7kMXHdfPF483TAOpS5myQilCrggg/RZu4lNgoLm/tLap6gswhVk5TSMymlh0l+ngUAQsjlAC4AcCnVrGChlN5PKW2mlDb37x8vLzUAvPCtkzDhulO8h1RpFV4MZxwyMPJuRjxJuTyY66Y9lw9Ya0wpqdLaqiisNFVb7IQQ/PmyZpwcM6+5SLf6GgwpYr9QGbz7I2rTPFgYYfArMZ3yJNYuIThXszpahsq/bjIZ+8tPHY7lt8jDKpOQQWb9J/WGd63LSl0qWeK425bfcj5Gu8kFi3HdpJFio27OgTP5egqlNN6KpYiwbb9Ym6y0Ci+nuN3ra/CJI/fFiQf2xQ+enhtbFvaStkkUPVtzENeiBzrHB18q+KRwUS3RP3xhNOat2Y6r//42tu9p9+okrI3f84XRvnmNKdef5ouPF5l+0xnY2x58Pp3pBtXtMcA4/4hBeG/tDgzoEW3u6bNH74enZq0OHH/nJ2f7/md34qtX7FxVVJjaKTrq5g8A6gG87DbqtyilVxYtlQHMp1Z5PvryyUsIwe8vGYUVQox+VNi+r/v16hIIAewS10dPCr/DUhukmWLDfcce2A/PXH0Cxs9b742OwtyTmQxBhmtXQ/o0akcqqpQYxWSAjIqqE+QPX33qgfji8cMiR7H99qIjpYo+sO+BRATWUYcZGx8pRU8pNd9/K2H45FmVRBrELaQpiMcJB/bDX796LE4Y3tfzqV5xsjMpWF9TcOtEgX/xTVZWphVZgrSoDO/fDdecVni1OmNRYGfeh+ewwf54c/F9LkWosgiv01lnF7bCtdL0TsWujC1MxlZWhZdC2guP2jeSBR0ndl6ET+a17NfneX+rwuZCZeJEqmTXTVIbsvMU2zGnlTfGnY5egs88yfd5ZEimSLlF7xxUJZ5jWEXfSRRmzMssSFRK0EDuulidQ1+G+C4N7dOIsYrUwybwlmCPhlr8z6jBuLSI6IxKVvSlUADseVWYbglFthgtKT0//cYzAlv4qeCTFLKOOhcyIq0w+7KSFX25JYhHGsQW/civJRhimMkQ3P75+LHcQHJx8uVAtOh/dMFIHKfYAMeUcs7rdDZJuY9MVr3LM6wGUzNIr60wBVS5it79XWnGH2sfV586XLmBR+llUIfrpYG0TsY++o3j0LKzVXuO6KMP2382bfzuoiMxYlDyC8lMKYdLhG9uNcY++lJKlDwVq+hZg6g064+1j9NHDChpamWtDMwVUJa76yEgiOje7zROGB7eMRcTdRNGZ+jAzxzdaVlMpHSmAh17YD/MXbMd/bjUIdXqo680D7eHt8AmZN/QtMGs6HJKnVKD2eMzo6PnmEkLpQgOqLQ2XgydqUC///GDMeX603xzBbVM0YdYG5Wm6CvWoi9sRlBmQSyh/OuasdiyS+/y4LnwqMG48KjB+OOrH8Ta9LmclCLqxkvulcoxWLIkERFmSjZDAusN2P1zITG+FabnK1fRnzZiAN7fsDN0j9O0UalzC8XA8shH5epTy7ZMIzalDPetNOUSh3L7vlmuoLAtNSvtWVSsov/+xw/G5ScMi5xTvtwUcrqUT9P36VqH5mG98a0zDiqbDCLV0vGldYK7Uii3S+TzzUPQpTaLTxy5r/a8cssZlYr10WczpGTJ/f+bUJZHGWxZezkVQjZD8NRVJySWHMzi7MtbirTHQHnnczqbcivQTIbgU6MGG2w8UlmKvmIt+lIiZhFMkt9edCQefmM5moep93G1VB6fax6Cz4VswVgslaVa4lFu140plSInwyp6BT/5xEhMWbwp8XIH9mjAuHNHJF5utVBhhlKnkNZ1BaWgUizlSnPRWUWv4Ctj98dXxlbWYpdq4COk04zp7WabPHy/niFnVj6VougrDavoLZaU09SvK569ZiwOCUnSVQ0Y7H1iiYFV9JZUYQ06OUfGDFGtNKxFXxps/2lJBdZjYwGsoi8VVtFbLJbUYPV8abCK3mKxpAZr0ZeGRBQ9IeR7hBBKCClP3l2LxVIVVNqOcZVC0YqeEDIEwFkAVhYvjsVi+Shj9XxpSMKivwPA9bDzaRaLpUgqbSFSpVCUoieEfBLAGkrpnITksXxEqXV39ulaZyN+LZVj2VeKqyn0rSKEvAJgH8lHNwG4EcDZJjcihFwB4AoAGDo0/sbRlupkzAF98f2PH4wvHGvbxkedH10wEmMP7FtuMUJ57BvHY0if0iRWTBoSN48GIeRwABMA7HYP7QdgLYBjKaXrddc2NzfTmTNnxrqvxWKxfFQhhMyilDZHvS72OJlSOhfAAE6A5QCaKaXJZwKzWCwWS2xsHL3FYrFUOYnNfFFKm5Iqy2KxWCzJYS16i8ViqXKsordYLJYqxyp6i8ViqXKsordYLJYqxyp6i8ViqXJiL5gq6qaEtABYEfPyfgDSHKtv5YtPmmUDrHzFkGbZgMqRbxiltH/Ui8ui6IuBEDIzzsqwzsLKF580ywZY+YohzbIB1S+fdd1YLBZLlWMVvcVisVQ5lajo7y+3ACFY+eKTZtkAK18xpFk2oMrlqzgfvcVisViiUYkWvcVisVgiUFGKnhByDiHkfULIB4SQcWWS4SFCyEZCyDzuWB9CyMuEkMXu797ucUIIuduV911CyOgSyzaEEDKJELKAEPIeIeTbKZOvgRAynRAyx5XvZ+7x/Qkh01z5niCE1LnH693/P3A/byqlfO49s4SQ2YSQ51Io23JCyFxCyDuEkJnusVQ8W/eevQghTxFCFrptcEwa5COEHOzWGfvZQQi5Ng2ycTJ+x30n5hFCHnPfleTaHqW0In4AZAEsAXAAgDoAcwCMLIMcJwMYDWAed+w2AOPcv8cBuNX9+zwALwIgAI4HMK3Esg0CMNr9uzuARQBGpkg+AqCb+3ctgGnufZ8EcLF7/D4AV7l/Xw3gPvfviwE80QnP97sAHgXwnPt/mmRbDqCfcCwVz9a9518BfN39uw5ArzTJ5943C2A9gGFpkQ3AYADLAHTh2tyXk2x7Ja/YBCtjDID/cv/fAOCGMsnSBL+ifx/AIPfvQQDed//+E4BLZOd1kpzPAjgrjfIBaATwNoDj4CwEqRGfM4D/Ahjj/l3jnkdKKNN+cHZNOx3Ac+6LngrZ3PssR1DRp+LZAujhKiuSRvm4+5wN4I00yQZH0a8C0MdtS88B+HiSba+SXDesMhir3WNpYCCldB0AuL/Zzltlk9kdzo2CYzWnRj7XNfIOgI0AXoYzSttGKe2QyODJ536+HUApNxO9E8D1APLu/31TJBsAUAAvEUJmEWcPZiA9z/YAAC0AHnZdXw8QQrqmSD7GxQAec/9OhWyU0jUAfgtgJYB1cNrSLCTY9ipJ0cu2W097yFBZZCaEdAPwNIBrKaU7dKdKjpVUPkppjlJ6FBzr+VgAh2hk6DT5CCEXANhIKZ3FH9bcvxzPdiyldDSAcwFcQwg5WXNuZ8tXA8eleS+ldBSAXXDcISo6vf5cH/cnAfwj7FTJsZLJ5s4NXAhgfwD7AugK5xmrZIgsXyUp+tUAhnD/s83I08AGQsggAHB/b3SPd7rMhJBaOEr+75TSZ9ImH4NSug3Aq3B8oL0IIWy3M14GTz73854AtpRIpLEAPkmcvY8fh+O+uTMlsgEAKKVr3d8bAfwTTkeZlme7GsBqSuk09/+n4Cj+tMgHOMrzbUrpBvf/tMh2JoBllNIWSmk7gGcAnIAE214lKfoZAA5yZ6Lr4AzB/l1mmRj/BnC5+/flcHzj7Phl7iz+8QC2s6FiKSCEEAAPAlhAKb09hfL1J4T0cv/uAqeBLwAwCcBnFfIxuT8LYCJ1HZNJQym9gVK6H3W2xLzYvdelaZANAAghXQkh3dnfcHzN85CSZ0spXQ9gFSHkYPfQGQDmp0U+l0tQcNswGdIg20oAxxNCGt13mNVdcm2v1JMfCU9anAcnkmQJgJvKJMNjcPxo7XB61q/B8Y9NALDY/d3HPZcAuMeVdy6A5hLLdiKcIdy7AN5xf85LkXxHAJjtyjcPwI/d4wcAmA7gAzjD6nr3eIP7/wfu5wd00jM+FYWom1TI5soxx/15j7X/tDxb955HAZjpPt9/AeidFvngTP5vBtCTO5YK2dx7/gzAQve9+D8A9Um2Pbsy1mKxWKqcSnLdWCwWiyUGVtFbLBZLlWMVvcVisVQ5VtFbLBZLlWMVvcVisVQ5VtFbLBZLlWMVvcVisVQ5VtFbLBZLlfP/Abj6Bvzx9c81AAAAAElFTkSuQmCC\n", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "pd.Series(y_pred - y_test).plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# (10) Misc tips and tricks" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Performance" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There might exists built in methods which are optimal for your use case. Finding the maximum is an $\\mathcal{O}(n)$ operation, while sorting is $\\mathcal{O}(n \\log n)$. Finding the top $k$ rows can be done in $\\mathcal{O}(k + n \\log k)$ time." + ] + }, + { + "cell_type": "code", + "execution_count": 151, + "metadata": {}, + "outputs": [], + "source": [ + "# Create a series with a million entries\n", + "ser = pd.Series(np.random.randn(1000000))" + ] + }, + { + "cell_type": "code", + "execution_count": 155, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 155, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "(ser.nlargest(50) == ser.sort_values(ascending=False).head(50)).all()" + ] + }, + { + "cell_type": "code", + "execution_count": 161, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "4.53 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" + ] + } + ], + "source": [ + "%%timeit\n", + "\n", + "ser.max()" + ] + }, + { + "cell_type": "code", + "execution_count": 152, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "18.8 ms ± 873 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "%%timeit\n", + "\n", + "ser.nlargest(50)" + ] + }, + { + "cell_type": "code", + "execution_count": 153, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "168 ms ± 6.99 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "%%timeit\n", + "\n", + "ser.sort_values(ascending=False).head(50)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "-------------" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It's generally better to apply filters *before* computations, especially if the computations are expensive." + ] + }, + { + "cell_type": "code", + "execution_count": 162, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "57 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)\n" + ] + } + ], + "source": [ + "%%timeit\n", + "\n", + "df.mean().imdb_score" + ] + }, + { + "cell_type": "code", + "execution_count": 163, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "39.8 µs ± 1.28 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" + ] + } + ], + "source": [ + "%%timeit\n", + "\n", + "df.imdb_score.mean()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "-----------------" + ] + }, + { + "cell_type": "code", + "execution_count": 302, + "metadata": {}, + "outputs": [], + "source": [ + "df = df.sort_values('gross').dropna(how='any')\n", + "temp = df.set_index('director_name').sort_index()" + ] + }, + { + "cell_type": "code", + "execution_count": 303, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "495 µs ± 7.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" + ] + } + ], + "source": [ + "%%timeit\n", + "\n", + "temp.loc[lambda df:df.index > 'o', :]" + ] + }, + { + "cell_type": "code", + "execution_count": 304, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "628 µs ± 20.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" + ] + } + ], + "source": [ + "%%timeit\n", + "\n", + "df.loc[lambda df:df.director_name > 'o', :]" + ] + }, + { + "cell_type": "code", + "execution_count": 305, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "129 µs ± 1.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" + ] + } + ], + "source": [ + "%%timeit\n", + "\n", + "temp.loc[slice('o', None), :]" + ] + }, + { + "cell_type": "code", + "execution_count": 306, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 306, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "temp.index.dropna().is_monotonic" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "------------------------" + ] + }, + { + "cell_type": "code", + "execution_count": 387, + "metadata": {}, + "outputs": [], + "source": [ + "import math\n", + "\n", + "temp = df.gross.dropna()" + ] + }, + { + "cell_type": "code", + "execution_count": 388, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "923 µs ± 4.03 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)\n" + ] + } + ], + "source": [ + "%%timeit\n", + "\n", + "temp.map(math.log)" + ] + }, + { + "cell_type": "code", + "execution_count": 389, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "141 µs ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)\n" + ] + } + ], + "source": [ + "%%timeit\n", + "\n", + "temp.apply(np.log)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "------------------" + ] + }, + { + "cell_type": "code", + "execution_count": 401, + "metadata": {}, + "outputs": [], + "source": [ + "import statistics\n", + "\n", + "vector = np.arange(1000000)" + ] + }, + { + "cell_type": "code", + "execution_count": 402, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1.88 ms ± 85.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" + ] + } + ], + "source": [ + "%%timeit\n", + "\n", + "np.mean(vector[1:] - vector[:-1])" + ] + }, + { + "cell_type": "code", + "execution_count": 403, + "metadata": {}, + "outputs": [], + "source": [ + "vector = list(vector)" + ] + }, + { + "cell_type": "code", + "execution_count": 404, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "649 ms ± 7.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" + ] + } + ], + "source": [ + "%%timeit\n", + "\n", + "statistics.mean((i - j for (i, j) in zip(vector[1:], vector[:-1])))\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.6" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}