Initial commit to fork

2018-05-31 15:45:00 -05:00 · 2018-05-31 15:45:00 -05:00 · 13781004a7
parent 95b7eb85a6
commit 13781004a7
84 changed files with 9789 additions and 0 deletions
--- a/.coveragerc
+++ b/.coveragerc
@ -0,0 +1,7 @@
+[run]
+source = packtml
+include = */packtml/*
+omit =
+    */packtml/setup.py
+    */packtml/utils/plotting.py
+    */setup.py
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,119 @@
+# scratch code
+scratch/
+
+# Any data unpackaged by tensorflow
+MNIST_data/
+
+# In-progress word docs
+~$*.doc*
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# Ignore PyCharm stuff...
+.idea/
+
+# Mac stuff
+.DS_Store
+
+# C extensions
+*.so
+
+# Testing
+.pytest_cache/
+
+# Distribution / packaging
+.Python
+env/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# dotenv
+.env
+
+# virtualenv
+.venv
+venv/
+ENV/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
--- a/.travis.yml
+++ b/.travis.yml
@ -0,0 +1,28 @@
+language: python
+sudo: required
+
+cache:
+  apt: true
+  directories:
+  - $HOME/.cache/pip
+  - $HOME/.ccache
+
+before_install:
+  - source build_tools/travis/before_install.sh
+env:
+  global:
+    - TEST_DIR=/tmp/packtml
+
+matrix:
+  include:
+    - os: linux
+      dist: trusty
+      env: PYTHON_VERSION="3.6"
+
+    - os: linux
+      dist: trusty
+      env: PYTHON_VERSION="2.7"
+
+install: source build_tools/travis/install.sh
+before_script: bash build_tools/travis/before_script.sh
+script: bash build_tools/travis/test_script.sh
--- a/MANIFEST.in
+++ b/MANIFEST.in
@ -0,0 +1 @@
+recursive include packtml/*
--- a/README.md
+++ b/README.md
@ -1,2 +1,142 @@
 # Hands-on-Supervised-Machine-Learning-with-Python
+
 Published by Packt, Hands-on Supervised Machine Learning with Python
+
+### Learn the underpinning os many supervised learning algorithms, and develop rich python coding practices in the process.
+
+*Supervised learning&mdash;help teach a machine to think for itself!*
+
+## Overview
+
+These days machine learning is everywhere, and it’s here to stay. Understanding the core principles that drive how a machine “learns” is a critical skill for any would-be practitioner or consumer alike. This course will introduce you to supervised machine learning, guiding you through the implementation and nuances of many popular machine learning algorithms while facilitating a deep understanding along the way.
+
+In this course, we’ll cover parametric models such as linear and logistic regression, non-parametric methods such as decision trees and boosting, various clustering techniques, and we’ll wrap up with a brief foray into neural networks.
+
+This video course highlights clean coding techniques, object-oriented class design, and general best practices in machine learning
+
+## Target audience
+
+This course is designed for those who would like to understand supervised machine learning algorithms at a deeper level. If you’re interested in understanding how and why an algorithm works rather than simply how to call its API, this course might be for you. Intermediate Python knowledge and at least an intermediate understanding of mathematical concepts is assumed. While notions in this course will be broken down into bits as granular as absolutely possible, terms and ideas such as “matrix transposition,” “gradient,” “dot product,” and “time complexity” are assumed to be understood without further explanation.
+
+## What you will learn
+
+* Understand the fundamental and theoretical differences between parametric and non-parametric models, and why you might opt for one over the other.
+* Discover how a machine can learn a concept and generalize its understanding to new data
+* Implement and grok several well-known supervised learning algorithms from scratch; build out your github portfolio and show off what you’re capable of!
+* Learn about model families like recommender systems, which are immediately applicable in domains such as ecommerce and marketing.
+* Become a much stronger python developer
+
+### Project layout
+
+All **[source code](packtml/)** is within the `packtml` folder, which serves as the python
+package for this course. Within the [examples](examples/) directory, you'll find a
+number of short Python scripts that serve to demonstrate how various classes in the `packtml`
+submodules work. Each respective folder inside the `examples/` directory corresponds to a
+submodule inside of the `packtml` python package.
+
+### Getting started
+
+To get your environment set up, make sure you have Anaconda installed and on your path.
+Then simply run the following:
+
+```bash
+$ conda env create -f environment.yml
+```
+
+To activate your environment in a Unix environment:
+
+```bash
+$ source activate packt-sml
+```
+
+In a Windows environment:
+
+```
+activate packt-sml
+```
+
+### Set up the python package (in your activated environment):
+
+```bash
+(packt-sml) $ python setup.py install
+```
+
+## What you'll learn
+
+In this course and within this package, you'll learn to implement a number of 
+commonly-used supervised learning algorithms, and when best to use one type of
+model over another. Below you'll find in-action examples of the various algorithms 
+we implement within this package.
+
+### Regression
+
+The classic introduction to machine learning, not only will we learn about linear regression,
+we'll code one from scratch so you really understand what's happening 
+[under the hood](packtml/regression/simple_regression.py). Then we'll 
+[apply one in practice](examples/regression/example_linear_regression.py) so you can see 
+how you might use it.
+
+<img src="img/regression/example_linear_regression.png" alt="KNN" width="50%"/>
+
+Next, we'll dive into logistic regression, which is linear regression's classification cousin. See
+the full logistic regression example [here](examples/regression/example_logistic_regression.py)
+or the algorithm's [source code](packtml/regression/simple_logistic.py) if you're interested.
+
+<img src="img/regression/example_logistic_regression.png" alt="KNN" width="50%"/>
+
+### KNN clustering
+
+During our exploration of non-parametric models, we'll explore clustering.
+The `packtml` package implements a simple, but effective k-Nearest Neighbor classifier.
+Here is its output on the iris dataset. For the full code example, head to the
+[examples directory](examples/clustering/example_knn_classifier.py) and then to the
+[source code](packtml/clustering/knn.py) to see how it's implemented.
+
+<img src="img/clustering/example_knn_classifier.png" alt="KNN" width="50%"/>
+
+### Decision trees
+
+In this course, we'll also implement a CART decision tree from scratch (for both
+regression and classification). Our classification tree's performance and potential 
+is shown at varying tree depths in the images below. The classification tree example
+is located [here](examples/decision_tree/example_classification_decision_tree.py), and
+the source code can be found [here](packtml/decision_tree/cart.py).
+
+<img src="img/decision_tree/example_classification_decision_tree.png" alt="CART clf" width="75%"/>
+
+In addition to classification, we can build a tree as a non-linear regression
+model, as shown below. The regression tree example is located 
+[here](examples/decision_tree/example_regression_decision_tree.py). Check out the
+[source code](packtml/decision_tree/cart.py) to understand how it works.
+
+<img src="img/decision_tree/example_regression_decision_tree.png" alt="CART reg" width="75%"/>
+
+### Deep learning
+
+One of the hottest topics of machine learning right now is deep learning and neural
+networks. In this course, we'll learn how to code a multi-layer perceptron classifier
+from scratch. The full example code is located [here](examples/neural_net/example_mlp_classifier.py)
+and this is the [source code](packtml/neural_net/mlp.py).
+
+<img src="img/neural_net/example_mlp_classifier.png" alt="MLP" width="75%"/>
+
+Next, we'll show how we can use the weights the MLP has learned on previous data to
+learn new classification labels via transfer learning. For further implementation
+details, check out the [example code](examples/neural_net/example_transfer_learning.py)
+or the [source code](packtml/neural_net/transfer.py).
+
+<img src="img/neural_net/example_transfer_learning.png" alt="MLP transfer" width="75%"/>
+
+### Recommendation algorithms
+
+These days, everything is available for purchase online. E-commerce sites have devoted
+lots of research to algorithms that can learn your preferences. In this course, we'll
+learn two such algorithms:
+
+* [Item-to-item](packtml/recommendation/itemitem.py) collaborative filtering
+* [Alternating least squares](packtml/recommendation/als.py) (matrix factorization)
+
+The [example ALS code](examples/recommendation/example_als_recommender.py) shows how
+train error decreases by iteration:
+
+<img src="img/recommendation/example_als_recommender.png" alt="ALS" width="50%"/>
--- a/build_tools/README.md
+++ b/build_tools/README.md
@ -0,0 +1,3 @@
+# CI/CD build tools
+
+The scripts contained here are simply used for building the CI/CD pipeline.
--- a/build_tools/travis/before_install.sh
+++ b/build_tools/travis/before_install.sh
@ -0,0 +1,5 @@
+#!/bin/bash
+
+# only build on linux for travis, so this will work
+set -e
+sudo apt-get -qq update
--- a/build_tools/travis/before_script.sh
+++ b/build_tools/travis/before_script.sh
@ -0,0 +1,7 @@
+#!/bin/bash
+
+set -e
+
+export DISPLAY=:99.0
+sh -e /etc/init.d/xvfb start
+sleep 5 # give xvfb some time to start by sleeping for 5 seconds
--- a/build_tools/travis/install.sh
+++ b/build_tools/travis/install.sh
@ -0,0 +1,41 @@
+#!/bin/bash
+# This script is meant to be called by the "install" step defined in
+# .travis.yml. See http://docs.travis-ci.com/ for more details.
+# The behavior of the script is controlled by environment variables defined
+# in the .travis.yml in the top level folder of the project.
+
+set -e
+
+echo 'List files from cached directories'
+echo 'pip:'
+ls $HOME/.cache/pip
+
+# for caching
+export CC=/usr/lib/ccache/gcc
+export CXX=/usr/lib/ccache/g++
+# Useful for debugging how ccache is used
+# export CCACHE_LOGFILE=/tmp/ccache.log
+# ~60M is used by .ccache when compiling from scratch at the time of writing
+ccache --max-size 100M --show-stats
+
+# Deactivate the travis-provided virtual environment and setup a
+# conda-based environment instead.
+deactivate || echo "No virtualenv or condaenv to deactivate"
+
+# install conda
+wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
+MINICONDA_PATH=/home/travis/miniconda
+
+# append the path, update conda
+chmod +x miniconda.sh && ./miniconda.sh -b -p $MINICONDA_PATH
+export PATH=$MINICONDA_PATH/bin:$PATH
+conda update --yes conda
+
+# Create the conda env and install the requirements
+conda create -n testenv --yes python=${PYTHON_VERSION}
+source activate testenv
+pip install -r requirements.txt
+pip install pytest pytest-cov
+
+# set up the package
+python setup.py install
--- a/build_tools/travis/test_script.sh
+++ b/build_tools/travis/test_script.sh
@ -0,0 +1,17 @@
+#!/bin/bash
+
+set -e
+
+run_tests() {
+    oldpwd=`pwd`
+
+    # Move to another directory to test
+    cd ..
+    mkdir -p ${TEST_DIR} && cd ${TEST_DIR}
+    pytest --cov packtml
+
+    # move back to original dir
+    cd ${oldpwd}
+}
+
+run_tests
--- a/curriculum.docx
+++ b/curriculum.docx
--- a/environment.yml
+++ b/environment.yml
@ -0,0 +1,9 @@
+name: packt-sml
+
+dependencies:
+  - python=3.6
+  - numpy
+  - scipy
+  - scikit-learn
+  - pandas
+  - matplotlib
--- a/examples/1.1
+++ b/examples/1.1
--- a/examples/clustering/example_knn_classifier.py
+++ b/examples/clustering/example_knn_classifier.py
@ -0,0 +1,53 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.clustering import KNNClassifier
+from packtml.utils.plotting import add_decision_boundary_to_axis
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import accuracy_score
+from sklearn.datasets import load_iris
+from matplotlib import pyplot as plt
+from matplotlib.colors import ListedColormap
+import numpy as np
+import sys
+
+# #############################################################################
+# Create a classification sub-dataset using iris
+iris = load_iris()
+X = iris.data[:, :2]
+y = iris.target
+
+# split data
+X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
+
+# #############################################################################
+# Fit a k-nearest neighbor model and get predictions
+k=10
+clf = KNNClassifier(X_train, y_train, k=k)
+pred = clf.predict(X_test)
+clf_accuracy = accuracy_score(y_test, pred)
+print("Test accuracy: %.3f" % clf_accuracy)
+
+# #############################################################################
+# Visualize difference in classes (this is from the scikit-learn KNN
+# plotting example:
+# http://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html#sphx-glr-auto-examples-neighbors-plot-classification-py)
+
+xx, yy, _ = add_decision_boundary_to_axis(estimator=clf, axis=plt,
+                                          nclasses=3, X_data=X_test)
+
+# Plot also the training points
+plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test,
+            cmap=ListedColormap(['#FF0000', '#00FF00', '#0000FF']),
+            edgecolor='k', s=20)
+
+plt.xlim(xx.min(), xx.max())
+plt.ylim(yy.min(), yy.max())
+plt.title("3-Class classification (k=%i)" % k)
+
+# if we're supposed to save it, do so INSTEAD OF showing it
+if len(sys.argv) > 1:
+    plt.savefig(sys.argv[1])
+else:
+    plt.show()
--- a/examples/data/README.md
+++ b/examples/data/README.md
@ -0,0 +1,3 @@
+# Demo data
+
+Cached data for the ML demo goes here.
--- a/examples/data/spam.csv
+++ b/examples/data/spam.csv
--- a/examples/decision_tree/example_classification_decision_tree.py
+++ b/examples/decision_tree/example_classification_decision_tree.py
@ -0,0 +1,63 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.decision_tree import CARTClassifier
+from packtml.utils.plotting import add_decision_boundary_to_axis
+from sklearn.metrics import accuracy_score
+from sklearn.model_selection import train_test_split
+import matplotlib.pyplot as plt
+import numpy as np
+import sys
+
+# #############################################################################
+# Create a classification dataset
+rs = np.random.RandomState(42)
+covariance = [[1, .75], [.75, 1]]
+n_obs = 500
+x1 = rs.multivariate_normal(mean=[0, 0], cov=covariance, size=n_obs)
+x2 = rs.multivariate_normal(mean=[1, 3], cov=covariance, size=n_obs)
+
+X = np.vstack((x1, x2)).astype(np.float32)
+y = np.hstack((np.zeros(n_obs), np.ones(n_obs)))
+
+# split the data
+X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
+
+# #############################################################################
+# Fit a simple decision tree classifier and get predictions
+shallow_depth = 2
+clf = CARTClassifier(X_train, y_train, max_depth=shallow_depth, criterion='gini',
+                     random_state=42)
+pred = clf.predict(X_test)
+clf_accuracy = accuracy_score(y_test, pred)
+print("Test accuracy (depth=%i): %.3f" % (shallow_depth, clf_accuracy))
+
+# Fit a deeper tree and show accuracy increases
+clf2 = CARTClassifier(X_train, y_train, max_depth=25, criterion='gini',
+                      random_state=42)
+pred2 = clf2.predict(X_test)
+clf2_accuracy = accuracy_score(y_test, pred2)
+print("Test accuracy (depth=25): %.3f" % clf2_accuracy)
+
+# #############################################################################
+# Visualize difference in classification ability
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 8))
+
+add_decision_boundary_to_axis(estimator=clf, axis=axes[0],
+                              nclasses=2, X_data=X_test)
+axes[0].scatter(X_test[:, 0], X_test[:, 1], c=pred, alpha=0.4)
+axes[0].set_title("Shallow tree (depth=%i) performance: %.3f"
+                  % (shallow_depth, clf_accuracy))
+
+add_decision_boundary_to_axis(estimator=clf2, axis=axes[1],
+                              nclasses=2, X_data=X_test)
+axes[1].scatter(X_test[:, 0], X_test[:, 1], c=pred2, alpha=0.4)
+axes[1].set_title("Deep tree (depth=25) performance: %.3f" % clf2_accuracy)
+
+# if we're supposed to save it, do so INSTEAD OF showing it
+if len(sys.argv) > 1:
+    plt.savefig(sys.argv[1])
+else:
+    plt.show()
--- a/examples/decision_tree/example_classification_split.py
+++ b/examples/decision_tree/example_classification_split.py
@ -0,0 +1,23 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.decision_tree.cart import RandomSplitter
+from packtml.decision_tree.metrics import InformationGain
+import numpy as np
+
+# #############################################################################
+# Build the example from the slides (3.3)
+X = np.array([[21, 3], [ 4, 2], [37, 2]])
+y = np.array([1, 0, 1])
+
+# this is the splitting class; we'll use gini as the criteria
+random_state = np.random.RandomState(42)
+splitter = RandomSplitter(random_state=random_state,
+                          criterion=InformationGain('gini'),
+                          n_val_sample=3)
+
+# find the best:
+best_feature, best_value, best_gain = splitter.find_best(X, y)
+print("Best feature=%i, best value=%r, information gain: %.3f"
+      % (best_feature, best_value, best_gain))
--- a/examples/decision_tree/example_information_gain.py
+++ b/examples/decision_tree/example_information_gain.py
@ -0,0 +1,19 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.decision_tree.metrics import gini_impurity, InformationGain
+import numpy as np
+
+# #############################################################################
+# Build the example from the slides
+y = np.array([0, 0, 0, 1, 1, 1, 1])
+uncertainty = gini_impurity(y)
+print("Initial gini impurity: %.4f" % uncertainty)
+
+# now get the information gain of the split from the slides
+directions = np.array(["right", "left", "left", "left",
+                       "right", "right", "right"])
+mask = directions == "left"
+print("Information gain from the split we created: %.4f"
+      % InformationGain("gini")(target=y, mask=mask, uncertainty=uncertainty))
--- a/examples/decision_tree/example_regression_decision_tree.py
+++ b/examples/decision_tree/example_regression_decision_tree.py
@ -0,0 +1,53 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.decision_tree import CARTRegressor
+from sklearn.metrics import mean_squared_error
+from sklearn.model_selection import train_test_split
+import matplotlib.pyplot as plt
+import numpy as np
+import sys
+
+# #############################################################################
+# Create a classification dataset
+rs = np.random.RandomState(42)
+X = np.sort(5 * rs.rand(80, 1), axis=0)
+y = np.sin(X).ravel()
+
+# split the data
+X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
+
+# #############################################################################
+# Fit a simple decision tree regressor and get predictions
+clf = CARTRegressor(X_train, y_train, max_depth=3, random_state=42)
+pred = clf.predict(X_test)
+clf_mse = mean_squared_error(y_test, pred)
+print("Test MSE (depth=3): %.3f" % clf_mse)
+
+# Fit a deeper tree and show accuracy increases
+clf2 = CARTRegressor(X_train, y_train, max_depth=10, random_state=42)
+pred2 = clf2.predict(X_test)
+clf2_mse = mean_squared_error(y_test, pred2)
+print("Test MSE (depth=10): %.3f" % clf2_mse)
+
+# #############################################################################
+# Visualize difference in learning ability
+
+x = X_train.ravel()
+xte = X_test.ravel()
+
+fig, axes = plt.subplots(1, 2, figsize=(12, 8))
+axes[0].scatter(x, y_train, alpha=0.25, c='r')
+axes[0].scatter(xte, pred, alpha=1.)
+axes[0].set_title("Shallow tree (depth=3) test MSE: %.3f" % clf_mse)
+
+axes[1].scatter(x, y_train, alpha=0.4, c='r')
+axes[1].scatter(xte, pred2, alpha=1.)
+axes[1].set_title("Deeper tree (depth=10) test MSE: %.3f" % clf2_mse)
+
+# if we're supposed to save it, do so INSTEAD OF showing it
+if len(sys.argv) > 1:
+    plt.savefig(sys.argv[1])
+else:
+    plt.show()
--- a/examples/neural_net/example_mlp_classifier.py
+++ b/examples/neural_net/example_mlp_classifier.py
@ -0,0 +1,78 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.neural_net import NeuralNetClassifier
+from packtml.utils.plotting import add_decision_boundary_to_axis
+from sklearn.metrics import accuracy_score
+from sklearn.model_selection import train_test_split
+import matplotlib.pyplot as plt
+import numpy as np
+import sys
+
+# #############################################################################
+# Create a classification dataset
+rs = np.random.RandomState(42)
+covariance = [[1, .75], [.75, 1]]
+n_obs = 1000
+x1 = rs.multivariate_normal(mean=[0, 0], cov=covariance, size=n_obs)
+x2 = rs.multivariate_normal(mean=[1, 5], cov=covariance, size=n_obs)
+
+X = np.vstack((x1, x2)).astype(np.float32)
+y = np.hstack((np.zeros(n_obs), np.ones(n_obs))).astype(int)
+
+# split the data
+X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=rs)
+
+# #############################################################################
+# Fit a simple neural network
+n_iter = 4
+hidden = (10,)
+clf = NeuralNetClassifier(X_train, y_train, hidden=hidden, n_iter=n_iter,
+                          learning_rate=0.001, random_state=42)
+print("Loss per training iteration: %r" % clf.train_loss)
+
+pred = clf.predict(X_test)
+clf_accuracy = accuracy_score(y_test, pred)
+print("Test accuracy (hidden=%s): %.3f" % (str(hidden), clf_accuracy))
+
+# #############################################################################
+# Fit a more complex neural network
+n_iter2 = 150
+hidden2 = (25, 25)
+clf2 = NeuralNetClassifier(X_train, y_train, hidden=hidden2, n_iter=n_iter2,
+                           learning_rate=0.001, random_state=42)
+
+pred2 = clf2.predict(X_test)
+clf_accuracy2 = accuracy_score(y_test, pred2)
+print("Test accuracy (hidden=%s): %.3f" % (str(hidden2), clf_accuracy2))
+
+# #############################################################################
+# Visualize difference in classification ability
+
+fig, axes = plt.subplots(2, 2, figsize=(12, 8))
+
+add_decision_boundary_to_axis(estimator=clf, axis=axes[0, 0],
+                              nclasses=2, X_data=X_test)
+axes[0, 0].scatter(X_test[:, 0], X_test[:, 1], c=pred, alpha=0.4)
+axes[0, 0].set_title("Shallow (hidden=%s @ %i iter) test accuracy: %.3f"
+                     % (str(hidden), n_iter, clf_accuracy))
+
+add_decision_boundary_to_axis(estimator=clf2, axis=axes[0, 1],
+                              nclasses=2, X_data=X_test)
+axes[0, 1].scatter(X_test[:, 0], X_test[:, 1], c=pred2, alpha=0.4)
+axes[0, 1].set_title("Deeper (hidden=%s @ %i iter): test accuracy: %.3f"
+                     % (str(hidden2), n_iter2, clf_accuracy2))
+
+# show the learning rates for each
+axes[1, 0].plot(np.arange(len(clf.train_loss)), clf.train_loss)
+axes[1, 0].set_title("Training loss by iteration")
+
+axes[1, 1].plot(np.arange(len(clf2.train_loss)), clf2.train_loss)
+axes[1, 1].set_title("Training loss by iteration")
+
+# if we're supposed to save it, do so INSTEAD OF showing it
+if len(sys.argv) > 1:
+    plt.savefig(sys.argv[1])
+else:
+    plt.show()
--- a/examples/neural_net/example_transfer_learning.py
+++ b/examples/neural_net/example_transfer_learning.py
@ -0,0 +1,104 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.neural_net import NeuralNetClassifier, TransferLearningClassifier
+from packtml.utils.plotting import add_decision_boundary_to_axis
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import accuracy_score
+import matplotlib.pyplot as plt
+import numpy as np
+import sys
+
+# #############################################################################
+# Create a classification dataset. This dataset differs from other datsets
+# we've created in that there are two majority classes, and one third (tiny)
+# class that we'll train the transfer learner over
+rs = np.random.RandomState(42)
+covariance = [[1, .75], [.75, 1]]
+
+# these are the majority classes
+n_obs = 1250
+x1 = rs.multivariate_normal(mean=[0, 0], cov=covariance, size=n_obs)
+x2 = rs.multivariate_normal(mean=[1, 5], cov=covariance, size=n_obs)
+
+# this is the minority class
+x3 = rs.multivariate_normal(mean=[0.85, 3.25], cov=[[1., .5], [1.25, 0.85]],
+                            size=n_obs // 3)
+
+# this is what the FIRST network will be trained on
+n_first = int(0.8 * n_obs)
+X = np.vstack((x1[:n_first], x2[:n_first])).astype(np.float32)
+y = np.hstack((np.zeros(n_first), np.ones(n_first))).astype(int)
+
+# this is what the SECOND network will be trained on
+X2 = np.vstack((x1[n_first:], x2[n_first:], x3)).astype(np.float32)
+y2 = np.hstack((np.zeros(n_obs - n_first),
+                np.ones(n_obs - n_first),
+                np.ones(x3.shape[0]) * 2)).astype(int)
+
+# split the data up
+X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=rs)
+X2_train, X2_test, y2_train, y2_test = train_test_split(X2, y2,
+                                                        random_state=rs)
+
+# #############################################################################
+# Fit the first neural network
+hidden = (25, 25)
+n_iter = 75
+clf = NeuralNetClassifier(X_train, y_train, hidden=hidden, n_iter=n_iter,
+                          learning_rate=0.001, random_state=42)
+
+pred = clf.predict(X_test)
+clf_accuracy = accuracy_score(y_test, pred)
+print("Test accuracy (hidden=%s): %.3f" % (str(hidden), clf_accuracy))
+
+# #############################################################################
+# Fit the transfer network - train one more layer with a new class
+t_hidden = (15,)
+t_iter = 25
+transfer = TransferLearningClassifier(X2_train, y2_train, pretrained=clf,
+                                      hidden=t_hidden, n_iter=t_iter,
+                                      random_state=42)
+
+t_pred = transfer.predict(X2_test)
+trans_accuracy = accuracy_score(y2_test, t_pred)
+print("Test accuracy (hidden=%s): %.3f" % (str(hidden + t_hidden),
+                                           trans_accuracy))
+
+# #############################################################################
+# Visualize how the models learned the classes
+
+fig, axes = plt.subplots(2, 2, figsize=(12, 8))
+
+
+add_decision_boundary_to_axis(estimator=clf, axis=axes[0, 0],
+                              nclasses=2, X_data=X_test)
+axes[0, 0].scatter(X_test[:, 0], X_test[:, 1], c=pred, alpha=0.4)
+axes[0, 0].set_title("MLP network (hidden=%s @ %i iter): %.3f"
+                     % (str(hidden), n_iter, clf_accuracy))
+
+add_decision_boundary_to_axis(estimator=transfer, axis=axes[0, 1],
+                              nclasses=3, X_data=X2_test)
+axes[0, 1].scatter(X2_test[:, 0], X2_test[:, 1], c=t_pred, alpha=0.4)
+axes[0, 1].set_title("Transfer network (hidden=%s @ %i iter): "
+                     "%.3f" % (str(hidden + t_hidden), t_iter,
+                               trans_accuracy))
+
+# show the learning rates for each
+axes[1, 0].plot(np.arange(len(clf.train_loss)), clf.train_loss)
+axes[1, 0].set_title("Training loss by iteration")
+
+# concat the two training losses together for this plot
+trans_train_loss = clf.train_loss + transfer.train_loss
+axes[1, 1].plot(np.arange(len(trans_train_loss)), trans_train_loss)
+axes[1, 1].set_title("Training loss by iteration")
+
+# Add a verticle line for where the transfer learning begins
+axes[1, 1].axvline(x=n_iter, ls="--")
+
+# if we're supposed to save it, do so INSTEAD OF showing it
+if len(sys.argv) > 1:
+    plt.savefig(sys.argv[1])
+else:
+    plt.show()
--- a/examples/recommendation/example_als_recommender.py
+++ b/examples/recommendation/example_als_recommender.py
@ -0,0 +1,54 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.recommendation import ALS
+from packtml.recommendation.data import get_completely_fabricated_ratings_data
+from packtml.metrics.ranking import mean_average_precision
+from matplotlib import pyplot as plt
+import numpy as np
+import sys
+
+# #############################################################################
+# Use our fabricated data set
+R, titles = get_completely_fabricated_ratings_data()
+
+# #############################################################################
+# Fit an item-item recommender, predict for user 0
+n_iter = 25
+rec = ALS(R, factors=5, n_iter=n_iter, random_state=42, lam=0.01)
+user0_rec, user_0_preds = rec.recommend_for_user(
+    R, user=0, filter_previously_seen=True,
+    return_scores=True)
+
+# print some info about user 0
+top_rated = np.argsort(-R[0, :])[:3]
+print("User 0's top 3 rated movies are: %r" % titles[top_rated].tolist())
+print("User 0's top 3 recommended movies are: %r"
+      % titles[user0_rec[:3]].tolist())
+
+# #############################################################################
+# We can score our recommender as well, to determine how well it actually did
+
+# first, get all user recommendations (top 10, not filtered)
+recommendations = list(rec.recommend_for_all_users(
+    R, n=10, filter_previously_seen=False,
+    return_scores=False))
+
+# get the TRUE items they've rated (in order)
+ground_truth = np.argsort(-R, axis=1)
+mean_avg_prec = mean_average_precision(
+    predictions=recommendations, labels=ground_truth)
+print("Mean average precision: %.3f" % mean_avg_prec)
+
+# plot the error
+plt.plot(np.arange(n_iter), rec.train_err)
+plt.xlabel("Iteration")
+plt.ylabel("MSE")
+plt.title("Train error by iteration")
+
+# if we're supposed to save it, do so INSTEAD OF showing it
+if len(sys.argv) > 1:
+    plt.savefig(sys.argv[1])
+else:
+    plt.show()
--- a/examples/recommendation/example_item_item_recommender.py
+++ b/examples/recommendation/example_item_item_recommender.py
@ -0,0 +1,39 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.recommendation import ItemItemRecommender
+from packtml.recommendation.data import get_completely_fabricated_ratings_data
+from packtml.metrics.ranking import mean_average_precision
+import numpy as np
+
+# #############################################################################
+# Use our fabricated data set
+R, titles = get_completely_fabricated_ratings_data()
+
+# #############################################################################
+# Fit an item-item recommender, predict for user 0
+rec = ItemItemRecommender(R, k=3)
+user0_rec, user_0_preds = rec.recommend_for_user(
+    R, user=0, filter_previously_seen=True,
+    return_scores=True)
+
+# print some info about user 0
+top_rated = np.argsort(-R[0, :])[:3]
+print("User 0's top 3 rated movies are: %r" % titles[top_rated].tolist())
+print("User 0's top 3 recommended movies are: %r"
+      % titles[user0_rec[:3]].tolist())
+
+# #############################################################################
+# We can score our recommender as well, to determine how well it actually did
+
+# first, get all user recommendations (top 10, not filtered)
+recommendations = list(rec.recommend_for_all_users(
+    R, n=10, filter_previously_seen=False,
+    return_scores=False))
+
+# get the TRUE items they've rated (in order)
+ground_truth = np.argsort(-R, axis=1)
+mean_avg_prec = mean_average_precision(
+    predictions=recommendations, labels=ground_truth)
+print("Mean average precision: %.3f" % mean_avg_prec)
--- a/examples/regression/example_linear_regression.py
+++ b/examples/regression/example_linear_regression.py
@ -0,0 +1,53 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.regression import SimpleLinearRegression
+from sklearn.linear_model import LinearRegression
+from sklearn.model_selection import train_test_split
+from matplotlib import pyplot as plt
+import numpy as np
+import sys
+
+# #############################################################################
+# Create a data-set that perfectly models the linear relationship:
+# y = 2a + 1.5b + 0
+random_state = np.random.RandomState(42)
+X = random_state.rand(500, 2)
+y = 2. * X[:, 0] + 1.5 * X[:, 1]
+
+# split the data
+X_train, X_test, y_train, y_test = train_test_split(X, y,
+                                                    random_state=random_state)
+
+# #############################################################################
+# Fit a simple linear regression, produce predictions
+lm = SimpleLinearRegression(X_train, y_train)
+predictions = lm.predict(X_test)
+print("Test sum of residuals: %.3f" % (y_test - predictions).sum())
+assert np.allclose(lm.theta, [2., 1.5])
+
+# #############################################################################
+# Show that our solution is similar to scikit-learn's
+
+lr = LinearRegression(fit_intercept=True)
+lr.fit(X_train, y_train)
+assert np.allclose(lm.theta, lr.coef_)
+assert np.allclose(predictions, lr.predict(X_test))
+
+# #############################################################################
+# Fit another on ONE feature so we can show the plot
+X_train = X_train[:, np.newaxis, 0]
+X_test = X_test[:, np.newaxis, 0]
+lm = SimpleLinearRegression(X_train, y_train)
+
+# create the predictions & plot them as the line
+preds = lm.predict(X_test)
+plt.scatter(X_test[:, 0], y_test, color='black')
+plt.plot(X_test[:, 0], preds, linewidth=3)
+
+# if we're supposed to save it, do so INSTEAD OF showing it
+if len(sys.argv) > 1:
+    plt.savefig(sys.argv[1])
+else:
+    plt.show()
--- a/examples/regression/example_logistic_regression.py
+++ b/examples/regression/example_logistic_regression.py
@ -0,0 +1,57 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.regression import SimpleLogisticRegression
+from packtml.utils.plotting import add_decision_boundary_to_axis
+from sklearn.linear_model import LogisticRegression
+from sklearn.datasets import make_classification
+from sklearn.model_selection import train_test_split
+from sklearn.metrics import accuracy_score
+from matplotlib import pyplot as plt
+import sys
+
+# #############################################################################
+# Create an almost perfectly linearly-separable classification set
+X, y = make_classification(n_samples=100, n_features=2, random_state=42,
+                           n_redundant=0, n_repeated=0, n_classes=2,
+                           class_sep=1.0)
+
+# split data
+X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
+
+# #############################################################################
+# Fit a simple logistic regression, produce predictions
+lm = SimpleLogisticRegression(X_train, y_train, n_steps=50)
+
+predictions = lm.predict(X_test)
+acc = accuracy_score(y_test, predictions)
+print("Test accuracy: %.3f" % acc)
+
+# Show that our solution is similar to scikit-learn's
+lr = LogisticRegression(fit_intercept=True, C=1e16)  # almost no regularization
+lr.fit(X_train, y_train)
+print("Sklearn test accuracy: %.3f" % accuracy_score(y_test,
+                                                     lr.predict(X_test)))
+
+# #############################################################################
+# Plot the data and the boundary we learned.
+
+add_decision_boundary_to_axis(estimator=lm, axis=plt,
+                              nclasses=2, X_data=X_test)
+
+# We have to break this into two plot calls, one for each class to
+# have different markers...
+c0_mask = y_test == 0
+plt.scatter(X_test[c0_mask, 0], X_test[c0_mask, 1],
+            c=~predictions[c0_mask], marker='o')
+plt.scatter(X_test[~c0_mask, 0], X_test[~c0_mask, 1],
+            c=~predictions[~c0_mask], marker='x')
+
+plt.title("Logistic test performance: %.4f (o=true 0, x=true 1)" % acc)
+
+# if we're supposed to save it, do so INSTEAD OF showing it
+if len(sys.argv) > 1:
+    plt.savefig(sys.argv[1])
+else:
+    plt.show()
--- a/examples/run_all_examples.py
+++ b/examples/run_all_examples.py
@ -0,0 +1,56 @@
+# -*- coding: utf-8 -*-
+#
+# This function is not intended to be run by students (or anyone, for that
+# matter). It is intended to be run by me (Taylor) just to automate the
+# population of the img/ directory with the output of the example plots.
+# Hence its poor documentation and sheer hackiness.
+
+from __future__ import absolute_import
+
+import os
+import sys
+import subprocess
+
+# determine where the user is calling this from...
+here = os.listdir(".")
+if "examples" in here:
+    cwd = "examples"
+    img_dir = "img"
+elif "clustering" in here:
+    cwd = "."
+    img_dir = "../img"
+else:
+    raise ValueError("Call this from top-level or from within "
+                     "the examples dir")
+
+# iterate all py files
+for root, dirs, files in os.walk(cwd, topdown=False):
+    for fil in files:
+        # Only run the ones with the appropriate prefix
+        if not fil.startswith("example_"):
+            continue
+
+        # Get the module root
+        module_root = root.split(os.sep)[1]
+
+        # If it's "data" we don't want that! That's where we cache the data
+        # for the demo
+        if module_root in ("data", ".ipynb_checkpoints"):
+            print("Skipping dir: %s" % module_root)
+            continue
+
+        # Otherwise create its corresponding path in ../img
+        image_root = os.path.join(img_dir, module_root)  # ../img/clustering
+
+        # create the directory in the image dir if it's not there
+        if not os.path.exists(image_root):
+            os.mkdir(image_root)
+
+        # run it
+        dest = os.path.join(image_root, fil[:-3] + ".png")
+        filexec = os.path.join(root, fil)
+
+        print("Running %s" % filexec)
+        subprocess.Popen([sys.executable, filexec, dest])
+
+sys.exit(0)
--- a/img/README.md
+++ b/img/README.md
@ -0,0 +1,5 @@
+# img
+
+Within this directory, you'll find the output of the various example scripts.
+The rendering of these images is automated by the 
+[examples/run_all_examples.py](../examples/run_all_examples.py) script.
--- a/img/clustering/example_knn_classifier.png
+++ b/img/clustering/example_knn_classifier.png
--- a/img/decision_tree/example_classification_decision_tree.png
+++ b/img/decision_tree/example_classification_decision_tree.png
--- a/img/decision_tree/example_regression_decision_tree.png
+++ b/img/decision_tree/example_regression_decision_tree.png
--- a/img/neural_net/example_mlp_classifier.png
+++ b/img/neural_net/example_mlp_classifier.png
--- a/img/neural_net/example_transfer_learning.png
+++ b/img/neural_net/example_transfer_learning.png
--- a/img/recommendation/example_als_recommender.png
+++ b/img/recommendation/example_als_recommender.png
--- a/img/regression/example_linear_regression.png
+++ b/img/regression/example_linear_regression.png
--- a/img/regression/example_logistic_regression.png
+++ b/img/regression/example_logistic_regression.png
--- a/packtml/VERSION
+++ b/packtml/VERSION
@ -0,0 +1 @@
+1.0.3
--- a/packtml/init.py
+++ b/packtml/init.py
@ -0,0 +1,32 @@
+# -*- coding: utf-8 -*-
+
+import os
+
+# global namespace:
+from packtml import clustering
+from packtml import decision_tree
+from packtml import metrics
+from packtml import neural_net
+from packtml import recommendation
+from packtml import regression
+from packtml import utils
+
+# set the version
+packtml_location = os.path.abspath(os.path.dirname(__file__))
+with open(os.path.join(packtml_location, "VERSION")) as vsn:
+    __version__ = vsn.read().strip()
+
+# remove from global namespace
+del os
+del packtml_location
+del vsn
+
+__all__ = [
+    'clustering',
+    'decision_tree',
+    'metrics',
+    'neural_net',
+    'recommendation',
+    'regression',
+    'utils'
+]
--- a/packtml/base.py
+++ b/packtml/base.py
@ -0,0 +1,42 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from abc import ABCMeta, abstractmethod
+
+from sklearn.externals import six
+
+__all__ = [
+    'BaseSimpleEstimator'
+]
+
+
+class BaseSimpleEstimator(six.with_metaclass(ABCMeta)):
+    """Base class for packt estimators.
+
+    The estimators in the Packt package do not behave exactly like scikit-learn
+    estimators (by design). They are made to perform the model fit immediately
+    upon class instantiation. Moreover, many of the hyper-parameter options
+    are limited to promote readability and avoid confusion.
+
+    The constructor of every Packt estimator should resemble the following::
+
+        def __init__(self, X, y, *args, **kwargs):
+            ...
+
+    where ``X`` is the training matrix, ``y`` is the training target variable,
+    and ``*args`` and ``**kwargs`` are varargs that will differ for each
+    estimator.
+    """
+    @abstractmethod
+    def predict(self, X):
+        """Form predictions based on new data.
+
+        This function must be implemented by subclasses to generate
+        predictions given the model fit.
+
+        Parameters
+        ----------
+        X : array-like, shape=(n_samples, n_features)
+            The test array. Should be only finite values.
+        """
--- a/packtml/clustering/init.py
+++ b/packtml/clustering/init.py
@ -0,0 +1,5 @@
+# -*- coding: utf-8 -*-
+
+from .knn import *
+
+__all__ = [s for s in dir() if not s.startswith("_")]
--- a/packtml/clustering/knn.py
+++ b/packtml/clustering/knn.py
@ -0,0 +1,99 @@
+# -*- coding: utf-8 -*-
+#
+# Author: Taylor Smith <taylor.smith@alkaline-ml.com>
+#
+# An implementation of kNN clustering. Note that this was written to
+# maximize readability. To use kNN in a true project setting, you may
+# wish to use a more highly optimized library, such as scikit-learn.
+
+from __future__ import absolute_import
+
+from sklearn.metrics.pairwise import euclidean_distances
+from sklearn.utils.validation import check_X_y
+from sklearn.utils.multiclass import check_classification_targets
+
+from scipy.stats import mode
+import numpy as np
+
+from ..base import BaseSimpleEstimator
+
+__all__ = [
+    'KNNClassifier'
+]
+
+
+class KNNClassifier(BaseSimpleEstimator):
+    """Classify points using k-Nearest Neighbors.
+
+    The kNN algorithm computes the distances between points in a matrix and
+    identifies the nearest "neighboring" points to each observation. The idea
+    is that neighboring points share similar attributes. Therefore, if a
+    neighbor is of some class, an unknown observation may likely belong to
+    the same class.
+
+    There are several caveats to kNN:
+
+        * We have to retain all of the training data, which is expensive.
+        * Computing the pairwise distance matrix is also expensive.
+        * You should make sure you've standardized your data (mean 0, stddev 1)
+          prior to fitting a kNN model
+
+    Parameters
+    ----------
+    X : array-like, shape=(n_samples, n_features)
+        The training array. Should be a numpy array or array-like structure
+        with only finite values.
+
+    y : array-like, shape=(n_samples,)
+        The target vector.
+
+    k : int, optional (default=10)
+        The number of neighbors to identify. The higher the ``k`` parameter,
+        the more likely you are to *under*-fit your data. The lower the ``k``
+        parameter, the more likely you are to *over*-fit your model.
+
+    Notes
+    -----
+    This is a very rudimentary implementation of KNN. It does not permit tuning
+    of distance metrics, optimization of the search algorithm or any other
+    parameters. It is written to be as simple as possible to maximize
+    readability. For a more optimal solution, see
+    ``sklearn.neighbors.KNeighborsClassifier``.
+    """
+    def __init__(self, X, y, k=10):
+        # check the input array
+        X, y = check_X_y(X, y, accept_sparse=False, dtype=np.float32,
+                         copy=True)
+
+        # make sure we're performing classification here
+        check_classification_targets(y)
+
+        # Save the K hyper-parameter so we can use it later
+        self.k = k
+
+        # kNN is a special case where we have to save the training data in
+        # order to make predictions in the future
+        self.X = X
+        self.y = y
+
+    def predict(self, X):
+        # Compute the pairwise distances between each observation in
+        # the dataset and the training data. This can be relatively expensive
+        # for very large datasets!!
+        dists = euclidean_distances(X, self.X)
+
+        # Arg sort to find the shortest distance for each row. This sorts
+        # elements in each row (independent of other rows) to determine the
+        # order required to sort the rows.
+        # I.e:
+        # >>> P = np.array([[4, 5, 1], [3, 1, 6]])
+        # >>> np.argsort(P, axis=1)
+        # array([[2, 0, 1],
+        #        [1, 0, 2]])
+        nearest = np.argsort(dists, axis=1)
+
+        # We only care about the top K, really, so get sorted and then truncate
+        predicted_labels = self.y[nearest][:, :self.k]
+
+        # We want the most common along the rows as the predictions
+        return mode(predicted_labels, axis=-1)[0].ravel()
--- a/packtml/clustering/tests/init.py
+++ b/packtml/clustering/tests/init.py
@ -0,0 +1,3 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
--- a/packtml/clustering/tests/test_knn.py
+++ b/packtml/clustering/tests/test_knn.py
@ -0,0 +1,33 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.clustering import KNNClassifier
+
+from sklearn.datasets import load_iris
+from numpy.testing import assert_array_equal
+import numpy as np
+
+iris = load_iris()
+X = iris.data[:, :2]
+y = iris.target
+
+
+def test_knn():
+    # show we can fit
+    knn = KNNClassifier(X, y)
+    # show we can predict
+    knn.predict(X)
+
+
+def test_knn2():
+    X2 = np.array([[0., 0., 0.5],
+                   [0., 0.5, 0.],
+                   [0.5, 0., 0.],
+                   [5., 5., 6.],
+                   [6., 5., 5.]])
+
+    y2 = [0, 0, 0, 1, 1]
+    knn = KNNClassifier(X2, y2, k=3)
+    preds = knn.predict(X2)
+    assert_array_equal(preds, y2)
--- a/packtml/decision_tree/init.py
+++ b/packtml/decision_tree/init.py
@ -0,0 +1,6 @@
+# -*- coding: utf-8 -*-
+
+from .cart import *
+from .metrics import *
+
+__all__ = [s for s in dir() if not s.startswith("_")]
--- a/packtml/decision_tree/cart.py
+++ b/packtml/decision_tree/cart.py
@ -0,0 +1,493 @@
+# -*- coding: utf-8 -*-
+#
+# Author: Taylor G Smith <taylor.smith@alkaline-ml.com>
+#
+# A simplified version of Classification and Regression Trees. This file
+# is intended to maximize readability and understanding of how CART trees work.
+# For very fast or customizable decision tree solutions, use scikit-learn.
+#
+# The best order in which to read & understand the contents to best
+# grok the entire concept:
+#
+#   1. metrics.InformationGain & metrics.VarianceReduction
+#   2. RandomSplitter
+#   3. LeafNode
+#   4. BaseCART
+
+from __future__ import absolute_import, division
+
+from sklearn.utils.validation import check_X_y, check_random_state, check_array
+from sklearn.utils.multiclass import check_classification_targets
+from sklearn.base import ClassifierMixin, RegressorMixin, is_classifier
+
+import numpy as np
+
+from ..base import BaseSimpleEstimator
+from .metrics import InformationGain, VarianceReduction
+
+__all__ = [
+    'CARTRegressor',
+    'CARTClassifier'
+]
+
+try:
+    xrange
+except NameError:  # py3
+    xrange = range
+
+
+class RandomSplitter(object):
+    """Evaluate a split via random values in a feature.
+
+    Every feature in the dataset needs to be evaluated in a CART tree. Since
+    that in itself can be expensive, the random splitter allows us to look at
+    only a random amount of row splits per feature in order to make the best
+    splitting decision.
+
+    Parameters
+    ----------
+    random_state : np.random.RandomState
+        The random state for seeding the choices
+
+    criterion : callable
+        The metric used for evaluating the "goodness" of a split. Either
+        ``InformationGain`` (with entropy or Gini) for classification, or
+        ``VarianceReduction`` for regression.
+
+    n_val_sample : float, optional (default=25)
+        The number of values per feature to sample as a splitting point.
+    """
+    def __init__(self, random_state, criterion, n_val_sample=25):
+        self.random_state = random_state
+        self.criterion = criterion  # BaseCriterion from metrics
+        self.n_val_sample = n_val_sample
+
+    def find_best(self, X, y):
+        criterion = self.criterion
+        rs = self.random_state
+
+        # keep track of the best info gain
+        best_gain = 0.
+
+        # keep track of best feature and best value on which to split
+        best_feature = None
+        best_value = None
+
+        # get the current state of the uncertainty (gini or entropy)
+        uncertainty = criterion.compute_uncertainty(y)
+
+        # iterate over each feature
+        for col in xrange(X.shape[1]):
+            feature = X[:, col]
+
+            # get all values in the feature
+            # values = np.unique(feature)
+            seen_values = set()
+
+            # the number of values to sample. Should be defined as the min
+            # between the prescribed n_val_sample value and the number of
+            # unique values in the feature.
+            n_vals = min(self.n_val_sample, np.unique(feature).shape[0])
+
+            # For each of n_val_sample iterations, select a random value
+            # from the feature and create a split. We store whether we've seen
+            # the value before; if we have, continue. Continue until we've seen
+            # n_vals unique values. This allows us to more likely select values
+            # that are high frequency (retains distributional data implicitly)
+            for v in rs.permutation(feature):
+
+                # if we've hit the limit of the number of values we wanted to
+                # examine, break out
+                if len(seen_values) == n_vals:
+                    break
+
+                # if we've already tried this value, continue
+                elif v in seen_values:  # O(1) lookup
+                    continue
+
+                # otherwise, it's a new value we've never tried splitting on.
+                # add it to the set.
+                seen_values.add(v)
+
+                # create the mask (these values "go left")
+                mask = feature >= v  # type: np.ndarray
+
+                # skip this step if this doesn't divide the dataset
+                if np.unique(mask).shape[0] == 1:  # all True or all False
+                    continue
+
+                # compute how good this split was
+                gain = criterion(y, mask, uncertainty=uncertainty)
+
+                # if the gain is better, we keep this feature & value &
+                # update the best gain we've seen so far
+                if gain > best_gain:
+                    best_feature = col
+                    best_value = v
+                    best_gain = gain
+
+        # if best feature is None, it means we never found a viable split...
+        # this is likely because all of our labels were perfect. In this case,
+        # we could select any feature and the first value and define that as
+        # our left split and nothing will go right.
+        if best_feature is None:
+            best_feature = 0
+            best_value = np.squeeze(X[:, best_feature])[0]
+            best_gain = 0.
+
+        # we need to know the best feature, the best value, and the best gain
+        return best_feature, best_value, best_gain
+
+
+class LeafNode(object):
+    """A tree node class.
+
+    Tree node that store the column on which to split and the value above
+    which to go left vs. right. Additionally, it stores the target statistic
+    related to this node. For instance, in a classification scenario:
+
+        >>> X = np.array([[ 1, 1.5 ],
+        ...               [ 2, 0.5 ],
+        ...               [ 3, 0.75]])
+        >>> y = np.array([0, 1, 1])
+        >>> node = LeafNode(split_col=0, split_val=2,
+        ...                 class_statistic=_most_common(y))
+
+    This means if ``node`` were a terminal node, it would generate predictions
+    of 1, since that was the most common value in the pre-split ``y``. The
+    class statistic will differ for splits in the tree, where the most common
+    value in ``y`` for records in ``X`` that go left is 1, and 0 for that which
+    goes to the right.
+
+    The class statistic is computed for each split as the tree recurses.
+
+    Parameters
+    ----------
+    split_col : int
+        The column on which to split.
+
+    split_val : float or int
+        The value above which to go left.
+    """
+    def __init__(self, split_col, split_val, split_gain, class_statistic):
+
+        self.split_col = split_col
+        self.split_val = split_val
+        self.split_gain = split_gain
+
+        # the class statistic is the mode or the mean of the targets for
+        # this split
+        self.class_statistic = class_statistic
+
+        # if these remain None, it's a terminal node
+        self.left = None
+        self.right = None
+
+    def create_split(self, X, y):
+        """Split the next X, y.
+
+        Returns
+        -------
+        X_left : np.ndarray, shape=(n_samples, n_features)
+            Rows where ``split_col >= split_val``.
+
+        X_right : np.ndarray, shape=(n_samples, n_features)
+            Rows where ``split_col < split_val``.
+
+        y_left : np.ndarray, shape=(n_samples,)
+            Target where ``split_col >= split_val``.
+
+        y_right : np.ndarray, shape=(n_samples,)
+            Target where ``split_col < split_val``.
+        """
+        # If values in the split column are greater than or equal to the
+        # split value, we go left.
+        left_mask = X[:, self.split_col] >= self.split_val
+
+        # Otherwise we go to the right
+        right_mask = ~left_mask  # type: np.ndarray
+
+        # If the left mask is all False or all True, it means we've achieved
+        # a perfect split.
+        all_left = left_mask.all()
+        all_right = right_mask.all()
+
+        # create the left split. If it's all right side, we'll return None
+        X_left = X[left_mask, :] if not all_right else None
+        y_left = y[left_mask] if not all_right else None
+
+        # create the right split. If it's all left side, we'll return None.
+        X_right = X[right_mask, :] if not all_left else None
+        y_right = y[right_mask] if not all_left else None
+
+        return X_left, X_right, y_left, y_right
+
+    def is_terminal(self):
+        """Determine whether the node is terminal.
+
+        If there is no left node and no right node, it's a terminal node.
+        If either is non-None, it is a parent to something.
+        """
+        return self.left is None and self.right is None
+
+    def __repr__(self):
+        """Get the string representation of the node."""
+        return "Rule: Go left if x%i >= %r else go right (gain=%.3f)" \
+               % (self.split_col, self.split_val, self.split_gain)
+
+    def predict_record(self, record):
+        """Find the terminal node in the tree and return the class statistic"""
+        # First base case, this is a terminal node:
+        has_left = self.left is not None
+        has_right = self.right is not None
+        if not has_left and not has_right:
+            return self.class_statistic
+
+        # Otherwise, determine whether the record goes right or left
+        go_left = record[self.split_col] >= self.split_val
+
+        # if we go left and there is a left node, delegate the recursion to the
+        # left side
+        if go_left and has_left:
+            return self.left.predict_record(record)
+
+        # if we go right, delegate to the right
+        if not go_left and has_right:
+            return self.right.predict_record(record)
+
+        # if we get here, it means one of two things:
+        # 1. we were supposed to go left and didn't have a left
+        # 2. we were supposed to go right and didn't have a right
+        # for both of these, we return THIS class statistic
+        return self.class_statistic
+
+
+def _most_common(y):
+    # This is essentially just a "mode" function to compute the most
+    # common value in a vector.
+    cls, cts = np.unique(y, return_counts=True)
+    order = np.argsort(-cts)
+    return cls[order][0]
+
+
+class _BaseCART(BaseSimpleEstimator):
+    def __init__(self, X, y, criterion, min_samples_split, max_depth,
+                 n_val_sample, random_state):
+        # make sure max_depth > 1
+        if max_depth < 2:
+            raise ValueError("max depth must be > 1")
+
+        # check the input arrays, and if it's classification validate the
+        # target values in y
+        X, y = check_X_y(X, y, accept_sparse=False, dtype=None, copy=True)
+        if is_classifier(self):
+            check_classification_targets(y)
+
+        # hyper parameters so we can later inspect attributes of the model
+        self.min_samples_split = min_samples_split
+        self.max_depth = max_depth
+        self.n_val_sample = n_val_sample
+        self.random_state = random_state
+
+        # create the splitting class
+        random_state = check_random_state(random_state)
+        self.splitter = RandomSplitter(random_state, criterion, n_val_sample)
+
+        # grow the tree depth first
+        self.tree = self._find_next_split(X, y, 0)
+
+    def _target_stat(self, y):
+        """Given a vector, ``y``, decide what value to return as the leaf
+        node statistic (mean for regression, mode for classification)
+        """
+
+    def _find_next_split(self, X, y, current_depth):
+        # base case 1: current depth is the limit, the parent node should
+        # be a terminal node (child = None)
+        # base case 2: n samples in X <= min_samples_split
+        if current_depth == self.max_depth or \
+                X.shape[0] <= self.min_samples_split:
+            return None
+
+        # create the next split
+        split_feature, split_value, gain = \
+            self.splitter.find_best(X, y)
+
+        # create the next node based on the best split feature and value
+        # that we just found. Also compute the "target stat" (mode of y for
+        # classification problems or mean of y for regression problems) and
+        # pass that to the node in case it is the terminal node (i.e., the
+        # decision maker)
+        node = LeafNode(split_feature, split_value, gain, self._target_stat(y))
+
+        # Create the splits based on the criteria we just determined, and then
+        # recurse down left, right sides
+        X_left, X_right, y_left, y_right = node.create_split(X, y)
+
+        # if either the left or right is None, it means we've achieved a
+        # perfect split. It is then a terminal node and will remain None.
+        if X_left is not None:
+            node.left = self._find_next_split(X_left, y_left,
+                                              current_depth + 1)
+
+        if X_right is not None:
+            node.right = self._find_next_split(X_right, y_right,
+                                               current_depth + 1)
+
+        return node
+
+    def predict(self, X):
+        # Check the array
+        X = check_array(X, dtype=np.float32)  # type: np.ndarray
+
+        # For each record in X, find its leaf node in the tree (O(log N))
+        # to get the predictions. This makes the prediction operation
+        # O(N log N) runtime complexity
+        predictions = [self.tree.predict_record(row) for row in X]
+        return np.asarray(predictions)
+
+
+class CARTRegressor(_BaseCART, RegressorMixin):
+    """Decision tree regression.
+
+    Builds a decision tree to solve a regression problem using the CART
+    algorithm. The estimator builds a binary tree structure, evaluating each
+    feature at each iteration to recursively split along the best value and
+    progress down the tree until each leaf node reaches parsimony.
+
+    The regression tree uses "variance reduction" to assess the "goodness"
+    of a split, selecting the split and feature that maximizes the value.
+
+    To make predictions, each record is evaluated at each node of the tree
+    until it reaches a leaf node. For regression, predictions are made by
+    returning the training target's mean for the leaf node.
+
+    Parameters
+    ----------
+    X : array-like, shape=(n_samples, n_features)
+        The training array. Should be a numpy array or array-like structure
+        with only finite values.
+
+    y : array-like, shape=(n_samples,)
+        The target vector.
+
+    max_depth : int, optional (default=5)
+        The maximum depth to which the tree will grow. Note that the tree is
+        not guaranteed to reach this depth and may stop growing early if the
+        ``min_samples_split`` terminal criterion is met first.
+
+    min_samples_split : int, optional (default=1)
+        A terminal criterion used to halt the growth of a tree. If a leaf
+        node's split contains <= ``min_samples_split``, it will not grow
+        any further.
+
+    n_val_sample : int, optional (default=25)
+        The method by which we evaluate splits differs a bit from highly
+        optimized libraries like scikit-learn, which may evaluate for the
+        globally optimal split for each feature. We use random splitting
+        which evaluates a number of unique values for each feature at each
+        split. The ``n_val_sample`` is the maximum number of values per
+        feature that will be evaluated as a potential splitting point at
+        each iteration.
+
+    random_state : int, None or RandomState, optional (default=None)
+        The random state used to seed the RandomSplitter.
+
+    Attributes
+    ----------
+    splitter : RandomSplitter
+        The feature splitting class. Used for determining optimal splits at
+        each node.
+
+    tree : LeafNode
+        The actual tree. Each node contains data on the class statistic (i.e.,
+        mode or mean of the training target at that split), best feature and
+        best value.
+    """
+    def __init__(self, X, y, max_depth=5, min_samples_split=1,
+                 n_val_sample=25, random_state=None):
+
+        super(CARTRegressor, self).__init__(
+            X, y, criterion=VarianceReduction(),
+            min_samples_split=min_samples_split, max_depth=max_depth,
+            n_val_sample=n_val_sample, random_state=random_state)
+
+    def _target_stat(self, y):
+        """Given a vector, ``y``, get the mean"""
+        return y.mean()
+
+
+class CARTClassifier(_BaseCART, ClassifierMixin):
+    """Decision tree classication.
+
+    Builds a decision tree to solve a classification problem using the CART
+    algorithm. The estimator builds a binary tree structure, evaluating each
+    feature at each iteration to recursively split along the best value and
+    progress down the tree until each leaf node reaches parsimony.
+
+    The classification tree uses "information gain" to assess the "goodness"
+    of a split, selecting the split and feature that maximizes the value.
+
+    To make predictions, each record is evaluated at each node of the tree
+    until it reaches a leaf node. For classification, predictions are made by
+    returning the training target's mode for the leaf node.
+
+    Parameters
+    ----------
+    X : array-like, shape=(n_samples, n_features)
+        The training array. Should be a numpy array or array-like structure
+        with only finite values.
+
+    y : array-like, shape=(n_samples,)
+        The target vector.
+
+    criterion : str or unicode, optional (default='gini')
+        The splitting criterion used for classification problems. CART trees
+        typically use "gini" but their cousins, C4.5 trees, use "entropy". Both
+        metrics are extremely similar and will likely not change your tree
+        structure by much.
+
+    max_depth : int, optional (default=5)
+        The maximum depth to which the tree will grow. Note that the tree is
+        not guaranteed to reach this depth and may stop growing early if the
+        ``min_samples_split`` terminal criterion is met first.
+
+    min_samples_split : int, optional (default=1)
+        A terminal criterion used to halt the growth of a tree. If a leaf
+        node's split contains <= ``min_samples_split``, it will not grow
+        any further.
+
+    n_val_sample : int, optional (default=25)
+        The method by which we evaluate splits differs a bit from highly
+        optimized libraries like scikit-learn, which may evaluate for the
+        globally optimal split for each feature. We use random splitting
+        which evaluates a number of unique values for each feature at each
+        split. The ``n_val_sample`` is the maximum number of values per
+        feature that will be evaluated as a potential splitting point at
+        each iteration.
+
+    random_state : int, None or RandomState, optional (default=None)
+        The random state used to seed the RandomSplitter.
+
+    Attributes
+    ----------
+    splitter : RandomSplitter
+        The feature splitting class. Used for determining optimal splits at
+        each node.
+
+    tree : LeafNode
+        The actual tree. Each node contains data on the class statistic (i.e.,
+        mode or mean of the training target at that split), best feature and
+        best value.
+    """
+    def __init__(self, X, y, criterion='gini', max_depth=5,
+                 min_samples_split=1, n_val_sample=25, random_state=None):
+
+        super(CARTClassifier, self).__init__(
+            X, y, criterion=InformationGain(criterion), max_depth=max_depth,
+            min_samples_split=min_samples_split,
+            n_val_sample=n_val_sample, random_state=random_state)
+
+    def _target_stat(self, y):
+        """Given a vector, ``y``, get the mode"""
+        return _most_common(y)
--- a/packtml/decision_tree/metrics.py
+++ b/packtml/decision_tree/metrics.py
@ -0,0 +1,145 @@
+# -*- coding: utf-8 -*-
+#
+# Author: Taylor Smith <taylor.smith@alkaline-ml.com>
+#
+# Metrics used for determining how to split a feature in a decision tree.
+
+from __future__ import absolute_import
+
+import numpy as np
+
+__all__ = [
+    'entropy',
+    'gini_impurity',
+    'InformationGain',
+    'VarianceReduction'
+]
+
+
+def _clf_metric(y, metric):
+    """Internal helper. Since this is internal, so no validation performed"""
+    # get unique classes in y
+    y = np.asarray(y)
+    C, cts = np.unique(y, return_counts=True)
+
+    # a base case is that there is only one class label
+    if C.shape[0] == 1:
+        return 0.
+
+    pr_C = cts.astype(float) / y.shape[0]  # P(Ci)
+
+    # 1 - sum(P(Ci)^2)
+    if metric == 'gini':
+        return 1. - pr_C.dot(pr_C)  # np.sum(pr_C ** 2)
+    elif metric == 'entropy':
+        return np.sum(-pr_C * np.log2(pr_C))
+
+    # shouldn't ever get to this point since it is internal
+    else:
+        raise ValueError("metric should be one of ('gini', 'entropy'), "
+                         "but encountered %s" % metric)
+
+
+def entropy(y):
+    """Compute the entropy of class labels.
+
+    This computes the entropy of training samples. A high entropy means
+    a relatively uniform distribution, while low entropy indicates a
+    varying distribution (many peaks and valleys).
+
+    References
+    ----------
+    .. [1] http://www.cs.csi.cuny.edu/~imberman/ai/Entropy%20and%20Information%20Gain.htm
+    """
+    return _clf_metric(y, 'entropy')
+
+
+def gini_impurity(y):
+    """Compute the Gini index on a target variable.
+
+    The Gini index gives an idea of how mixed two classes are within a leaf
+    node. A perfect class separation will result in a Gini impurity of 0 (i.e.,
+    "perfectly pure").
+    """
+    return _clf_metric(y, 'gini')
+
+
+class BaseCriterion(object):
+    """Splitting criterion.
+
+    Base class for InformationGain and VarianceReduction. WARNING - do
+    not invoke this class directly. Use derived classes only! This is a
+    loosely-defined abstract class used to prescribe a common interface
+    for sub-classes.
+    """
+    def compute_uncertainty(self, y):
+        """Compute the uncertainty for a vector.
+
+        A subclass should override this function to compute the uncertainty
+        (i.e., entropy or gini) of a vector.
+        """
+
+
+class VarianceReduction(BaseCriterion):
+    """Compute the variance reduction after a split.
+
+    Variance reduction is a splitting criterion used by CART trees in the
+    context of regression. It examines the variance in a target before and
+    after a split to determine whether we've reduced the variability in the
+    target.
+    """
+    def compute_uncertainty(self, y):
+        """Compute the variance of a target."""
+        return np.var(y)
+
+    def __call__(self, target, mask, uncertainty):
+        left, right = target[mask], target[~mask]
+        return uncertainty - (self.compute_uncertainty(left) +
+                              self.compute_uncertainty(right))
+
+
+class InformationGain(BaseCriterion):
+    """Compute the information gain after a split.
+
+    The information gain metric is used by CART trees in a classification
+    context. It measures the difference in the gini or entropy before and
+    after a split to determine whether the split "taught" us anything.
+
+    Parameters
+    ----------
+    metric : str or unicode
+        The name of the metric to use. Either "gini" (Gini impurity)
+        or "entropy".
+    """
+    def __init__(self, metric):
+        # let fail out with a KeyError if an improper metric
+        self.crit = {'gini': gini_impurity,
+                     'entropy': entropy}[metric]
+
+    def compute_uncertainty(self, y):
+        """Compute the uncertainty for a vector.
+
+        This method computes either the Gini impurity or entropy of a target
+        vector using the prescribed method.
+        """
+        return self.crit(y)
+
+    def __call__(self, target, mask, uncertainty):
+        """Compute the information gain of a split.
+
+        Parameters
+        ----------
+        target : np.ndarray
+            The target feature
+
+        mask : np.ndarray
+            The value mask
+
+        uncertainty : float
+            The gini or entropy of rows pre-split
+        """
+        left, right = target[mask], target[~mask]
+        p = float(left.shape[0]) / float(target.shape[0])
+
+        crit = self.crit  # type: callable
+        return uncertainty - p * crit(left) - (1 - p) * crit(right)
--- a/packtml/decision_tree/tests/init.py
+++ b/packtml/decision_tree/tests/init.py
@ -0,0 +1,3 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
--- a/packtml/decision_tree/tests/test_cart.py
+++ b/packtml/decision_tree/tests/test_cart.py
@ -0,0 +1,119 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from numpy.testing import assert_array_equal, assert_almost_equal
+import numpy as np
+
+from packtml.decision_tree.metrics import InformationGain
+from packtml.decision_tree.cart import (CARTClassifier, CARTRegressor,
+                                        RandomSplitter, LeafNode, _most_common)
+
+X = np.array([[0, 1, 2],
+              [1, 2, 3],
+              [2, 3, 4]])
+
+y = np.array([0, 1, 1])
+
+X2 = np.array([[0, 1, 2],
+               [1, 2, 3],
+               [2, 3, 4],
+               [3, 4, 5],
+               [4, 5, 6],
+               [5, 6, 7]])
+
+y2 = np.array([0, 0, 1, 1, 1, 1])
+
+# a regression dataset
+rs = np.random.RandomState(42)
+Xreg = np.sort(5 * rs.rand(100, 1), axis=0)
+yreg = np.sin(Xreg).ravel()
+
+
+def test_most_common():
+    assert _most_common(y) == 1
+    assert _most_common([1]) == 1
+
+
+def test_terminal_leaf_node():
+    node = LeafNode(split_col=0, split_val=1.,
+                    class_statistic=_most_common(y),
+                    split_gain=np.inf)
+
+    # show that there are no children
+    assert node.is_terminal()
+
+    # show that the splitting works as expected
+    X_left, X_right, y_left, y_right = node.create_split(X, y)
+    assert_array_equal(X_left, X[1:, :])
+    assert_array_equal(X_right, X[:1, :])
+    assert_array_equal(y_left, [1, 1])
+    assert_array_equal(y_right, [0])
+
+    # show that predictions work as expected
+    assert [node.predict_record(r) for r in X] == [1, 1, 1]
+
+
+def test_complex_leaf_node():
+    node = LeafNode(split_col=0, split_val=3.,
+                    class_statistic=_most_common(y2),
+                    split_gain=np.inf)
+
+    # create the split
+    X_left, X_right, y_left, y_right = node.create_split(X2, y2)
+
+    # show it worked as expected
+    assert_array_equal(X_left, X2[3:, :])
+    assert_array_equal(X_right, X2[:3, :])
+    assert_array_equal(y_left, [1, 1, 1])
+    assert_array_equal(y_right, [0, 0, 1])
+
+    # show that if we CURRENTLY predicted on the bases of node being the
+    # terminal leaf, we'd get all 1s.
+    get_preds = (lambda: [node.predict_record(r) for r in X2])
+    assert get_preds() == [1, 1, 1, 1, 1, 1]
+
+    # add a sub node to the right side
+    right_node = LeafNode(split_col=0, split_val=2.,
+                          class_statistic=_most_common(y_right),
+                          split_gain=np.inf)
+
+    assert right_node.class_statistic == 0.
+
+    # attach to the original node and assert it's not terminal anymore
+    node.right = right_node
+    assert not node.is_terminal()
+
+    # now our predictions should differ!
+    assert get_preds() == [0, 0, 0, 1, 1, 1]
+
+
+def test_fit_classifier():
+    # show we can fit a classifier
+    clf = CARTClassifier(X, y)
+    # show we can predict
+    clf.predict(X)
+
+
+def test_fit_regressor():
+    # show we can fit a regressor
+    reg = CARTRegressor(Xreg, yreg)
+    # show we can predict
+    reg.predict(Xreg)
+
+
+def test_random_splitter():
+    pre_X = np.array([[21, 3], [4, 2], [37, 2]])
+    pre_y = np.array([1, 0, 1])
+
+    # this is the splitting class; we'll use gini as the criteria
+    random_state = np.random.RandomState(42)
+    splitter = RandomSplitter(random_state=random_state,
+                              criterion=InformationGain('gini'),
+                              n_val_sample=3)
+
+    # find the best:
+    best_feature, best_value, best_gain = splitter.find_best(pre_X, pre_y)
+    assert best_feature == 0
+    assert best_value == 21
+    assert_almost_equal(best_gain, 0.4444444444, decimal=8)
--- a/packtml/decision_tree/tests/test_metrics.py
+++ b/packtml/decision_tree/tests/test_metrics.py
@ -0,0 +1,52 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.decision_tree.metrics import (entropy, gini_impurity,
+                                           InformationGain)
+
+import numpy as np
+from numpy.testing import assert_almost_equal
+
+
+def test_entropy():
+    events = np.asarray(9 * [0] + 5 * [1])  # 9/14, 5/14
+    ent = entropy(events)
+    assert round(ent, 2) == 0.94, round(ent, 2)
+
+
+def test_gini_impurity():
+    x = np.asarray([0] * 10 + [1] * 10)
+    assert gini_impurity(x) == 0.5
+    assert gini_impurity(x[:10]) == 0.
+
+    # show that no mixing of gini yields 0.0
+    assert gini_impurity(np.array([0, 0])) == 0.
+
+    # with SOME mixing we get 0.5
+    assert gini_impurity(np.array([0, 1])) == 0.5
+
+    # with a lot of mixing we get a number close to 0.8
+    gi = gini_impurity([0, 1, 2, 3, 4])
+    assert_almost_equal(gi, 0.8)
+
+
+def test_information_gain():
+    X = np.array([
+        [0, 3],
+        [1, 3],
+        [2, 1],
+        [2, 1],
+        [1, 3]
+    ])
+
+    y = np.array([0, 0, 1, 1, 2])
+
+    uncertainty = gini_impurity(y)
+    assert_almost_equal(uncertainty, 0.63999999)
+    mask = X[:, 0] == 0
+
+    # compute the info gain for this mask
+    infog = InformationGain("gini")
+    ig = infog(y, mask, uncertainty)
+    assert_almost_equal(ig, 0.1399999)
--- a/packtml/metrics/init.py
+++ b/packtml/metrics/init.py
@ -0,0 +1,5 @@
+# -*- coding: utf-8 -*-
+
+from .ranking import *
+
+__all__ = [s for s in dir() if not s.startswith("_")]
--- a/packtml/metrics/ranking.py
+++ b/packtml/metrics/ranking.py
@ -0,0 +1,266 @@
+# -*- coding: utf-8 -*-
+#
+# Author: Taylor G Smith
+#
+# Recommender system ranking metrics derived from Spark source for use with
+# Python-based recommender systems. See the full gist here:
+# https://gist.github.com/tgsmith61591/d8aa96ac7c74c24b33e4b0cb967ca519
+
+from __future__ import absolute_import, division
+
+import numpy as np
+
+import warnings
+
+__all__ = [
+    'mean_average_precision',
+    'ndcg_at',
+    'precision_at',
+]
+
+try:
+    xrange
+except NameError:  # python 3 does not have an 'xrange'
+    xrange = range
+
+
+def _require_positive_k(k):
+    """Helper function to avoid copy/pasted code for validating K"""
+    if k <= 0:
+        raise ValueError("ranking position k should be positive")
+
+
+def _mean_ranking_metric(predictions, labels, metric):
+    """Helper function for precision_at_k and mean_average_precision"""
+    # do not zip, as this will require an extra pass of O(N). Just assert
+    # equal length and index (compute in ONE pass of O(N)).
+    # if len(predictions) != len(labels):
+    #     raise ValueError("dim mismatch in predictions and labels!")
+    # return np.mean([
+    #     metric(np.asarray(predictions[i]), np.asarray(labels[i]))
+    #     for i in xrange(len(predictions))
+    # ])
+
+    # Actually probably want lazy evaluation in case preds is a
+    # generator, since preds can be very dense and could blow up
+    # memory... but how to assert lengths equal? FIXME
+    return np.mean([
+        metric(np.asarray(prd), np.asarray(labels[i]))
+        for i, prd in enumerate(predictions)  # lazy eval if generator
+    ])
+
+
+def _warn_for_empty_labels():
+    """Helper for missing ground truth sets"""
+    warnings.warn("Empty ground truth set! Check input data")
+    return 0.
+
+
+def precision_at(predictions, labels, k=10, assume_unique=True):
+    """Compute the precision at K.
+
+    Compute the average precision of all the queries, truncated at
+    ranking position k. If for a query, the ranking algorithm returns
+    n (n is less than k) results, the precision value will be computed
+    as #(relevant items retrieved) / k. This formula also applies when
+    the size of the ground truth set is less than k.
+    If a query has an empty ground truth set, zero will be used as
+    precision together with a warning.
+
+    Parameters
+    ----------
+    predictions : array-like, shape=(n_predictions,)
+        The prediction array. The items that were predicted, in descending
+        order of relevance.
+
+    labels : array-like, shape=(n_ratings,)
+        The labels (positively-rated items).
+
+    k : int, optional (default=10)
+        The rank at which to measure the precision.
+
+    assume_unique : bool, optional (default=True)
+        Whether to assume the items in the labels and predictions are each
+        unique. That is, the same item is not predicted multiple times or
+        rated multiple times.
+
+    Examples
+    --------
+    >>> # predictions for 3 users
+    >>> preds = [[1, 6, 2, 7, 8, 3, 9, 10, 4, 5],
+    ...          [4, 1, 5, 6, 2, 7, 3, 8, 9, 10],
+    ...          [1, 2, 3, 4, 5]]
+    >>> # labels for the 3 users
+    >>> labels = [[1, 2, 3, 4, 5], [1, 2, 3], []]
+    >>> precision_at(preds, labels, 1)
+    0.33333333333333331
+    >>> precision_at(preds, labels, 5)
+    0.26666666666666666
+    >>> precision_at(preds, labels, 15)
+    0.17777777777777778
+    """
+    # validate K
+    _require_positive_k(k)
+
+    def _inner_pk(pred, lab):
+        # need to compute the count of the number of values in the predictions
+        # that are present in the labels. We'll use numpy in1d for this (set
+        # intersection in O(1))
+        if lab.shape[0] > 0:
+            n = min(pred.shape[0], k)
+            cnt = np.in1d(pred[:n], lab, assume_unique=assume_unique).sum()
+            return float(cnt) / k
+        else:
+            return _warn_for_empty_labels()
+
+    return _mean_ranking_metric(predictions, labels, _inner_pk)
+
+
+def mean_average_precision(predictions, labels, assume_unique=True):
+    """Compute the mean average precision on predictions and labels.
+
+    Returns the mean average precision (MAP) of all the queries. If a query
+    has an empty ground truth set, the average precision will be zero and a
+    warning is generated.
+
+    Parameters
+    ----------
+    predictions : array-like, shape=(n_predictions,)
+        The prediction array. The items that were predicted, in descending
+        order of relevance.
+
+    labels : array-like, shape=(n_ratings,)
+        The labels (positively-rated items).
+
+    assume_unique : bool, optional (default=True)
+        Whether to assume the items in the labels and predictions are each
+        unique. That is, the same item is not predicted multiple times or
+        rated multiple times.
+
+    Examples
+    --------
+    >>> # predictions for 3 users
+    >>> preds = [[1, 6, 2, 7, 8, 3, 9, 10, 4, 5],
+    ...          [4, 1, 5, 6, 2, 7, 3, 8, 9, 10],
+    ...          [1, 2, 3, 4, 5]]
+    >>> # labels for the 3 users
+    >>> labels = [[1, 2, 3, 4, 5], [1, 2, 3], []]
+    >>> mean_average_precision(preds, labels)
+    0.35502645502645497
+    """
+
+    def _inner_map(pred, lab):
+        if lab.shape[0]:
+            # compute the number of elements within the predictions that are
+            # present in the actual labels, and get the cumulative sum weighted
+            # by the index of the ranking
+            n = pred.shape[0]
+
+            # Scala code from Spark source:
+            # var i = 0
+            # var cnt = 0
+            # var precSum = 0.0
+            # val n = pred.length
+            # while (i < n) {
+            #     if (labSet.contains(pred(i))) {
+            #         cnt += 1
+            #         precSum += cnt.toDouble / (i + 1)
+            #     }
+            #     i += 1
+            # }
+            # precSum / labSet.size
+
+            arange = np.arange(n, dtype=np.float32) + 1.  # this is the denom
+            present = np.in1d(pred[:n], lab, assume_unique=assume_unique)
+            prec_sum = np.ones(present.sum()).cumsum()
+            denom = arange[present]
+            return (prec_sum / denom).sum() / lab.shape[0]
+
+        else:
+            return _warn_for_empty_labels()
+
+    return _mean_ranking_metric(predictions, labels, _inner_map)
+
+
+def ndcg_at(predictions, labels, k=10, assume_unique=True):
+    """Compute the normalized discounted cumulative gain at K.
+
+    Compute the average NDCG value of all the queries, truncated at ranking
+    position k. The discounted cumulative gain at position k is computed as:
+        sum,,i=1,,^k^ (2^{relevance of ''i''th item}^ - 1) / log(i + 1)
+    and the NDCG is obtained by dividing the DCG value on the ground truth set.
+    In the current implementation, the relevance value is binary.
+    If a query has an empty ground truth set, zero will be used as
+    NDCG together with a warning.
+
+    Parameters
+    ----------
+    predictions : array-like, shape=(n_predictions,)
+        The prediction array. The items that were predicted, in descending
+        order of relevance.
+
+    labels : array-like, shape=(n_ratings,)
+        The labels (positively-rated items).
+
+    k : int, optional (default=10)
+        The rank at which to measure the NDCG.
+
+    assume_unique : bool, optional (default=True)
+        Whether to assume the items in the labels and predictions are each
+        unique. That is, the same item is not predicted multiple times or
+        rated multiple times.
+
+    Examples
+    --------
+    >>> # predictions for 3 users
+    >>> preds = [[1, 6, 2, 7, 8, 3, 9, 10, 4, 5],
+    ...          [4, 1, 5, 6, 2, 7, 3, 8, 9, 10],
+    ...          [1, 2, 3, 4, 5]]
+    >>> # labels for the 3 users
+    >>> labels = [[1, 2, 3, 4, 5], [1, 2, 3], []]
+    >>> ndcg_at(preds, labels, 3)
+    0.3333333432674408
+    >>> ndcg_at(preds, labels, 10)
+    0.48791273434956867
+
+    References
+    ----------
+    .. [1] K. Jarvelin and J. Kekalainen, "IR evaluation methods for
+           retrieving highly relevant documents."
+    """
+    # validate K
+    _require_positive_k(k)
+
+    def _inner_ndcg(pred, lab):
+        if lab.shape[0]:
+            # if we do NOT assume uniqueness, the set is a bit different here
+            if not assume_unique:
+                lab = np.unique(lab)
+
+            n_lab = lab.shape[0]
+            n_pred = pred.shape[0]
+            n = min(max(n_pred, n_lab), k)  # min(min(p, l), k)?
+
+            # similar to mean_avg_prcsn, we need an arange, but this time +2
+            # since python is zero-indexed, and the denom typically needs +1.
+            # Also need the log base2...
+            arange = np.arange(n, dtype=np.float32)  # length n
+
+            # since we are only interested in the arange up to n_pred, truncate
+            # if necessary
+            arange = arange[:n_pred]
+            denom = np.log2(arange + 2.)  # length n
+            gains = 1. / denom  # length n
+
+            # compute the gains where the prediction is present in the labels
+            dcg_mask = np.in1d(pred[:n], lab, assume_unique=assume_unique)
+            dcg = gains[dcg_mask].sum()
+
+            # the max DCG is sum of gains where the index < the label set size
+            max_dcg = gains[arange < n_lab].sum()
+            return dcg / max_dcg
+
+        else:
+            return _warn_for_empty_labels()
+
+    return _mean_ranking_metric(predictions, labels, _inner_ndcg)
--- a/packtml/metrics/tests/init.py
+++ b/packtml/metrics/tests/init.py
@ -0,0 +1,3 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
--- a/packtml/metrics/tests/test_ranking.py
+++ b/packtml/metrics/tests/test_ranking.py
@ -0,0 +1,45 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.metrics.ranking import (mean_average_precision, ndcg_at,
+                                     precision_at)
+
+from numpy.testing import assert_almost_equal
+import warnings
+
+preds = [[1, 6, 2, 7, 8, 3, 9, 10, 4, 5],
+         [4, 1, 5, 6, 2, 7, 3, 8, 9, 10],
+         [1, 2, 3, 4, 5]]
+
+labels = [[1, 2, 3, 4, 5], [1, 2, 3], []]
+
+
+def assert_warning_caught(func):
+    def test_wrapper(*args, **kwargs):
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+
+            # execute the fxn
+            func(*args, **kwargs)
+            assert len(w)  # assert there's something there...
+    return test_wrapper
+
+
+@assert_warning_caught
+def test_map():
+    assert_almost_equal(
+        mean_average_precision(preds, labels), 0.35502645502645497)
+
+
+@assert_warning_caught
+def test_pak():
+    assert_almost_equal(precision_at(preds, labels, 1), 0.33333333333333331)
+    assert_almost_equal(precision_at(preds, labels, 5), 0.26666666666666666)
+    assert_almost_equal(precision_at(preds, labels, 15), 0.17777777777777778)
+
+
+@assert_warning_caught
+def test_ndcg():
+    assert_almost_equal(ndcg_at(preds, labels, 3), 0.3333333432674408)
+    assert_almost_equal(ndcg_at(preds, labels, 10), 0.48791273434956867)
--- a/packtml/neural_net/init.py
+++ b/packtml/neural_net/init.py
@ -0,0 +1,6 @@
+# -*- coding: utf-8 -*-
+
+from .mlp import *
+from .transfer import *
+
+__all__ = [s for s in dir() if not s.startswith("_")]
--- a/packtml/neural_net/base.py
+++ b/packtml/neural_net/base.py
@ -0,0 +1,33 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from sklearn.externals import six
+from abc import ABCMeta, abstractmethod
+
+import numpy as np
+
+__all__ = [
+    'tanh',
+    'NeuralMixin'
+]
+
+def tanh(X):
+    """Hyperbolic tangent.
+
+    Compute the tan-h (Hyperbolic tangent) activation function.
+    This is a very easily-differentiable activation function.
+
+    Parameters
+    ----------
+    X : np.ndarray, shape=(n_samples, n_features)
+        The transformed X array (X * W + b).
+    """
+    return np.tanh(X)
+
+
+class NeuralMixin(six.with_metaclass(ABCMeta)):
+    """Abstract interface for neural network classes."""
+    @abstractmethod
+    def export_weights_and_biases(self, output_layer=True):
+        """Return the weights and biases of the network"""
--- a/packtml/neural_net/mlp.py
+++ b/packtml/neural_net/mlp.py
@ -0,0 +1,273 @@
+# -*- coding: utf-8 -*-
+#
+# Author: Taylor G Smith <taylor.smith@alkaline-ml.com>
+#
+# A simple multilayer perceptron classifier. If you find yourself struggling
+# to follow the derivation of the back-propagation, check out this great
+# refresher on scalar & matrix calculas + differential equations.
+# http://parrt.cs.usfca.edu/doc/matrix-calculus/index.html
+
+from __future__ import absolute_import, division
+
+from sklearn.utils.validation import check_X_y, check_random_state
+from sklearn.utils.multiclass import check_classification_targets
+
+import numpy as np
+
+from ..base import BaseSimpleEstimator
+from .base import NeuralMixin, tanh
+
+__all__ = [
+    'NeuralNetClassifier'
+]
+
+try:
+    xrange
+except NameError:  # py3
+    xrange = range
+
+
+def _calculate_loss(truth, preds, weights, l2):
+    """Compute the log loss.
+
+    Calculate the log loss between the true class labels and the predictions
+    generated by the softmax layer in our neural network.
+
+    Parameters
+    ----------
+    truth : np.ndarray, shape=(n_samples,)
+        The true labels
+
+    preds : np.ndarray, shape=(n_samples, n_classes)
+        The predicted class probabilities
+
+    weights : list
+        The list of weights matrices. Used for computing the loss
+        with the L2 regularization.
+
+    l2 : float
+        The regularization parameter
+    """
+    # get the log probs of the prediction for the true class labels
+    n_samples = truth.shape[0]
+    logprobs = -np.log(preds[range(n_samples), truth])
+
+    # compute the sum of log probs
+    sum_logprobs = logprobs.sum()
+
+    # add the L2 regularization term
+    sum_logprobs += l2 / 2. * sum(np.square(W).sum() for W in weights)
+    return 1. / n_samples * sum_logprobs
+
+
+def softmax(X):
+    """Apply the softmax function.
+
+    The softmax function squashes an N-dimensional vector into a K-dimensional
+    vector whose elements add up to 1, and whose elements are bound in (0, 1).
+
+    Parameters
+    ----------
+    X : np.ndarray, shape=(n_samples, n_features)
+        The matrix over which to apply softmax along the rows.
+    """
+    # first compute the exponential. This is a step that would take place
+    # in the sigmoid (logistic) function as well. We can already begin to see
+    # where this is going to resemble logistic regression...
+    X_exp = np.exp(X)
+    return X_exp / np.sum(X_exp, axis=1, keepdims=True)
+
+
+class NeuralNetClassifier(BaseSimpleEstimator, NeuralMixin):
+    """A neural network classifier.
+
+    Create a multi-layer perceptron classifier. Note that this is a very
+    simple implementation of an MLP with only fully-connected layers and
+    very few tunable parameters. It is designed for readability. For more
+    optimized neural network code, look into TensorFlow, Keras or other
+    libraries.
+
+    This implementation of a neural net uses the ReLu activation function
+    *only*, and does not allow early convergence. It will continue for
+    ``n_iter``. There are many other parameters that would typically be
+    tunable in a network, for instance dropout, regularization, learning
+    rate, etc. The majority of these parameters are left out of this
+    implementation to keep it simple.
+
+    Parameters
+    ----------
+    X : array-like, shape=(n_samples, n_features)
+        The training array. Should be a numpy array or array-like structure
+        with only finite values.
+
+    y : array-like, shape=(n_samples,)
+        The target vector.
+
+    hidden : iterable, optional (default=(25,))
+        An iterable indicating the number of units per hidden layer.
+
+    n_iter : int, optional (default=10)
+        The default number of iterations to perform.
+
+    learning_rate : float, optional (default=0.001)
+        The rate at which we descend the gradient.
+
+    random_state : int, None or RandomState, optional (default=42)
+        The random state for initializing the weights matrices.
+    """
+    def __init__(self, X, y, hidden=(25,), n_iter=10, learning_rate=0.001,
+                 regularization=0.01, random_state=42):
+
+        self.hidden = hidden
+        self.random_state = random_state
+        self.n_iter = n_iter
+        self.learning_rate = learning_rate
+        self.regularization = regularization
+
+        # initialize weights, biases, etc.
+        X, y, weights, biases = self._init_weights_biases(
+            X, y, hidden, random_state, last_dim=None)
+
+        # we can keep track of the loss for each iter
+        train_loss = []
+
+        # for each iteration, feed X through the network, compute the loss,
+        # and back-propagate the error to correct the weights.
+        for _ in xrange(n_iter):
+            # compute the product of X on the hidden layers (the output of
+            # the network)
+            out, layer_results = self._forward_step(X, weights, biases)
+
+            # compute the loss on the output
+            loss = _calculate_loss(truth=y, preds=out, weights=weights,
+                                   l2=self.regularization)
+            train_loss.append(loss)
+
+            # now back-propagate to correct the weights and biases via
+            # gradient descent
+            self._back_propagate(y, out, layer_results, weights,
+                                 biases, learning_rate,
+                                 self.regularization)
+
+        # save the weights, biases and loss as instance attributes
+        self.weights = weights
+        self.biases = biases
+        self.train_loss = train_loss
+
+    @staticmethod
+    def _init_weights_biases(X, y, hidden, random_state, last_dim=None):
+        # make sure dims all match in X, y and that we have appropriate
+        # classification targets
+        X, y = check_X_y(X, y, copy=False)
+        check_classification_targets(y)
+
+        random_state = check_random_state(random_state)
+
+        # initialize the weights and biases. For each layer, we create a new
+        # matrix of dimensions [last_layer_col_dim, new_col_dim]. This ensures
+        # we can compute matrix products across the layers and that the
+        # dimensions all match up. The biases will each be a vector of ones
+        # in this example, though in other networks that can be initialized
+        # differently
+        weights = []
+        biases = []
+
+        # if last dim is undefined, use the column shape of the input data.
+        # this argument is used to simplify the initialization of weights/
+        # biases in the transfer learning class...
+        if last_dim is None:
+            last_dim = X.shape[1]
+
+        for layer_size in hidden:
+            # initialize to extremely small values
+            w = random_state.rand(last_dim, layer_size) * 0.01
+            b = np.ones(layer_size)
+            last_dim = layer_size
+
+            weights.append(w)
+            biases.append(b)
+
+        # we need to add one more layer (the output layer) that is the size of
+        # the expected output probabilities. We'll apply the softmax function
+        # to the output of this layer.
+        n_outputs = np.unique(y).shape[0]
+        weights.append(random_state.rand(last_dim, n_outputs))
+        biases.append(np.ones(n_outputs))
+
+        return X, y, weights, biases
+
+    @staticmethod
+    def _forward_step(X, weights, biases):
+        # track the intermediate products
+        intermediate_results = [X]
+
+        # progress through all the layers EXCEPT the very last one.
+        for w, b in zip(weights[:-1], biases[:-1]):
+
+            # apply the activation function to the product of X and the weights
+            # (after adding the bias vector)
+            X = tanh(X.dot(w) + b)
+
+            # append this layer result
+            intermediate_results.append(X)
+
+        # we handle the very last layer a bit differently, since it's out
+        # output layer. First compute the product...
+        X = X.dot(weights[-1]) + biases[-1]
+
+        # then rather than apply the activation function (tanh), we apply
+        # the softmax, which is essentially generalized logistic regression.
+        return softmax(X), intermediate_results
+
+    @staticmethod
+    def _back_propagate(truth, probas, layer_results, weights,
+                        biases, learning_rate, l2):
+        # the probabilities are our first delta. Subtract 1 from the
+        # TRUE labels' probabilities in the predictions
+        n_samples = truth.shape[0]
+
+        # subtract 1 from true idcs. initial deltas are: (y_hat - y)
+        probas[range(n_samples), truth] -= 1.
+
+        # iterate back through the layers computing the deltas (derivatives)
+        last_delta = probas
+        for next_weights, next_biases, layer_res in \
+                zip(weights[::-1], biases[::-1], layer_results[::-1]):
+
+            # the gradient for this layer is equivalent to the previous delta
+            # multiplied by the intermittent layer result
+            d_W = layer_res.T.dot(last_delta)
+
+            # column sums of the (just-computed) delta is the derivative
+            # of the biases
+            d_b = np.sum(last_delta, axis=0)
+
+            # set the next delta for the next iter
+            last_delta = last_delta.dot(next_weights.T) * \
+                (1. - np.power(layer_res, 2.))
+
+            # update the weights gradient with the L2 regularization term
+            d_W += l2 * next_weights
+
+            # update the weights in this layer. The learning rate governs how
+            # quickly we descend the gradient
+            next_weights += -learning_rate * d_W
+            next_biases += -learning_rate * d_b
+
+    def predict(self, X):
+        # compute the probabilities and then get the argmax for each class
+        probas = self.predict_proba(X)
+
+        # we want the argmaxes of each row
+        return np.argmax(probas, axis=1)
+
+    def predict_proba(self, X):
+        # simply compute a forward step (we don't care about idx 1 of the
+        # tuple, which is just the intermediate products)
+        return self._forward_step(X, self.weights, self.biases)[0]
+
+    def export_weights_and_biases(self, output_layer=True):
+        w, b = self.weights, self.biases
+        if output_layer:
+            return w, b
+        return w[:-1], b[:-1]
--- a/packtml/neural_net/tests/init.py
+++ b/packtml/neural_net/tests/init.py
@ -0,0 +1,3 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
--- a/packtml/neural_net/tests/test_mlp.py
+++ b/packtml/neural_net/tests/test_mlp.py
@ -0,0 +1,15 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.neural_net import NeuralNetClassifier
+from sklearn.datasets import load_iris
+
+iris = load_iris()
+X, y = iris.data,  iris.target
+
+
+def test_mlp():
+    # show we can fit and predict
+    clf = NeuralNetClassifier(X, y, random_state=42)
+    clf.predict(X)
--- a/packtml/neural_net/tests/test_transfer.py
+++ b/packtml/neural_net/tests/test_transfer.py
@ -0,0 +1,52 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.neural_net import NeuralNetClassifier, TransferLearningClassifier
+
+import numpy as np
+
+
+def test_transfer_learner():
+    rs = np.random.RandomState(42)
+    covariance = [[1, .75], [.75, 1]]
+
+    # these are the majority classes
+    n_obs = 500
+    x1 = rs.multivariate_normal(mean=[0, 0], cov=covariance, size=n_obs)
+    x2 = rs.multivariate_normal(mean=[1, 5], cov=covariance, size=n_obs)
+
+    # this is the minority class
+    x3 = rs.multivariate_normal(mean=[0.85, 3.25],
+                                cov=[[1., .5], [1.25, 0.85]],
+                                size=150)
+
+    # this is what the FIRST network will be trained on
+    n_first = 400
+    X = np.vstack((x1[:n_first], x2[:n_first])).astype(np.float32)
+    y = np.hstack((np.zeros(n_first), np.ones(n_first))).astype(int)
+
+    # this is what the SECOND network will be trained on
+    X2 = np.vstack((x1[n_first:], x2[n_first:], x3)).astype(np.float32)
+    y2 = np.hstack((np.zeros(n_obs - n_first),
+                    np.ones(n_obs - n_first),
+                    np.ones(x3.shape[0]) * 2)).astype(int)
+
+    # Fit the first neural network
+    clf = NeuralNetClassifier(X, y, hidden=(25, 25), n_iter=50,
+                              learning_rate=0.001, random_state=42)
+
+    # Fit the transfer network - train one more layer with a new class
+    transfer = TransferLearningClassifier(X2, y2, pretrained=clf, hidden=(15,),
+                                          n_iter=10, random_state=42)
+
+    # show we can predict
+    transfer.predict(X2)
+
+    # show we can use a transfer learner on an existing transfer learner
+    transfer2 = TransferLearningClassifier(X2, y2, pretrained=transfer,
+                                           hidden=(25,),
+                                           random_state=15)
+
+    # and show we can still predict
+    transfer2.predict(X2)
--- a/packtml/neural_net/transfer.py
+++ b/packtml/neural_net/transfer.py
@ -0,0 +1,154 @@
+# -*- coding: utf-8 -*-
+#
+# Author: Taylor G Smith <taylor.smith@alkaline-ml.com>
+#
+# A simple transfer learning classifier. If you find yourself struggling
+# to follow the derivation of the back-propagation, check out this great
+# refresher on scalar & matrix calculas + differential equations.
+# http://parrt.cs.usfca.edu/doc/matrix-calculus/index.html
+
+from __future__ import absolute_import
+
+import numpy as np
+
+from .base import NeuralMixin, tanh
+from ..base import BaseSimpleEstimator
+from .mlp import NeuralNetClassifier, _calculate_loss
+
+__all__ = [
+    'TransferLearningClassifier'
+]
+
+try:
+    xrange
+except NameError:
+    xrange = range
+
+
+def _pretrained_forward_step(X, pt_weights, pt_biases):
+    """Complete a forward step from the pre-trained model"""
+    # progress through all the layers (the output was already trimmed off)
+    for w, b in zip(pt_weights, pt_biases):
+        X = tanh(X.dot(w) + b)
+    return X
+
+
+class TransferLearningClassifier(BaseSimpleEstimator, NeuralMixin):
+    """A transfer learning classifier.
+
+    Create a multi-layer perceptron classifier that learned from a
+    previously-trained network. No fine-tuning is performed, and no
+    prior-trained layers can be retrained (i.e., they remain frozen).
+
+    Parameters
+    ----------
+    X : array-like, shape=(n_samples, n_features)
+        The training array. Should be a numpy array or array-like structure
+        with only finite values.
+
+    y : array-like, shape=(n_samples,)
+        The target vector.
+
+    pretrained : NeuralNetClassifier, TransferLearningClassifier
+        The pre-trained MLP. The transfer learner leverages the features
+        extracted from the pre-trained network (the trained weights without
+        the output layer) and uses them to transform the input data before
+        training the new layers.
+
+    hidden : iterable, optional (default=(25,))
+        An iterable indicating the number of units per hidden layer.
+
+    n_iter : int, optional (default=10)
+        The default number of iterations to perform.
+
+    learning_rate : float, optional (default=0.001)
+        The rate at which we descend the gradient.
+
+    random_state : int, None or RandomState, optional (default=42)
+        The random state for initializing the weights matrices.
+    """
+    def __init__(self, X, y, pretrained, hidden=(25,), n_iter=10,
+                 regularization=0.01, learning_rate=0.001, random_state=42):
+
+        # initialize via the NN static method
+        self.hidden = hidden
+        self.random_state = random_state
+        self.n_iter = n_iter
+        self.learning_rate = learning_rate
+        self.regularization = regularization
+
+        # this is the previous model
+        self.model = pretrained
+
+        # assert that it's a neural net or we'll break down later
+        assert isinstance(pretrained, NeuralMixin), \
+            "Pre-trained model must be a neural network!"
+
+        # initialize weights, biases, etc. for THE TRAINABLE LAYERS ONLY!
+        pt_w, pt_b = pretrained.export_weights_and_biases(output_layer=False)
+        X, y, weights, biases = NeuralNetClassifier._init_weights_biases(
+            X, y, hidden, random_state,
+
+            # use as the last dim the column dimension of the last weights
+            # (the ones BEFORE the output layer, that is)
+            last_dim=pt_w[-1].shape[1])
+
+        # we can train this in a similar fashion to the plain MLP we designed:
+        # for each iteration, feed X through the network, compute the loss,
+        # and back-propagate the error to correct the weights.
+        train_loss = []
+        for _ in xrange(n_iter):
+            # first, pass the input data through the pre-trained model's
+            # hidden layers. Do not pass it through the last layer, however,
+            # since we don't want its output from the softmax layer.
+            X_transform = _pretrained_forward_step(X, pt_w, pt_b)
+
+            # NOW we complete a forward step on THIS model's
+            # untrained  weights/biases
+            out, layer_results = NeuralNetClassifier._forward_step(
+                X_transform, weights, biases)
+
+            # compute the loss on the output
+            loss = _calculate_loss(truth=y, preds=out, weights=pt_w + weights,
+                                   l2=self.regularization)
+            train_loss.append(loss)
+
+            # now back-propagate to correct THIS MODEL's weights and biases via
+            # gradient descent. NOTE we do NOT adjust the pre-trained model's
+            # weights!!!
+            NeuralNetClassifier._back_propagate(
+                truth=y, probas=out, layer_results=layer_results,
+                weights=weights, biases=biases,
+                learning_rate=learning_rate,
+                l2=self.regularization)
+
+        # save the weights, biases
+        self.weights = weights
+        self.biases = biases
+        self.train_loss = train_loss
+
+    def predict(self, X):
+        # compute the probabilities and then get the argmax for each class
+        probas = self.predict_proba(X)
+
+        # we want the argmaxes of each row
+        return np.argmax(probas, axis=1)
+
+    def predict_proba(self, X):
+        # Compute a forward step with the pre-trained model first:
+        pt_w, pt_b = self.model.export_weights_and_biases(output_layer=False)
+        X_transform = _pretrained_forward_step(X, pt_w, pt_b)
+
+        # and then complete a forward step with the trained weights and biases
+        return NeuralNetClassifier._forward_step(
+            X_transform, self.weights, self.biases)[0]
+
+    def export_weights_and_biases(self, output_layer=True):
+        pt_weights, pt_biases = \
+            self.model.export_weights_and_biases(output_layer=False)
+        w = pt_weights + self.weights
+        b = pt_biases + self.biases
+
+        if output_layer:
+            return w, b
+        return w[:-1], b[:-1]
--- a/packtml/recommendation/init.py
+++ b/packtml/recommendation/init.py
@ -0,0 +1,7 @@
+# -*- coding: utf-8 -*-
+
+from .als import *
+from .data import *
+from .itemitem import *
+
+__all__ = [s for s in dir() if not s.startswith("_")]
--- a/packtml/recommendation/als.py
+++ b/packtml/recommendation/als.py
@ -0,0 +1,202 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from sklearn.utils.validation import check_random_state, check_array
+
+from numpy.linalg import solve
+import numpy as np
+
+from .base import RecommenderMixin
+from ..base import BaseSimpleEstimator
+
+__all__ = [
+    'ALS'
+]
+
+try:
+    xrange
+except NameError:  # py3 does not have xrange
+    xrange = range
+
+
+def mse(R, X, Y, W):
+    """Compute the reconstruction MSE. This is our loss function"""
+    return ((W * (R - X.dot(Y))) ** 2).sum()
+
+
+class ALS(BaseSimpleEstimator, RecommenderMixin):
+    r"""Alternating Least Squares for explicit ratings matrices.
+
+    Computes the ALS user factors and item factors for explicit ratings
+    systems. This solves:
+
+        R' = XY
+
+    where ``X`` is an (m x f) matrix of user factors, and ``Y`` is an
+    (f x n) matrix of item factors. Note that for very large ratings matrices,
+    this can quickly grow outside the scope of what will fit into memory!
+
+    Parameters
+    ----------
+    R : array-like, shape=(n_users, n_items)
+        The ratings matrix. This must be an explicit ratings matrix where
+        0 indicates an item that a user has not yet rated.
+
+    factors : int or float, optional (default=0.25)
+        The number of factors to learn. Default is ``0.25 * n_items``.
+
+    n_iter : int, optional (default=10)
+        The number of iterations to perform. The larger the number, the
+        smaller the train error, but the more likely to overfit.
+
+    lam : float, optional (default=0.001)
+        The L2 regularization parameter. The higher ``lam``, the more
+        regularization is performed, and the more robust the solution. However,
+        extra iterations are typically required.
+
+    random_state : int, None or RandomState, optional (default=None)
+        The random state for seeding the initial item factors matrix, ``Y``.
+
+    Attributes
+    ----------
+    X : np.ndarray, shape=(n_users, factors)
+        The user factors
+
+    Y : np.ndarray, shape=(factors, n_items)
+        The item factors
+
+    train_err : list
+        The list of training MSE for each iteration performed
+
+    lam : float
+        The lambda (regularization) value.
+
+    Notes
+    -----
+    If you plan to use a very large matrix, consider using a sparse CSR matrix
+    to preserve memory, but you'll have to amend the ``recommend_for_user``
+    function, which expects dense output.
+    """
+    def __init__(self, R, factors=0.25, n_iter=10, lam=0.001,
+                 random_state=None):
+        # check the array
+        R = check_array(R, dtype=np.float32)  # type: np.ndarray
+        n_users, n_items = R.shape
+
+        # get the random state
+        random_state = check_random_state(random_state)
+
+        # get the number of factors. If it's a float, compute it
+        if isinstance(factors, float):
+            factors = min(np.ceil(factors * n_items).astype(int), n_items)
+
+        # the weight matrix is used as a masking matrix when computing the MSE.
+        # it allows us to only compute the reconstruction MSE on the rated
+        # items, and not the unrated ones.
+        W = (R > 0.).astype(np.float32)
+
+        # initialize the first array, Y, and X to None
+        Y = random_state.rand(factors, n_items)
+        X = None
+
+        # the identity matrix (time lambda) is added to the XX or YY product
+        # at each iteration.
+        I = np.eye(factors) * lam
+
+        # this list will store all of the training errors
+        train_err = []
+
+        # for each iteration, iteratively solve for X, Y, and compute the
+        # updated MSE
+        for i in xrange(n_iter):
+            X = solve(Y.dot(Y.T) + I, Y.dot(R.T)).T
+            Y = solve(X.T.dot(X) + I, X.T.dot(R))
+
+            # update the training error
+            train_err.append(mse(R, X, Y, W))
+
+        # now we have X, Y, which are our user factors and item factors
+        self.X = X
+        self.Y = Y
+        self.train_err = train_err
+        self.n_factors = factors
+        self.lam = lam
+
+    def predict(self, R, recompute_users=False):
+        """Generate predictions for the test set.
+
+        Computes the predicted product of ``XY`` given the fit factors.
+        If recomputing users, will learn the new user factors given the
+        existing item factors.
+        """
+        R = check_array(R, dtype=np.float32, copy=False)  # type: np.ndarray
+        Y = self.Y  # item factors
+        n_factors, _ = Y.shape
+
+        # we can re-compute user factors on their updated ratings, if we want.
+        # (not always advisable, but can be useful for offline recommenders)
+        if recompute_users:
+            I = np.eye(n_factors) * self.lam
+            X = solve(Y.dot(Y.T) + I, Y.dot(R.T)).T
+        else:
+            X = self.X
+
+        return X.dot(Y)
+
+    def recommend_for_user(self, R, user, n=10, recompute_user=False,
+                           filter_previously_seen=False,
+                           return_scores=True):
+        """Generate predictions for a single user.
+
+        Parameters
+        ----------
+        R : array-like, shape=(n_users, n_items)
+            The test ratings matrix. This must be an explicit ratings matrix
+            where 0 indicates an item that a user has not yet rated.
+
+        user : int
+            The user index for whom to generate predictions.
+
+        n : int or None, optional (default=10)
+            The number of recommendations to return. Default is 10. For all,
+            set to None.
+
+        recompute_user : bool, optional (default=False)
+            Whether to recompute the user factors given the test set.
+            Not always advisable, as it can be considered leakage, but can
+            be useful in an offline recommender system where refits are
+            infrequent.
+
+        filter_previously_seen : bool, optional (default=False)
+            Whether to filter out previously-rated items.
+
+        return_scores : bool, optional (default=True)
+            Whether to return the computed scores for the recommended items.
+
+        Returns
+        -------
+        items : np.ndarray
+            The top ``n`` items recommended for the user.
+
+        scores (optional) : np.ndarray
+            The corresponding scores for the top ``n`` items for the user.
+            Only returned if ``return_scores`` is True.
+        """
+        R = check_array(R, dtype=np.float32, copy=False)
+
+        # compute the new user vector. Squeeze to make sure it's a vector
+        user_vec = self.predict(R, recompute_users=recompute_user)[user, :]
+        item_indices = np.arange(user_vec.shape[0])
+
+        # if we are filtering previously seen, remove the prior-rated items
+        if filter_previously_seen:
+            rated_mask = R[user, :] != 0.
+            user_vec = user_vec[~rated_mask]
+            item_indices = item_indices[~rated_mask]
+
+        order = np.argsort(-user_vec)[:n]  # descending order of computed scores
+        items = item_indices[order]
+        if return_scores:
+            return items, user_vec[order]
+        return items
--- a/packtml/recommendation/base.py
+++ b/packtml/recommendation/base.py
@ -0,0 +1,42 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from sklearn.externals import six
+from abc import ABCMeta, abstractmethod
+
+__all__ = [
+    'RecommenderMixin'
+]
+
+try:
+    xrange
+except NameError:  # py3
+    xrange = range
+
+
+class RecommenderMixin(six.with_metaclass(ABCMeta)):
+    """Mixin interface for recommenders.
+
+    This class should be inherited by recommender algorithms. It provides an
+    abstract interface for generating recommendations for a user, and a
+    function for creating recommendations for all users.
+    """
+    @abstractmethod
+    def recommend_for_user(self, R, user, n=10, filter_previously_seen=False,
+                           return_scores=True, **kwargs):
+        """Generate recommendations for a user.
+
+        A method that should be overridden by subclasses to create
+        recommendations via their own prediction strategy.
+        """
+
+    def recommend_for_all_users(self, R, n=10,
+                                filter_previously_seen=False,
+                                return_scores=True, **kwargs):
+        """Create recommendations for all users."""
+        return (
+            self.recommend_for_user(
+                R, user, n=n, filter_previously_seen=filter_previously_seen,
+                return_scores=return_scores, **kwargs)
+            for user in xrange(R.shape[0]))
--- a/packtml/recommendation/data.py
+++ b/packtml/recommendation/data.py
@ -0,0 +1,77 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+import numpy as np
+
+__all__ = [
+    'get_completely_fabricated_ratings_data'
+]
+
+
+def get_completely_fabricated_ratings_data():
+    """Disclaimer: this is a made-up data set.
+
+    Get a ratings data set for use with one of the packtml recommenders.
+    This data set is a completely made-up ratings matrix consisting of
+    cult classics, all of which are awesome (seriously, if there are any
+    you haven't seen, you should).
+
+    (Please
+                    don't
+                sue
+
+                             me......)
+
+    The data contains 5 users and 15 items (movies). Movies:
+
+      0)  Ghost Busters
+      1)  Ghost Busters 2
+      2)  The Goonies
+      3)  Big Trouble in Little China
+      4)  The Rocky Horror Picture Show
+      5)  A Clockwork Orange
+      6)  Pulp Fiction
+      7)  Bill & Ted's Excellent Adventure
+      8)  Weekend at Bernie's
+      9)  Dumb and Dumber
+      10) Clerks
+      11) Jay & Silent Bob Strike Back
+      12) Tron
+      13) Total Recall
+      14) The Princess Bride
+
+    Notes
+    -----
+    Seriously, I fabricated all of these ratings semi-haphazardly. Don't
+    take this as me bashing any movies.
+    """
+    return (np.array([
+        # user 0 is a classic 30-yo millennial who is nostalgic for the 90s
+        [5.0, 3.5, 5.0, 0.0, 0.0, 0.0, 4.5, 3.0,
+         0.0, 2.5, 4.0, 4.0, 0.0, 1.5, 3.0],
+
+        # user 1 is a 40-yo who only likes action
+        [1.5, 0.0, 0.0, 1.0, 0.0, 4.0, 5.0, 0.0,
+         2.0, 0.0, 3.0, 3.5, 0.0, 4.0, 0.0],
+
+        # user 2 is a 12-yo whose parents are strict about what she watches.
+        [4.5, 4.0, 5.0, 0.0, 0.0, 0.0, 0.0, 4.0,
+         3.5, 5.0, 0.0, 0.0, 0.0, 0.0, 5.0],
+
+        # user 3 has just about seen it all, and doesn't really care for
+        # the goofy stuff. (but seriously, who rates the Goonies 2/5???)
+        [2.0, 1.0, 2.0, 1.0, 2.5, 4.5, 4.5, 0.5,
+         1.5, 1.0, 2.0, 2.5, 3.5, 3.5, 2.0],
+
+        # user 4 has just opened a netflix account and hasn't had a chance
+        # to watch too much
+        [0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0,
+         0.0, 0.0, 0.0, 1.5, 4.0, 0.0, 0.0],
+    ]), np.array(["Ghost Busters", "Ghost Busters 2",
+                  "The Goonies", "Big Trouble in Little China",
+                  "The Rocky Horror Picture Show", "A Clockwork Orange",
+                  "Pulp Fiction", "Bill & Ted's Excellent Adventure",
+                  "Weekend at Bernie's", "Dumb and Dumber", "Clerks",
+                  "Jay & Silent Bob Strike Back", "Tron", "Total Recall",
+                  "The Princess Bride" ]))
--- a/packtml/recommendation/itemitem.py
+++ b/packtml/recommendation/itemitem.py
@ -0,0 +1,140 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from sklearn.utils.validation import check_array
+from sklearn.metrics.pairwise import cosine_similarity
+
+import numpy as np
+
+from .base import RecommenderMixin
+from ..base import BaseSimpleEstimator
+
+__all__ = [
+    'ItemItemRecommender'
+]
+
+try:
+    xrange
+except NameError:  # py3
+    xrange = range
+
+
+class ItemItemRecommender(BaseSimpleEstimator, RecommenderMixin):
+    """Item-to-item recommendation system using cosine similarity.
+
+    A collaborative filtering recommender algorithm that computes the cosine
+    similarity between each item and generates recommendations for users'
+    highly rated items by returning similar items.
+
+    Parameters
+    ----------
+    R : array-like, shape=(n_users, n_items)
+        The ratings matrix. This must be an explicit ratings matrix where
+        0 indicates an item that a user has not yet rated.
+
+    Attributes
+    ----------
+    similarity : np.ndarray, shape=(n_items, n_items)
+        The similarity matrix.
+
+    Notes
+    -----
+    This implementation is very rudimentary and does not allow tuning of
+    hyper-parameters apart from ``k``. No similarity metrics apart from cosine
+    similarity may be used. It is largely written to optimize readability. For
+    a very highly optimized version, try the "implicit" library.
+    """
+    def __init__(self, R, k=10):
+        # check the array, but don't copy if not needed
+        R = check_array(R, dtype=np.float32, copy=False)  # type: np.ndarray
+
+        # save the hyper param for later use later
+        self.k = k
+
+        # compute the similarity between all the items. This calculates the
+        # similarity between each ITEM
+        sim = cosine_similarity(R.T)
+
+        # Only keep the similarities of the top K, setting all others to zero
+        # (negative since we want descending)
+        not_top_k = np.argsort(-sim, axis=1)[:, k:]  # shape=(n_items, k)
+
+        if not_top_k.shape[1]:  # only if there are cols (k < n_items)
+            # now we have to set these to zero in the similarity matrix
+            row_indices = np.repeat(range(not_top_k.shape[0]),
+                                    not_top_k.shape[1])
+            sim[row_indices, not_top_k.ravel()] = 0.
+
+        self.similarity = sim
+
+    def recommend_for_user(self, R, user, n=10,
+                           filter_previously_seen=False,
+                           return_scores=True, **kwargs):
+        """Generate predictions for a single user.
+
+        Parameters
+        ----------
+        R : array-like, shape=(n_users, n_items)
+            The test ratings matrix. This must be an explicit ratings matrix
+            where 0 indicates an item that a user has not yet rated.
+
+        user : int
+            The user index for whom to generate predictions.
+
+        n : int or None, optional (default=10)
+            The number of recommendations to return. Default is 10. For all,
+            set to None.
+
+        filter_previously_seen : bool, optional (default=False)
+            Whether to filter out previously-rated items.
+
+        return_scores : bool, optional (default=True)
+            Whether to return the computed scores for the recommended items.
+
+        **kwargs : keyword args
+            Ignored. Present to match super signature.
+
+        Returns
+        -------
+        items : np.ndarray
+            The top ``n`` items recommended for the user.
+
+        recommendations (optional) : np.ndarray
+            The corresponding scores for the top ``n`` items for the
+            user. Only returned if ``return_scores`` is True.
+        """
+
+        # check the array and get the user vector
+        R = check_array(R, dtype=np.float32, copy=False)
+        user_vector = R[user, :]
+
+        # compute the dot product between the user vector and the similarity
+        # matrix
+        recommendations = user_vector.dot(self.similarity)  # shape=(n_items,)
+
+        # if we're filtering previously-seen items, now is the time to do that
+        item_indices = np.arange(recommendations.shape[0])
+        if filter_previously_seen:
+            rated_mask = user_vector != 0.
+            recommendations = recommendations[~rated_mask]
+            item_indices = item_indices[~rated_mask]
+
+        # now arg sort descending (most similar items first)
+        order = np.argsort(-recommendations)[:n]
+        items = item_indices[order]
+
+        if return_scores:
+            return items, recommendations[order]
+        return items
+
+    def predict(self, R):
+        """Generate predictions for the test set.
+
+        Computes the predicted product of users' rated vectors on the
+        pre-computed similarity matrix.
+        """
+        R = check_array(R, dtype=np.float32, copy=False)  # type: np.ndarray
+
+        # compute the product R*sim
+        return R.dot(self.similarity)
--- a/packtml/recommendation/tests/init.py
+++ b/packtml/recommendation/tests/init.py
@ -0,0 +1,3 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
--- a/packtml/recommendation/tests/test_als.py
+++ b/packtml/recommendation/tests/test_als.py
@ -0,0 +1,44 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.recommendation import ALS
+
+# make up a ratings matrix...
+R = [[1.,  0.,  3.5,  2.,  0.,  0.,  0.,  1.5],
+     [0.,  2.,  3.,   0.,  0.,  2.5, 0.,  0. ],
+     [3.5, 4.,  2.,   0.,  4.5, 3.5, 0.,  2. ],
+     [3.,  3.5, 0.,   2.5, 3.,  0.,  0.,  0. ]]
+
+
+def test_als_simple_fit():
+    als = ALS(R, factors=3, n_iter=5, random_state=42)
+    assert len(als.train_err) == 5, als.train_err
+    assert als.n_factors == 3, als.n_factors
+
+    # assert all errors are decreasing over time
+    errs = list(zip(als.train_err[:-1], als.train_err[1:]))
+    assert all(new_err < last_err for last_err, new_err in errs), errs
+
+
+def test_als_predict():
+    als = ALS(R, factors=4, n_iter=8, random_state=42)
+    user0, scr = als.recommend_for_user(R, 0, filter_previously_seen=True,
+                                        return_scores=True)
+
+    # assert previously-rated items not present
+    rated = (0, 2, 3, 7)
+    for r in rated:  # previously-rated
+        assert r not in user0
+
+    # show the score lengths are the same
+    assert scr.shape[0] == user0.shape[0]
+
+    # now if we do NOT filter, assert those are present again (also, recompute)
+    user0, scr = als.recommend_for_user(R, 0, filter_previously_seen=False,
+                                        return_scores=True,
+                                        recompute_user=True)
+    for r in rated:
+        assert r in user0
+
+    assert user0.shape[0] == scr.shape[0]
--- a/packtml/recommendation/tests/test_itemitem.py
+++ b/packtml/recommendation/tests/test_itemitem.py
@ -0,0 +1,67 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.recommendation import ItemItemRecommender
+
+import numpy as np
+from numpy.testing import assert_array_almost_equal
+
+from types import GeneratorType
+
+# make up a ratings matrix...
+R = np.array([[1.,  0.,  3.5,  2.,  0.,  0.,  0.,  1.5],
+              [0.,  2.,  3.,   0.,  0.,  2.5, 0.,  0. ],
+              [3.5, 4.,  2.,   0.,  4.5, 3.5, 0.,  2. ],
+              [3.,  3.5, 0.,   2.5, 3.,  0.,  0.,  0. ]])
+
+
+def test_itemitem_simple():
+    rec = ItemItemRecommender(R, k=3)
+
+    # assert on the similarity
+    expected = np.array([
+        [ 1.        ,  0.91461057,  0.        ,  0.        ,  0.9701687 ,
+          0.        ,  0.        ,  0.        ],
+        [ 0.91461057,  1.        ,  0.        ,  0.        ,  0.92793395,
+          0.        ,  0.        ,  0.        ],
+        [ 0.        ,  0.        ,  1.        ,  0.        ,  0.        ,
+          0.6708902 ,  0.        ,  0.73632752],
+        [ 0.62906665,  0.48126166,  0.        ,  1.        ,  0.        ,
+          0.        ,  0.        ,  0.        ],
+        [ 0.9701687 ,  0.92793395,  0.        ,  0.        ,  1.        ,
+          0.        ,  0.        ,  0.        ],
+        [ 0.        ,  0.77786258,  0.        ,  0.        ,  0.67706717,
+          1.        ,  0.        ,  0.        ],
+        [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
+          0.        ,  0.        ,  0.        ],
+        [ 0.72079856,  0.        ,  0.73632752,  0.        ,  0.        ,
+          0.        ,  0.        ,  1.        ]])
+
+    assert_array_almost_equal(expected, rec.similarity)
+
+    # show we can generate recommendations
+    rec0, scores0 = rec.recommend_for_user(R, 0)
+
+    # we didn't filter, so the rated items should still be present
+    assert np.in1d([0, 2, 3, 7], rec0).all()
+
+    # re-compute and show the previously-rated are not present
+    rec0_filtered, scores0_filtered = rec.recommend_for_user(
+        R, 0, filter_previously_seen=True)
+
+    assert len(rec0_filtered) == 4, rec0_filtered
+    assert rec0_filtered.tolist() == [5, 1, 4, 6]
+
+    # test the prediction, which is just a big product...
+    pred = rec.predict(R)
+    assert pred.shape == R.shape
+
+    # get recommendations for ALL users
+    recommendations = rec.recommend_for_all_users(R, return_scores=False,
+                                                  filter_previously_seen=False)
+
+    assert isinstance(recommendations, GeneratorType)
+    recs = list(recommendations)
+    assert len(recs) == 4
+    assert all(len(x) == 8 for x in recs)
--- a/packtml/regression/init.py
+++ b/packtml/regression/init.py
@ -0,0 +1,7 @@
+# -*- coding: utf-8 -*-
+
+from .simple_regression import *
+from .simple_logistic import *
+
+__all__ = [s for s in dir() if not s.startswith("_")]
+
--- a/packtml/regression/simple_logistic.py
+++ b/packtml/regression/simple_logistic.py
@ -0,0 +1,123 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from sklearn.utils.validation import check_X_y, check_array
+
+import numpy as np
+
+from ..utils.extmath import log_likelihood, logistic_sigmoid
+from ..utils.validation import assert_is_binary
+from ..base import BaseSimpleEstimator
+
+__all__ = [
+    'SimpleLogisticRegression'
+]
+
+try:
+    xrange
+except NameError:  # py 3 doesn't have an xrange
+    xrange = range
+
+
+class SimpleLogisticRegression(BaseSimpleEstimator):
+    """Simple logistic regression.
+
+    This class provides a very simple example of straight forward logistic
+    regression with an intercept. There are few tunable parameters aside from
+    the number of iterations, & learning rate, and the model is fit upon
+    class initialization.
+
+    Parameters
+    ----------
+    X : array-like, shape=(n_samples, n_features)
+        The array of predictor variables. This is the array we will use
+        to regress on ``y``.
+
+    y : array-like, shape=(n_samples,)
+        This is the target array on which we will regress to build
+        our model. It should be binary (0, 1).
+
+    n_steps : int, optional (default=100)
+        The number of iterations to perform.
+
+    learning_rate : float, optional (default=0.001)
+        The learning rate.
+
+    loglik_interval : int, optional (default=5)
+        How frequently to compute the log likelihood. This is an expensive
+        operation--computing too frequently will be very expensive.
+
+    Attributes
+    ----------
+    theta : array-like, shape=(n_features,)
+        The coefficients
+
+    intercept : float
+        The intercept term
+
+    log_likelihood : list
+        A list of the iterations' log-likelihoods
+    """
+    def __init__(self, X, y, n_steps=100, learning_rate=0.001,
+                 loglik_interval=5):
+        X, y = check_X_y(X, y, accept_sparse=False,  # keep dense for example
+                         y_numeric=True)
+
+        # we want to make sure y is binary since that's all our example covers
+        assert_is_binary(y)
+
+        # X should be centered/scaled for logistic regression, much like
+        # with linear regression
+        means, stds = X.mean(axis=0), X.std(axis=0)
+        X = (X - means) / stds
+
+        # since we're going to learn an intercept, we can cheat and set the
+        # intercept to be a new feature that we'll learn with everything else
+        X_w_intercept = np.hstack((np.ones((X.shape[0], 1)), X))
+
+        # initialize the coefficients as zeros
+        theta = np.zeros(X_w_intercept.shape[1])
+
+        # now for each step, we compute the inner product of X and the
+        # coefficients, transform the predictions with the sigmoid function,
+        # and adjust the weights by the gradient
+        ll = []
+        for iteration in xrange(n_steps):
+            preds = logistic_sigmoid(X_w_intercept.dot(theta))
+            residuals = y - preds  # The error term
+            gradient = X_w_intercept.T.dot(residuals)
+
+            # update the coefficients
+            theta += learning_rate * gradient
+
+            # you may not always want to do this, since it's expensive. Tune
+            # the error_interval to increase/reduce this
+            if (iteration + 1) % loglik_interval == 0:
+                ll.append(log_likelihood(X_w_intercept, y, theta))
+
+        # recall that our theta includes the intercept, so we need to pop
+        # that off and store it
+        self.intercept = theta[0]
+        self.theta = theta[1:]
+        self.log_likelihood = ll
+        self.column_means = means
+        self.column_std = stds
+
+    def predict_proba(self, X):
+        """Generate the probabilities that a sample belongs to class 1"""
+        X = check_array(X, accept_sparse=False, copy=False)  # type: np.ndarray
+
+        # make sure dims match
+        theta = self.theta
+        if theta.shape[0] != X.shape[1]:
+            raise ValueError("Dim mismatch in predictors!")
+
+        # scale the data appropriately
+        X = (X - self.column_means) / self.column_std
+
+        # creates a copy
+        return logistic_sigmoid(np.dot(X, theta.T) + self.intercept)
+
+    def predict(self, X):
+        return np.round(self.predict_proba(X)).astype(int)
--- a/packtml/regression/simple_regression.py
+++ b/packtml/regression/simple_regression.py
@ -0,0 +1,100 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from sklearn.utils.validation import check_X_y, check_array
+
+import numpy as np
+from numpy.linalg import lstsq
+
+from ..base import BaseSimpleEstimator
+
+
+__all__ = [
+    'SimpleLinearRegression'
+]
+
+
+class SimpleLinearRegression(BaseSimpleEstimator):
+    """Simple linear regression.
+
+    This class provides a very simple example of straight forward OLS
+    regression with an intercept. There are no tunable parameters, and
+    the model fit happens directly on class instantiation.
+
+    Parameters
+    ----------
+    X : array-like, shape=(n_samples, n_features)
+        The array of predictor variables. This is the array we will use
+        to regress on ``y``.
+
+    y : array-like, shape=(n_samples,)
+        This is the target array on which we will regress to build
+        our model.
+
+    Attributes
+    ----------
+    theta : array-like, shape=(n_features,)
+        The least-squares solution (the coefficients)
+
+    rank : int
+        The rank of the predictor matrix, ``X``
+
+    singular_values : array-like, shape=(n_features,)
+        The singular values of ``X``
+
+    X_means : array-like, shape=(n_features,)
+        The column means of the predictor matrix, ``X``
+
+    y_mean : float
+        The mean of the target variable, ``y``
+
+    intercept : float
+        The intercept term
+    """
+    def __init__(self, X, y):
+        # First check X, y and make sure they are of equal length, no NaNs
+        # and that they are numeric
+        X, y = check_X_y(X, y, y_numeric=True,
+                         accept_sparse=False)  # keep it simple
+
+        # Next, we want to scale all of our features so X is centered
+        # We will do the same with our target variable, y
+        X_means = np.average(X, axis=0)
+        y_mean = y.mean(axis=0)
+
+        # don't do in place, so we get a copy
+        X = X - X_means
+        y = y - y_mean
+
+        # Let's compute the least squares on X wrt y
+        # Least squares solves the equation `a x = b` by computing a
+        # vector `x` that minimizes the Euclidean 2-norm `|| b - a x ||^2`.
+        theta, _, rank, singular_values = lstsq(X, y)
+
+        # finally, we compute the intercept values as the mean of the target
+        # variable MINUS the inner product of the X_means and the coefficients
+        intercept = y_mean - np.dot(X_means, theta.T)
+
+        # ... and set everything as an instance attribute
+        self.theta = theta
+        self.rank = rank
+        self.singular_values = singular_values
+
+        # we have to retain some of the statistics around the data too
+        self.X_means = X_means
+        self.y_mean = y_mean
+        self.intercept = intercept
+
+    def predict(self, X):
+        """Compute new predictions for X"""
+        # copy, make sure numeric, etc...
+        X = check_array(X, accept_sparse=False, copy=False)  # type: np.ndarray
+
+        # make sure dims match
+        theta = self.theta
+        if theta.shape[0] != X.shape[1]:
+            raise ValueError("Dim mismatch in predictors!")
+
+        # creates a copy
+        return np.dot(X, theta.T) + self.intercept
--- a/packtml/regression/tests/init.py
+++ b/packtml/regression/tests/init.py
@ -0,0 +1,3 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
--- a/packtml/regression/tests/test_simple_logistic.py
+++ b/packtml/regression/tests/test_simple_logistic.py
@ -0,0 +1,27 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.regression import SimpleLogisticRegression
+from sklearn.datasets import make_classification
+from sklearn.metrics import accuracy_score
+
+import numpy as np
+
+X, y = make_classification(n_samples=100, n_features=2, random_state=42,
+                           n_redundant=0, n_repeated=0, n_classes=2,
+                           class_sep=1.0)
+
+
+def test_simple_logistic():
+    lm = SimpleLogisticRegression(X, y, n_steps=50, loglik_interval=10)
+    assert np.allclose(lm.theta, np.array([ 1.32320936, -0.03926072]))
+
+    # test that we can predict
+    preds = lm.predict(X)
+
+    # show we're better than chance
+    assert accuracy_score(y, preds) > 0.5
+
+    # show that we only computed the log likelihood 5 times
+    assert len(lm.log_likelihood) == 5, lm.log_likelihood
--- a/packtml/regression/tests/test_simple_regression.py
+++ b/packtml/regression/tests/test_simple_regression.py
@ -0,0 +1,21 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.regression import SimpleLinearRegression
+
+import numpy as np
+from numpy.testing import assert_almost_equal
+
+
+def test_simple_linear_regression():
+    # y = 2a + 1.5b + 0
+    random_state = np.random.RandomState(42)
+    X = random_state.rand(100, 2)
+    y = 2. * X[:, 0] + 1.5 * X[:, 1]
+
+    lm = SimpleLinearRegression(X, y)
+    predictions = lm.predict(X)
+    residuals = y - predictions
+    assert_almost_equal(residuals.sum(), 0.)
+    assert np.allclose(lm.theta, [2., 1.5])
--- a/packtml/utils/init.py
+++ b/packtml/utils/init.py
@ -0,0 +1,8 @@
+# -*- coding: utf-8 -*-
+
+from .extmath import *
+from .linalg import *
+from .plotting import *
+from .validation import *
+
+__all__ = [s for s in dir() if not s.startswith("_")]
--- a/packtml/utils/extmath.py
+++ b/packtml/utils/extmath.py
@ -0,0 +1,60 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+import numpy as np
+
+__all__ = [
+    'log_likelihood',
+    'logistic_sigmoid'
+]
+
+
+def log_likelihood(X, y, w):
+    """Compute the log-likelihood function.
+
+    Computes the log-likelihood function over the training data.
+    The key to the log-likelihood is that the log of the product of
+    likelihoods becomes the sum of logs. That is (in pseudo-code),
+
+        np.log(np.product([f(i) for i in range(N)]))
+
+    is equivalent to:
+
+        np.sum([np.log(f(i)) for i in range(N)])
+
+    The log-likelihood function is used in computing the gradient for
+    our loss function since the derivative of the sum (of logs) is equivalent
+    to the sum of derivatives, which simplifies all of our math.
+
+    Parameters
+    ----------
+    X : np.ndarray, shape=(n_samples, n_features)
+        The training data.
+
+    y : np.ndarray, shape=(n_samples,)
+        The target vector of 1s or 0s.
+
+    w : np.ndarray, shape=(n_features,)
+        The vector of feature weights (coefficients)
+
+    References
+    ----------
+    .. [1] For a very thorough explanation of the log-likelihood function, see
+           https://www.coursera.org/learn/ml-classification/lecture/1ZeTC/very-optional-expressing-the-log-likelihood
+    """
+    weighted = X.dot(w)
+    return (y * weighted - np.log(1. + np.exp(weighted))).sum()
+
+
+def logistic_sigmoid(x):
+    """The logistic function.
+
+    Compute the logistic (sigmoid) function over a vector, ``x``.
+
+    Parameters
+    ----------
+    x : np.ndarray, shape=(n_samples,)
+        A vector to transform.
+    """
+    return 1. / (1. + np.exp(-x))
--- a/packtml/utils/linalg.py
+++ b/packtml/utils/linalg.py
@ -0,0 +1,28 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from numpy import linalg as la
+
+__all__ = [
+    'l2_norm'
+]
+
+
+def l2_norm(X, axis=0):
+    """Compute the L2 (Euclidean) norm of a matrix.
+
+    Computes the L2 norm along the specified axis. If axis is 0,
+    computes the norms along the columns. If 1, computes along the
+    rows.
+
+    Parameters
+    ----------
+    X : array-like, shape=(n_samples, n_features)
+        The matrix on which to compute the norm.
+
+    axis : int, optional (default=0)
+        The axis along which to compute the norm. 0 is for columns,
+        1 is for rows.
+    """
+    return la.norm(X, ord=None, axis=axis)
--- a/packtml/utils/plotting.py
+++ b/packtml/utils/plotting.py
@ -0,0 +1,160 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from matplotlib.colors import ListedColormap
+from matplotlib import pyplot as plt
+
+from .validation import learning_curve
+
+import numpy as np
+
+__all__ = [
+    'add_decision_boundary_to_axis',
+    'plot_learning_curve'
+]
+
+
+def add_decision_boundary_to_axis(estimator, axis, nclasses,
+                                  X_data, stepsize=0.02,
+                                  colors=('#FFAAAA', '#AAFFFA', '#AAAAFF')):
+    """Plot a classification decision boundary on an axis.
+
+    Estimates lots of values from a classifier and adds the color map
+    mesh to an axis. WARNING - use PRIOR to applying scatter values on the
+    axis!
+
+    Parameters
+    ----------
+    estimator : BaseSimpleEstimator
+        An estimator that implements ``predict``.
+
+    axis : matplotlib.Axis
+        The axis we're plotting on.
+
+    nclasses : int
+        The number of classes present in the data
+
+    X_data : np.ndarray, shape=(n_samples, n_features)
+        The X data used to fit the data, and along which to plot. Preferably
+        2 features for plotting. The first two will be used to plot.
+
+    stepsize : float, optional (default=0.02)
+        The size of the steps in the values on which to predict.
+
+    colors : tuple or iterable, optional
+        The color map
+
+    Returns
+    -------
+    xx : np.ndarray
+        The x array
+
+    yy : np.ndarray
+        The y array
+
+    axis : matplotlib.Axis
+        The axis
+    """
+    x_min, x_max = X_data[:, 0].min() - 1, X_data[:, 0].max() + 1
+    y_min, y_max = X_data[:, 1].min() - 1, X_data[:, 1].max() + 1
+    xx, yy = np.meshgrid(np.arange(x_min, x_max, stepsize),
+                         np.arange(y_min, y_max, stepsize))
+
+    Z = estimator.predict(np.c_[xx.ravel(), yy.ravel()])
+    Z = Z.reshape(xx.shape)
+
+    axis.pcolormesh(xx, yy, Z, cmap=ListedColormap(list(colors[:nclasses])))
+    return xx, yy, axis
+
+
+def plot_learning_curve(model, X, y, n_folds, metric, train_sizes,
+                        seed=None, trace=False, y_lim=None, **kwargs):
+    """Fit and plot a CV learning curve.
+
+    Fits the model with ``n_folds`` of cross-validation over various
+    training sizes and computes arrays of scores for the train samples
+    and the validation fold samples, then plots them.
+
+    Parameters
+    ----------
+    model : BaseSimpleEstimator
+        The model class that should be fit.
+
+    X : array-like, shape=(n_samples, n_features)
+        The training matrix.
+
+    y : array-like, shape=(n_samples,)
+        The training labels/ground-truth.
+
+    metric : callable
+        The scoring metric
+
+    train_sizes : iterable
+        The size of the training set for each fold.
+
+    n_folds : int, optional (default=3)
+        The number of CV folds
+
+    seed : int or None, optional (default=None)
+        The random seed for cross validation.
+
+    trace : bool, optional (default=False)
+        Whether to print to stdout after each set of folds is fit
+        for a given train size.
+
+    y_lim : iterable or None, optional (default=None)
+        The y-axis limits
+
+    **kwargs : keyword args or dict
+        The keyword args to pass to the estimator.
+
+    Returns
+    -------
+    plt : Figure
+        The matplotlib figure for plotting
+
+    References
+    ----------
+    .. [1] Based on the scikit-learn example:
+           http://scikit-learn.org/stable/auto_examples/model_selection/plot_learning_curve.html
+    """
+    # delegate the model fits to the function in .validation
+    train_scores, val_scores = learning_curve(
+        model, X, y, train_sizes=train_sizes,
+        metric=metric, seed=seed, trace=trace,
+        n_folds=n_folds, **kwargs)
+
+    # compute the means/stds of each scores list
+    train_scores_mean = np.mean(train_scores, axis=1)
+    val_scores_mean = np.mean(val_scores, axis=1)
+    train_scores_std = np.std(train_scores, axis=1)
+    val_scores_std = np.std(val_scores, axis=1)
+
+    # plot the learning curves
+    plt.figure()
+    plt.title("Learning curve (model=%s, train sizes=%s)"
+              % (model.__name__, str(train_sizes)))
+
+    plt.xlabel("Training sizes")
+    plt.ylabel("Score (%s)" % metric.__name__)
+    plt.grid()
+
+    # define the y-axis limit if necessary
+    if y_lim is not None:
+        plt.ylim(y_lim)
+
+    plt.fill_between(train_sizes, train_scores_mean - train_scores_std,
+                     train_scores_mean + train_scores_std, alpha=0.1,
+                     color="r")
+    plt.fill_between(train_sizes, val_scores_mean - val_scores_std,
+                     val_scores_mean + val_scores_std, alpha=0.1,
+                     color="g")
+
+    plt.plot(train_sizes, train_scores_mean, 'o-', color="r",
+             label="Training score")
+    plt.plot(train_sizes, val_scores_mean, 'o-', color="g",
+             label="Validation score")
+    plt.legend(loc="best")
+
+    return plt
--- a/packtml/utils/tests/init.py
+++ b/packtml/utils/tests/init.py
@ -0,0 +1,3 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
--- a/packtml/utils/tests/test_linalg.py
+++ b/packtml/utils/tests/test_linalg.py
@ -0,0 +1,23 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from sklearn.datasets import load_iris
+from packtml.utils import linalg
+
+from numpy.testing import assert_array_almost_equal
+import numpy as np
+
+iris = load_iris()
+X, y = iris.data, iris.target
+
+
+def test_row_norms():
+    means = np.average(X, axis=0)
+    X_centered = X - means
+
+    norms = linalg.l2_norm(X_centered, axis=0)
+    assert_array_almost_equal(
+        norms,
+        np.array([ 10.10783524,   5.29269308,
+                   21.53749599,   9.31556404]))
--- a/packtml/utils/tests/test_validation.py
+++ b/packtml/utils/tests/test_validation.py
@ -0,0 +1,37 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from packtml.utils import validation as val
+from packtml.regression import SimpleLogisticRegression
+
+from sklearn.metrics import accuracy_score
+from sklearn.datasets import load_breast_cancer
+
+bc = load_breast_cancer()
+X, y = bc.data, bc.target
+
+
+def test_is_iterable():
+    assert val.is_iterable([1, 2, 3])
+    assert val.is_iterable((1, 2, 3))
+    assert val.is_iterable({1, 2, 3})
+    assert val.is_iterable({1: 'a', 2: 'b'})
+    assert not val.is_iterable(123)
+    assert not val.is_iterable(None)
+    assert not val.is_iterable("a string")
+
+
+def test_learning_curves():
+    train_scores, val_scores = \
+        val.learning_curve(
+            SimpleLogisticRegression, X, y,
+            metric=accuracy_score,
+            train_sizes=(100, 250, 400),
+            n_folds=3, seed=42, trace=True,
+
+            # kwargs:
+            n_steps=20, loglik_interval=20)
+
+    assert train_scores.shape == (3, 3)
+    assert val_scores.shape == (3, 3)
--- a/packtml/utils/validation.py
+++ b/packtml/utils/validation.py
@ -0,0 +1,169 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+from sklearn.externals import six
+from sklearn.utils.validation import check_random_state
+from sklearn.model_selection import ShuffleSplit
+
+import numpy as np
+
+__all__ = [
+    'assert_is_binary',
+    'is_iterable',
+    'learning_curve'
+]
+
+
+def assert_is_binary(y):
+    """Validate that a vector is binary.
+
+    Checks that a vector is binary. This utility is used by all of
+    the simple classifier estimators to validate the input target.
+
+    Parameters
+    ----------
+    y : np.ndarray, shape=(n_samples,)
+        The target vector
+    """
+    # validate that y is in (0, 1)
+    unique_y = np.unique(y)  # type: np.ndarray
+    if unique_y.shape[0] != 2 or [0, 1] != unique_y.tolist():
+        raise ValueError("y must be binary, but got unique values of %s"
+                         % str(unique_y))
+
+
+def is_iterable(x):
+    """Determine whether an item is iterable.
+
+    Python 3 introduced the ``__iter__`` functionality to
+    strings, making them falsely behave like iterables. This
+    function determines whether an object is an iterable given
+    the presence of the ``__iter__`` method and that the object
+    is *not* a string.
+
+    Parameters
+    ----------
+    x : int, object, str, iterable, None
+        The object in question. Could feasibly be any type.
+    """
+    if isinstance(x, six.string_types):
+        return False
+    return hasattr(x, "__iter__")
+
+
+def learning_curve(model, X, y, metric, train_sizes, n_folds=3,
+                   seed=None, trace=False, **kwargs):
+    """Fit a CV learning curve.
+
+    Fits the model with ``n_folds`` of cross-validation over various
+    training sizes and returns arrays of scores for the train samples
+    and the validation fold samples.
+
+    Parameters
+    ----------
+    model : BaseSimpleEstimator
+        The model class that should be fit.
+
+    X : array-like, shape=(n_samples, n_features)
+        The training matrix.
+
+    y : array-like, shape=(n_samples,)
+        The training labels/ground-truth.
+
+    metric : callable
+        The scoring metric
+
+    train_sizes : iterable
+        The size of the training set for each fold.
+
+    n_folds : int, optional (default=3)
+        The number of CV folds
+
+    seed : int or None, optional (default=None)
+        The random seed for cross validation.
+
+    trace : bool, optional (default=False)
+        Whether to print to stdout after each set of folds is fit
+        for a given train size.
+
+    **kwargs : keyword args or dict
+        The keyword args to pass to the estimator.
+
+    Returns
+    -------
+    train_scores : np.ndarray, shape=(n_trials, n_folds)
+        The scores for the train samples. Each row represents a
+        trial (new train size), and each column corresponds to the
+        fold of the trial, i.e., for ``n_folds=3``, there will be
+        3 columns.
+
+    val_scores : np.ndarray, shape=(n_trials, n_folds)
+        The scores for the validation folds. Each row represents a
+        trial (new train size), and each column corresponds to the
+        fold of the trial, i.e., for ``n_folds=3``, there will be
+        3 columns.
+    """
+    # Each of these lists will be a 2d array. A row will represent a
+    # trial for a particular train size, and each column will
+    # correspond with a fold.
+    train_scores = []
+    val_scores = []
+
+    # The number of samples in the dataset
+    n_samples = X.shape[0]
+
+    # If the input is a pandas frame, make it a numpy array for indexing
+    if hasattr(X, "iloc"):
+        X = X.values
+
+    # We need to validate that all of the sizes within the train_sizes
+    # are less than the number of samples in the dataset!
+    assert all(s < n_samples for s in train_sizes), \
+        "All train sizes (%s) must be less than n_samples (%i)" \
+        % (str(train_sizes), n_samples)
+
+    # For each training size, we're going to initialize a new KFold
+    # cross validation instance and fit the K folds...
+    for train_size in train_sizes:
+        cv = ShuffleSplit(n_splits=n_folds,
+                          train_size=train_size,
+                          test_size=n_samples - train_size,
+                          random_state=seed)
+
+        # This is the inner list (row) that will represent the
+        # scores for this train size
+        inner_train_scores = []
+        inner_val_scores = []
+
+        # get our splits
+        for train_indices, test_indices in cv.split(X, y):
+            # get the training samples
+            train_X = X[train_indices, :]
+            train_y = y.take(train_indices)
+
+            # fit the model
+            m = model(train_X, train_y, **kwargs)
+
+            # score the model on the train set
+            inner_train_scores.append(
+                metric(train_y, m.predict(train_X)))
+
+            # score the model on the validation set
+            inner_val_scores.append(
+                metric(y.take(test_indices),
+                       m.predict(X[test_indices, :])))
+
+        # Now attach the inner lists to the outer lists
+        train_scores.append(inner_train_scores)
+        val_scores.append(inner_val_scores)
+
+        if trace:
+            print("Completed fitting %i folds for train size=%i"
+                  % (n_folds, train_size))
+
+    # Make our train/val arrays into numpy arrays
+    train_scores = np.asarray(train_scores)
+    val_scores = np.asarray(val_scores)
+
+    return train_scores, val_scores
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,5 @@
+numpy>=0.11
+scipy>=0.19
+scikit-learn>=0.18
+pandas
+matplotlib
--- a/setup.py
+++ b/setup.py
@ -0,0 +1,54 @@
+# -*- coding: utf-8 -*-
+
+from __future__ import absolute_import
+
+import sys
+import setuptools
+
+with open("packtml/VERSION", 'r') as vsn:
+    VERSION = vsn.read().strip()
+
+# Permitted args: "install" only, basically.
+UNSUPPORTED_COMMANDS = {  # this is a set literal, not a dict
+    'develop', 'release', 'bdist_egg', 'bdist_rpm',
+    'bdist_wininst', 'install_egg_info', 'build_sphinx',
+    'egg_info', 'easy_install', 'upload', 'bdist_wheel',
+    '--single-version-externally-managed', 'test', 'build_ext'
+}
+
+intersect = UNSUPPORTED_COMMANDS.intersection(set(sys.argv))
+if intersect:
+    msg = "The following arguments are unsupported: %s. " \
+          "To install, please use `python setup.py install`." \
+          % str(list(intersect))
+
+    # if "test" is in the arguments, make sure the user knows how to test.
+    if "test" in intersect:
+        msg += " To test, make sure pytest is installed, and after " \
+               "installation run `pytest packtml`"
+
+    raise ValueError(msg)
+
+# get requirements
+with open("requirements.txt") as req:
+    REQUIREMENTS = req.read().strip().split("\n")
+
+py_version_tag = '-%s.%s'.format(sys.version_info[:2])
+setuptools.setup(name="packtml",
+                 description="Hands-on Supervised Learning - teach a machine "
+                             "to think for itself!",
+                 author="Taylor G Smith",
+                 author_email="taylor.smith@alkaline-ml.com",
+                 packages=['packtml',
+                           'packtml/clustering',
+                           'packtml/decision_tree',
+                           'packtml/metrics',
+                           'packtml/neural_net',
+                           'packtml/recommendation',
+                           'packtml/regression',
+                           'packtml/utils'],
+                 zip_safe=False,
+                 include_package_data=True,
+                 install_requires=REQUIREMENTS,
+                 package_data={"packtml": ["*"]},
+                 version=VERSION)