update docs

pull/56/head
nyanp 2020-02-23 22:57:26 +09:00
parent 83e319d8c0
commit e03e6bbbae
9 changed files with 85 additions and 20 deletions

View File

@ -2,11 +2,17 @@
![GitHub Actions CI Status](https://github.com/nyanp/nyaggle/workflows/Python%20package/badge.svg)
![Python Versions](https://img.shields.io/pypi/pyversions/nyaggle.svg?logo=python&logoColor=white)
**nyaggle** is a utility library for Kaggle and offline competitions,
particularly focused on experiment logging, feature engineering and validation.
[**Documentation**](https://nyaggle.readthedocs.io/en/latest/index.html)
| [**Slide (Japanese)**](https://docs.google.com/presentation/d/1jv3J7DISw8phZT4z9rqjM-azdrQ4L4wWJN5P-gKL6fA/edit?usp=sharing)
- [documentation](https://nyaggle.readthedocs.io/en/latest/index.html)
- [slide (Japanese)](https://docs.google.com/presentation/d/1jv3J7DISw8phZT4z9rqjM-azdrQ4L4wWJN5P-gKL6fA/edit?usp=sharing)
**nyaggle** is a utility library for Kaggle and offline competitions,
particularly focused on experiment tracking, feature engineering and validation.
- **nyaggle.experiment** - Experiment tracking
- **nyaggle.feature_store** - Lightweight feature storage using feather-format
- **nyaggle.features** - sklearn-compatible features
- **nyaggle.hyper_parameters** - Collection of GBDT hyper-parameters used in past Kaggle competitions
- **nyaggle.validation** - Adversarial validation & sklearn-compatible CV splitters
## Installation
You can install nyaggle via pip:
@ -16,7 +22,7 @@ $pip install nyaggle
## Examples
### Experiment Logging
### Experiment Tracking
`run_experiment()` is an high-level API for experiment with cross validation.
It outputs parameters, metrics, out of fold predictions, test predictions,
feature importance and submission.csv under the specified directory.
@ -63,6 +69,28 @@ result = run_experiment(params,
with_mlflow=True)
```
nyaggle also has a low-level API which has similar interface to
[mlflow tracking](https://www.mlflow.org/docs/latest/tracking.html) and [wandb](https://www.wandb.com/).
```python
from nyaggle.experiment import Experiment
with Experiment(logging_directory='./output/') as exp:
# log key-value pair as a parameter
exp.log_param('lr', 0.01)
exp.log_param('optimizer', 'adam')
# log text
exp.log('blah blah blah')
# log metric
exp.log_metric('CV', 0.85)
# log numpy ndarray, pandas dafaframe and any artifacts
exp.log_numpy('predicted', predicted)
exp.log_dataframe('submission', sub, file_format='csv')
exp.log_artifact('path-to-your-file')
```
### Feature Engineering

View File

@ -6,10 +6,10 @@ You can install nyaggle via pip:
.. code-block:: bash
pip install nyaggle
pip install nyaggle # Install core parts of nyaggle
nyaggle does not install the following packages by pip:
nyaggle does not install the following packages by default:
- catboost
- lightgbm
@ -17,12 +17,17 @@ nyaggle does not install the following packages by pip:
- mlflow
- pytorch
You need to install these packages if you want to use them through nyaggle API.
For example, you need to install xgboost before calling ``run_experiment`` with ``algorithm_type='xgb'``.
To use :code:`nyaggle.nlp.BertSentenceVectorizer`, you first need to install PyTorch.
Please refer to `PyTorch installation page <https://pytorch.org/get-started/locally/#start-locally>`_
to install Pytorch to your environment.
Modules which depends on these packages won't work until you also install them.
For example, ``run_experiment`` with ``algorithm_type='xgb'``, ``'lgbm'`` and ``'cat'`` options won't work
until you also install xgboost, lightgbm and catboost respectively.
If you want to install everything required in nyaggle, This command can be used:
.. code-block:: bash
pip install nyaggle[all] # Install everything
If you use :code:`lang=ja` option in :code:`BertSentenceVecorizer`,
you also need to intall MeCab and mecab-python3 package to your environment.

View File

@ -1,4 +1,4 @@
experiment
nyaggle.experiment
-----------------------
.. automodule:: nyaggle.experiment

View File

@ -1,4 +1,4 @@
feature_store
nyaggle.feature_store
---------------------------
.. automodule:: nyaggle.feature_store

View File

@ -1,4 +1,4 @@
feature
nyaggle.feature
----------------------------------------
.. automodule:: nyaggle.feature.category_encoder

View File

@ -1,4 +1,4 @@
hyper_parameters
nyaggle.hyper_parameters
--------------------------
.. automodule:: nyaggle.hyper_parameters

View File

@ -1,4 +1,4 @@
util
nyaggle.util
-----------------------
.. automodule:: nyaggle.util

View File

@ -1,4 +1,4 @@
validation
nyaggle.validation
--------------------------
.. automodule:: nyaggle.validation

View File

@ -55,8 +55,8 @@ If you are familiar with mlflow tracking, you may notice that these APIs are sim
Log extra parameters to run_experiment
---------------------------------------
Logging extra parameters to run_experiment
-------------------------------------------
By using ``inherit_experiment`` parameter, you can mix any additional logging with the results ``run_experiment`` will create.
In the following example, nyaggle records the result of ``run_experiment`` under the same experiment as
@ -74,3 +74,35 @@ the parameter and metrics written outside of the function.
exp.log_metrics('my extra metrics', 0.999)
Tracking seed averaging experiment
---------------------------------------
If you train a bunch of models with different seeds to ensemble them, tracking individual models with mlflow
will make GUI filled up with these results and make it difficult to manage.
A nested run functionality of mlflow is useful to display multiple models together in one result.
.. code-block:: python
import mlflow
from nyaggle.experiment import average_results
mlflow.start_run()
base_logging_dir = './seed-avg/'
results = []
for i in range(3):
mlflow.start_run(nested=True) # use nested-run to place each experiments under the parent run
params['seed'] = i
result = run_experiment(params,
X_train,
y_train,
X_test,
logging_directory=base_logging_dir+f'seed_{i}',
with_mlflow=True)
results.append(result)
mlflow.end_run()
average_results([base_logging_dir+f'seed_{i}' for i in range(3)], base_logging_dir+'sub.csv')