update docs
parent
83e319d8c0
commit
e03e6bbbae
38
README.md
38
README.md
|
@ -2,11 +2,17 @@
|
|||
![GitHub Actions CI Status](https://github.com/nyanp/nyaggle/workflows/Python%20package/badge.svg)
|
||||
![Python Versions](https://img.shields.io/pypi/pyversions/nyaggle.svg?logo=python&logoColor=white)
|
||||
|
||||
**nyaggle** is a utility library for Kaggle and offline competitions,
|
||||
particularly focused on experiment logging, feature engineering and validation.
|
||||
[**Documentation**](https://nyaggle.readthedocs.io/en/latest/index.html)
|
||||
| [**Slide (Japanese)**](https://docs.google.com/presentation/d/1jv3J7DISw8phZT4z9rqjM-azdrQ4L4wWJN5P-gKL6fA/edit?usp=sharing)
|
||||
|
||||
- [documentation](https://nyaggle.readthedocs.io/en/latest/index.html)
|
||||
- [slide (Japanese)](https://docs.google.com/presentation/d/1jv3J7DISw8phZT4z9rqjM-azdrQ4L4wWJN5P-gKL6fA/edit?usp=sharing)
|
||||
**nyaggle** is a utility library for Kaggle and offline competitions,
|
||||
particularly focused on experiment tracking, feature engineering and validation.
|
||||
|
||||
- **nyaggle.experiment** - Experiment tracking
|
||||
- **nyaggle.feature_store** - Lightweight feature storage using feather-format
|
||||
- **nyaggle.features** - sklearn-compatible features
|
||||
- **nyaggle.hyper_parameters** - Collection of GBDT hyper-parameters used in past Kaggle competitions
|
||||
- **nyaggle.validation** - Adversarial validation & sklearn-compatible CV splitters
|
||||
|
||||
## Installation
|
||||
You can install nyaggle via pip:
|
||||
|
@ -16,7 +22,7 @@ $pip install nyaggle
|
|||
|
||||
## Examples
|
||||
|
||||
### Experiment Logging
|
||||
### Experiment Tracking
|
||||
`run_experiment()` is an high-level API for experiment with cross validation.
|
||||
It outputs parameters, metrics, out of fold predictions, test predictions,
|
||||
feature importance and submission.csv under the specified directory.
|
||||
|
@ -63,6 +69,28 @@ result = run_experiment(params,
|
|||
with_mlflow=True)
|
||||
```
|
||||
|
||||
nyaggle also has a low-level API which has similar interface to
|
||||
[mlflow tracking](https://www.mlflow.org/docs/latest/tracking.html) and [wandb](https://www.wandb.com/).
|
||||
|
||||
```python
|
||||
from nyaggle.experiment import Experiment
|
||||
|
||||
with Experiment(logging_directory='./output/') as exp:
|
||||
# log key-value pair as a parameter
|
||||
exp.log_param('lr', 0.01)
|
||||
exp.log_param('optimizer', 'adam')
|
||||
|
||||
# log text
|
||||
exp.log('blah blah blah')
|
||||
|
||||
# log metric
|
||||
exp.log_metric('CV', 0.85)
|
||||
|
||||
# log numpy ndarray, pandas dafaframe and any artifacts
|
||||
exp.log_numpy('predicted', predicted)
|
||||
exp.log_dataframe('submission', sub, file_format='csv')
|
||||
exp.log_artifact('path-to-your-file')
|
||||
```
|
||||
|
||||
### Feature Engineering
|
||||
|
||||
|
|
|
@ -6,10 +6,10 @@ You can install nyaggle via pip:
|
|||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install nyaggle
|
||||
pip install nyaggle # Install core parts of nyaggle
|
||||
|
||||
|
||||
nyaggle does not install the following packages by pip:
|
||||
nyaggle does not install the following packages by default:
|
||||
|
||||
- catboost
|
||||
- lightgbm
|
||||
|
@ -17,12 +17,17 @@ nyaggle does not install the following packages by pip:
|
|||
- mlflow
|
||||
- pytorch
|
||||
|
||||
You need to install these packages if you want to use them through nyaggle API.
|
||||
For example, you need to install xgboost before calling ``run_experiment`` with ``algorithm_type='xgb'``.
|
||||
|
||||
To use :code:`nyaggle.nlp.BertSentenceVectorizer`, you first need to install PyTorch.
|
||||
Please refer to `PyTorch installation page <https://pytorch.org/get-started/locally/#start-locally>`_
|
||||
to install Pytorch to your environment.
|
||||
Modules which depends on these packages won't work until you also install them.
|
||||
For example, ``run_experiment`` with ``algorithm_type='xgb'``, ``'lgbm'`` and ``'cat'`` options won't work
|
||||
until you also install xgboost, lightgbm and catboost respectively.
|
||||
|
||||
If you want to install everything required in nyaggle, This command can be used:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
pip install nyaggle[all] # Install everything
|
||||
|
||||
|
||||
If you use :code:`lang=ja` option in :code:`BertSentenceVecorizer`,
|
||||
you also need to intall MeCab and mecab-python3 package to your environment.
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
experiment
|
||||
nyaggle.experiment
|
||||
-----------------------
|
||||
|
||||
.. automodule:: nyaggle.experiment
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
feature_store
|
||||
nyaggle.feature_store
|
||||
---------------------------
|
||||
|
||||
.. automodule:: nyaggle.feature_store
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
feature
|
||||
nyaggle.feature
|
||||
----------------------------------------
|
||||
|
||||
.. automodule:: nyaggle.feature.category_encoder
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
hyper_parameters
|
||||
nyaggle.hyper_parameters
|
||||
--------------------------
|
||||
|
||||
.. automodule:: nyaggle.hyper_parameters
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
util
|
||||
nyaggle.util
|
||||
-----------------------
|
||||
|
||||
.. automodule:: nyaggle.util
|
||||
|
|
|
@ -1,4 +1,4 @@
|
|||
validation
|
||||
nyaggle.validation
|
||||
--------------------------
|
||||
|
||||
.. automodule:: nyaggle.validation
|
||||
|
|
|
@ -55,8 +55,8 @@ If you are familiar with mlflow tracking, you may notice that these APIs are sim
|
|||
|
||||
|
||||
|
||||
Log extra parameters to run_experiment
|
||||
---------------------------------------
|
||||
Logging extra parameters to run_experiment
|
||||
-------------------------------------------
|
||||
|
||||
By using ``inherit_experiment`` parameter, you can mix any additional logging with the results ``run_experiment`` will create.
|
||||
In the following example, nyaggle records the result of ``run_experiment`` under the same experiment as
|
||||
|
@ -74,3 +74,35 @@ the parameter and metrics written outside of the function.
|
|||
|
||||
exp.log_metrics('my extra metrics', 0.999)
|
||||
|
||||
|
||||
Tracking seed averaging experiment
|
||||
---------------------------------------
|
||||
|
||||
If you train a bunch of models with different seeds to ensemble them, tracking individual models with mlflow
|
||||
will make GUI filled up with these results and make it difficult to manage.
|
||||
A nested run functionality of mlflow is useful to display multiple models together in one result.
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
import mlflow
|
||||
from nyaggle.experiment import average_results
|
||||
|
||||
mlflow.start_run()
|
||||
base_logging_dir = './seed-avg/'
|
||||
results = []
|
||||
|
||||
for i in range(3):
|
||||
mlflow.start_run(nested=True) # use nested-run to place each experiments under the parent run
|
||||
params['seed'] = i
|
||||
|
||||
result = run_experiment(params,
|
||||
X_train,
|
||||
y_train,
|
||||
X_test,
|
||||
logging_directory=base_logging_dir+f'seed_{i}',
|
||||
with_mlflow=True)
|
||||
results.append(result)
|
||||
|
||||
mlflow.end_run()
|
||||
|
||||
average_results([base_logging_dir+f'seed_{i}' for i in range(3)], base_logging_dir+'sub.csv')
|
||||
|
|
Loading…
Reference in New Issue