It is an utility library for Kaggle and offline competitions. It is particularly focused on experiment tracking, feature engineering, and validation.
Go to file
Taiga Noumi d210b73735 ver0.1.0 2020-02-06 20:28:40 +09:00
.github/workflows install mlflow in CI 2020-01-07 01:25:42 +09:00
docs Update nyaggle.rst 2020-01-19 14:07:24 +09:00
examples refactoring 2020-02-04 23:44:07 +09:00
nyaggle ver0.1.0 2020-02-06 20:28:40 +09:00
tests implement decorator for feature_store 2020-02-06 08:21:26 +09:00
.gitignore add GitHub action, python 3.5 support 2019-12-25 23:52:25 +09:00
.readthedocs.yml refactor requirements 2019-12-31 10:00:46 +09:00
LICENSE Change license to MIT 2020-01-14 23:36:14 +09:00
MANIFEST.in Create pythonpublish.yml 2019-12-29 00:07:24 +09:00
README.md refactoring 2020-02-04 23:44:07 +09:00
requirements.txt support xgboost 2020-01-31 00:05:51 +09:00
setup.py support xgboost 2020-01-31 00:05:51 +09:00

README.md

nyaggle

GitHub Actions CI Status Python Versions

nyaggle is a utility library for Kaggle and offline competitions, particularly focused on feature engineering and validation. See the documentation for details.

  • Feature Engineering
    • K-Fold Target Encoding
    • BERT Sentence Vectorization
  • Model Validation
    • CV with OOF
    • Adversarial Validation
    • sklearn compatible time series splitter
  • Experiment
    • Experiment logging
    • High-level API for logging gradient boosting experiment
  • Ensemble
    • Blending

Installation

You can install nyaggle via pip:

$pip install nyaggle

Examples

Experiment Logging

run_experiment() is an high-level API for experiment with cross validation. It outputs parameters, metrics, out of fold predictions, test predictions, feature importance and submission.csv under the specified directory.

It can be combined with mlflow tracking.

Feature Engineering

Target Encoding with K-Fold

Text Vectorization using BERT

You need to install pytorch to your virtual environment to use BertSentenceVectorizer. MaCab and mecab-python3 are also required if you use Japanese BERT model.

Model Validation

cross_validate() is a handy API to calculate K-fold CV, Out-of-Fold prediction and test prediction at one time. You can pass LGBMClassifier/LGBMRegressor and any other sklearn models.