It is an utility library for Kaggle and offline competitions. It is particularly focused on experiment tracking, feature engineering, and validation.
Go to file
Taiga Noumi e3146db70e v0.0.9 2020-01-25 09:00:28 +09:00
.github/workflows install mlflow in CI 2020-01-07 01:25:42 +09:00
docs Update nyaggle.rst 2020-01-19 14:07:24 +09:00
examples add gpu option in example 2020-01-25 08:50:23 +09:00
nyaggle v0.0.9 2020-01-25 09:00:28 +09:00
tests add validation 2020-01-25 09:00:12 +09:00
.gitignore add GitHub action, python 3.5 support 2019-12-25 23:52:25 +09:00
.readthedocs.yml refactor requirements 2019-12-31 10:00:46 +09:00
LICENSE Change license to MIT 2020-01-14 23:36:14 +09:00
MANIFEST.in Create pythonpublish.yml 2019-12-29 00:07:24 +09:00
README.md return submission df 2020-01-16 20:06:44 +09:00
requirements.txt add dependency 2020-01-23 20:22:08 +09:00
setup.py fix setup.py 2020-01-24 18:41:38 +09:00

README.md

nyaggle

GitHub Actions CI Status Python Versions

nyaggle is a utility library for Kaggle and offline competitions, particularly focused on feature engineering and validation. See the documentation for details.

  • Feature Engineering
    • K-Fold Target Encoding
    • BERT Sentence Vectorization
  • Model Validation
    • CV with OOF
    • Adversarial Validation
    • sklearn compatible time series splitter
  • Experiment
    • Experiment logging
    • High-level API for logging gradient boosting experiment
  • Ensemble
    • Blending

Installation

You can install nyaggle via pip:

$pip install nyaggle

Examples

Experiment Logging

experiment_gbdt() is an high-level API for cross validation using gradient boosting algorithm. It outputs parameters, metrics, out of fold predictions, test predictions, feature importance and submission.csv under the specified directory.

It can be combined with mlflow tracking.

Feature Engineering

Target Encoding with K-Fold

Text Vectorization using BERT

You need to install pytorch to your virtual environment to use BertSentenceVectorizer. MaCab and mecab-python3 are also required if you use Japanese BERT model.

Model Validation

cross_validate() is a handy API to calculate K-fold CV, Out-of-Fold prediction and test prediction at one time. You can pass LGBMClassifier/LGBMRegressor and any other sklearn models.