It is an utility library for Kaggle and offline competitions. It is particularly focused on experiment tracking, feature engineering, and validation.
Go to file
nyanp 53c04b1c66
Change license to MIT
2020-01-14 23:36:14 +09:00
.github/workflows install mlflow in CI 2020-01-07 01:25:42 +09:00
docs fix code style by code inspection 2020-01-13 23:47:45 +09:00
examples make id_column optional 2020-01-14 23:13:46 +09:00
nyaggle make id_column optional 2020-01-14 23:13:46 +09:00
tests fix test 2020-01-14 23:20:18 +09:00
.gitignore add GitHub action, python 3.5 support 2019-12-25 23:52:25 +09:00
.readthedocs.yml refactor requirements 2019-12-31 10:00:46 +09:00
LICENSE Change license to MIT 2020-01-14 23:36:14 +09:00
MANIFEST.in Create pythonpublish.yml 2019-12-29 00:07:24 +09:00
README.md make id_column optional 2020-01-14 23:13:46 +09:00
requirements.txt add dependency 2019-12-31 12:10:02 +09:00
setup.py fix code style by code inspection 2020-01-13 23:47:45 +09:00

README.md

nyaggle

GitHub Actions CI Status Python Versions

nyaggle is a utility library for Kaggle and offline competitions, particularly focused on feature engineering and validation. See the documentation for details.

  • Feature Engineering
    • K-Fold Target Encoding
    • BERT Sentence Vectorization
  • Model Validation
    • CV with OOF
    • Adversarial Validation
    • sklearn compatible time series splitter
  • Experiment
    • Experiment logging
    • High-level API for logging gradient boosting experiment
  • Ensemble
    • Blending

Installation

You can install nyaggle via pip:

$pip install nyaggle

Examples

Experiment Logging

experiment_gbdt() is an high-level API for cross validation using gradient boosting algorithm. It outputs parameters, metrics, out of fold predictions, test predictions, feature importance and submission.csv under the specified directory.

It can be combined with mlflow tracking.

Feature Engineering

Target Encoding with K-Fold

Text Vectorization using BERT

You need to install pytorch to your virtual environment to use BertSentenceVectorizer. MaCab and mecab-python3 are also required if you use Japanese BERT model.

Model Validation

cross_validate() is a handy API to calculate K-fold CV, Out-of-Fold prediction and test prediction at one time. You can pass LGBMClassifier/LGBMRegressor and any other sklearn models.