It is an utility library for Kaggle and offline competitions. It is particularly focused on experiment tracking, feature engineering, and validation.
Go to file
Taiga Noumi a8e8f48145 add option to capturing stdout 2020-02-27 23:15:13 +09:00
.github/workflows Remove unnecessary commands in pythonpublish.yml 2020-02-24 09:27:38 +09:00
docs update index 2020-02-26 23:13:21 +09:00
examples change params/metrics to json format, replace overwrite to if_exists 2020-02-20 23:42:04 +09:00
nyaggle add option to capturing stdout 2020-02-27 23:15:13 +09:00
tests add option to capturing stdout 2020-02-27 23:15:13 +09:00
.gitignore Add pytest cache 2020-02-22 09:46:15 +09:00
.readthedocs.yml refactor requirements 2019-12-31 10:00:46 +09:00
LICENSE Change license to MIT 2020-01-14 23:36:14 +09:00
MANIFEST.in Create pythonpublish.yml 2019-12-29 00:07:24 +09:00
README.md update docs 2020-02-26 23:29:48 +09:00
requirements-dev.txt Create requirements-dev.txt 2020-02-23 00:33:10 +09:00
requirements.txt remove ubelt 2020-02-18 18:54:39 +09:00
setup.py remove ubelt 2020-02-18 18:54:39 +09:00

README.md

nyaggle

GitHub Actions CI Status Python Versions Documentation Status

Documentation | Slide (Japanese)

nyaggle is a utility library for Kaggle and offline competitions, particularly focused on experiment tracking, feature engineering and validation.

  • nyaggle.ensemble - Averaging & stacking
  • nyaggle.experiment - Experiment tracking
  • nyaggle.feature_store - Lightweight feature storage using feather-format
  • nyaggle.features - sklearn-compatible features
  • nyaggle.hyper_parameters - Collection of GBDT hyper-parameters used in past Kaggle competitions
  • nyaggle.validation - Adversarial validation & sklearn-compatible CV splitters

Installation

You can install nyaggle via pip:

$pip install nyaggle

Examples

Experiment Tracking

run_experiment() is an high-level API for experiment with cross validation. It outputs parameters, metrics, out of fold predictions, test predictions, feature importance and submission.csv under the specified directory.

It can be combined with mlflow tracking.

nyaggle also has a low-level API which has similar interface to mlflow tracking and wandb.

Feature Engineering

Target Encoding with K-Fold

Text Vectorization using BERT

You need to install pytorch to your virtual environment to use BertSentenceVectorizer. MaCab and mecab-python3 are also required if you use Japanese BERT model.

Adversarial Validation

Validation Splitters

nyaggle provides a set of validation splitters that compatible with sklean interface.

Other Awesome Repositories

Here is a list of awesome repositories that provide general utility functions for data science competitions. Please let me know if you have another one :)