It is an utility library for Kaggle and offline competitions. It is particularly focused on experiment tracking, feature engineering, and validation.
Go to file
nyanp c0fac30372 add method option to averaging_opt 2022-03-15 20:02:06 +09:00
.github/workflows drop python v3.5 support 2020-09-20 15:28:20 +09:00
docs skip imported_members 2020-09-07 12:49:19 +09:00
examples change params/metrics to json format, replace overwrite to if_exists 2020-02-20 23:42:04 +09:00
nyaggle add method option to averaging_opt 2022-03-15 20:02:06 +09:00
tests add method option to averaging_opt 2022-03-15 20:02:06 +09:00
.gitignore Ignore catboost_info 2020-02-28 10:10:22 +09:00
.readthedocs.yml refactor requirements 2019-12-31 10:00:46 +09:00
LICENSE Change license to MIT 2020-01-14 23:36:14 +09:00
MANIFEST.in Create pythonpublish.yml 2019-12-29 00:07:24 +09:00
README.md add badge 2020-08-01 22:19:18 +09:00
requirements-dev.txt support lightgbm 3.0.0 2020-09-06 23:10:16 +09:00
requirements.txt typo 2020-09-03 19:11:21 +09:00
setup.py drop python v3.5 support 2020-09-20 15:28:20 +09:00

README.md

nyaggle

GitHub Actions CI Status GitHub Actions CI Status Python Versions Documentation Status

Documentation | Slide (Japanese)

nyaggle is a utility library for Kaggle and offline competitions, particularly focused on experiment tracking, feature engineering and validation.

  • nyaggle.ensemble - Averaging & stacking
  • nyaggle.experiment - Experiment tracking
  • nyaggle.feature_store - Lightweight feature storage using feather-format
  • nyaggle.features - sklearn-compatible features
  • nyaggle.hyper_parameters - Collection of GBDT hyper-parameters used in past Kaggle competitions
  • nyaggle.validation - Adversarial validation & sklearn-compatible CV splitters

Installation

You can install nyaggle via pip:

$pip install nyaggle

Examples

Experiment Tracking

run_experiment() is an high-level API for experiment with cross validation. It outputs parameters, metrics, out of fold predictions, test predictions, feature importance and submission.csv under the specified directory.

It can be combined with mlflow tracking.

nyaggle also has a low-level API which has similar interface to mlflow tracking and wandb.

Feature Engineering

Target Encoding with K-Fold

Text Vectorization using BERT

You need to install pytorch to your virtual environment to use BertSentenceVectorizer. MaCab and mecab-python3 are also required if you use Japanese BERT model.

Adversarial Validation

Validation Splitters

nyaggle provides a set of validation splitters that compatible with sklean interface.

Other Awesome Repositories

Here is a list of awesome repositories that provide general utility functions for data science competitions. Please let me know if you have another one :)