It is an utility library for Kaggle and offline competitions. It is particularly focused on experiment tracking, feature engineering, and validation.
Go to file
Taiga Noumi 174cc19b53 change params/metrics to json format, replace overwrite to if_exists 2020-02-20 23:42:04 +09:00
.github/workflows drop 3.5 2020-02-20 23:32:29 +09:00
docs update docs, add extras_require option 2020-02-17 08:17:13 +09:00
examples change params/metrics to json format, replace overwrite to if_exists 2020-02-20 23:42:04 +09:00
nyaggle change params/metrics to json format, replace overwrite to if_exists 2020-02-20 23:42:04 +09:00
tests change params/metrics to json format, replace overwrite to if_exists 2020-02-20 23:42:04 +09:00
.gitignore add GitHub action, python 3.5 support 2019-12-25 23:52:25 +09:00
.readthedocs.yml refactor requirements 2019-12-31 10:00:46 +09:00
LICENSE Change license to MIT 2020-01-14 23:36:14 +09:00
MANIFEST.in Create pythonpublish.yml 2019-12-29 00:07:24 +09:00
README.md update docs 2020-02-13 20:56:58 +09:00
requirements.txt remove ubelt 2020-02-18 18:54:39 +09:00
setup.py remove ubelt 2020-02-18 18:54:39 +09:00

README.md

nyaggle

GitHub Actions CI Status Python Versions

nyaggle is a utility library for Kaggle and offline competitions, particularly focused on experiment logging, feature engineering and validation.

Installation

You can install nyaggle via pip:

$pip install nyaggle

Examples

Experiment Logging

run_experiment() is an high-level API for experiment with cross validation. It outputs parameters, metrics, out of fold predictions, test predictions, feature importance and submission.csv under the specified directory.

It can be combined with mlflow tracking.

Feature Engineering

Target Encoding with K-Fold

Text Vectorization using BERT

You need to install pytorch to your virtual environment to use BertSentenceVectorizer. MaCab and mecab-python3 are also required if you use Japanese BERT model.

Adversarial Validation

Validation Splitters

nyaggle provides a set of validation splitters that compatible with sklean interface.

Other Awesome Repositories

Here is a list of awesome repositories that provide general utility functions for data science competitions. Please let me know if you have another one :)