It is an utility library for Kaggle and offline competitions. It is particularly focused on experiment tracking, feature engineering, and validation.
Go to file
momijiame 82d040d592 rename 'sklearn' to 'scikit-learn
- The 'sklearn' PyPI package is deprecated
2023-02-25 12:07:58 +09:00
.github/workflows drop python v3.5 support 2020-09-20 15:28:20 +09:00
docs skip imported_members 2020-09-07 12:49:19 +09:00
examples change params/metrics to json format, replace overwrite to if_exists 2020-02-20 23:42:04 +09:00
nyaggle add method option to averaging_opt 2022-03-15 20:02:06 +09:00
tests add method option to averaging_opt 2022-03-15 20:02:06 +09:00
.gitignore Ignore catboost_info 2020-02-28 10:10:22 +09:00
.readthedocs.yml refactor requirements 2019-12-31 10:00:46 +09:00
LICENSE Change license to MIT 2020-01-14 23:36:14 +09:00
MANIFEST.in Create pythonpublish.yml 2019-12-29 00:07:24 +09:00
README.md docs(readme): Grammar fixes 2022-04-23 22:56:29 -05:00
requirements-dev.txt support lightgbm 3.0.0 2020-09-06 23:10:16 +09:00
requirements.txt rename 'sklearn' to 'scikit-learn 2023-02-25 12:07:58 +09:00
setup.py rename 'sklearn' to 'scikit-learn 2023-02-25 12:07:58 +09:00

README.md

nyaggle

GitHub Actions CI Status GitHub Actions CI Status Python Versions Documentation Status

Documentation | Slide (Japanese)

nyaggle is an utility library for Kaggle and offline competitions. It is particularly focused on experiment tracking, feature engineering, and validation.

  • nyaggle.ensemble - Averaging & stacking
  • nyaggle.experiment - Experiment tracking
  • nyaggle.feature_store - Lightweight feature storage using feather-format
  • nyaggle.features - sklearn-compatible features
  • nyaggle.hyper_parameters - Collection of GBDT hyper-parameters used in past Kaggle competitions
  • nyaggle.validation - Adversarial validation & sklearn-compatible CV splitters

Installation

You can install nyaggle via pip:

Examples

Experiment Tracking

run_experiment() is a high-level API for experiments with cross validation. It outputs parameters, metrics, out of fold predictions, test predictions, feature importance, and submission.csv under the specified directory.

To enable mlflow tracking, include the optional with_mlflow=True parameter.

nyaggle also has a low-level API which has similar interface to mlflow tracking and wandb.

Feature Engineering

Target Encoding with K-Fold

Text Vectorization using BERT

You need to install pytorch to your virtual environment to use BertSentenceVectorizer. MaCab and mecab-python3 are also required if you use the Japanese BERT model.

Adversarial Validation

Validation Splitters

nyaggle provides a set of validation splitters that are compatible with sklearn.

Other Awesome Repositories

Here is a list of awesome repositories that provide general utility functions for data science competitions. Please let me know if you have another one :)