It is an utility library for Kaggle and offline competitions. It is particularly focused on experiment tracking, feature engineering, and validation.
Go to file
nyanp 86a9db4375
Merge pull request #113 from wakame1367/bugfix/lightgbm_v4
Temporary Fix for Issue #112
2023-07-22 23:15:41 +09:00
.github/workflows update python publish 2023-07-12 14:34:45 +00:00
docs skip imported_members 2020-09-07 12:49:19 +09:00
examples change params/metrics to json format, replace overwrite to if_exists 2020-02-20 23:42:04 +09:00
nyaggle v0.1.6 2023-07-12 13:32:11 +00:00
tests add method option to averaging_opt 2022-03-15 20:02:06 +09:00
.gitignore add ignore rules 2023-07-12 21:23:18 +09:00
.readthedocs.yml refactor requirements 2019-12-31 10:00:46 +09:00
LICENSE Change license to MIT 2020-01-14 23:36:14 +09:00
MANIFEST.in Create pythonpublish.yml 2019-12-29 00:07:24 +09:00
README.md docs(readme): Grammar fixes 2022-04-23 22:56:29 -05:00
requirements-dev.txt add version rule 2023-07-19 12:54:34 +09:00
requirements.txt rename 'sklearn' to 'scikit-learn 2023-02-25 12:07:58 +09:00
setup.py v0.1.6 2023-07-12 13:32:11 +00:00

README.md

nyaggle

GitHub Actions CI Status GitHub Actions CI Status Python Versions Documentation Status

Documentation | Slide (Japanese)

nyaggle is an utility library for Kaggle and offline competitions. It is particularly focused on experiment tracking, feature engineering, and validation.

  • nyaggle.ensemble - Averaging & stacking
  • nyaggle.experiment - Experiment tracking
  • nyaggle.feature_store - Lightweight feature storage using feather-format
  • nyaggle.features - sklearn-compatible features
  • nyaggle.hyper_parameters - Collection of GBDT hyper-parameters used in past Kaggle competitions
  • nyaggle.validation - Adversarial validation & sklearn-compatible CV splitters

Installation

You can install nyaggle via pip:

Examples

Experiment Tracking

run_experiment() is a high-level API for experiments with cross validation. It outputs parameters, metrics, out of fold predictions, test predictions, feature importance, and submission.csv under the specified directory.

To enable mlflow tracking, include the optional with_mlflow=True parameter.

nyaggle also has a low-level API which has similar interface to mlflow tracking and wandb.

Feature Engineering

Target Encoding with K-Fold

Text Vectorization using BERT

You need to install pytorch to your virtual environment to use BertSentenceVectorizer. MaCab and mecab-python3 are also required if you use the Japanese BERT model.

Adversarial Validation

Validation Splitters

nyaggle provides a set of validation splitters that are compatible with sklearn.

Other Awesome Repositories

Here is a list of awesome repositories that provide general utility functions for data science competitions. Please let me know if you have another one :)