Commit Graph

195 Commits (master)

Author SHA1 Message Date
Andrej eba36e8464
Merge pull request #309 from ho2103/master
Fix AssertionError on macOS - need to check CUDA availability for bf16
2023-06-22 08:24:17 -07:00
o 1eaceae193 Fix AssertionError on macOS - need to check CUDA availability for bf16 2023-06-19 18:05:09 -04:00
Andrej 4eb7a96b07
Merge pull request #305 from okuvshynov/fix_osx_dataload
nanogpt: fix multiprocessing in load_dataset on os x
2023-06-17 20:26:35 -07:00
Oleksandr Kuvshynov 542ac51d1f nanogpt: fix multiprocessing in load_dataset on os x
The issue seems to be that _fixup_main_from_path in multiprocessing
module in python is unable to find entry point, thus, adding
```
if __name__ == '__main__'
```
2023-06-17 20:35:38 -04:00
Andrej 41d7014f7d
Merge pull request #301 from okuvshynov/master
[easy] allow multithreading in load_dataset
2023-06-16 18:30:03 -07:00
Oleksandr Kuvshynov bb7e96754a nanogpt: allow multithreading in load dataset 2023-06-16 20:00:17 -04:00
Andrej Karpathy 7339b904ef use WORLD_SIZE instead of device_count, supports both the case where the number of gpus we train on is smaller than gpus available, and also multinode training may be a bugfix 2023-06-14 23:33:07 +00:00
Andrej f08abb45bd
Merge pull request #274 from apivovarov/gelu
Use nn.GELU - 1.27x faster training
2023-06-14 16:25:15 -07:00
Andrej 18ee6b62b6
Merge pull request #275 from apivovarov/rm_unsqueeze
Remove pos unsqueeze(0)
2023-06-14 15:38:45 -07:00
Andrej ed7887c888
Merge pull request #270 from LaihoE/master
fix np.sum overflows on windows
2023-06-14 15:36:26 -07:00
Andrej 8020bb582b
Merge pull request #276 from apivovarov/gitign
Add more files to .gitignore
2023-06-14 15:30:39 -07:00
Andrej 0f06d9b889
Merge pull request #277 from apivovarov/is_bf16_supported
Use bf16 only if supported
2023-06-14 15:29:50 -07:00
Andrej cf4835ed6f
Merge pull request #286 from ctjlewis/master
docs: simplify dependencies installation
2023-06-14 15:21:04 -07:00
Lewis eeac8732b9
docs: simplify dependencies installation
Adds a `pip install ...` command that will install all necessary dependencies, while retaining original dependency notes. Added quick description of `tqdm` as well.
2023-05-31 23:04:08 -05:00
Alexander Pivovarov eb33b8bf1c Use bf16 only if supported 2023-05-17 03:26:48 +00:00
Alexander Pivovarov b120c421bf Add more files to .gitignore 2023-05-17 02:50:22 +00:00
Alexander Pivovarov 39ae397a93 Remove pos unsqueeze(0) 2023-05-17 02:30:18 +00:00
Alexander Pivovarov 594068e7ae Use nn.GELU 2023-05-17 00:53:35 +00:00
Laiho 6649b299eb np.sum overflows on windows 2023-05-09 16:36:59 +03:00
Andrej Karpathy 7fe4a099ad simplify configure_optimizers by a lot 2023-05-06 14:40:28 +00:00
Andrej 196160b849
Merge pull request #247 from gnobre/macbook-run-instructions
Macbook run instructions
2023-04-17 20:16:31 -07:00
Andrej 21f9bff7e4
Merge pull request #225 from otaviogood/grad_accum
Fix for gradient_accumulation_steps training slow
2023-04-17 20:11:25 -07:00
Andrej a6a708c7f1
Merge branch 'master' into grad_accum 2023-04-17 20:11:00 -07:00
Guilherme Nobre e30c8fda23
Merge branch 'karpathy:master' into macbook-run-instructions 2023-04-15 09:50:58 +01:00
Guilherme 4732c43af3 add macbook specific instructions to generate samples 2023-04-15 09:49:38 +01:00
Andrej d9f4735f5e
Merge pull request #10 from LaihoE/master
batch file write
2023-04-13 00:39:41 -07:00
Andrej b288f4cfb2
Merge pull request #146 from lutzroeder/master
Add .gitignore
2023-04-12 22:48:37 -07:00
Andrej 079df20748
Merge pull request #74 from venusatuluri/fix_decode
Small fix to decode fn in shakespeare_char/prepare.py
2023-04-12 22:45:01 -07:00
Andrej 01e48ec1ab
Merge pull request #240 from YassineYousfi/master
don't dropout in eval mode
2023-04-12 22:43:59 -07:00
Andrej 7840a66859
Merge pull request #54 from MicroPanda123/luv
Give tqdm some love :)
2023-04-12 22:25:18 -07:00
Andrej 8abe215fba
Merge pull request #128 from abrahamsangha/fix-typo
fix typo
2023-04-12 22:24:41 -07:00
Andrej ad62003d7a
Merge pull request #142 from kovkev/patch-1
Fix the position of a comma
2023-04-12 22:24:06 -07:00
Andrej ea24604b29
Merge pull request #220 from python273/patch-1
Fix GPT.crop_block_size when flash attention is available
2023-04-12 22:13:01 -07:00
Andrej 8aeea6d970
Merge pull request #224 from SnehalRaj/patch-1
fix small typo
2023-04-12 22:12:26 -07:00
Andrej 2457471c9c
Merge pull request #236 from ymurenko/master
fix "cuda out of memory" when resuming training
2023-04-12 22:09:42 -07:00
Andrej Karpathy 553f949f46 fix minor bug where we have to scale the loss to account for gradient accumulation, which sums before backprop. note that this is not a major bug because AdamW is scale invariant. however, this did affect gradient clipping 2023-04-13 04:59:11 +00:00
Yassine Yousfi 7399dfe39d dont always dropout! 2023-04-10 22:56:22 -07:00
ymurenko 4ac2e8ce3a fix "cuda out of memory" when resuming training 2023-04-05 17:28:55 -04:00
Snehal Raj c58fc4605c
fix small typo 2023-03-25 20:36:46 +01:00
Otavio Good 978d4fe538 Fix for gradient_accumulation_steps training slow 2023-03-25 00:04:45 -07:00
Kirill c3f254844d
Fix GPT.crop_block_size when flash attention is available 2023-03-24 14:51:02 +03:00
Andrej a82b33b525
Merge pull request #199 from ChristianOrr/patch-1
bugfix in decode function
2023-03-12 13:40:20 -07:00
Christian Orr 36c7db8c44
bugfix in decode function
Return was left out of the decoder, so it didn't work.
2023-03-08 10:16:19 +02:00
Andrej 0d8fbd11ae
Merge pull request #195 from drisspg/enable_sdpa_with_nonzero_dropout
Enable sdpa for nonzero dropout
2023-03-06 21:47:20 -08:00
Driss Guessous 6170531b8a enable sdpa for nonzero dropout 2023-03-05 19:29:29 +00:00
Andrej ae3a8d5fdd
Merge pull request #145 from otaviogood/gradAccumStability
fix for training stability on single GPU
2023-02-14 18:48:54 -08:00
Lutz Roeder 10046a2ec0 Add .gitignore 2023-02-13 13:57:20 -08:00
Otavio Good 086ebe1822 fix for training stability on single GPU 2023-02-13 10:42:44 -08:00
kovkev c2531159c7
Fix the position of a comma 2023-02-11 17:13:24 -08:00
Andrej Karpathy 55c5069696 fix misinformation in readme 2023-02-10 16:34:46 +00:00