summaryrefslogtreecommitdiff
path: root/csrc
AgeCommit message (Expand)Author
2022-11-06Added blocksizes 2048, 1024, and 512 to blockwise quant.Tim Dettmers
2022-09-13Fixed cpu blockwise quantization for small input tensors.Tim Dettmers
2022-09-11Fixed 2^31 max size issue for cpu blockwise quant.Tim Dettmers
2022-08-23Fixed issue where Pascal was not displaying proper error.Tim Dettmers
2022-08-16Enhanced error handling in CUDA SETUP failures.Tim Dettmers
2022-08-16Added fused bias in dequant_mm.Tim Dettmers
2022-08-16Removed storage() from get_ptr; added boilerplate for bias dequant_mm.Tim Dettmers
2022-08-06Removed faulty asserts.Tim Dettmers
2022-08-04Merge branch 'extract_outliers' into debugTim Dettmers
2022-08-03Added fixes for the case that matmullt dim A is zero, e.g. [0, 768].Tim Dettmers
2022-08-03Added CUDA block assert and is_on_gpu check.Tim Dettmers
2022-07-26Merge branch 'patch_merge' into extract_outliersTim Dettmers
2022-07-26Added col_ampere outlier extraction kernel.Tim Dettmers
2022-07-26Working outlier extraction for Turing.Tim Dettmers
2022-07-26Boilerplate and test for extract_outliers.Tim Dettmers
2022-07-26Fixed cpuonly build.Tim Dettmers
2022-07-25Some progress on build script; added multi-cuda install script.Tim Dettmers
2022-07-25Fixed makefile; fixed Ampere igemmlt_8 bug.Tim Dettmers
2022-07-22Fixed rowcol synchronization bug.Tim Dettmers
2022-07-22Most tests passing.Tim Dettmers
2022-07-01Reduce diffMax Ryabinin
2022-07-01Reduce diffMax Ryabinin
2022-07-01Reduce diffMax Ryabinin
2022-07-01Reduce diffMax Ryabinin
2022-07-01Add a CPU-only build optionMax Ryabinin
2021-11-28Added AdamW. #10 #13Tim Dettmers
2021-11-10Added adagrad with tests (no clipping).Tim Dettmers
2021-10-21Added compilation from source instructions; easier compilation.Tim Dettmers
2021-10-20Added skip_zeros; tests are passing.Tim Dettmers
2021-10-20Initial plumbing for skip_zeros.Tim Dettmers
2021-10-05Initial commitTim Dettmers