From 5f95b5253f4936080479c909724601b342da1c18 Mon Sep 17 00:00:00 2001 From: Tim Dettmers Date: Thu, 7 Oct 2021 09:54:34 -0700 Subject: Updated readme with latest changes. --- howto_config_override.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 howto_config_override.md (limited to 'howto_config_override.md') diff --git a/howto_config_override.md b/howto_config_override.md new file mode 100644 index 0000000..11e9d49 --- /dev/null +++ b/howto_config_override.md @@ -0,0 +1,26 @@ +# How to override config hyperparameters for particular weights/parameters + +If you want to optimize some unstable parameters with 32-bit Adam and others with 8-bit Adam, you can use the `GlobalOptimManager`. With this, we can also configure specific hyperparameters for particular layers, such as embedding layers. To do that, we need two things: (1) register the parameter while they are still on the CPU, (2) override the config with the new desired hyperparameters (anytime, anywhere). See our [guide](howto_config_override.md) for more details + +```python +import torch +import bitsandbytes as bnb + +mng = bnb.optim.GlobalOptimManager.get_instance() + +model = MyModel() +mng.register_parameters(model.parameters()) # 1. register parameters while still on CPU + +model = model.cuda() +# use 8-bit optimizer states for all parameters +adam = bnb.optim.Adam(model.parameters(), lr=0.001, optim_bits=8) + +# 2a. override: the parameter model.fc1.weight now uses 32-bit Adam +mng.override_config(model.fc1.weight, 'optim_bits', 32) + +# 2b. override: the two special layers use +# sparse optimization + different learning rate + different Adam betas +mng.override_config([model.special.weight, model.also_special.weight], + key_value_dict ={'is_sparse': True, 'lr': 1e-5, 'betas'=(0.9, 0.98)}) +``` +Possible options for the config override are: `betas, eps, weight_decay, lr, optim_bits, min_8bit_size, percentile_clipping, block_wise, max_unorm` -- cgit v1.2.3