Add Automatic Mixed Precision option for training and evaluation. #199

alexander-soare · 2024-05-20T12:02:32Z

What this does

As titled.
Side change: Some tweaks to end-to-end test params to compensate for the extra time added here (reducing model sizes)

How it was tested

CI += end-to-end testing for training and eval with AMP.

How to checkout & try? (for the reviewer)

Try training with use_amp = false/true. I tried this with ACT/Aloha and saw a reduction in training time and memory usage.

Try evaluating with AMP:

python lerobot/scripts/eval.py -p lerobot/diffusion_pusht eval.n_episodes=50 eval.batch_size=50 eval.use_async_envs=true env.episode_length=300 +use_amp=true

For me, it took 61 seconds without AMP and 48 seconds with AMP.

Finally, I trained ACT/aloha_sim_transfer_cube_human with the same recipe as https://huggingface.co/lerobot/act_aloha_sim_transfer_cube_human but only to 25k iters. I evaluated 50 episodes and matched the the success rate of the baseline at 30k iters: ie ~56% (wandb run here, but I used the wrong simulation env during training https://wandb.ai/alexander-soare/lerobot/runs/vrrckq4p?nw=nwuseralexandersoare)

Cadene · 2024-05-20T17:25:55Z

lerobot/configs/default.yaml

@@ -10,6 +10,9 @@ hydra:
    name: default

 device: cuda  # cpu
+# `use_amp` determines whether to use Automatic Mixed Precision (AMP) for training and evaluation. With AMP,
+# automatic gradient scaling is used.
+use_amp: false


Should we set it to true by default?

Suggested change

use_amp: false

use_amp: true

Maybe in a second PR once all our models on the hub are fp16 checkpoints?

Yes I agree we should do it next. Good suggestion.

Cadene · 2024-05-20T17:27:03Z

lerobot/scripts/train.py

-        pin_memory=cfg.device != "cpu",
+        pin_memory=device.type != "cpu",


Cadene

:D

…ggingface#199)

wip

304f83f

alexander-soare marked this pull request as draft May 20, 2024 12:02

Merge remote-tracking branch 'upstream/main' into use_amp

736d38b

Cadene assigned alexander-soare May 20, 2024

alexander-soare added 2 commits May 20, 2024 13:13

Merge remote-tracking branch 'upstream/main' into use_amp

b059759

add amp to eval script

39b6fcb

alexander-soare added the ⚡️ Performance Performance-related label May 20, 2024

backup wip

f40cede

alexander-soare marked this pull request as ready for review May 20, 2024 13:42

Merge branch 'main' into use_amp

4f0b429

alexander-soare requested a review from Cadene May 20, 2024 16:35

Cadene approved these changes May 20, 2024

View reviewed changes

Cadene requested changes May 20, 2024

View reviewed changes

Cadene approved these changes May 20, 2024

View reviewed changes

alexander-soare added 2 commits May 20, 2024 18:48

Merge remote-tracking branch 'upstream/main' into use_amp

dc244b0

Merge remote-tracking branch 'origin/use_amp' into use_amp

d7934a2

alexander-soare merged commit b6c216b into huggingface:main May 20, 2024
5 checks passed

alexander-soare deleted the use_amp branch May 20, 2024 17:58

HalvardBariller pushed a commit to HalvardBariller/lerobot that referenced this pull request May 21, 2024

Add Automatic Mixed Precision option for training and evaluation. (hu…

c5f485b

…ggingface#199)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Automatic Mixed Precision option for training and evaluation. #199

Add Automatic Mixed Precision option for training and evaluation. #199

alexander-soare commented May 20, 2024 •

edited

Cadene May 20, 2024

Cadene May 20, 2024

alexander-soare May 20, 2024

Cadene May 20, 2024

Cadene left a comment •

edited

		pin_memory=cfg.device != "cpu",
		pin_memory=device.type != "cpu",

Add Automatic Mixed Precision option for training and evaluation. #199

Add Automatic Mixed Precision option for training and evaluation. #199

Conversation

alexander-soare commented May 20, 2024 • edited

What this does

How it was tested

How to checkout & try? (for the reviewer)

Cadene May 20, 2024

Choose a reason for hiding this comment

Cadene May 20, 2024

Choose a reason for hiding this comment

alexander-soare May 20, 2024

Choose a reason for hiding this comment

Cadene May 20, 2024

Choose a reason for hiding this comment

Cadene left a comment • edited

Choose a reason for hiding this comment

alexander-soare commented May 20, 2024 •

edited

Cadene left a comment •

edited