Skip to content

复现过程发生OOM #5

@heiseon

Description

@heiseon

在复现过程中发生OOM,可能是什么原因?我应该如何设置参数?

详细情况:
GPU: 2*4090卡
Base:Model| Qwen/Qwen2.5-0.5B
Data:xiaodongguaAIGC/X-R1-750
Config:X_R1_zero_0dot5B_config.yaml 修改了num process=1,其他的不变
执行命令:
ACCELERATE_LOG_LEVEL=info
accelerate launch
--config_file recipes/zero1.yaml
--num_processes=1
src/x_r1/grpo.py
--config recipes/X_R1_zero_0dot5B_config.yaml

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions