Target hardware: NVIDIA RTX 4090 (24 GB VRAM), Linux (Ubuntu/Pop!_OS),
CUDA 12.8 or newer driver. Python 3.12 in a dedicated venv. Disk:
~7 GB for the venv + ~35 GB for both BF16 model weights.
1. Clone the upstream repo
git clone https://github.com/HiDream-ai/HiDream-O1-Image.git
cd HiDream-O1-Image
2. Create a dedicated Python 3.12 venv
python3.12 -m venv venv
source venv/bin/activate
pip install --upgrade pip
3. Install torch (pin to <2.9)
PyTorch 2.9.x has a Qwen3-VL regression
(QwenLM/Qwen3-VL#1811)
that the upstream README explicitly warns about. Pin below 2.9:
pip install "torch<2.9" --index-url https://download.pytorch.org/whl/cu128
4. Install the upstream requirements
pip install -r requirements.txt
pip install "huggingface-hub>=0.24" safetensors accelerate
5. (Optional) flash-attn
Flash attention is preferred but optional. On CUDA 13 hosts it often fails
to build; the pipeline falls back to standard attention with a small (~15%)
speed penalty.
pip install flash-attn --no-build-isolation \
|| sed -i 's/"use_flash_attn": True/"use_flash_attn": False/' models/pipeline.py
6. Download BF16 weights
Use drbaph’s republished BF16 weights — about 17.6 GB per variant.
Pick a cache root with plenty of free space (the model is too big for a
home directory on most laptops):
export HF_HOME=/big/disk/hugging_face_cache
huggingface-cli download drbaph/HiDream-O1-Image-Dev-BF16
huggingface-cli download drbaph/HiDream-O1-Image-BF16
Do not use the FP8 variants (drbaph/HiDream-O1-Image-FP8 /
-Dev-FP8). They’re packaged for the Saganaki22 ComfyUI custom node,
which dequantizes Float8_e4m3fn to BFloat16 at runtime.
The upstream pipeline does no such dequant and crashes inside torch.where
with “Promotion for Float8 Types is not supported, attempted to promote
BFloat16 and Float8_e4m3fn”.
7. Verify CUDA
python -c "import torch; assert torch.cuda.is_available(); print(torch.cuda.get_device_name(0))"
8. Run inference
The upstream inference.py is an argparse wrapper around
models.pipeline.generate_image(). You can use it directly:
python inference.py \
--model_path /big/disk/hugging_face_cache/hub/models--drbaph--HiDream-O1-Image-Dev-BF16/snapshots/<rev>/ \
--model_type dev \
--prompt "a photo of a cat on a wooden floor" \
--output_image cat.png
For editing or IP generation, swap --model_type full and pass
--ref_images path1 path2 .... The upstream README lists complete
examples for all five modes.