BNB 4-bit quantization
Converts full-precision f16 models to BNB NF4 4-bit on CPU, then quantizes layer-by-layer or all at once. A 4B model fits in ~5 GB VRAM. Use --low-vram for 9B on a 7.7 GB card.
Learn more
Qwen3.5 models are often distributed as full-precision VLM checkpoints. This toolkit prepares text-only variants for training and inference: BNB conversion, visual tower strip, verification, and upload.
| Repo | Purpose |
|---|---|
| qwen35-toolkit (this repo) | Model prep — BNB quantization, visual tower strip, verify, upload |
| qwen-qlora-train | LoRA training, adapter inference, CPU merge |