Assistant-only loss masking
Gradients only where they belong. Masking is computed in character space before tokenization — works correctly with any subword vocabulary.
Learn more
Practical QLoRA workflow for smaller setups. Focused on clear dataset handling, controlled truncation, and reproducible training/inference steps.
| Repo | Purpose |
|---|---|
| qwen35-toolkit | Model prep — BNB quantization, visual tower strip, verify, upload |
| qwen-qlora-train (this repo) | LoRA training, adapter inference, CPU merge |
⚠️ Validated training on RTX 3070 8 GB currently covers Qwen3 1.7B and 4B (see Quickstart). Qwen3 8B OOMs on unsloth 2026.3.4+, and sizes above 4B should be treated as experimental on this hardware class.