Skip to content

Post-merge workflow

After qlora-merge, you have a standalone fp16 model directory. Use this workflow to produce GGUF artifacts and publish them.

Prerequisites

  • A successful qlora-merge output directory.
  • qwen35-toolkit installed.
  • llama.cpp tools available for GGUF conversion/quantization.

Steps

Step 1 — Convert merged fp16 model to GGUF f16

Create a GGUF base file from merged weights.

StepToolWhat it produces
Convert to GGUFllama.cpp/convert_hf_to_gguf.pymodel-F16.gguf

Success criteria: model-F16.gguf exists.

Step 2 — Quantize GGUF

Generate inference-friendly quant files.

StepToolWhat it produces
Quantizellama-quantizeQ4_K_M · Q5_K_M · Q6_K · Q8_0

Success criteria: at least one quantized GGUF file (commonly Q4_K_M) is created.

Step 3 — Upload artifacts to Hugging Face Hub

Publish GGUF output directory.

StepToolWhat it produces
Uploadqwen35-uploadHuggingFace Hub repo

Success criteria: Hub repo contains expected GGUF files.

Expected result

  • Merged fp16 model is converted to GGUF f16.
  • Quantized GGUF variants are generated.
  • Final artifacts are published to Hub.

Common failures

  • Conversion fails on wrong input path -> verify merge output directory first.
  • llama-quantize not found -> build llama.cpp binaries before running step 2.
  • Hub push fails -> rerun hf auth login or pass --hf-token.

GGUF conversion command reference

See qwen35-toolkit -> GGUF conversion for exact command lines.

Released under the Apache 2.0 License.