Skip to content

Hardware

Purpose

Provide practical RAM/VRAM requirements for the toolkit workflow. Numbers are based on runs performed on an RTX 3070 Laptop (7.7 GB VRAM).

When to use

  • Before selecting model size and conversion path.
  • Before deciding whether to use --low-vram.
  • When checking whether an operation is CPU-only or GPU-required.

Resource requirements

OperationVRAMRAM
BNB quantization (4B, default)~5 GB~18 GB
BNB quantization (--low-vram)~200-400 MB peak~18 GB
f16 model load for strip/verifynone~18 GB
GGUF conversion (f16 -> GGUF)none~18 GB
GGUF Q4_K_M inference (4B)~5 GB

Minimum guidance by operation

OperationMinimum VRAMMinimum RAM
Quantize 0.8B / 2B / 4B~5 GB~18 GB
Quantize 9B~200-400 MB peak (--low-vram)~36 GB
Strip / GGUF convertnoneapproximately model size x 2
Verify 4B text-only~3 GB~8 GB

Interpretation notes

  • RAM is often the primary constraint because full-precision source weights are staged in CPU memory.
  • VRAM is mainly needed for quantization and GPU inference.
  • --low-vram trades runtime for reduced VRAM peak.

Typical hardware scenarios

Less than 8 GB VRAM

  • Use --low-vram for quantization.
  • Strip and GGUF conversion remain CPU/RAM operations.

No GPU

  • BNB quantization requires CUDA.
  • Strip, verify (CPU fallback), and GGUF conversion can run on CPU.

More than 8 GB VRAM

  • Default quantization path is usually simpler/faster.
  • Larger model sizes still require adequate RAM.

Edge cases / limitations

WARNING

Even with low VRAM usage, insufficient RAM can block quantization/convert flows. Check RAM first when planning 9B operations.

  • Verify can fall back to CPU when CUDA move fails.
  • Published numbers are empirical and depend on driver/runtime overhead.