Skip to content

Config reference

Purpose

Reference for all TrainConfig fields and defaults. Any YAML field not set explicitly falls back to the default listed here.

When to use

  • While creating a new training config.
  • When checking which settings are safe on limited hardware.
  • When validating reasoning/truncation/loss settings.

Config load behavior

text
1. Token resolution order:
   - `--hf_token` CLI flag
   - `HF_TOKEN` environment variable
   - YAML `hf_token` field
2. Default resolution:
   - A default is applied only when the field is omitted in YAML.

Schema (YAML shape)

yaml
run_name: "run"
model_name: "unsloth/Qwen3-4B-bnb-4bit"
dataset_id: "your-hf-username/your-dataset"
max_length: 2048

Options by section

Identity

FieldDefaultDescription
run_name"run"Subdirectory name under output_dir/ and adapter_base_dir/
output_dir"outputs"Root dir for trainer checkpoints and logs
adapter_base_dir"adapters"Root dir for saved LoRA adapter
hf_tokennullHF access token (prefer HF_TOKEN env var)

Model

FieldDefaultDescription
model_name"unsloth/Qwen3-4B-bnb-4bit"HF repo id or local path
chat_template"qwen3"Unsloth template key (qwen3 works for Qwen3 and Qwen3.5)

Valid chat_template keys commonly used:

KeyUse for
"qwen3"Qwen3 and Qwen3.5
"qwen3-thinking"Qwen3 with explicit thinking template
"qwen3-instruct"Qwen3 instruct without thinking
"qwen-2.5"Qwen2.5 family

Dataset

FieldDefaultDescription
dataset_id""HF dataset id or local path (required)
dataset_split"train"Split passed to load_dataset
messages_field"messages"Column with conversation list
dataset_schema"auto"auto / messages / prompt_response

Reasoning / thinking

FieldDefaultDescription
reasoning_field"reasoning_content"Canonical field for <think> content
reasoning_keysnullAlternative keys normalized into reasoning_field
extract_think_tagstrueExtract inline <think>...</think> from assistant content
think_mode"keep"keep preserve reasoning, drop remove reasoning
think_max_tokens0Cap tokens per reasoning block (0 = no cap)
think_role"think"Role name for separate-think-message datasets
think_loss"all"all / answer_only / answer_plus_think

think_loss behavior:

ValueGradient scope
allFull assistant span (<think> + answer)
answer_onlyTokens after </think>
answer_plus_thinkThink content + answer (excluding literal tags)

Sequence / truncation

FieldDefaultDescription
max_seq_length2048Positional allocation; set >= max_length
max_length2048Max tokens/sample; primary VRAM lever
truncate_side"left"Fallback token-level truncation side

Precision / hardware

FieldDefaultDescription
load_in_4bittrueBNB NF4 quantization
attn_implementation"sdpa"sdpa or flash_attention_2
fp16trueFloat16 path (commonly Qwen3)
bf16falseBFloat16 path (commonly Qwen3.5)

LoRA

FieldDefaultDescription
lora_r16LoRA rank
lora_alpha32Scaling factor (commonly 2 * lora_r)
lora_dropout0.0LoRA dropout
lora_target_modulesnullnull -> all 7 projection layers
gradient_checkpointing"unsloth"unsloth / true / false

Training

FieldDefaultDescription
per_device_train_batch_size1Keep at 1 for limited VRAM setups
gradient_accumulation_steps8Effective batch = batch size x accumulation
learning_rate2e-4Peak LR for AdamW
warmup_ratio0.05LR warmup fraction
max_steps1000Total optimizer steps
logging_steps20Log interval
save_steps200Checkpoint interval
seed3407Random seed
optim"adamw_8bit"adamw_8bit or adamw_torch

Loss masking

FieldDefaultDescription
assistant_rolesnullRoles that carry loss (null -> ["assistant"])
drop_if_no_assistanttrueDrop samples with no assistant turn

Validation rules

text
1. Required dataset source:
   - `dataset_id` must be non-empty.
2. Precision flags:
   - `fp16` and `bf16` should not be enabled at the same time.
3. Sequence budget:
   - high `max_length` increases VRAM pressure and can trigger OOM on 8 GB GPUs.
4. Positional allocation:
   - `max_seq_length` should be >= `max_length`.

Edge cases / limitations

WARNING

dataset_id is required. Empty dataset_id cannot be resolved into a train dataset.

  • fp16 and bf16 should not be enabled simultaneously.
  • Aggressive max_length on 8 GB VRAM can cause OOM.

Released under the Apache 2.0 License.