Reasoning / thinking control
Purpose
Control how reasoning (<think>...</think>) is kept, removed, and trained. This page covers think_mode, think_loss, and think_max_tokens behavior.
When to use
- You want answer-only behavior for agent/tool datasets.
- You want to preserve full reasoning traces for reasoning datasets.
- You need to cap reasoning token length for memory/perf constraints.
Syntax (config fields)
yaml
think_mode: keep # keep | drop
think_loss: answer_only # all | answer_only | answer_plus_think
think_max_tokens: 0 # 0 = no capOptions
think_mode
| Value | Effect |
|---|---|
keep | Keep reasoning content (with optional cap) |
drop | Remove reasoning content entirely |
think_loss
| Value | Gradient scope |
|---|---|
all | Full assistant span: think + answer |
answer_only | Tokens after </think> only |
answer_plus_think | Think content + answer, excluding literal tags |
think_max_tokens
0means no cap.- Positive values cap reasoning token count per message.
Decision flow
text
1. Decide content retention (`think_mode`):
- `keep` to preserve reasoning content.
- `drop` for answer-only datasets/behavior.
2. Decide gradient scope (`think_loss`):
- `all`, `answer_only`, or `answer_plus_think`.
3. Decide cap (`think_max_tokens`):
- `0` for unlimited.
- positive value to constrain long reasoning traces.
4. Validate against eval target:
- train/eval behavior should match (answer-only vs full reasoning).Examples
yaml
# Agent/tool training: keep think text, train only final answer
think_mode: keep
think_loss: answer_onlyyaml
# Full reasoning training
think_mode: keep
think_loss: allyaml
# Pure answer-style behavior
think_mode: drop
think_loss: allyaml
# Reasoning cap example
think_max_tokens: 512Edge cases / limitations
WARNING
Aggressive think_max_tokens can truncate reasoning mid-thought and teach abrupt stop patterns. Prefer cleaner datasets over heavy truncation when possible.
think_losshas no practical effect on reasoning text whenthink_mode: drop.- Keep reasoning settings aligned with your evaluation target (answer-only vs full reasoning).