<claudexml/>
Complex · advanced

Tree-of-thought reasoning with branch scoring

Explore three reasoning paths, score each against criteria, pick the winner — all in one prompt.

Problems with multiple plausible approaches (debugging, design choices, ambiguous puzzles) where the first path the model commits to isn't always the best. Explicit branching forces exploration before commitment.

The prompt

Copy this verbatim. Replace the {{ … }} placeholders with your values.

<instructions>
For the problem in <problem>, explore three distinct approaches before committing.

Output the following tags in order:

<approach_1>
  <reasoning>your step-by-step thinking for this approach</reasoning>
  <conclusion>what this approach would yield</conclusion>
</approach_1>

<approach_2>...</approach_2>
<approach_3>...</approach_3>

<scoring>
For each approach, score on:
- correctness (1–5): does it actually solve the problem?
- simplicity (1–5): how easy is it to verify / explain?
- robustness (1–5): does it hold up on edge cases?

Show scores as a table.
</scoring>

<choice>
State which approach wins and why, in one paragraph.
</choice>

<final>
Produce the chosen approach's final answer, polished.
</final>

Rules:
- Approaches must be genuinely distinct, not minor variations of one idea.
- Score each approach independently; don't anchor on approach 1.
- If two approaches tie, prefer simplicity over cleverness.
</instructions>

<problem>{{ problem_text }}</problem>

Sample input

Problem: Our service returns 500 errors intermittently under load. We have logs, metrics, and traces. What's the most efficient debugging approach?

Expected output

<approach_1>
<reasoning>Start with logs. Grep for 500-correlated stack traces; group by exception type...</reasoning>
<conclusion>Identifies the failing code path but doesn't tell us why it fails only under load.</conclusion>
</approach_1>
<approach_2>
<reasoning>Start with metrics. Plot 500 rate vs. concurrency, GC pause, connection pool size...</reasoning>
<conclusion>Surfaces the systemic trigger (likely connection exhaustion) but not the failing call path.</conclusion>
</approach_2>
<approach_3>
<reasoning>Start with traces. Find a sampled trace of a failing request; walk every span...</reasoning>
<conclusion>Ties the failing call path to the systemic trigger in one artifact.</conclusion>
</approach_3>
<scoring>
| Approach   | Correctness | Simplicity | Robustness |
|------------|-------------|------------|------------|
| Logs       | 3           | 5          | 3          |
| Metrics    | 3           | 4          | 4          |
| Traces     | 5           | 3          | 5          |
</scoring>
<choice>
Approach 3 wins: it gives both the where (call path) and the why (systemic conditions) in one
artifact, which the other two only give in isolation. Slightly harder to set up if tracing
isn't already on, but available here.
</choice>
<final>
[full debugging playbook using traces as the primary tool]
</final>

Notes & tuning tips

  • The forced "three distinct approaches" constraint is what stops the model from collapsing into one path with three rewordings.
  • Score table makes the choice auditable — useful when a human reviewer wants to overrule the model.
  • Costs ~3× a direct-answer prompt in tokens. Reserve for genuinely ambiguous problems.
  • Pair with chain-of-thought () inside each approach for harder problems.

What this example uses

Tags: <instructions>

Patterns: chain of thought

Cite this page
Tree-of-thought reasoning with branch scoring. claudexml.com. https://claudexml.com/examples/tree-of-thought/