<claudexml/>
Complex · advanced

Extract → validate → transform pipeline in one call

Four-stage data pipeline: extract raw fields, validate against rules, transform to target shape, emit errors.

Production ETL where you can't afford a downstream parse failure: extract, validate at the point of extraction, transform, and surface validation errors as structured data rather than as malformed output.

The prompt

Copy this verbatim. Replace the {{ … }} placeholders with your values.

<instructions>
Process the input through four stages. Output each in its own tag.

1. <extracted>
   Raw fields pulled verbatim from <input>. Use null when a field is absent.
   Shape: { "<field_name>": "verbatim value or null" }

2. <validation>
   Apply the rules in <rules>. For each rule, output:
   { "rule": "rule_id", "passed": true|false, "field": "field_name", "note": "string or null" }

3. <transformed>
   Only if all required-tier rules passed: emit the canonical shape in <target_schema>.
   Otherwise: emit null.

4. <errors>
   List every failed validation, with rule_id and a human-readable message.
   Empty array if validation fully passed.

Output the four tags in order. Do not skip any. Do not include prose outside the tags.
</instructions>

<rules>
{{ validation_rules }}
</rules>

<target_schema>
{{ target_schema }}
</target_schema>

<input>
{{ input_data }}
</input>

Sample input

rules: R1 (required) email present and valid format; R2 (required) age is integer 13–120; R3 (warn) phone E.164 format
target_schema: { user_id (uuid), email (lowercased), age (int), phone (E.164 or null) }
input: 'Sarah, [email protected], age 29, phone 415-555-0142'

Expected output

<extracted>
{ "name": "Sarah", "email": "[email protected]", "age": "29", "phone": "415-555-0142" }
</extracted>
<validation>
[
  { "rule": "R1", "passed": true,  "field": "email", "note": null },
  { "rule": "R2", "passed": true,  "field": "age",   "note": "converted from string" },
  { "rule": "R3", "passed": false, "field": "phone", "note": "not E.164; missing +1 prefix" }
]
</validation>
<transformed>
{ "user_id": "(server-generated)", "email": "[email protected]", "age": 29, "phone": null }
</transformed>
<errors>
[ { "rule": "R3", "field": "phone", "message": "Phone '415-555-0142' is not E.164. Stored as null." } ]
</errors>

Notes & tuning tips

  • Putting validation between extract and transform lets the model self-gate: if required rules fail, is null and your downstream code never sees garbage.
  • Rules and target schema as / tags makes the prompt reusable across record types — only swap the parameters.
  • Don't ask the model to be the source of truth for IDs (uuid, timestamps); generate those server-side after a successful transform.
  • For very high-volume pipelines, this is ~3× the cost of bare extraction. Worth it when the alternative is silent data corruption.

What this example uses

Tags: <instructions> <format>

Patterns: structured output

Cite this page
Extract → validate → transform pipeline in one call. claudexml.com. https://claudexml.com/examples/extract-validate-transform/