Framework Validation

AI failures are context failures.
Not model failures.

Most engineers blame the AI when output is wrong. The actual problem is almost always incomplete context. The key skill is context construction, not prompt phrasing. This is a framework for training engineering teams to use AI systematically — not ad hoc.

You are reviewing a framework, not a course. We want disagreement, gaps, and pushback — not agreement.

01 Core Thesis

What we believe — and why it matters

When AI produces wrong output, the instinct is to rephrase the prompt. The actual failure is almost always upstream: missing constraints, absent boundaries, relevant code never shared, prior decisions never stated. Fixing the prompt without fixing the context produces the same failure, differently worded.

Context failures, not model failures

When AI is wrong, the problem is almost never the model. It is the context handed to it.

Construction, not phrasing

The skill is assembling the right context — constraints, boundaries, relevant code, prior decisions — not finding the right wording.

Depth scales with criticality

Exploratory tasks need light context. Production-bound tasks require a full context pack. Not every prompt is the same.

Output is a proposal, not a solution

AI output requires evaluation before action — always. Treating it as a solution skips the most important step.

Workflow integration, not ad hoc use

AI used randomly produces random results. It must be integrated into how engineers actually work — PRs, planning, debugging, architecture.

Train preparation and evaluation

Training should focus on context preparation, task structuring, and output evaluation. Not prompt techniques.

02 What This Looks Like

The same task. Two different outcomes.

This is not about prompt length. It is about what the AI needs to not make dangerous assumptions.

✕ Without structured context

Add input validation to the registration endpoint. Reject invalid emails and short passwords.

AI invents its own validation logic, ignores existing utilities
Error format doesn't match the established schema
Potentially touches layers it should not
Output looks complete — and breaks in review

✓ With structured context

Task: Add input validation to registration endpoint.

Relevant code: [controller] [validation utility] [error schema]
Constraints: Use existing utility. Follow error schema. Don't touch user model.
Boundary: Controller layer only. Auth handled downstream.
Prior decisions: Standardised on utility in v2.1. Custom validators rejected.

AI uses what exists, not what it invents
Error format matches the schema
Stays within the controller layer
Output is usable with minimal correction

↑

Less rework

Structured context eliminates the most common failure modes before the prompt is sent.

↓

Faster reviews

Output that respects boundaries and existing patterns clears review faster and with fewer cycles.

→

Front-loaded cost

Two minutes of preparation replaces thirty minutes of debugging. The overhead moves, it does not increase.

03 Framework

Three layers. Not a sequence — a set of lenses.

Engineers don't work in a clean sequence. They jump in mid-task, hit legacy code with no docs, or pick up context from another engineer. These layers are not steps — they are lenses applied based on where you are in a task.

Layer 01

Context Preparation

Modules 1, 2, 4

Understanding, assembling, and calibrating the context AI needs to operate within your system — not a generic system. Too little context produces incorrect assumptions. Too much buries the signal.

This layer also covers diagnosis: when output is wrong, the first check is whether the failure is a context problem, a task structure problem, or a model limitation.

M1 — Context Preparation M2 — Assembling the Right Context M4 — Identifying Missing Context

Layer 02

Task Structuring

Module 3

Converting assembled context into a scoped, executable task. This is where context becomes a prompt — not before. A poorly structured task produces bad output even with complete context.

M3 — Structuring the Task Before Prompting

Layer 03

System Continuity

Modules 5, 6

Context failures are not only individual — they are organizational. Half the problem is context that lives in people's heads. This layer addresses maintaining architectural alignment and documenting non-obvious decisions so AI remains coherent across sessions, team members, and time.

M5 — Maintaining Architecture Context M6 — Documenting Non-Obvious Decisions

Cross-cutting

Output evaluation is continuous — not a final step. Every layer involves assessing whether AI output is a valid proposal before acting on it.

This is not prompt engineering. It is not a set of content tips. It is a structured approach to integrating AI into engineering workflows that already exist — PRs, planning, debugging, architecture decisions.

04 Module 1 — Sample

Context Preparation Before AI Interaction

Think of it this way: you are briefing a highly capable contractor who has never seen your system. They are fast, but they only know what you tell them. If your brief is incomplete, their output will be confidently wrong.

// What AI actually needs { relevant_code: "specific files, functions, or modules the task touches", constraints: "what must not change, what patterns must be followed", system_boundary: "what this component owns, hands off, depends on", prior_decisions: "why it's built this way, what alternatives were rejected" } // What AI does not need { every_file: "noise, not signal", business_domain: "unless directly relevant to the task", prior_context: "already established in this conversation" }

Before every non-trivial prompt, answer:

Pre-Prompt Checklist

Which files are directly involved?

Not adjacent files. Not the whole module. The specific code this task touches.

What constraints apply?

What must not change. What patterns must be followed. What the system cannot do.

What has already been decided that the AI should not question?

Prior decisions, rejected alternatives, architectural commitments.

What are the boundaries of what can be changed?

What this task owns. What it hands off to. What it must not touch.

What can I safely exclude?

What is unrelated. What has already been established in this conversation.

Context preparation is also iterative. When output is wrong, diagnose what was missing or unclear before re-prompting — do not just rephrase the same request.

05 For Reviewers

We want disagreement, not agreement.

This is a framework validation exercise. The goal is to find gaps, weaknesses, and real-world failure points — before investing in full content or build. Sharp pushback is more valuable than polite endorsement.

What do you fundamentally disagree with in the core thesis?

What is missing from the framework that would make it incomplete or wrong?

Where would this fail in your team or organisation?

Does this feel meaningfully different from prompt engineering — or is that a distinction without a difference?

Would you adopt this framework for training your engineers? What would need to change before you would?

AI failures are context failures.Not model failures.