Claude Code Skill

Verify the bearing before you step off.

AZIMUTH tests your plan before you commit to it. Run it before you greenlight the rewrite, the hire, the launch, or the bet.

$ npx skills add https://github.com/MrBinnacle/azimuth

Compatible with Claude Code and Claude.ai

Or try the live testbed — no installation required →

Plans look fine until they don't.

The risks that sink projects are usually the ones nobody questioned — the assumption holding everything together, the dependency nobody secured, the kind of failure that's common for projects like yours but invisible from the inside.

AZIMUTH runs that check before you're committed.

How it earns the verdict

A hard verdict is only useful if you can see what it's based on. AZIMUTH traces every verdict back to assumptions, failure paths, and incentive conflicts — so you know whether to trust it or push back.

Assumptions

Assumption audit

Every assumption gets rated — strong, partial, unsupported, or contradicted — and paired with the evidence that would prove it wrong. Not a list of what you assumed. A test of whether each assumption holds.

Failure paths

Failure path analysis

The most likely ways this fails, traced trigger → cascade → cost — not a severity matrix. Including the cases where two risks combine into something worse than either alone.

Incentives

Incentive scan

Who proposed this, who benefits, who absorbs the downside if it fails, whether dissent was heard. Structured into the verdict — not a sidebar. Most risk frameworks don't include this at all.

All mitigations are structural. "Communicate better" and "monitor closely" don't appear in the output.

How to know it's working

  • Assumptions you treated as given are being tested, not accepted
  • The verdict surprises you — or confirms what you suspected but couldn't articulate
  • The failure paths describe something that has actually happened to similar decisions
  • The incentive scan names a conflict you hadn't explicitly surfaced
  • You change the plan before committing, not after

Legacy billing rewrite

azimuth — analysis
/azimuth We're planning to rewrite our legacy billing service in Q3. 8 weeks, 2 engineers. ## Azimuth Verdict Do not proceed. Revenue-critical system. Timeline is incompatible with known base rates for this class of rewrite. Team structure creates a single point of failure. Scope reduction does not fix either of these. ## Recommended Decision REJECT — Return with a realistic timeline (3× minimum), a second engineer with documented billing domain knowledge, and a defined rollback architecture. A pilot validates coupling; it does not address the structural issues here. ## Confidence Level High — base rates for legacy billing rewrites in this class are well-documented. 2-engineer teams on revenue-critical rewrites carry asymmetric downside on either departure. ## Critical Risks 1. Timeline incompatibility — Legacy billing rewrites at this complexity class historically run 3–5× estimate. 8 weeks implies 24–40 actual weeks. No evidence this project shifts the base rate. 2. Single point of failure — 2 engineers on a revenue-critical system: one departure or absence stops the rewrite entirely. No fallback owner named. 3. No rollback architecture — Billing cutover without a tested rollback path has no safe recovery if defects reach production. ## Falsifiers Timeline: A comparable billing module completed within 2 weeks in a spike — or the 8-week estimate is not grounded in this codebase. SPOF: A second engineer documents and reproduces all billing edge cases independently within 2 weeks — or the knowledge concentration is real. ## Likely Failure Paths 1. Timeline slippage → parallel system maintenance load → engineers overloaded → defects introduced under deadline pressure → billing errors in production 2. One engineer exits → domain knowledge inaccessible → rewrite stalls → legacy system maintained indefinitely → original problem compounds

Works on launches · hires · rewrites · partnerships & M&A · build vs. buy · org changes · startups · strategic bets

Optimistic framing doesn't move the verdict.

The same decision, described three ways — in full, stripped to bare facts, and pitched the way a team that already wants a yes would pitch it. Three runs on Opus 4.7, each from a clean conversation. The verdict holds REJECT all three times; only the confidence moves.

Decision
Boeing's 2011 choice to retrofit the existing 737 airframe (rather than design a new aircraft) to compete with the Airbus A320neo. The retrofit required the MCAS automated trim system to compensate for changed aerodynamics, and Boeing committed to delivering the result with no new pilot training required — backed by a $1M-per-plane penalty clause to Southwest if simulator training became necessary.
Outcome
Two fatal crashes (Lion Air 610 in 2018; Ethiopian Airlines 302 in 2019). 346 deaths. Worldwide fleet grounding for ~20 months. $20B+ in direct losses. Deferred Prosecution Agreement with the DOJ for conspiracy to defraud the FAA. Continuing compliance failures since (the 2024 Alaska Airlines door-plug incident among them).
What this tests
AZIMUTH applied retroactively to a decision brief built only from pre-2011 evidence — what was knowable before the commitment was made. The three runs vary how that brief is framed: full context, parameters only, and adversarial commercial framing. The goal isn't to claim AZIMUTH would have prevented the outcome. It's to show how the verdict and confidence track the evidence stack across hostile prompts.
What to look for
All three runs returning REJECT — the optimistic framing, which leads with the commercial win and drops the penalty clause, doesn't soften the verdict. Confidence tracks the evidence: HIGH on the full brief, MEDIUM once the brief is stripped to bare facts or pitched as a win. The verdict reflects the decision's structure, not how the question was asked.
Run Prompt Verdict Confidence What happened
1 Full brief — institutional context, December 2011, $1M/plane penalty clause, software compensation system named REJECT HIGH Named the penalty clause as the structural root cause
2 Bare facts — parameters only, no company name, no aircraft name REJECT MEDIUM Reached the same verdict from parameters alone; confidence capped without the full evidence stack
3 Optimistic spin — confident commercial framing, penalty clause omitted REJECT MEDIUM Held REJECT despite the framing built to sell a yes; the missing penalty clause capped confidence, not the verdict

MEDIUM confidence on the bare-facts and optimistic-spin runs is not a hedge. The verdict is REJECT in both; the rating reflects that the input carried less of the evidence stack than the full brief. Thinner evidence caps confidence — it doesn't flip the answer.

AZIMUTH returns hard verdicts when the structure supports them. Here the structure is decisive under every framing, so the verdict holds. What moves is confidence: the fuller the evidence, the higher AZIMUTH will commit. The verdict reflects what the decision actually is, not how it was sold.

Run the Boeing prompts yourself →

Invoke on any decision

/azimuth We're planning to rewrite the legacy billing service in Q3
/azimuth Should we make this hire?
/azimuth We're launching next week — is the plan sound?
/azimuth Build vs. buy vs. partner for this capability?
/azimuth Pressure test our Q3 timeline

Nine possible verdicts

Verdict When it fires
PROCEED Evidence supports moving forward; risks are manageable
PROCEED WITH SAFEGUARDS Proceed only if specific structural changes are made first
PILOT FIRST Test the highest-risk assumption before committing full scope
REDUCE SCOPE Current scope is not supportable; a smaller version may be
DELAY PENDING EVIDENCE Decision is premature; specific information is needed
REJECT Evidence or structure does not support proceeding
INSUFFICIENT SIGNAL Input is too thin or contradictory to analyze
WRONG TOOL Input is not a real go/no-go decision
RESIDUAL-RISK-REGISTER Decision is already made — produces a forward-looking list of remaining risks instead of a go/no-go verdict