AI Newsletter Digest improvements: fixed QP soft line break decoding, URL extraction, and content cleaning

2026-03-04 13:29:22 +00:00
parent 29a98137a7
commit 57dd294675
13706 changed files with 2114953 additions and 237629 deletions
--- a/archive/inactive-skills/codex-conductor/references/codex-runbook.md
+++ b/archive/inactive-skills/codex-conductor/references/codex-runbook.md
@@ -0,0 +1,79 @@
+# Coding-Agent Runbook (PTY + Background)
+
+This orchestrator MUST delegate implementation tasks to a coding agent.
+Do not hand-code feature work directly when the skill is active.
+
+Supported agents:
+- `codex`
+- `claude`
+- `opencode`
+- `pi`
+
+## First Rule
+
+At skill start, ask:
+1) Which coding agent should run tasks?
+2) Which fallback agent should be used if primary fails?
+
+## Launch Patterns
+
+### Codex
+```bash
+codex exec --full-auto "<gate task prompt>"
+```
+
+### Claude
+```bash
+claude "<gate task prompt>"
+```
+
+### OpenCode
+```bash
+opencode run "<gate task prompt>"
+```
+
+### Pi
+```bash
+pi -p "<gate task prompt>"
+```
+
+OpenClaw execution recommendation:
+- `pty:true` for interactive CLIs
+- `background:true` for long-running work
+- `workdir:<project-root>`
+
+## Required Orchestration Loop
+
+1. Generate gate prompt (`generate_gate_prompt.py`).
+2. Execute selected coding agent with that prompt (`agent_exec.py` or equivalent).
+3. Require coding agent to update docs immediately after task completion:
+   - docs/tasks.md
+   - docs/progress.md
+   - docs/change-log.md
+   - docs/traceability.md
+   - docs/test-results.md
+   - docs/agent-handoff.md
+4. OpenClaw agent runs verification itself:
+   - CLI checks in terminal
+   - Browser/manual checks for web journeys
+5. If validation fails:
+   - summarize issue clearly with command/flow + output
+   - re-spawn coding agent with fix prompt (same task/spec)
+   - require docs updates again
+   - re-test
+6. Only then update gate status.
+
+## Manual Review Responsibility
+
+Even in autonomous mode, the OpenClaw agent performs manual verification itself:
+- Web/UI flows: run in browser tools, test critical journeys.
+- CLI flows: run required commands in terminal and inspect outputs.
+
+If checks fail, send concrete failure details to coding agent, request fix, and retest.
+
+## Completion Wake Pattern
+
+For long runs, require coding agent wake messages to include task + verification handoff.
+
+"When fully done, run:
+openclaw gateway wake --text 'Done: <gate> | task: <summary> | handoff: see docs/agent-handoff.md for CLI+Browser checks' --mode now"
--- a/archive/inactive-skills/codex-conductor/references/gate-checklists.md
+++ b/archive/inactive-skills/codex-conductor/references/gate-checklists.md
@@ -0,0 +1,76 @@
+# Gate Checklists
+
+## G0 Intake Complete
+- Planning questionnaire started
+- Mission, scope, journeys captured
+- project_mode and execution_mode selected
+
+## G1 Planning Approved
+- Requirements testable and clear
+- **Specs created for all v1 features** (in `docs/specs/` or `docs/requirements.md`)
+- **Each spec has acceptance criteria (testable, not vague)**
+- Definition of Done captured
+- Acceptance tests drafted
+- Risks and assumptions listed
+
+## G2 Architecture Approved
+Common:
+- architecture doc updated
+- ADR-0001 completed with alternatives
+- **ADRs reference relevant specs**
+- test strategy and security baseline included
+- **`docs/specs/` directory exists with feature specs**
+
+Greenfield preconditions:
+- bootstrap architecture is complete
+- **At least one feature spec approved**
+
+Brownfield preconditions:
+- as-is architecture complete
+- system inventory + dependency map complete
+- characterization baseline exists
+- migration plan + compatibility matrix complete
+- **Existing behavior documented before changes specced**
+
+## G3 Slice-1 Build Verified
+- **Task references spec section** (e.g., `Spec: specs/auth.md#login`)
+- first vertical slice implemented
+- **Implementation matches spec acceptance criteria**
+- unit tests pass for slice
+- integration test for key path passes
+- manual smoke path passes
+- docs updated
+
+## G4 Full Build Verified
+- **All tasks in g4-task-plan.md have spec references**
+- lint/type/build pass
+- unit + integration suite pass
+- e2e critical paths pass
+- contract checks pass (if API boundaries exist)
+- migration checks pass (brownfield)
+- **All spec acceptance criteria verified**
+
+## G5 Security & Quality Verified
+- secret scanning baseline
+- dependency vulnerability baseline
+- auth/authorization checks
+- input validation checks
+- error handling/logging checks
+- performance smoke checks
+
+## G6 Release Candidate Verified
+- release checklist complete
+- rollback instructions tested/validated
+- monitoring/alerts configured
+- open risks acknowledged
+
+## G7 Production/Handover Complete
+- post-deploy smoke passes
+- handover notes complete
+- incident/runbook notes complete
+- backlog of follow-ups created
+
+## State Transitions
+Allowed: PENDING -> IN_PROGRESS -> PASS/FAIL/BLOCKED
+- FAIL requires evidence + remediation plan
+- BLOCKED requires owner + unblock condition
--- a/archive/inactive-skills/codex-conductor/references/gate-prompts.md
+++ b/archive/inactive-skills/codex-conductor/references/gate-prompts.md
@@ -0,0 +1,168 @@
+# Gate Prompt Templates (Codex)
+
+Copy/adapt these templates per gate. Keep prompts explicit and evidence-oriented.
+
+## Common Prompt Header
+
+```text
+You are implementing Gate <GATE_ID> for this project.
+
+Constraints:
+- Follow AGENTS.md workflow rules exactly.
+- Update documentation after every meaningful change.
+- Run required validations and report evidence.
+- Do not claim completion without test outputs.
+- Do not assume requirements; if unclear, stop and ask.
+- If spec reference is missing for implementation work, return BLOCKED and do not code.
+
+Output contract (mandatory):
+- STATUS: DONE | BLOCKED
+- TASK: <single task>
+- SPEC_REF: <reference or BLOCKED reason>
+- FILES_CHANGED: <list>
+- VALIDATION_RUN: <commands + outcomes>
+- OPENCLAW_VERIFY: <cli checks + browser checks or N/A>
+- RISKS: <list or NONE>
+
+Required docs updates:
+- docs/tasks.md
+- docs/progress.md
+- docs/change-log.md
+- docs/traceability.md
+- docs/test-results.md
+
+When fully done, run:
+openclaw gateway wake --text "Done: <GATE_ID> completed with evidence" --mode now
+```
+
+## G1 Planning Prompt
+
+```text
+Objective: Complete planning artifacts.
+
+Tasks:
+1) Finalize requirements with testable acceptance criteria.
+2) Capture Definition of Done.
+3) List assumptions and risks.
+4) If research_mode=true, produce docs/research-notes.md with options and recommendation.
+
+Validations:
+- Ensure requirements are testable and unambiguous.
+- Ensure acceptance criteria map to at least one test each.
+
+Done condition:
+- docs/requirements.md complete
+- docs/plan.md updated
+- docs/progress.md updated for G1
+```
+
+## G2 Architecture Prompt
+
+```text
+Objective: Complete architecture baseline and ADR.
+
+Tasks:
+1) Update docs/architecture.md (components, data flow, deployment, security baseline).
+2) Update docs/adr/ADR-0001-initial-architecture.md with alternatives and trade-offs.
+3) For brownfield, ensure as-is architecture + migration artifacts are current.
+
+Validations:
+- Architecture supports must-have journeys.
+- ADR includes at least 2 alternatives.
+
+Done condition:
+- G2 artifacts complete and cross-linked in docs/traceability.md
+```
+
+## G3 Slice-1 Prompt
+
+```text
+Objective: Deliver and verify first vertical slice.
+
+Tasks:
+1) Implement first slice for the top priority user journey.
+2) Add unit and integration tests for this slice.
+3) Execute manual smoke test for the slice.
+
+Validations:
+- unit tests pass
+- integration tests pass
+- manual smoke scenario recorded in docs/test-results.md
+
+Done condition:
+- slice-1 works end-to-end with evidence
+```
+
+## G4 Full Build Prompt
+
+```text
+Objective: Complete full build and baseline verification.
+
+Tasks:
+1) Implement remaining in-scope v1 features.
+2) Run full validation suite.
+3) Resolve failures or document blockers.
+
+Validations:
+- lint/type/build pass
+- unit/integration/e2e pass
+- contract checks pass if API boundaries exist
+
+Done condition:
+- all in-scope features implemented and verified
+```
+
+## G5 Security & Quality Prompt
+
+```text
+Objective: Execute security and quality gate.
+
+Tasks:
+1) Run dependency/secret baseline checks.
+2) Verify auth/input validation/error handling.
+3) Run performance smoke checks.
+
+Validations:
+- no unresolved critical/high issues
+- mitigation plan logged for medium/low issues
+
+Done condition:
+- security and quality evidence logged
+```
+
+## G6 Release Candidate Prompt
+
+```text
+Objective: Prepare and verify release candidate.
+
+Tasks:
+1) Complete release checklist.
+2) Validate rollback instructions.
+3) Confirm monitoring/alerts baseline.
+
+Validations:
+- release-checklist complete
+- rollback approach validated
+- docs versioned and coherent
+
+Done condition:
+- RC ready for approval/deployment
+```
+
+## G7 Handover Prompt
+
+```text
+Objective: Complete handover and close orchestration.
+
+Tasks:
+1) Execute post-deploy smoke tests.
+2) Finalize handover notes + runbook pointers.
+3) Create next-iteration backlog.
+
+Validations:
+- critical journeys pass in deployed environment
+- unresolved risks have owners
+
+Done condition:
+- docs/progress.md reaches 100% and project is handover-ready
+```
--- a/archive/inactive-skills/codex-conductor/references/manual-test-templates.md
+++ b/archive/inactive-skills/codex-conductor/references/manual-test-templates.md
@@ -0,0 +1,72 @@
+# Manual Test Templates
+
+Use these templates for human-verifiable checks. Record all runs in `docs/test-results.md`.
+
+## Mandatory Orchestrator Behavior
+
+- The orchestrator itself performs manual verification after coding agent changes.
+- For web/UI systems: run real browser checks.
+- For CLI systems: run actual commands and inspect outputs.
+- If verification fails: orchestrator re-spawns coding agent with a fix prompt, then re-tests.
+
+## Web App Manual Tests
+
+### WT-001: Auth Login Journey (if auth exists)
+- Preconditions: test user account exists
+- Steps:
+  1. Open login page in a real browser
+  2. Submit valid credentials
+  3. Confirm landing on authenticated area
+- Expected: login succeeds, no console/server errors
+
+### WT-002: Core CRUD Journey
+- Steps:
+  1. Create an entity
+  2. View it in listing/detail
+  3. Edit it
+  4. Delete it
+- Expected: data lifecycle works end-to-end
+
+### WT-003: Failure Path
+- Steps:
+  1. Trigger invalid input
+  2. Trigger API/server failure scenario
+- Expected: graceful errors, no crash, clear recovery path
+
+### WT-004: Payment Journey (if payments exist)
+- Steps:
+  1. Execute success path
+  2. Execute failure/cancel path
+- Expected: both handled correctly with consistent state
+
+## CLI Manual Tests
+
+### CT-001: Happy Path Command
+- Steps: run primary command with valid inputs
+- Expected: success exit code and expected output
+
+### CT-002: Invalid Input Handling
+- Steps: run command with malformed/missing args
+- Expected: clear error, non-zero exit, no crash
+
+### CT-003: Config Handling
+- Steps: run with expected config + missing config
+- Expected: explicit behavior and guidance
+
+### CT-004: Output Contract
+- Steps: verify stdout/stderr format against docs
+- Expected: output consistent and parseable if required
+
+## Brownfield Migration Tests
+
+### BT-001: Legacy/Modern Parity Check
+- Steps: run same scenario against old and new path
+- Expected: equivalent behavior for supported scope
+
+### BT-002: Rollback Rehearsal
+- Steps: deploy migration slice then execute rollback procedure
+- Expected: service restored cleanly to prior known-good state
+
+### BT-003: Contract Compatibility
+- Steps: verify consumer/provider boundary contracts
+- Expected: no breaking contract changes
--- a/archive/inactive-skills/codex-conductor/references/modes.md
+++ b/archive/inactive-skills/codex-conductor/references/modes.md
@@ -0,0 +1,42 @@
+# Modes
+
+## 1) Project Mode
+
+### greenfield
+Use for new systems from scratch.
+
+Expected pre-architecture outputs:
+- requirements baseline
+- architecture baseline
+- ADR-0001
+- initial CI/test strategy
+
+### brownfield
+Use for onboarding and evolving existing systems.
+
+Expected pre-architecture outputs:
+- as-is architecture
+- system inventory
+- dependency map
+- legacy risk register
+- characterization test baseline
+- migration strategy with rollback points
+- compatibility matrix
+
+## 2) Execution Mode
+
+### autonomous
+- proceed automatically when gate checks pass
+- auto-repair up to configured retries (default 2)
+- pause only on persistent failures/blockers
+
+### gated
+- pause at every gate
+- present pass/fail evidence
+- require explicit user go-ahead to proceed
+
+## Recommended Defaults
+
+- Unknown/new domain → `gated`
+- High-risk brownfield migration → `gated`
+- Well-understood internal greenfield project → `autonomous`
--- a/archive/inactive-skills/codex-conductor/references/planning-questionnaire.md
+++ b/archive/inactive-skills/codex-conductor/references/planning-questionnaire.md
@@ -0,0 +1,65 @@
+# Planning Questionnaire (Mandatory)
+
+Ask these in order. Do not start implementation until critical answers are provided.
+
+## 0) Coding Agent Selection (Ask First)
+1. Which coding agent should run implementation tasks? (`codex` | `claude` | `opencode` | `pi`)
+2. What is the fallback coding agent if the primary fails repeatedly?
+
+## A) Outcome and Scope
+3. What are we building (one-sentence mission)?
+4. Who are the target users?
+5. What is in scope for v1?
+6. What is explicitly out of scope?
+7. What is the deadline (if any)?
+
+## B) User Journeys and Success
+8. What are the top 3 user journeys?
+9. What must work on day one (must-have features)?
+10. What metrics define success (adoption, conversion, latency, reliability)?
+11. What does “Definition of Done” mean for this project?
+
+## C) Product and Compliance Constraints
+12. Any legal/compliance constraints (privacy, data residency, PCI, HIPAA, etc.)?
+13. Any accessibility level target (e.g., WCAG baseline)?
+14. Any browser/device/platform constraints?
+15. Any third-party integrations required?
+
+## D) Technical Constraints
+16. Preferred stack (frontend/backend/database/infra)?
+17. Existing repo or greenfield?
+18. Required hosting target (Cloudflare, Vercel, AWS, on-prem, etc.)?
+19. Required CI/CD platform?
+20. Auth requirements (roles, SSO, OAuth providers)?
+21. Payments/subscriptions needed?
+22. Data model complexity and expected scale?
+
+## E) Quality and Operations
+23. Required test levels (unit/integration/e2e/perf/security)?
+24. Availability target/SLO?
+25. Logging/monitoring/alerting requirements?
+26. Rollback expectations?
+27. Backup and disaster recovery expectations?
+
+## F) Orchestration Preferences
+28. Mode: `autonomous` or `gated`?
+29. Should `research_mode` run during planning? (`true/false`)
+30. In gated mode, who approves each gate?
+31. In autonomous mode, should orchestrator auto-repair failures up to 2 retries? (`true/false`)
+32. Preferred progress update frequency?
+
+## G) Acceptance and Sign-off
+33. What are the exact acceptance tests for launch?
+34. What evidence is required at each gate?
+35. Final approver for release?
+
+## Minimum Inputs Required to Start Build
+- Primary coding agent choice
+- Mission
+- Top user journeys
+- v1 scope
+- Hosting target
+- Stack preference (or explicit “recommend one”)
+- Mode (`autonomous` or `gated`)
+- Definition of Done
+- Acceptance tests
--- a/archive/inactive-skills/codex-conductor/references/research-playbook.md
+++ b/archive/inactive-skills/codex-conductor/references/research-playbook.md
@@ -0,0 +1,39 @@
+# Research Playbook
+
+Use during planning when `research_mode=true`.
+
+## Goals
+- Reduce architecture risk before implementation
+- Provide transparent option comparison
+- Tie decisions to requirements and constraints
+
+## Research Procedure
+1. Restate research questions from planning gaps.
+2. Define decision criteria (cost, complexity, speed, security, scale, lock-in).
+3. Generate 2-4 viable options per major decision:
+   - app architecture
+   - data layer
+   - deployment model
+   - auth model
+   - testing strategy
+4. For each option, record:
+   - fit for requirements
+   - trade-offs
+   - operational burden
+   - risk profile
+5. Recommend one option with confidence score (low/medium/high).
+6. Convert recommendation into ADR draft.
+
+## Output Template (`docs/research-notes.md`)
+- Questions
+- Decision Criteria
+- Options Compared
+- Recommendation
+- Risks and Mitigations
+- Follow-up Questions
+
+## Quality Rules
+- Prefer primary docs and well-established references.
+- Avoid single-source decisions for critical architecture choices.
+- Mark unknowns explicitly.
+- Do not present uncertain conclusions as facts.
--- a/archive/inactive-skills/codex-conductor/references/spec-driven-development.md
+++ b/archive/inactive-skills/codex-conductor/references/spec-driven-development.md
@@ -0,0 +1,185 @@
+# Spec-Driven Development (Non-Negotiable)
+
+This is the governing principle of the orchestrator: **no code without a spec**.
+
+## Core Rule
+
+The coding agent MUST NOT write implementation code until a written, approved spec exists for what it is about to build. This prevents:
+
+- Guessing at requirements
+- Making assumptions about behavior
+- Building features the user didn't ask for
+- Architectural drift from undocumented decisions
+
+## What Counts as a Spec
+
+A spec is a written document (in `docs/` or inline in a task file) that includes:
+
+1. **What** is being built (feature/component/fix)
+2. **Why** it's needed (user story, problem statement)
+3. **Acceptance criteria** (testable conditions for "done")
+4. **Constraints** (tech stack, performance, security, compatibility)
+5. **Out of scope** (what this does NOT do)
+
+Minimum viable spec for a single task:
+```markdown
+## Task: [Name]
+**Goal:** [One sentence]
+**Acceptance Criteria:**
+- [ ] Criterion 1
+- [ ] Criterion 2
+**Constraints:** [Any limits]
+**Out of Scope:** [What we're not doing]
+```
+
+## Spec Lifecycle
+
+### 1. Spec Creation (Before G2)
+- Orchestrator (or user) writes the spec
+- Spec is stored in `docs/specs/` or embedded in `docs/requirements.md`
+- For brownfield: existing behavior must be documented first
+
+### 2. Spec Approval (Before Implementation)
+- User reviews and approves (in gated mode)
+- Or orchestrator validates completeness (in autonomous mode)
+- Spec is marked APPROVED in `docs/specs/` or status.json
+
+### 3. Spec → Task Mapping (G3/G4)
+- Each task in `docs/g4-task-plan.md` MUST reference a spec section
+- Format: `Spec: requirements.md#feature-name` or `Spec: specs/auth.md`
+- Tasks without spec references are BLOCKED
+
+### 4. Implementation (Coding Agent)
+- Agent receives: spec + task description + context
+- Agent MUST NOT invent features not in spec
+- Agent MUST flag spec gaps and request clarification (not guess)
+
+### 5. Verification Against Spec
+- Orchestrator checks implementation against acceptance criteria
+- Deviation from spec = FAIL (not creative license)
+
+## Enforcement Points
+
+### Gate G1 (Planning Approved)
+- `docs/requirements.md` must exist with testable requirements
+- Acceptance criteria must be explicit, not vague
+
+### Gate G2 (Architecture Approved)
+- `docs/specs/` directory must exist with at least one spec file
+- Or `docs/requirements.md` must have spec-level detail for v1 features
+- ADR references must point to spec decisions
+
+### Gate G3/G4 (Build)
+- Each task prompt MUST include:
+  - Spec reference
+  - Acceptance criteria from spec
+  - Explicit boundaries
+- `run_gate.py` blocks tasks without `--spec-ref` argument
+
+### Coding Agent Prompt Template
+All coding agent prompts MUST include this preamble:
+
+```
+## SPEC-DRIVEN RULES
+1. You are implementing ONLY what is specified below.
+2. Do NOT add features, abstractions, or "improvements" not in spec.
+3. If the spec is unclear or incomplete, STOP and ask for clarification.
+4. Do NOT guess at requirements. Ever.
+5. Your output will be verified against the acceptance criteria below.
+
+## SPEC
+[Insert spec section here]
+
+## ACCEPTANCE CRITERIA
+[Insert criteria here]
+
+## TASK
+[Insert specific task]
+```
+
+## Red Flags (Auto-Fail)
+
+The following trigger automatic gate failure:
+
+- Task executed without spec reference
+- Coding agent added unrequested features
+- Acceptance criteria missing or vague ("should work well")
+- Implementation diverged from spec without change request
+- Assumptions documented as facts
+
+## Change Requests
+
+If requirements change mid-build:
+
+1. Run `change_impact.py` to assess impact
+2. Update spec documents
+3. Re-approve affected specs
+4. Update traceability matrix
+5. Only then resume implementation
+
+No "I'll just add this quickly" — all changes go through spec update.
+
+## Spec Templates
+
+### Feature Spec (`docs/specs/feature-name.md`)
+```markdown
+# Feature: [Name]
+
+## Overview
+[1-2 sentences]
+
+## User Story
+As a [user type], I want [goal] so that [benefit].
+
+## Acceptance Criteria
+- AC-1: Given [context], when [action], then [result]
+- AC-2: Given [context], when [action], then [result]
+
+## Allowed Scope Files
+- src/path/to/feature/**
+- tests/path/to/feature/**
+
+## Technical Constraints
+- [Stack/performance/security constraints]
+
+## Dependencies
+- [Other features, APIs, services]
+
+## Out of Scope
+- [What this feature explicitly does NOT do]
+
+## Open Questions
+- [Anything needing clarification before implementation]
+```
+
+### API Endpoint Spec
+```markdown
+# Endpoint: [Method] [Path]
+
+## Purpose
+[What this endpoint does]
+
+## Request
+- Method: [GET/POST/etc]
+- Path: [/api/v1/resource]
+- Auth: [Required/None/Scope]
+- Body: [Schema or example]
+
+## Response
+- Success: [Status + schema]
+- Errors: [Status codes + meanings]
+
+## Validation Rules
+- [Field validations]
+
+## Side Effects
+- [Database changes, events emitted, etc]
+```
+
+## Summary
+
+**Spec → Approve → Implement → Verify**
+
+No shortcuts. No guessing. No "I assumed you wanted..."
+
+The spec is the contract. Deviate = Fail.
--- a/archive/inactive-skills/codex-conductor/references/testing-matrix.md
+++ b/archive/inactive-skills/codex-conductor/references/testing-matrix.md
@@ -0,0 +1,71 @@
+# Testing Matrix (Gate-Based)
+
+Apply this matrix on every project. Expand when domain-specific risks appear.
+
+## Gate G1 (Planning)
+- Validate requirements clarity
+- Validate acceptance criteria are testable
+- Validate risks and assumptions listed
+
+## Gate G2 (Architecture)
+- Validate architecture supports all must-have journeys
+- Validate threat model baseline exists
+- Validate ADR exists with alternatives and trade-offs
+
+## Gate G3 (Slice-1 Build)
+- Unit tests for first slice pass
+- Integration test for key flow passes
+- Manual smoke test of one critical journey passes
+- Docs updated for slice
+
+## Gate G4 (Full Build)
+- Lint/type/build clean
+- Unit/integration suite pass
+- E2E critical paths pass
+- API contract checks pass (if relevant)
+- Data migration checks pass (if relevant)
+
+## Gate G5 (Security & Quality)
+- Secret scanning baseline
+- Dependency vulnerability scan baseline
+- AuthN/AuthZ checks
+- Input validation checks
+- Error handling/logging checks
+- Performance smoke checks
+
+## Gate G6 (Release Candidate)
+- Release checklist complete
+- Rollback steps tested or validated
+- Monitoring/alerts configured
+- Versioned docs complete
+
+## Gate G7 (Production/Handover)
+- Post-deploy smoke tests pass
+- Incident runbook available
+- Handover notes complete
+- Open risks tracked with owners
+
+## Manual Testing Requirements
+
+For Web Projects:
+- Login flow (if auth exists)
+- Core create/read/update/delete journey
+- Payment happy path + failure path (if payments exist)
+- Error page and recovery behavior
+
+For CLI Projects:
+- Core command success path
+- Invalid input handling
+- Config loading behavior
+- Output format consistency
+
+## Evidence Format
+
+For every gate, record in `docs/test-results.md`:
+- test name
+- command or steps
+- expected result
+- actual result
+- pass/fail
+- evidence link/snippet
+- timestamp