---
title: "feat: Unclear Feedback Handling — Conditional Enricher, Clarity Rubric, Button Clarification Loop"
type: feat
status: active
date: 2026-04-04
origin: docs/brainstorms/2026-04-04-unclear-feedback-handling-requirements.md
---

# Unclear Feedback Handling — Conditional Enricher, Clarity Rubric, Button Clarification Loop

## Overview

Today, when the comparator returns `unclear`, the pipeline marks the queue entry complete, posts a ❓ thread ack, and drops the message. The enricher (`apps/feedback-pipeline/src/pipeline/enricher.ts`) is never called on this path. Per-message raw-feedback filenames also fragment thread provenance.

This plan wires a **conditional enricher** (runs when comparator confidence <0.9) into the pipeline, introduces a **Clarity Rubric** (3-of-4 attributes + confidence ≥0.7 gate), creates a durable **needs-review** store keyed by Slack thread root, adds **Approve/Reject buttons with note modals** on every bot-posted message, and runs both **real-time** (thread-reply) and **scheduled** (30-min) clarification loops. Supporting infra (checkpointing, multi-insight splitting, Review Queue, eval harness) is structured into later phases.

Ships behind three feature flags (R29) enabled in sequence: `ENABLE_ENRICHMENT_ON_UNCLEAR` → `ENABLE_CLARIFICATION_POST` → `ENABLE_AUTO_PROMOTION`.

## Problem Frame

Two problems stem from the current unclear path (see origin: `docs/brainstorms/2026-04-04-unclear-feedback-handling-requirements.md`):

1. **Data loss** — real product feedback is silently discarded because it lacked context at receipt time.
2. **No resolution path** — humans can't clarify, and the pipeline never retries.

Plus: raw-feedback is keyed by message `ts`, so follow-up messages in the same Slack thread create separate files with fragmented provenance.

## Requirements Trace

This plan covers requirements R1–R16, R18–R24, R27, R29–R31, R43–R45, R49–R51, R54, R55 (core unclear-handling loop). Supporting infra requirements R25–R26 (multi-insight splitting), R28 (Linear cache), R32–R42 (Review Queue + eval harness), R46–R48, R52–R53 (checkpointing, outbox, DLQ, two-phase search) are scoped into later phases — see Phased Delivery.

**Per-thread raw-feedback storage:** R1, R2, R3
**Needs-review store:** R4, R5, R6
**Conditional enricher + rubric:** R7, R8, R9, R18, R19, R20
**Approve/Reject buttons (4-variant):** R21, R22, R23, R24, R54, R55
**Wrong-enrichment archive:** R43, R44, R45
**Clarification loop:** R10, R11, R12, R15, R16
**Promotion:** R13, R14
**Customer dimension:** R27
**Rollout & observability:** R29, R30, R31
**Tooling discipline:** R49 (enricher toolshed), R50 (qmd collection layout), R51 (directory-scoped rules)

## Scope Boundaries

**In scope:**
- Core enrichment → rubric → clarification → promotion loop.
- Per-thread raw-feedback + needs-review storage.
- Block Kit Approve/Reject buttons with note modal, 4-variant handling.
- Feature-flagged staged rollout.
- Minimal good/bad example storage (file-level, no dashboard — the storage format + collection path are landed so the Review Queue can consume them later).
- Wrong-enrichment history archived in the needs-review file.

**Out of scope — deferred to follow-on plans:**
- Multi-insight splitting (R25, R26) — requires comparator signature refactor and insight-level re-comparison; split into its own plan.
- Review Queue dashboard in feedback-ui (R32, R33, R35, R36, R37, R38) — separate feedback-ui plan.
- AI eval harness (R39–R42) — separate plan, consumes the tuning-examples storage this plan creates.
- Automated prompt tuning (R42) — Phase 2 per origin doc.
- Linear cache sync job (R28) — separate plan; this plan launches enricher with a toolshed that omits Linear until the cache lands.
- Customer profile backfill — parallel feature; this plan only reads `knowledges/customers/` if present.
- Checkpointing table (R46, R47), transactional outbox (R52), dead-letter table (R53), two-phase vault search (R48) — separate durability plan.
- Batch bootstrap (`apps/feedback-pipeline/src/batch/bootstrap.ts`) — unchanged.
- `@mention` interactive handler (`processMessageInteractive`) — unchanged for this plan.
- No backfill of existing per-message raw-feedback files (leave-as-is decision).
- No changes to `customer-insights/` insight schema or category index format.

## Context & Research

### Relevant Code and Patterns

- `apps/feedback-pipeline/src/app.ts` — Slack Bolt App (Socket Mode). `app.message` handler (lines 288–369) receives all channel messages including thread replies (Slack fires `message` events with `thread_ts`). `processQueue()` (lines 373–429) drops on `unclear` today. `setInterval` queue-polling pattern at line 580 (template for scheduler).
- `apps/feedback-pipeline/src/pipeline/enricher.ts` — `enrich()` returns `{ resolved, enrichedContext?, escalationMessage?, sourcesFetched }`. Already accepts a tools interface with `readSlackThread` + optional `searchLinear`. Needs extension for adjacent-messages, vault-search, customer-profile reads (R49 toolshed).
- `apps/feedback-pipeline/src/pipeline/comparator.ts` — returns a single `ComparisonResult` with `confidence`. R7 gates enrichment on `confidence < 0.9`. Multi-insight refactor (array return) is explicitly deferred (R25 follow-on plan).
- `apps/feedback-pipeline/src/pipeline/ingestion.ts` — `ingest()` writes per-message raw-feedback. Needs thread-root keying and `messages[]` accumulation.
- `apps/feedback-pipeline/src/knowledge/format.ts` — gray-matter serialize/parse. Extend schemas with `messages` + `threadRootTs`.
- `apps/feedback-pipeline/src/knowledge/knowledge-store.ts` — `writeRawFeedback` overwrites today (line 90). Needs append-or-update.
- `apps/feedback-pipeline/src/knowledge/types.ts` — zod schemas for `RawFeedback`, `InsightEntry`, `ComparisonResult`, `EnrichmentResult`, `PipelineEvent`. Add `NeedsReviewSchema`, extend `RawFeedbackSchema`, add new `PipelineEventType`s.
- `apps/feedback-pipeline/src/lib/slack-service.ts` — `getThreadReplies()`, `postMessage()`, `getPermalink()`. Need to add Block Kit posting + modal open helpers.
- `apps/feedback-pipeline/src/lib/escalation.ts` — existing `postEscalation` + `MAX_ENRICHMENT_ATTEMPTS` config. Retry cap drops from 5 → 3 per origin R15.
- `apps/feedback-pipeline/src/lib/state.ts` — `PipelineState` (better-sqlite3). No schema change required for this plan; needs-review state lives in markdown.
- `apps/feedback-pipeline/src/lib/event-journal.ts` — already supports `feedback.escalated`, `feedback.enrichment_resolved`. Extend `PipelineEventType` for new events (R30).
- `apps/feedback-ui/lib/search-store.ts:246` — qmd indexes `raw-feedback` with `**/*.md` (confirmed). Needs-review must NOT live under `raw-feedback/` or it pollutes search. Use top-level `knowledges/needs-review/` (qmd collections R50).
- `apps/feedback-pipeline/src/lib/agent-backend.ts` — `createStageBackend(stage)` factory. Add `'enrich'` and `'rubric'` stages.

### Institutional Learnings

- `docs/solutions/best-practices/qmd-sdk-hybrid-search-integration-2026-04-03.md` — qmd singleton + collection configuration. Confirms R50 collection layout is feasible.
- `docs/solutions/best-practices/ai-eval-promptfoo-feedback-pipeline-2026-04-01.md` — eval patterns; `feedback.enrichment_resolved` vs `feedback.escalated` already fire, so resolution rate is measurable from day one (R31).
- `.claude/rules/nats.md` — NATS publish on every vault file write; `knowledges/needs-review/` must be synced.
- `.claude/rules/testing.md` — write tests alongside code; run full suite before commit.

### External References

Brainstorm research-synthesis section references Sierra Agent Harness, Letta Four-Layer Memory, Stripe Minions Part 1/2, Slope Durable Workflows, Mintlify ChromaFs, HAL, Cognition Devin, Loop Temporal+Outbox. The patterns carried forward into this plan's in-scope phases:
- **Stripe Minions — bounded iteration**: validates R15 3-retry cap.
- **Stripe Minions Part 2 — Blueprints**: our pipeline IS a blueprint; staged flags match shift-left philosophy.
- **Stripe Minions Part 2 — directory-scoped rules**: R51 `_rules.md` auto-loading.
- **Stripe Minions Part 2 — Toolshed**: R49 enricher tool allowlist.
- **Letta Core/Recall memory**: needs-review `history[]` archive (R43) = recall memory.
- **HAL graduated-access rollout**: maps to staged feature flags (R29).

Patterns from research that inform later phases (deferred): Slope checkpointing (R46/R47), Loop outbox/DLQ (R52/R53), Mintlify two-phase search (R48), Slope three-stage feedback pipeline (eval harness R39–R41).

## Key Technical Decisions

- **Needs-review at top-level `knowledges/needs-review/`**: qmd indexes `raw-feedback/**/*.md`, so a subfolder would pollute search. This also aligns with R50's three-collection layout (customer-insights, raw-feedback, tuning-examples). Needs-review becomes its own collection excluded from user-facing search.
- **Per-thread keyed files**: filename pattern `<iso-date-of-thread-root>-<slug>.md`. `threadRootTs = message.thread_ts ?? message.ts`. Same scheme in raw-feedback and needs-review so promotion can deterministically find the target filename.
- **Frontmatter `messages[]` + body-is-first-message**: every message appended to frontmatter `messages: [{ts, user, text}]`. The first message remains in the markdown body so qmd indexes the most natural searchable text. Idempotent on `ts`.
- **Append-on-thread-match semantics**: `writeRawFeedback` reads existing file (if present), appends new message to `messages[]`, updates `updated`, atomic-writes. Preserves provenance.
- **Needs-review state is markdown, not SQLite**: frontmatter = source of truth (status, retryCount, lastEnrichmentAt, escalation history, `history[]` archive of superseded enrichments). Editable by humans, diffable in git, no schema migration. Scheduler scans the folder. A follow-on plan may add a SQLite index if the folder grows past ~5k entries.
- **Conditional enricher gate at `confidence < 0.9`**: enricher runs whenever comparator confidence is below 0.9, covering BOTH `unclear` returns and low-confidence `new`/`merge` returns. High-confidence cases bypass enrichment entirely (cost saving per R7).
- **Clarity Rubric as YAML config**: `apps/feedback-pipeline/config/clarity-rubric.yaml` defines the 4 attributes + thresholds. Rubric evaluator is a discrete pipeline step (R9) that scores present/absent on each attribute using a dedicated LLM call with a fixed prompt. Config-driven so curators can tune without code changes.
- **Retry cap = 3** (down from 5 in v1 brainstorm, per R15 updated decision). After 3, `status: needs-human-review`; pipeline keeps **absorbing** subsequent thread messages (R16) but only posts minimal "📥 received" ack — no more clarifying questions.
- **Buttons on every bot message, 4-variant** (R21–R24, R54): Approve / Reject buttons with optional note via modal. Button `value` encodes `{threadRootTs, stage, outputId, action}` JSON. Modal submission drives 4-variant routing. Approve advances workflow; Approve-with-note runs ONE extra enrichment refinement using the note; Reject increments retry with generic context; Reject-with-note uses the note as strong re-enrichment context AND as eval-harness ground truth.
- **Wrong-enrichment archive in needs-review frontmatter `history[]`**: every superseded enrichment gets archived (not deleted), capturing `{ts, sources, summary, rejected, reason}`. Never overwritten. Builds the eval harness dataset (R44).
- **Good/bad examples write to `knowledges/tuning/examples/<stage>/<label>/<id>.md`** (R34): minimal storage in this plan — the Review Queue consumes it later. Written directly by the button-callback handler when an `approve`/`reject` includes stage context.
- **Enricher toolshed as an allowlist config**: declared tools: `readSlackThread`, `readAdjacentMessages`, `searchVaultInsights`, `readCustomerProfile`. `searchLinearCache` + `fetchLinearLive` are registered but no-op until the Linear cache feature lands (R28 deferred). Toolshed defined in `apps/feedback-pipeline/config/enricher-tools.yaml`.
- **Scheduled retry via `setInterval`**: matches existing `pollTimer` pattern; no separate cron service. 10-min interval, 30-min staleness filter, 24h re-post guard.
- **Thread-reply auto-trigger reuses existing `app.message` handler**: branch on `thread_ts` + `findByThreadRootTs` lookup; zero new Slack event subscriptions.
- **In-memory lock Set keyed by `threadRootTs`**: prevents concurrent re-enrichment from scheduled + thread-reply paths on the same needs-review file.
- **Directory-scoped `_rules.md` (R51)**: enricher + rubric evaluator check for `knowledges/customer-insights/<category>/_rules.md` when they have a working category hypothesis, and prepend the file contents to the system prompt. Empty dirs use default rules.
- **Customer dimension best-effort (R27)**: enricher looks up Slack user in `knowledges/customers/` (if present) and extracts `@company` mentions from thread; writes `customerName` or `null`. Never blocks promotion.
- **Feature flags gate delivery phases** (R29): flags are env vars, checked at handler entry points. Missing/false → original silent-drop behavior (safe degrade).

## Open Questions

### Resolved During Planning

- **needs-review location** → top-level `knowledges/needs-review/`. (qmd indexes `raw-feedback/**/*.md` at `apps/feedback-ui/lib/search-store.ts:246`.)
- **thread-reply subscription** → existing `app.message` handler already receives them; branch on `thread_ts` + needs-review lookup. No new event type.
- **scheduled retry** → `setInterval` in `start()`, following `pollTimer` pattern.
- **existing raw-feedback migration** → no backfill; new scheme applies to net-new writes only.
- **button callback routing** → encode `{threadRootTs, stage, outputId, action}` JSON in Block Kit button `value`, decode in action handler, correlate to needs-review file via `threadRootTs` + `findByThreadRootTs`.
- **Linear cache readiness** → enricher launches with a smaller toolshed; `searchLinearCache` is registered but no-ops until the cache feature ships.
- **rubric attribute storage** → YAML config file at `apps/feedback-pipeline/config/clarity-rubric.yaml`, hot-reloadable.

### Deferred to Implementation

- **Exact prompt for rubric evaluator**: draft in Unit 4, tune against a small fixture set at implementation time.
- **Adjacent-messages window precise bounds (R8)**: 24h window + same channel is the starting point; tune during rollout. Exact query semantics (before-only vs before-and-after) resolved when wiring `readAdjacentMessages` to Slack history.
- **Vault-search integration surface**: this plan's enricher calls qmd directly from the pipeline process (no HTTP hop to feedback-ui). The exact import (same `@tobilu/qmd` singleton vs new instance) resolved during implementation.
- **Modal Block Kit layout details**: 1 text-input field + submit button is enough; exact label copy decided during implementation.
- **Thread-level concurrency lock granularity**: in-memory `Set<threadRootTs>`. If Railway runs >1 instance, revisit with a distributed lock in a later durability plan.
- **`outputId` generation scheme for button values**: deterministic hash of `{threadRootTs, stage, postedAt}` so re-posts of the same stage don't collide.
- **Rubric threshold initial values**: 3/4 attributes + 0.7 confidence per origin R19; tune during Phase 1 rollout based on observed resolution rate.

## High-Level Technical Design

> *This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.*

**End-to-end flow:**

```mermaid
flowchart TB
    A[Slack message] --> A1{thread_ts AND<br/>needs-review file<br/>exists?}
    A1 -- yes --> RE[trigger re-enrich]
    A1 -- no --> CLS[classifier]
    CLS -- not feedback --> SKIP[skip]
    CLS -- feedback --> CMP[comparator]
    CMP --> G1{confidence >= 0.9?}
    G1 -- yes --> ING[ingest:<br/>write per-thread<br/>raw-feedback + insight]
    G1 -- no --> ENR[enricher<br/>toolshed: thread,<br/>adjacent, vault,<br/>customer]
    ENR --> RUB[clarity rubric<br/>4 attrs from config]
    RUB --> G2{>=3/4 attrs AND<br/>confidence >=0.7?}
    G2 -- yes --> CMP2[retry comparator<br/>w/ enriched context]
    CMP2 --> G3{confidence >= 0.9?}
    G3 -- yes --> ING
    G3 -- no --> NR[write needs-review +<br/>post summary +<br/>Approve/Reject btns]
    G2 -- no --> NR
    RE --> ENR
    NR --> WAIT{wait for:<br/>button click<br/>thread reply<br/>30-min timer}
    WAIT -- approve --> PROMOTE[promote:<br/>write raw-feedback +<br/>delete needs-review +<br/>ingest]
    WAIT -- approve-with-note --> ENR2[extra enrichment<br/>pass with note] --> PROMOTE
    WAIT -- reject or reply --> CAP{retryCount<br/>< 3?}
    CAP -- yes --> ENR
    CAP -- no --> HR[status:<br/>needs-human-review<br/>absorb further msgs]
    ING --> BTN[post ingest ack<br/>with buttons]
    PROMOTE --> BTN
```

**Needs-review frontmatter shape (directional):**

```yaml
threadRootTs: "1743800000.000100"
channel: "#i-user-feedback (C08...)"
channelId: "C08..."
threadPermalink: "https://..."
status: pending-clarification  # pending-clarification | re-enriching | needs-human-review
retryCount: 1
customerName: null
lastEnrichmentAt: "2026-04-04T17:05:00Z"
lastEscalationPostedAt: "2026-04-04T17:05:02Z"
escalationMessageTs: "1743800120.000800"
currentSummary: "User reports CSV export timing out for large reports"
escalationMessage: "Is this about the reports page CSV export specifically?"
rubricScore:
  featureSurface: true
  painPoint: true
  userRole: false
  desiredOutcome: true
  attributesMet: 3
  comparatorConfidence: 0.6
  passed: false  # confidence below 0.7 gate
messages:
  - ts: "1743800000.000100"
    user: "Jane (U123)"
    text: "export is broken"
  - ts: "1743800300.000200"
    user: "Jane (U123)"
    text: "the one on the reports page"
history:
  - ts: "2026-04-04T17:00:00Z"
    sources: [slack-thread]
    summary: "User reports an export is broken"
    superseded: true
    reason: "Insufficient detail — rubric 2/4 attributes"
```

**Button value encoding (R55):**

```json
{"threadRootTs":"1743800000.000100","stage":"enrichment","outputId":"hash8","action":"approve"}
```

Action handler decodes, opens modal (optional note), modal submission routes to 4-variant handler.

**qmd collection layout (R50):**

```text
knowledges/
  customer-insights/     # collection: customer-insights (user-facing search)
    <category>/
      _rules.md          # R51 directory-scoped rules (optional)
      <insight>.md
  raw-feedback/          # collection: raw-feedback (enricher-only, NOT user-facing)
    <thread-ts>-<slug>.md
  needs-review/          # NEW, NOT in any qmd collection (excluded from search)
    <thread-ts>-<slug>.md
  tuning/
    examples/            # collection: tuning-examples (eval-harness-only)
      classifier/
        good/  bad/
      comparator/
        good/  bad/
      enricher/
        good/  bad/
  customers/             # read-only by enricher (separate feature)
```

## Implementation Units

### Unit Dependency Graph

```mermaid
flowchart TB
    subgraph P1[Phase 1: ENABLE_ENRICHMENT_ON_UNCLEAR - silent]
        U1[U1: per-thread<br/>format + helpers]
        U2[U2: writeRawFeedback<br/>append-or-update]
        U3[U3: needs-review<br/>store module]
        U4[U4: clarity rubric<br/>config + evaluator]
        U5[U5: enricher toolshed<br/>+ adjacent/vault/customer tools]
        U6[U6: unclear-handler<br/>enrich->rubric->retry-compare]
        U7[U7: qmd collection<br/>reconfig + _rules loader]
    end
    subgraph P2[Phase 2: ENABLE_CLARIFICATION_POST - buttons]
        U8[U8: Block Kit<br/>post-with-buttons helper]
        U9[U9: button action<br/>+ modal handler]
        U10[U10: 4-variant<br/>button router]
        U11[U11: good/bad example<br/>writer]
    end
    subgraph P3[Phase 3: ENABLE_AUTO_PROMOTION - loop]
        U12[U12: thread-reply<br/>auto trigger]
        U13[U13: scheduled<br/>30-min retry]
        U14[U14: promotion<br/>write+delete+ingest]
        U15[U15: needs-human-review<br/>absorb mode]
    end
    U1 --> U2
    U1 --> U3
    U3 --> U6
    U2 --> U6
    U4 --> U6
    U5 --> U6
    U7 --> U5
    U6 --> U8
    U8 --> U9
    U9 --> U10
    U10 --> U11
    U10 --> U12
    U12 --> U14
    U13 --> U14
    U14 --> U15
```

---

### Phase 1 — `ENABLE_ENRICHMENT_ON_UNCLEAR` (silent needs-review, no Slack post)

- [ ] **Unit 1: Per-thread filename + `messages` frontmatter**

**Goal:** Key raw-feedback files by Slack thread root `ts`; add `messages[]` and `threadRootTs` to frontmatter schema.

**Requirements:** R1, R3

**Dependencies:** none

**Files:**
- Modify: `apps/feedback-pipeline/src/knowledge/format.ts`
- Modify: `apps/feedback-pipeline/src/knowledge/types.ts`
- Test: `apps/feedback-pipeline/tests/knowledge/format.test.ts`

**Approach:**
- Add `generateThreadKeyedFilename(threadRootTs, firstText)`.
- Extend `RawFeedbackSchema` with `threadRootTs: z.string().optional()` and `messages: z.array(z.object({ ts, user, text })).default([])`.
- `serializeRawFeedback` emits both; `parseRawFeedback` back-compat reads old files (missing fields → defaults).
- Body = first-message text so qmd indexes natural content.

**Execution note:** test-first — failing format tests for new fields + filename.

**Patterns to follow:** existing gray-matter serialization in `format.ts`; zod schemas in `types.ts`.

**Test scenarios:**
- Happy path: `generateThreadKeyedFilename("1743800000.000100","export broken")` uses thread ts's ISO date as prefix.
- Happy path: serialize→parse round-trip preserves `messages` array of length 2 and `threadRootTs`.
- Edge case: parsing a legacy file without `messages`/`threadRootTs` yields `messages: []`, `threadRootTs: undefined`, no throw.
- Edge case: body = `messages[0].text` after round-trip; single-message write → `messages.length === 1`.

**Verification:** `format.test.ts` passes; no regressions in existing tests.

---

- [ ] **Unit 2: `writeRawFeedback` append-or-update semantics**

**Goal:** On existing per-thread file, read → append new message (dedupe on `ts`) → bump `updated` → atomic-write. Otherwise create fresh.

**Requirements:** R2

**Dependencies:** U1

**Files:**
- Modify: `apps/feedback-pipeline/src/knowledge/knowledge-store.ts`
- Modify: `apps/feedback-pipeline/src/pipeline/ingestion.ts` (call-site: pass `threadRootTs` + one `messages[]` entry)
- Test: `apps/feedback-pipeline/tests/knowledge/knowledge-store.test.ts`

**Approach:**
- `writeRawFeedback({ threadRootTs, messages, ...fields })`. Compute filename from `threadRootTs`. If file exists → read, append to `messages[]` skipping duplicates by `ts`, update `updated`, write atomically. Else → write fresh.
- `ingestion.ts` computes `threadRootTs = message.threadTs ?? message.ts` and builds `messages[0] = { ts, user, text }`.
- NATS publish fires on every write.

**Patterns to follow:** `atomicWrite` + `publishFile` guarded by `isNatsSyncEnabled()`.

**Test scenarios:**
- Happy path: first call creates file with 1 message; second call same thread appends → 2 messages.
- Happy path: `updated > created` after second write.
- Edge case: duplicate `ts` write is idempotent (no dupes).
- Integration: NATS publish fires with full file content on each write.
- Error path: atomic-write temp-file failure leaves original file intact.

**Verification:** `knowledge-store.test.ts` passes; round-trip via `parseRawFeedback` returns accumulated `messages`.

---

- [ ] **Unit 3: Needs-review store module**

**Goal:** CRUD module for `knowledges/needs-review/<thread-root-ts>-<slug>.md` with status lifecycle, retry counting, and `history[]` archive.

**Requirements:** R4, R5, R6, R15, R43

**Dependencies:** U1

**Files:**
- Create: `apps/feedback-pipeline/src/knowledge/needs-review-store.ts`
- Modify: `apps/feedback-pipeline/src/knowledge/types.ts` (add `NeedsReviewSchema`, `NeedsReviewHistoryEntrySchema`)
- Create: `apps/feedback-pipeline/tests/knowledge/needs-review-store.test.ts`

**Approach:**
- `NeedsReviewEntry`: `{ threadRootTs, channel, channelId, threadPermalink, status, retryCount, customerName?, lastEnrichmentAt, lastEscalationPostedAt?, escalationMessageTs?, currentSummary, escalationMessage?, rubricScore?, messages[], history[] }`.
- Functions: `writeNeedsReview`, `readNeedsReview`, `updateNeedsReview(threadRootTs, patch)`, `archiveEnrichmentToHistory(threadRootTs, entry)`, `deleteNeedsReview`, `listNeedsReview({ skipHumanReview? })`, `findByThreadRootTs`, `appendMessage(threadRootTs, message)`.
- Top-level folder: `knowledges/needs-review/`.
- `findByThreadRootTs` scans folder for filenames prefixed by `<thread-root-iso-date>-`.
- NATS publish on create/update/delete.

**Execution note:** test-first, tmp dir per test.

**Patterns to follow:** `knowledge-store.ts` atomicWrite + NATS publish pattern.

**Test scenarios:**
- Happy path: `writeNeedsReview` creates file with `status: pending-clarification`, `retryCount: 0`, `history: []`.
- Happy path: `updateNeedsReview({ status: 're-enriching', retryCount: 1 })` preserves untouched fields.
- Happy path: `archiveEnrichmentToHistory` appends entry, never overwrites existing history.
- Happy path: `listNeedsReview({ skipHumanReview: true })` filters out status `needs-human-review`.
- Happy path: `findByThreadRootTs` returns path if file exists, else null.
- Happy path: `appendMessage` is idempotent by `ts`.
- Edge case: `updateNeedsReview` on missing file throws typed error.
- Edge case: incrementing retryCount to 3 does NOT auto-flip to `needs-human-review` (caller enforces).
- Integration: NATS publish fires on write/update/delete when enabled.
- Error path: atomic write failure leaves original file intact.

**Verification:** `needs-review-store.test.ts` passes; round-trip yields identical values.

---

- [ ] **Unit 4: Clarity Rubric — config file + evaluator**

**Goal:** YAML-backed rubric config + a rubric evaluator that scores the 4 attributes present/absent and computes the combined pass/fail gate.

**Requirements:** R9, R18, R19, R20

**Dependencies:** none (parallelizable with U1–U3)

**Files:**
- Create: `apps/feedback-pipeline/config/clarity-rubric.yaml`
- Create: `apps/feedback-pipeline/src/pipeline/rubric.ts`
- Modify: `apps/feedback-pipeline/src/knowledge/types.ts` (`RubricScore` schema)
- Modify: `apps/feedback-pipeline/src/lib/agent-backend.ts` (add `'rubric'` stage)
- Create: `apps/feedback-pipeline/tests/pipeline/rubric.test.ts`

**Approach:**
- YAML config declares the 4 attributes, a short natural-language definition for each, the minimum-attributes threshold (3), and the min comparator confidence (0.7).
- `evaluateRubric(message, enrichedContext, comparatorConfidence, backend): Promise<RubricScore>` calls a dedicated LLM with a fixed prompt asking for a JSON `{featureSurface: bool, painPoint: bool, userRole: bool, desiredOutcome: bool, reasoning: string}`.
- Combine with `comparatorConfidence` → `passed: attributesMet >= 3 && comparatorConfidence >= 0.7`.
- Config is loaded once at module load + re-read on each call (hot-reload). `RUBRIC_CONFIG_PATH` env var for override.

**Execution note:** test-first; mock backend returns stub JSON.

**Patterns to follow:** system-prompt shape from `enricher.ts` / `comparator.ts`.

**Test scenarios:**
- Happy path: 4/4 attrs + confidence 0.8 → `passed: true, attributesMet: 4`.
- Happy path: 3/4 attrs + confidence 0.75 → `passed: true`.
- Edge case: 3/4 attrs + confidence 0.69 → `passed: false` (confidence gate).
- Edge case: 2/4 attrs + confidence 0.95 → `passed: false` (attrs gate).
- Error path: LLM returns malformed JSON → throws typed error, caller handles.
- Integration: config file edit changes threshold on next invocation (no restart).
- Integration: missing config file throws typed error at startup.

**Verification:** `rubric.test.ts` passes; changing thresholds in YAML file alters pass/fail without restart.

---

- [ ] **Unit 5: Enricher toolshed — new tools + tool allowlist config**

**Goal:** Expand `EnrichmentTools` with adjacent-messages, vault-search, customer-profile reads. Declare the allowlist in YAML. Register all tools but no-op Linear until the cache feature lands.

**Requirements:** R8, R27, R49, R51

**Dependencies:** U7

**Files:**
- Create: `apps/feedback-pipeline/config/enricher-tools.yaml`
- Modify: `apps/feedback-pipeline/src/pipeline/enricher.ts` (expand `EnrichmentTools` interface + prompt)
- Create: `apps/feedback-pipeline/src/pipeline/enricher-tools.ts` (tool implementations + allowlist loader)
- Modify: `apps/feedback-pipeline/src/lib/slack-service.ts` (add `getAdjacentMessages(channel, ts, windowMs)`)
- Create: `apps/feedback-pipeline/src/lib/customer-store.ts` (read `knowledges/customers/`, resolve Slack userId → customerName)
- Create: `apps/feedback-pipeline/tests/pipeline/enricher-tools.test.ts`

**Approach:**
- Tools: `readSlackThread` (existing), `readAdjacentMessages` (NEW: channel history ±24h same channel), `searchVaultInsights` (NEW: qmd hybrid search over `customer-insights` collection, top-3 by default), `readCustomerProfile` (NEW: read `knowledges/customers/` markdown files and match by Slack userId), `searchLinearCache`/`fetchLinearLive` (registered as no-ops).
- Allowlist YAML lists each tool's name + enabled flag + any config (e.g. `adjacentWindowMs`, `vaultSearchTopK`).
- Enricher system prompt enumerates the allowlisted tools dynamically; disabled tools are omitted from the prompt.
- `_rules.md` auto-load (R51): enricher checks `knowledges/customer-insights/<category>/_rules.md` when it forms a category hypothesis, prepends file contents to system prompt. Empty/missing → default prompt.

**Execution note:** test-first, mock qmd store + Slack client + filesystem.

**Patterns to follow:** `EnrichmentTools` interface shape; `SlackService.getChannelHistory` for adjacent lookups.

**Test scenarios:**
- Happy path: `readAdjacentMessages` returns messages within 24h window, same channel, excluding the original message.
- Happy path: `searchVaultInsights` returns top-3 insights by qmd rank.
- Happy path: `readCustomerProfile` resolves user ID to customer name when present; returns null when absent or store missing.
- Happy path: allowlist with `searchLinearCache: enabled=false` → enricher prompt omits the tool.
- Happy path: `_rules.md` in category dir is prepended to enricher system prompt.
- Edge case: `knowledges/customers/` directory missing → `readCustomerProfile` returns null, no throw.
- Edge case: qmd returns no results → `searchVaultInsights` returns [].
- Edge case: `_rules.md` missing → default prompt used, no error.

**Verification:** `enricher-tools.test.ts` passes; toggling YAML enabled flag changes enricher prompt.

---

- [ ] **Unit 6: `unclear-handler` — conditional enricher → rubric → retry-compare → needs-review**

**Goal:** Replace the silent-drop branch in `processQueue()`. New flow: if `comparatorConfidence < 0.9` → enrich → rubric → if pass retry-compare; if retry-compare confidence ≥0.9 ingest; else write needs-review (silent under flag 1).

**Requirements:** R7, R9, R13 (without delete), R27, R43 (history), R30

**Dependencies:** U2, U3, U4, U5

**Files:**
- Create: `apps/feedback-pipeline/src/pipeline/unclear-handler.ts`
- Modify: `apps/feedback-pipeline/src/app.ts` (call `handleUnclear` from `processQueue`)
- Modify: `apps/feedback-pipeline/src/knowledge/types.ts` (add `PipelineEventType`s: `feedback.enriched`, `feedback.rubric_passed`, `feedback.rubric_failed`, `feedback.needs_human_review`, `feedback.promoted`, `feedback.clarification_posted`)
- Create: `apps/feedback-pipeline/tests/pipeline/unclear-handler.test.ts`

**Approach:**
- `handleUnclear(message, comparison, deps) → {kind:'ingested'|'needs-review', ...}`.
- Feature-flag gate: `ENABLE_ENRICHMENT_ON_UNCLEAR=true` required; else fall back to current silent-drop.
- Steps: enrich → `evaluateRubric` → if pass, retry compare with `text + "\n\n[Enriched: " + summary + "]"` → if confidence≥0.9 call `ingest`. Else write needs-review with `currentSummary`, `rubricScore`, initial history entry (not yet superseded).
- Customer dimension: enricher + `readCustomerProfile` populate `customerName`; written into both raw-feedback (on ingest) and needs-review.
- Emit pipeline events at each transition.

**Execution note:** failing integration test first; mock all deps (enricher backend, compare backend, rubric backend, vault paths).

**Patterns to follow:** `createStageBackend('enrich')` factory for DI; option-bag signatures.

**Test scenarios:**
- Happy path: comparator confidence 0.95 → bypasses enricher entirely, ingests.
- Happy path: confidence 0.5 (unclear) → enrich → rubric 4/4 pass → retry-compare 0.92 → ingest; no needs-review written.
- Happy path: confidence 0.5 → enrich → rubric 2/4 fail → needs-review written with rubric score + current summary; no ingest.
- Happy path: rubric passes but retry-compare still <0.9 → needs-review written.
- Edge case: feature flag off → original silent-drop behavior (no enricher call).
- Edge case: non-threaded message → `threadRootTs = message.ts`.
- Error path: enricher throws → caught, needs-review written with fallback summary "Need more context".
- Error path: rubric evaluator throws → treated as fail; needs-review written.
- Integration: emits `feedback.enriched`, `feedback.rubric_passed`/`_failed`, `feedback.ingested` or equivalent.
- Integration: needs-review `history[]` contains 1 entry after first enrichment.
- Integration: customer name populated when `readCustomerProfile` resolves user.

**Verification:** `unclear-handler.test.ts` passes; local smoke with `ENABLE_ENRICHMENT_ON_UNCLEAR=true` creates needs-review files for ambiguous messages.

---

- [ ] **Unit 7: qmd collection reconfiguration + directory-scoped rules loader**

**Goal:** Update `feedback-ui/lib/search-store.ts` qmd collection config to match R50 layout. Add `_rules.md` loader used by enricher + rubric.

**Requirements:** R50, R51

**Dependencies:** none (parallelizable with U1–U4)

**Files:**
- Modify: `apps/feedback-ui/lib/search-store.ts` (add `ignore: ["needs-review/**", "tuning/**"]` to raw-feedback; add `tuning-examples` collection; needs-review stays out of qmd)
- Create: `apps/feedback-pipeline/src/lib/scoped-rules.ts` (`loadScopedRules(categorySlug, vaultRoot)` → string | null)
- Modify: `apps/feedback-ui/tests/search-store.test.ts` (if present) OR create `apps/feedback-ui/tests/search-store.test.ts`
- Create: `apps/feedback-pipeline/tests/lib/scoped-rules.test.ts`

**Approach:**
- Add `ignore` to `raw-feedback` collection (defense-in-depth even though needs-review is at top level).
- Add `tuning-examples` collection at `knowledges/tuning/examples/` with pattern `**/*.md`, NOT user-facing (enricher/eval-harness only).
- `loadScopedRules` reads `knowledges/customer-insights/<slug>/_rules.md` and returns trimmed content; returns null if missing.

**Patterns to follow:** existing collection config block in `search-store.ts:242-255`.

**Test scenarios:**
- Happy path: collection config includes all three collections with correct paths.
- Happy path: `loadScopedRules("csv-export", vaultRoot)` returns file contents when present.
- Edge case: missing `_rules.md` → returns null.
- Edge case: empty `_rules.md` → returns null (trimmed empty string).
- Integration: qmd store creation still succeeds after config change.

**Verification:** search-store tests pass; `scoped-rules.test.ts` passes.

---

### Phase 2 — `ENABLE_CLARIFICATION_POST` (buttons posted to Slack, not yet auto-promoting)

- [ ] **Unit 8: Block Kit post-with-buttons helper**

**Goal:** A reusable helper that posts a bot message to a Slack thread with Approve + Reject buttons carrying encoded `value` JSON. Used for every bot-posted pipeline message.

**Requirements:** R21, R55

**Dependencies:** U6

**Files:**
- Create: `apps/feedback-pipeline/src/lib/slack-buttons.ts`
- Modify: `apps/feedback-pipeline/src/lib/slack-service.ts` (add `postWithButtons(channel, threadTs, blocks, buttonPayload)`)
- Create: `apps/feedback-pipeline/tests/lib/slack-buttons.test.ts`

**Approach:**
- `buildButtonPayload(threadRootTs, stage, outputId, action): string` returns JSON string ≤ Slack's 2000-char limit.
- `buildApproveRejectBlocks(text, payloadBase)` returns Block Kit blocks: section + actions with 2 buttons (`approve`, `reject`), each carrying a serialized `{...payloadBase, action}` value.
- `outputId = sha256(threadRootTs + stage + postedAt).slice(0,8)` — deterministic, collision-resistant for this scale.
- `postWithButtons` wraps `chat.postMessage` with blocks + fallback text.

**Patterns to follow:** existing Block Kit helpers in `app.ts` (`buildResultBlocks`, `buildLoadingBlocks`).

**Test scenarios:**
- Happy path: blocks contain 2 action buttons with distinct `value` JSON (action=approve vs reject).
- Happy path: decoded `value` round-trips `threadRootTs`, `stage`, `outputId`, `action`.
- Edge case: payload with long `threadRootTs` still fits under 2000 chars.
- Edge case: `outputId` is deterministic for identical inputs.
- Integration: `postWithButtons` calls `chat.postMessage` with expected args.

**Verification:** `slack-buttons.test.ts` passes; manual smoke posts a button pair to a test Slack thread.

---

- [ ] **Unit 9: Button action handler + modal opener**

**Goal:** Register a Slack Bolt `app.action` handler for `approve`/`reject` button clicks that opens a Block Kit modal with a single optional note field.

**Requirements:** R21, R55

**Dependencies:** U8

**Files:**
- Modify: `apps/feedback-pipeline/src/app.ts` (register `app.action('approve')`, `app.action('reject')`, `app.view('button_note_modal')`)
- Create: `apps/feedback-pipeline/src/lib/button-modal.ts` (builds modal, decodes private_metadata)
- Create: `apps/feedback-pipeline/tests/lib/button-modal.test.ts`

**Approach:**
- `app.action('approve')` ack()s, parses `action.value` JSON, opens modal via `client.views.open` with private_metadata = value JSON (so modal submission preserves full context).
- Modal: title = "Add a note (optional)" + single multi-line text input ("note", optional) + submit button labeled "Submit".
- `app.view('button_note_modal')` ack()s, reads `private_metadata` + note value, routes to 4-variant handler (U10).

**Patterns to follow:** existing `app.event('app_mention')` handler in `app.ts:198` for Bolt handler shape.

**Test scenarios:**
- Happy path: `app.action('approve')` calls `views.open` with modal blocks + encoded private_metadata.
- Happy path: modal submission with empty note → routes to `approve` variant.
- Happy path: modal submission with note → routes to `approve-with-note` variant.
- Edge case: `private_metadata` fails to decode → logs error, no-op.
- Integration: `app.view` handler invokes the 4-variant router with `{action, note?, threadRootTs, stage, outputId}`.

**Verification:** `button-modal.test.ts` passes; manual smoke shows modal opens on button click.

---

- [ ] **Unit 10: 4-variant button router**

**Goal:** A pure function that routes decoded button events to the right pipeline action: approve / approve-with-note / reject / reject-with-note.

**Requirements:** R22, R23, R24, R54

**Dependencies:** U9

**Files:**
- Create: `apps/feedback-pipeline/src/pipeline/button-router.ts`
- Create: `apps/feedback-pipeline/tests/pipeline/button-router.test.ts`

**Approach:**
- `routeButtonClick({action, note?, threadRootTs, stage, outputId}, deps)`:
  - `approve` (no note): log `good-example` via U11, trigger promotion (U14) if needs-review exists and Phase 3 flag enabled.
  - `approve-with-note`: log `good-example` with note, run ONE extra enricher pass using `note` as extra context, then promote.
  - `reject` (no note): log `bad-example`, increment retryCount, trigger re-enrichment with generic "user marked this wrong" context.
  - `reject-with-note`: log `bad-example` with `note` as reasoning, increment retryCount, trigger re-enrichment with note as strong context, store note as eval ground truth (write to tuning-examples with `groundTruth: true`).
- All variants respect retry cap: reject at retryCount=3 → status `needs-human-review`.

**Execution note:** test-first, pure-logic unit.

**Patterns to follow:** `handleUnclear` DI shape (U6).

**Test scenarios:**
- Happy path: approve (no note) → `good-example` written, promotion triggered.
- Happy path: approve-with-note → one extra enricher pass runs, note in enricher context, then promote.
- Happy path: reject (no note) → `bad-example` written, re-enrichment queued, retryCount+1.
- Happy path: reject-with-note → `bad-example` with note, eval ground truth flagged, re-enrichment with note in enricher system prompt.
- Edge case: reject at retryCount=2 → next increment hits 3 → status `needs-human-review`, no further enrichment.
- Edge case: approve clicked when no needs-review exists (e.g. on ingest-ack) → logs good-example only, no promotion attempt.
- Integration: correct events emitted (`feedback.enriched` / `feedback.needs_human_review` / `feedback.promoted`).

**Verification:** `button-router.test.ts` passes.

---

- [ ] **Unit 11: Good/bad example writer**

**Goal:** Write approve/reject feedback as markdown files under `knowledges/tuning/examples/<stage>/<label>/<id>.md` so the future eval harness can consume them.

**Requirements:** R34 (storage only)

**Dependencies:** U7 (tuning-examples collection)

**Files:**
- Create: `apps/feedback-pipeline/src/knowledge/tuning-store.ts`
- Create: `apps/feedback-pipeline/tests/knowledge/tuning-store.test.ts`

**Approach:**
- `writeTuningExample({stage, label, threadRootTs, input, output, reasoning?, curatorId, groundTruth?, timestamp})`.
- Filename: `<timestamp-iso>-<threadRootTs-slug>.md`, atomic-write, NATS publish.
- Markdown body = human-readable capture; frontmatter holds structured fields.

**Patterns to follow:** `knowledge-store.ts` atomicWrite + NATS publish.

**Test scenarios:**
- Happy path: writes to correct path `tuning/examples/enricher/good/...`.
- Happy path: includes reasoning in frontmatter when provided.
- Happy path: `groundTruth: true` field preserved.
- Edge case: invalid stage/label → typed error.
- Integration: NATS publish fires when enabled.

**Verification:** `tuning-store.test.ts` passes; files appear in correct subtree.

---

### Phase 3 — `ENABLE_AUTO_PROMOTION` (thread reply + scheduler + promotion)

- [ ] **Unit 12: Thread-reply auto-trigger**

**Goal:** Detect thread replies that belong to an open needs-review thread and trigger re-enrichment (not the normal queue).

**Requirements:** R10

**Dependencies:** U6 (reuse unclear-handler's enrich+rubric step), U10 (optional — Phase 3 may ship before or after Phase 2 flags)

**Files:**
- Modify: `apps/feedback-pipeline/src/app.ts` (branch in `app.message` handler)
- Create: `apps/feedback-pipeline/src/pipeline/reenrich-handler.ts`
- Create: `apps/feedback-pipeline/tests/pipeline/reenrich-handler.test.ts`

**Approach:**
- In `app.message`, after channel filter: if `message.thread_ts` set AND `findByThreadRootTs(thread_ts)` exists AND status ∈ {pending-clarification, re-enriching}: append the reply to needs-review `messages[]`, call `reenrichHandler.attempt`, return (skip normal enqueue).
- `reenrichHandler.attempt(threadRootTs, newMessage?, deps)`:
  1. Set status → `re-enriching`, append message.
  2. Archive previous enrichment to `history[]` (mark `superseded: true`).
  3. Build synthetic `SlackMessage` combining thread text.
  4. Call enricher → rubric → retry-compare (same as U6).
  5. If pass + categorized: signal `promote` (U14).
  6. If still failing: `updateNeedsReview({ status: 'pending-clarification', lastEnrichmentAt: now, currentSummary, rubricScore })`.
- In-memory `Set<threadRootTs>` prevents concurrent re-enrichment.

**Patterns to follow:** `handleUnclear` signature (U6).

**Test scenarios:**
- Happy path: reply arrives → needs-review exists → enricher resolves → retry-compare categorizes → promotion queued.
- Happy path: reply arrives but no needs-review for thread → normal enqueue runs (no re-enrichment).
- Happy path: status `needs-human-review` → reply is absorbed (appended) but NOT re-enriched (see U15).
- Edge case: concurrent scheduler + thread-reply on same file → second attempt short-circuits on lock.
- Edge case: enrichment resolves but retry-compare still <0.9 → status stays pending-clarification, retryCount incremented, history entry added.
- Error path: promotion fails mid-flight → status stays pending-clarification (not lost).
- Integration: `history[]` contains both first enrichment (superseded) and current enrichment.

**Verification:** `reenrich-handler.test.ts` passes; local smoke with seeded needs-review + synthetic reply updates the file.

---

- [ ] **Unit 13: Scheduled 30-min retry + 24h re-post guard + retry cap**

**Goal:** `setInterval` job that scans `knowledges/needs-review/` every 10 minutes, re-enriches stale entries, enforces retry cap, respects 24h re-post guard.

**Requirements:** R11, R12, R15

**Dependencies:** U12 (reuses `reenrichHandler.attempt`)

**Files:**
- Create: `apps/feedback-pipeline/src/pipeline/reenrich-scheduler.ts`
- Modify: `apps/feedback-pipeline/src/app.ts` (start/stop scheduler in `start()` / `shutdown()`)
- Create: `apps/feedback-pipeline/tests/pipeline/reenrich-scheduler.test.ts`

**Approach:**
- `runScheduledReenrichment(deps, nowFn)`:
  1. `listNeedsReview({skipHumanReview: true})`.
  2. Filter: `status === 'pending-clarification' && now - lastEnrichmentAt > 30min`.
  3. For each: if `retryCount >= 3` → set `status: 'needs-human-review'`, emit `feedback.needs_human_review`, skip.
  4. Else call `reenrichHandler.attempt(threadRootTs, undefined, deps)` (no new message, enricher re-reads thread via Slack).
  5. If still failing AND `now - lastEscalationPostedAt > 24h`: re-post escalation (reuses U8 post-with-buttons), update `lastEscalationPostedAt`.
- Env: `REENRICH_INTERVAL_MS=600000`, `REENRICH_STALENESS_MS=1800000`, `ESCALATION_REPOST_MS=86400000`, `NEEDS_REVIEW_MAX_RETRIES=3`.

**Execution note:** test-first with injected `nowFn`.

**Patterns to follow:** `pollTimer` setInterval lifecycle in `app.ts`.

**Test scenarios:**
- Happy path: entry older than 30min + retryCount=1 → re-enrich called.
- Happy path: entry within 30min → skipped.
- Happy path: retryCount=3 → flipped to `needs-human-review`, no enrichment.
- Happy path: still-unresolved after re-enrich + last escalation 25h ago → re-posted.
- Edge case: last escalation 23h ago → enrich runs, no re-post.
- Edge case: status `needs-human-review` → always skipped.
- Error path: single entry throws → loop continues; other entries processed.
- Error path: Slack re-post fails → `lastEscalationPostedAt` NOT advanced; retries next tick.
- Integration: scheduler starts with `start()`, clears on `shutdown()`.

**Verification:** `reenrich-scheduler.test.ts` passes.

---

- [ ] **Unit 14: Promotion — write raw-feedback + delete needs-review + ingest + promotion ack**

**Goal:** When re-enrichment resolves a needs-review item, promote atomically-ish: write raw-feedback with accumulated messages, run compare + ingest, delete needs-review, post ✅ promotion ack with buttons.

**Requirements:** R13, R14

**Dependencies:** U2, U3, U6, U8

**Files:**
- Create: `apps/feedback-pipeline/src/pipeline/promotion-handler.ts`
- Create: `apps/feedback-pipeline/tests/pipeline/promotion-handler.test.ts`

**Approach:**
- `promote(threadRootTs, enrichedContext, categoryDecision, deps)`:
  1. Read needs-review entry.
  2. Construct synthetic `SlackMessage` with accumulated text (first message text + any reply bodies joined).
  3. Call `ingest(message, comparison, vaultRoot)` — ingestion now uses per-thread writeRawFeedback (U2) so all `messages[]` are persisted.
  4. Delete needs-review file via `deleteNeedsReview`.
  5. Post ✅ promotion ack text matching R14: "Got it — <enrichedContext>. Filing this as feedback now." with Approve/Reject buttons (U8).
  6. Emit `feedback.promoted`.
- Idempotency: if raw-feedback file already exists (crash-recovery), `writeRawFeedback` merges instead of failing.

**Test scenarios:**
- Happy path: writes raw-feedback file with full `messages[]` from needs-review, deletes needs-review, posts ack with correct text.
- Happy path: promotion ack text exactly matches R14 format.
- Edge case: crash after writeRawFeedback before delete → next promotion retries deletion (idempotent), no duplicate raw-feedback.
- Error path: delete fails → raw-feedback already written; logs warning; retries on next trigger via findByThreadRootTs.
- Integration: category evidence count incremented (ingest path), insight file updated.

**Verification:** `promotion-handler.test.ts` passes.

---

- [ ] **Unit 15: Needs-human-review absorb mode**

**Goal:** When thread receives more messages after status becomes `needs-human-review`, absorb them (append to needs-review `messages[]`) and post a minimal "📥 received" ack — no re-enrichment, no clarifying questions.

**Requirements:** R16

**Dependencies:** U12, U14

**Files:**
- Modify: `apps/feedback-pipeline/src/pipeline/reenrich-handler.ts` (absorb branch)
- Modify: `apps/feedback-pipeline/src/app.ts` (ack formatting)
- Modify: `apps/feedback-pipeline/tests/pipeline/reenrich-handler.test.ts`

**Approach:**
- In `reenrichHandler.attempt`, if `status === 'needs-human-review'`: append message only, post "📥 received" ack, return early (no enricher call, no rubric).
- Absorbed messages build historical training signal for curators/eval harness.

**Test scenarios:**
- Happy path: reply arrives, status=needs-human-review → message appended, minimal ack posted, no enricher call.
- Happy path: multiple absorbed messages all appended in order.
- Edge case: curator manually edits file back to `pending-clarification` in vault → next reply triggers normal re-enrichment.
- Integration: `feedback.needs_human_review` emitted once; subsequent absorbs do NOT re-emit.

**Verification:** `reenrich-handler.test.ts` extended case passes.

---

## System-Wide Impact

- **Interaction graph:** touches `app.message`, `app.action`, `app.view` Slack handlers; `processQueue`; `ingest`; `writeRawFeedback`; NATS publish; Slack `postMessage` + `views.open`; qmd collection config. The `@mention` handler (`processMessageInteractive`) is intentionally unchanged — remains synchronous best-effort.
- **Error propagation:** needs-review writes are atomic + NATS fire-and-forget. Enricher/rubric exceptions caught → needs-review with fallback summary. Slack API failures never block vault writes. Button action handler acks immediately even if downstream routing fails (Bolt 3s ack requirement).
- **State lifecycle risks:** promotion = write-then-delete (not transactional). Idempotency via `findByThreadRootTs` + deterministic per-thread raw-feedback filename (overwrite-safe through append semantics). Concurrent scheduler+thread-reply guarded by in-memory lock. In-memory lock vulnerability if Railway runs >1 instance — noted in deferred questions.
- **API surface parity:** qmd collections reconfigured (`search-store.ts`). User-facing search continues to hit only `customer-insights`; `raw-feedback` + `tuning-examples` are enricher/eval-only; `needs-review` is NOT indexed at all.
- **Integration coverage:** end-to-end tests in `tests/e2e/pipeline-flow.test.ts` must cover: feedback<0.9 → enrich → rubric-pass → ingest; feedback<0.9 → enrich → rubric-fail → needs-review → button-approve → promote; reply-triggered re-enrichment; scheduler catches missed replies; 3-retry cap → needs-human-review → absorb.
- **Unchanged invariants:** `customer-insights/` insight schema, `category-index.json` format, insight-level `rawFeedbackRef` provenance field (still resolvable because per-thread filename is deterministic), `@mention` handler behavior, batch bootstrap flow, SQLite `message_queue` schema, comparator output shape (single result — multi-insight deferred).
- **Feature-flag safety:** each phase guarded by its flag; flags off → original behavior preserved. Safe rollback by toggling env var.
- **Cost exposure:** enricher runs on every `<0.9` message, then rubric evaluator runs, then retry-compare if rubric passes. Upper bound = 3 LLM calls per unclear message. R31 cost-per-message metric gates against runaway.
- **Concurrency:** in-memory lock Set<threadRootTs> prevents double-enrichment. File writes are atomic.

## Risks & Dependencies

| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Enricher cost blows up on verbose/noisy channels | Med | Med | Feature flag 1 landed silent first; measure per-message cost + resolution rate before landing flag 2. R31 budget guardrail. |
| Rubric 3/4 + 0.7 thresholds too strict → every msg stuck | Med | High | YAML config hot-reloadable; tune during Phase 1 rollout. Track `feedback.rubric_passed` rate from day 1. |
| Button `value` JSON hits Slack's 2000-char limit | Low | Med | Deterministic short `outputId` hash (8 chars); `threadRootTs` is ~20 chars; total well under limit. |
| Double-promotion on scheduler+reply race | Low | Med | `Set<threadRootTs>` in-memory lock; promotion checks `findByThreadRootTs` before acting. |
| Multi-instance deploy breaks in-memory lock | Low (today) | Med | Railway currently single-instance; documented as deferred to durability plan. |
| Promotion crash between write-raw-feedback and delete-needs-review | Low | Low | Deterministic filename + append-semantics makes re-run idempotent. |
| Approve-with-note "one extra pass" introduces unbounded loops | Low | Med | Hard-coded single-pass guard; no recursion; counts as one retry. |
| qmd ignore-pattern regression makes needs-review searchable | Low | High | Needs-review at TOP level (outside raw-feedback entirely); `ignore` is defense-in-depth. Integration test asserts. |
| Customer profile store missing at launch | High | Low | Best-effort read; `customerName: null` never blocks promotion. |
| Linear cache feature slips | High | Low | Toolshed launches with Linear no-op; enricher still functional via thread+adjacent+vault+customer tools. |
| Slack Bolt action handler 3s ack timeout | Med | High | Handler calls `ack()` immediately, does routing async. Standard Bolt pattern. |
| Legacy per-message raw-feedback files split provenance | High | Low | Accepted; "no backfill" decision. Old files remain searchable. |
| `needs-human-review` items lost without a UI | Med | Med | Files persist in vault, grep-able; Review Queue is next plan. Status transition emits event to journal. |

## Phased Delivery

### Phase 1: Silent Enrichment (flag `ENABLE_ENRICHMENT_ON_UNCLEAR`)
**Units 1–7.** Enricher runs on <0.9 confidence, rubric evaluates, retry-compare runs, needs-review files written silently. No Slack interaction. Goal: validate resolution rate + rubric calibration before adding user-facing noise.

**Ready criteria:** `feedback.rubric_passed` rate ≥ 50% in production; enricher cost per unclear message < $0.10; zero lost messages (every unclear writes needs-review or ingests).

### Phase 2: Clarification Buttons (flag `ENABLE_CLARIFICATION_POST`)
**Units 8–11.** Bot posts clarification summaries + Approve/Reject buttons on every pipeline output. Curator can hand-tune via buttons. Good/bad examples written to `tuning-examples` collection.

**Ready criteria:** Button click latency < 2s; modal open success rate > 99%; example writes validated.

### Phase 3: Auto-Promotion Loop (flag `ENABLE_AUTO_PROMOTION`)
**Units 12–15.** Thread replies and Approve-clicks promote needs-review to raw-feedback. Scheduler catches missed replies. 3-retry cap flips to `needs-human-review` with absorb mode.

**Ready criteria:** Promotion end-to-end < 5s from button click; scheduler processes backlog in under 5 min; no duplicate raw-feedback files created.

### Follow-on plans (NOT covered here)
- **Multi-insight splitting** (R25, R26): comparator refactor to return `ComparisonResult[]`; insight-level re-comparison on new thread messages.
- **Review Queue dashboard** (R32–R38): feedback-ui page listing all pipeline decisions, flag UI, threshold/rubric/taxonomy editing, audit trail, backfill scopes.
- **AI eval harness** (R39–R42): consumes `tuning-examples`, runs on-demand + nightly + CI.
- **Durability layer** (R46, R47, R52, R53): checkpointing table, transactional outbox, DLQ.
- **Two-phase vault search** (R48): qmd coarse → in-memory fine filter.
- **Linear cache sync** (R28): batch job populating `knowledges/raw-data/linear/`.
- **Customer profile backfill** (R27): Stripe/Attio/Supabase/Fillout/Fathom sync into `knowledges/customers/`.

## Success Metrics

Track from Phase 1 onward via `EventJournal` + `/admin/logs` + `/health` endpoint:

- **Resolution rate:** % unclear resolved by enricher alone, % resolved after clarification, % stuck in `needs-human-review`. Target ≥70% resolved before needs-human-review (origin doc).
- **Cost per message:** LLM spend per enrichment run, per rubric eval, per retry-compare, per promoted item. Alert threshold: > $0.50 per promoted item.
- **Time-to-promotion:** median time from message-received to promotion for needs-review items. Target ≤ 30 min (scheduled retry cycle) or seconds (thread-reply trigger).
- **Data completeness:** % threads captured (no drops) vs baseline. Target 100%.
- **Rubric attribute fill rates:** per-attribute %, detect which attributes are hardest to infer (signal for enricher tool improvements).

## Documentation Plan

- Update `apps/feedback-pipeline/README.md` with new env vars, feature flags, vault layout.
- Add `apps/feedback-pipeline/config/README.md` describing `clarity-rubric.yaml` + `enricher-tools.yaml` config surfaces.
- Add `knowledges/README.md` (or extend) documenting directory-scoped `_rules.md` convention.
- Document the promotion + absorb workflow in `docs/solutions/best-practices/` after Phase 3 hits prod.
- Update `/health` endpoint docs to include `pendingNeedsReview`, `needsHumanReview` counts.

## Operational / Rollout Notes

- **Flag rollout sequence** (matches R29): enable `ENABLE_ENRICHMENT_ON_UNCLEAR` first, observe for ≥3 days. Then `ENABLE_CLARIFICATION_POST`. Then `ENABLE_AUTO_PROMOTION`.
- **Observability:** add Railway logs filters for `feedback.rubric_passed`, `feedback.rubric_failed`, `feedback.needs_human_review`, `feedback.promoted`.
- **Backoff strategy:** if enricher cost exceeds budget, flip `ENABLE_ENRICHMENT_ON_UNCLEAR=false` — safe degrade to silent drop (original behavior).
- **Manual triage path:** curator edits a `knowledges/needs-review/*.md` file directly (status → `pending-clarification` + bump retryCount to 0) to re-enter the loop, until Review Queue lands.
- **NATS sync:** verify `packages/nats-sync/` accepts `knowledges/needs-review/**` and `knowledges/tuning/**` paths.

## Sources & References

- **Origin document:** [docs/brainstorms/2026-04-04-unclear-feedback-handling-requirements.md](../brainstorms/2026-04-04-unclear-feedback-handling-requirements.md)
- **Diagrams:** `docs/plans/subplans/unclear-feedback-diagrams/` (excalidraw files, supplementary)
- Related code:
  - `apps/feedback-pipeline/src/app.ts`
  - `apps/feedback-pipeline/src/pipeline/{classifier,comparator,enricher,ingestion}.ts`
  - `apps/feedback-pipeline/src/knowledge/{format,knowledge-store,types}.ts`
  - `apps/feedback-pipeline/src/lib/{slack-service,escalation,state,event-journal}.ts`
  - `apps/feedback-ui/lib/search-store.ts` (qmd indexing confirmation)
- Related institutional learnings:
  - `docs/solutions/best-practices/qmd-sdk-hybrid-search-integration-2026-04-03.md`
  - `docs/solutions/best-practices/ai-eval-promptfoo-feedback-pipeline-2026-04-01.md`
- Research prior-art (see origin doc Research Synthesis):
  - Sierra Agent Harness / Ghostwriter / ADP
  - Letta four-layer memory
  - Stripe Minions Part 1 & Part 2 (Blueprints, Toolshed, directory-scoped rules)
  - Slope durable workflows + operational ownership + feedback pipeline
  - Mintlify ChromaFs two-phase retrieval
  - Cognition Devin Session Insights
  - HAL graduated access
  - Loop Temporal + outbox/DLQ patterns
