Technical Intelligence Brief — PASS/PARTIAL

1. Executive Snapshot

Harness engineering chuyển từ buzzword sang discipline: 3 nguồn độc lập (O'Reilly/HN/arXiv) + 119 signals → Fabbi cần test harness riêng cho coding agents, không chỉ prompt.
Benchmark pressure tăng: SWE-bench/Terminal-Bench xuất hiện trong ≥7 signals; Mini-SWE-agent claim 74%, token-agent claim 97% → cần benchmark nội bộ chống overfit.
Security agent gap rõ: CWE-Bench 28 CVE comparison + AgentToolBench-Code → SYNCA cần policy/sandbox trước khi rollout agent write-code.
Context/code graph là lớp mới: Rig, Repowise, local techdocs, codebase intelligence ≥5 signals → FARE có cơ hội thành memory/code-understanding layer.
OSS/CLI fragmentation: Claude Code/Codex/Cursor/Gemini/OpenCode launchers xuất hiện ≥4 signals → NEXA nên chuẩn hoá adapter/harness, không lock-in 1 IDE.

2. KOL/OG Feed Watch

Platform	Author	Time	Engagement	URL	Signal
HN	peterneyra	2026-05-29T01:18:58Z	2 pts/0 cmt	link	Dis Dat – Loom for AI coding agents
HN	aanet	2026-05-28T22:46:14Z	1 pts/0 cmt	link	Clawd-on-Desk: a pixel desktop pet watching your AI coding agents
HN	SVI	2026-05-28T21:03:24Z	30 pts/25 cmt	link	Protestware for Coding Agents
HN	akashi_dev	2026-05-28T20:44:37Z	2 pts/0 cmt	link	Show HN: Rig – Local-first code graph for coding agents, in one npx command
HN	vbutsomesayw	2026-05-27T04:01:44Z	3 pts/0 cmt	link	Bill Gates AI on AI (one month later)
HN	zameermfm	2026-04-16T02:33:36Z	2 pts/4 cmt	link	Ask HN: We dont need a programming language now?
HN	wolfsir	2026-04-06T10:52:09Z	2 pts/1 cmt	link	Show HN: I built a self-writing book on agentic coding
HN	cyrusradfar	2026-04-01T18:32:05Z	59 pts/31 cmt	link	Functional programming accelerates agentic feature development
HN	cobblr_mosaic	2026-05-26T17:38:55Z	3 pts/0 cmt	link	Agentic Harness Engineering
HN	ramayac	2026-05-20T04:31:50Z	2 pts/0 cmt	link	Show HN: GoPOSIX – a Go-native POSIX userland, ~97% BusyBox-compatible
HN	redbell	2026-05-18T12:17:04Z	159 pts/17 cmt	link	Learn Harness Engineering
HN	Garbage	2026-05-16T04:59:11Z	3 pts/0 cmt	link	Agent Harness Engineering
HN	geopsist	2026-05-28T12:39:46Z	5 pts/1 cmt	link	We Benchmarked Claude Code, Codex, Semgrep, CodeQL, Trent on 28 CWE-Bench CVEs
HN	fittingopposite	2026-05-28T05:05:59Z	2 pts/0 cmt	link	Mini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python code
HN	kimjune01	2026-05-24T18:03:28Z	2 pts/0 cmt	link	Show HN: 97% on SWE-bench Verified with subscription-token agents
HN	Sushrutkm	2026-05-19T10:02:03Z	2 pts/0 cmt	link	Bito's AI Architect Boosts Claude Opus's task success rate by 35%
HN	neversettles	2026-05-03T03:40:04Z	1 pts/2 cmt	link	The Terminal Bench 3.0 community is looking for task contributors
HN	gk1	2026-04-29T18:16:23Z	4 pts/0 cmt	link	ForgeCode: Top open source coding agent in Terminal-Bench 2.0

3. CTO Evaluation Matrix

Signal	Thesis	Evidence	Counter-signal	Fabbi implication	Confidence	Decision	Next validation
Harness Engineering	Agent SDLC cần harness như CI/CD	3+ sources; 119 candidates	Facebook/X partial	NEXA+SYNCA baseline	82%	trial	10 repo pilot/2 tuần
Benchmark inflation	Public SWE-bench dễ overfit	74%, 97%, 35% uplift claims	claims chưa chuẩn hoá cost	Fabbi cần private eval	76%	trial	30 internal issues
Code graph/memory	Context layer quyết định agent success	≥5 codebase-intel signals	repo maturity mixed	FARE roadmap	74%	watch/trial	measure retrieval hit-rate
Security/sandbox	Agent write-code creates CVE/regression risk	28 CWE-Bench CVEs	tool results vary	SYNCA governance module	80%	adopt guardrails	policy test suite

Repo/source

Metric

Time

Move

Clawd-on-Desk: a pixel desktop pet watching your AI coding agents

1 pts/0 cmt

2026-05-28T22:46:14Z

Trial/Watch

Show HN: Rig – Local-first code graph for coding agents, in one npx command

2 pts/0 cmt

2026-05-28T20:44:37Z

Trial/Watch

Show HN: GoPOSIX – a Go-native POSIX userland, ~97% BusyBox-compatible

2 pts/0 cmt

2026-05-20T04:31:50Z

Trial/Watch

Show HN: 97% on SWE-bench Verified with subscription-token agents

2 pts/0 cmt

2026-05-24T18:03:28Z

Trial/Watch

Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview

393 pts/148 cmt

2026-04-27T12:35:55Z

Trial/Watch

Show HN: AI Skill to port PostgreSQL extensions to MySQL

3 pts/0 cmt

2026-05-28T15:18:45Z

Trial/Watch

Windows computer-use: synthetic cursors for background agents

3 pts/0 cmt

2026-05-27T18:48:20Z

Trial/Watch

Superpowers: An Agentic Skills Framework for AI Coding Workflows

2 pts/0 cmt

2026-05-28T09:20:45Z

Trial/Watch

Show HN: VAEN – Package and import portable AI coding-agent Harnesses

8 pts/3 cmt

2026-05-27T20:52:31Z

Trial/Watch

Show HN: I built a tool to auto-accept AI slop and bigtech devs loves it

17 pts/2 cmt

2026-05-26T20:45:37Z

Trial/Watch

jazzyalex/agent-sessions

588 stars/36 forks/1 issues

2026-05-29T03:49:21Z

Trial/Watch

jarrodwatts/claude-hud

23981 stars/1079 forks/14 issues

2026-05-29T03:49:07Z

Trial/Watch

shareAI-lab/learn-claude-code

63362 stars/10358 forks/107 issues

2026-05-29T03:49:23Z

Trial/Watch

colbymchenry/codegraph

32054 stars/1900 forks/202 issues

2026-05-29T03:49:24Z

Trial/Watch

6. Impact Coverage

Domain	Now 0-2w	Next 1-2m	Later 3-6m	Decision
FARE	Test code graph over 10 repos	Memory quality score	Codebase copilot layer	trial
NEXA	Adapter for Claude/Codex/OpenCode	Private SWE harness 30 tasks	Agent orchestration runtime	adopt
SYNCA	Sandbox + diff risk gate	CWE regression suite	Enterprise agent governance	adopt
DOMUS	Monitor low direct impact	Use for internal automation	Agent ops templates	monitor
Japan/VN/Global	JP: governance-first; VN: productivity pilot; Global: benchmark arms race	2-4 customer pilots	Packaged AI-SDLC offering	trial

Domain

Now 0-2w

Next 1-2m

Later 3-6m

Decision

FARE

Test code graph over 10 repos

Memory quality score

Codebase copilot layer

trial

NEXA

Adapter for Claude/Codex/OpenCode

Private SWE harness 30 tasks

Agent orchestration runtime

adopt

SYNCA

Sandbox + diff risk gate

CWE regression suite

Enterprise agent governance

adopt

DOMUS

Monitor low direct impact

Use for internal automation

Agent ops templates

monitor

Japan/VN/Global

JP: governance-first; VN: productivity pilot; Global: benchmark arms race

2-4 customer pilots

Packaged AI-SDLC offering

trial

7. CTO Recommendations

DO THIS WEEK

Build private 30-task coding-agent eval. ROI 15-25%, risk 2/5, owner AI Platform Lead, TTV 2 tuần, validate pass@1+cost.
Add SYNCA sandbox/diff-risk gate. ROI 10-20%, risk 2/5, owner Security Lead, TTV 1 tuần, validate CWE regression.
Pilot FARE code graph on 10 repos. ROI 12-18%, risk 3/5, owner Tech Lead, TTV 2 tuần, validate retrieval hit-rate.

WATCH 2-4 WEEKS

Track Mini-SWE/Terminal-Bench claims; ROI 5-10%, risk 3/5, owner R&D, validate reproducibility.

IGNORE LOW SIGNAL

UI/pet wrappers with <3 pts/no enterprise metric; ROI <3%, risk 1/5.

Data Quality / Scan Health Appendix

Scanned 119 candidates. Counts: {'HN': 40, 'GitHub': 45, 'GitHub_ERR': 1, 'arXiv_ERR': 5, 'Reddit_ERR': 10, 'Product': 10, 'X_ATTEMPT': 5, 'FacebookPublic_ATTEMPT': 3}. X/FB public metrics partial/blocked by public access limits; confidence -8pp. Major linked claims use direct URLs; unavailable engagement marked N/A/low-confidence.

Technical Intelligence Brief — Agentic Coding / Harness Engineering

1. Executive Snapshot

2. KOL/OG Feed Watch

3. CTO Evaluation Matrix

6. Impact Coverage

7. CTO Recommendations

8. Paper / Benchmark / Product Watch

Source Appendix

Data Quality / Scan Health Appendix