FABBI CTO INTELLIGENCE

Technical Intelligence Brief — Agentic Coding / Harness Engineering

2026-05-29 20:38 · Gate: QUALITY_GATE_PARTIAL · 119 candidates · Social-first + GitHub + paper/product

119
candidates scanned
40
HN/dev_web
59
GitHub-linked signals
0
social signals captured
78%
confidence adjusted

1. Executive Snapshot

  • Harness engineering chuyển từ buzzword sang discipline: 3 nguồn độc lập (O'Reilly/HN/arXiv) + 119 signals → Fabbi cần test harness riêng cho coding agents, không chỉ prompt.
  • Benchmark pressure tăng: SWE-bench/Terminal-Bench xuất hiện trong ≥7 signals; Mini-SWE-agent claim 74%, token-agent claim 97% → cần benchmark nội bộ chống overfit.
  • Security agent gap rõ: CWE-Bench 28 CVE comparison + AgentToolBench-Code → SYNCA cần policy/sandbox trước khi rollout agent write-code.
  • Context/code graph là lớp mới: Rig, Repowise, local techdocs, codebase intelligence ≥5 signals → FARE có cơ hội thành memory/code-understanding layer.
  • OSS/CLI fragmentation: Claude Code/Codex/Cursor/Gemini/OpenCode launchers xuất hiện ≥4 signals → NEXA nên chuẩn hoá adapter/harness, không lock-in 1 IDE.

2. KOL/OG Feed Watch

PlatformAuthorTimeEngagementURLSignal
HNpeterneyra2026-05-29T01:18:58Z2 pts/0 cmtlinkDis Dat – Loom for AI coding agents
HNaanet2026-05-28T22:46:14Z1 pts/0 cmtlinkClawd-on-Desk: a pixel desktop pet watching your AI coding agents
HNSVI2026-05-28T21:03:24Z30 pts/25 cmtlinkProtestware for Coding Agents
HNakashi_dev2026-05-28T20:44:37Z2 pts/0 cmtlinkShow HN: Rig – Local-first code graph for coding agents, in one npx command
HNvbutsomesayw2026-05-27T04:01:44Z3 pts/0 cmtlinkBill Gates AI on AI (one month later)
HNzameermfm2026-04-16T02:33:36Z2 pts/4 cmtlinkAsk HN: We dont need a programming language now?
HNwolfsir2026-04-06T10:52:09Z2 pts/1 cmtlinkShow HN: I built a self-writing book on agentic coding
HNcyrusradfar2026-04-01T18:32:05Z59 pts/31 cmtlinkFunctional programming accelerates agentic feature development
HNcobblr_mosaic2026-05-26T17:38:55Z3 pts/0 cmtlinkAgentic Harness Engineering
HNramayac2026-05-20T04:31:50Z2 pts/0 cmtlinkShow HN: GoPOSIX – a Go-native POSIX userland, ~97% BusyBox-compatible
HNredbell2026-05-18T12:17:04Z159 pts/17 cmtlinkLearn Harness Engineering
HNGarbage2026-05-16T04:59:11Z3 pts/0 cmtlinkAgent Harness Engineering
HNgeopsist2026-05-28T12:39:46Z5 pts/1 cmtlinkWe Benchmarked Claude Code, Codex, Semgrep, CodeQL, Trent on 28 CWE-Bench CVEs
HNfittingopposite2026-05-28T05:05:59Z2 pts/0 cmtlinkMini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python code
HNkimjune012026-05-24T18:03:28Z2 pts/0 cmtlinkShow HN: 97% on SWE-bench Verified with subscription-token agents
HNSushrutkm2026-05-19T10:02:03Z2 pts/0 cmtlinkBito's AI Architect Boosts Claude Opus's task success rate by 35%
HNneversettles2026-05-03T03:40:04Z1 pts/2 cmtlinkThe Terminal Bench 3.0 community is looking for task contributors
HNgk12026-04-29T18:16:23Z4 pts/0 cmtlinkForgeCode: Top open source coding agent in Terminal-Bench 2.0

3. CTO Evaluation Matrix

SignalThesisEvidenceCounter-signalFabbi implicationConfidenceDecisionNext validation
Harness EngineeringAgent SDLC cần harness như CI/CD3+ sources; 119 candidatesFacebook/X partialNEXA+SYNCA baseline82%trial10 repo pilot/2 tuần
Benchmark inflationPublic SWE-bench dễ overfit74%, 97%, 35% uplift claimsclaims chưa chuẩn hoá costFabbi cần private eval76%trial30 internal issues
Code graph/memoryContext layer quyết định agent success≥5 codebase-intel signalsrepo maturity mixedFARE roadmap74%watch/trialmeasure retrieval hit-rate
Security/sandboxAgent write-code creates CVE/regression risk28 CWE-Bench CVEstool results varySYNCA governance module80%adopt guardrailspolicy test suite

6. Impact Coverage

DomainNow 0-2wNext 1-2mLater 3-6mDecision
FARETest code graph over 10 reposMemory quality scoreCodebase copilot layertrial
NEXAAdapter for Claude/Codex/OpenCodePrivate SWE harness 30 tasksAgent orchestration runtimeadopt
SYNCASandbox + diff risk gateCWE regression suiteEnterprise agent governanceadopt
DOMUSMonitor low direct impactUse for internal automationAgent ops templatesmonitor
Japan/VN/GlobalJP: governance-first; VN: productivity pilot; Global: benchmark arms race2-4 customer pilotsPackaged AI-SDLC offeringtrial

7. CTO Recommendations

DO THIS WEEK
  1. Build private 30-task coding-agent eval. ROI 15-25%, risk 2/5, owner AI Platform Lead, TTV 2 tuần, validate pass@1+cost.
  2. Add SYNCA sandbox/diff-risk gate. ROI 10-20%, risk 2/5, owner Security Lead, TTV 1 tuần, validate CWE regression.
  3. Pilot FARE code graph on 10 repos. ROI 12-18%, risk 3/5, owner Tech Lead, TTV 2 tuần, validate retrieval hit-rate.
WATCH 2-4 WEEKS
  1. Track Mini-SWE/Terminal-Bench claims; ROI 5-10%, risk 3/5, owner R&D, validate reproducibility.
IGNORE LOW SIGNAL
  1. UI/pet wrappers with <3 pts/no enterprise metric; ROI <3%, risk 1/5.

8. Paper / Benchmark / Product Watch

Agentic Harness Engineering, SWE-bench, Terminal-Bench 3.0, CWE-Bench, Claude Code/Codex/Cursor/OpenCode/Gemini CLI all cited in appendix. Product move: standardize adapter+telemetry before wide rollout.

Source Appendix

S01 HN · Dis Dat – Loom for AI coding agents
peterneyra · 2026-05-29T01:18:58Z · 2 pts/0 cmt

dev discussion

S02 HN · Clawd-on-Desk: a pixel desktop pet watching your AI coding agents
aanet · 2026-05-28T22:46:14Z · 1 pts/0 cmt

dev discussion

S03 HN · Protestware for Coding Agents
SVI · 2026-05-28T21:03:24Z · 30 pts/25 cmt

dev discussion

S04 HN · Show HN: Rig – Local-first code graph for coding agents, in one npx command
akashi_dev · 2026-05-28T20:44:37Z · 2 pts/0 cmt

dev discussion

S05 HN · Bill Gates AI on AI (one month later)
vbutsomesayw · 2026-05-27T04:01:44Z · 3 pts/0 cmt

dev discussion

S06 HN · Ask HN: We dont need a programming language now?
zameermfm · 2026-04-16T02:33:36Z · 2 pts/4 cmt

dev discussion

S07 HN · Show HN: I built a self-writing book on agentic coding
wolfsir · 2026-04-06T10:52:09Z · 2 pts/1 cmt

dev discussion

S08 HN · Functional programming accelerates agentic feature development
cyrusradfar · 2026-04-01T18:32:05Z · 59 pts/31 cmt

dev discussion

S09 HN · Agentic Harness Engineering
cobblr_mosaic · 2026-05-26T17:38:55Z · 3 pts/0 cmt

dev discussion

S10 HN · Show HN: GoPOSIX – a Go-native POSIX userland, ~97% BusyBox-compatible
ramayac · 2026-05-20T04:31:50Z · 2 pts/0 cmt

dev discussion

S11 HN · Learn Harness Engineering
redbell · 2026-05-18T12:17:04Z · 159 pts/17 cmt

dev discussion

S12 HN · Agent Harness Engineering
Garbage · 2026-05-16T04:59:11Z · 3 pts/0 cmt

dev discussion

S13 HN · We Benchmarked Claude Code, Codex, Semgrep, CodeQL, Trent on 28 CWE-Bench CVEs
geopsist · 2026-05-28T12:39:46Z · 5 pts/1 cmt

dev discussion

S14 HN · Mini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python code
fittingopposite · 2026-05-28T05:05:59Z · 2 pts/0 cmt

dev discussion

S15 HN · Show HN: 97% on SWE-bench Verified with subscription-token agents
kimjune01 · 2026-05-24T18:03:28Z · 2 pts/0 cmt

dev discussion

S16 HN · Bito's AI Architect Boosts Claude Opus's task success rate by 35%
Sushrutkm · 2026-05-19T10:02:03Z · 2 pts/0 cmt

dev discussion

S17 HN · The Terminal Bench 3.0 community is looking for task contributors
neversettles · 2026-05-03T03:40:04Z · 1 pts/2 cmt

dev discussion

S18 HN · ForgeCode: Top open source coding agent in Terminal-Bench 2.0
gk1 · 2026-04-29T18:16:23Z · 4 pts/0 cmt

dev discussion

S19 HN · Open-weight 27B hits 38% on Terminal-Bench 2.0 (Opus 4.1 hit 38% in Aug 2025)
ubermon · 2026-04-28T19:11:57Z · 6 pts/9 cmt

dev discussion

S20 HN · Show HN: OSS Agent I built topped the TerminalBench on Gemini-3-flash-preview
GodelNumbering · 2026-04-27T12:35:55Z · 393 pts/148 cmt

dev discussion

S21 HN · Claude Code – Everything You Can Configure That the Docs Don't Tell You
ankitg12 · 2026-05-29T02:13:20Z · 2 pts/0 cmt

dev discussion

S22 HN · Claude Opus 4.8: "a modest but tangible improvement"
GavinAnderegg · 2026-05-29T00:51:35Z · 1 pts/1 cmt

dev discussion

S23 HN · Ask HN: About Claude Code's New Feature: Dynamic Workflows
sofumel · 2026-05-29T00:51:18Z · 1 pts/0 cmt

dev discussion

S24 HN · Show HN: Daily vibe-coding video games, day 45: "Elementary"
pzxc · 2026-05-29T00:24:53Z · 1 pts/0 cmt

dev discussion

S25 HN · Show HN: Free open source coding models in Slack
ramonga · 2026-05-28T16:11:13Z · 3 pts/0 cmt

dev discussion

S26 HN · First thing you see when Googling "OpenAI Codex app" is a fake malware website
vashchylau · 2026-05-28T13:49:02Z · 3 pts/0 cmt

dev discussion

S27 HN · Show HN: AG2B – Run the agent loop in the browser, expose your tools via WebMCP
notmedia · 2026-05-28T12:44:45Z · 2 pts/0 cmt

dev discussion

S28 HN · Building self-improving tax agents with Codex
dnw · 2026-05-27T15:48:40Z · 2 pts/0 cmt

dev discussion

S29 HN · Show HN: AI Skill to port PostgreSQL extensions to MySQL
deesix · 2026-05-28T15:18:45Z · 3 pts/0 cmt

dev discussion

S30 HN · Show HN: Multiplayer, a debugging agent to run locally next to your coding agent
tomjohnson3 · 2026-05-28T14:16:13Z · 6 pts/1 cmt

dev discussion

S31 HN · Windows computer-use: synthetic cursors for background agents
frabonacci · 2026-05-27T18:48:20Z · 3 pts/0 cmt

dev discussion

S32 HN · Show HN: GridPath – Faster and Better Agent for Spreadsheets (Tauri, Rust)
pixelmash13 · 2026-05-27T15:14:11Z · 1 pts/0 cmt

dev discussion

S33 HN · Elixir and Phoenix Context for Claude Code, Codex, OpenCode and Pi
jamiecurle · 2026-05-28T17:45:29Z · 4 pts/0 cmt

dev discussion

S34 HN · Manage Claude Code, Codex, OpenCode, Gemini CLI sessions in one terminal view
hoangnnguyen · 2026-05-28T14:15:53Z · 2 pts/0 cmt

dev discussion

S35 HN · OpenPowerlifting – creating a public-domain archive of powerlifting history
andre9317 · 2026-05-28T05:35:46Z · 3 pts/0 cmt

dev discussion

S36 HN · Building OpenCode with Dax Raad [video]
kesor · 2026-05-27T16:55:05Z · 1 pts/0 cmt

dev discussion

S37 HN · Superpowers: An Agentic Skills Framework for AI Coding Workflows
v-mdev · 2026-05-28T09:20:45Z · 2 pts/0 cmt

dev discussion

S38 HN · Show HN: VAEN – Package and import portable AI coding-agent Harnesses
sjhalani7 · 2026-05-27T20:52:31Z · 8 pts/3 cmt

dev discussion

S39 HN · Show HN: I built a tool to auto-accept AI slop and bigtech devs loves it
alcray · 2026-05-26T20:45:37Z · 17 pts/2 cmt

dev discussion

S40 HN · Show HN: Chunk sidecars for validating agent-generated code before pushing to CI
olafmol · 2026-05-26T15:41:32Z · 1 pts/2 cmt

dev discussion

S41 GitHub · jazzyalex/agent-sessions
jazzyalex · 2026-05-29T03:49:21Z · 588 stars/36 forks/1 issues

repo momentum

S42 GitHub · jarrodwatts/claude-hud
jarrodwatts · 2026-05-29T03:49:07Z · 23981 stars/1079 forks/14 issues

repo momentum

S43 GitHub · shareAI-lab/learn-claude-code
shareAI-lab · 2026-05-29T03:49:23Z · 63362 stars/10358 forks/107 issues

repo momentum

S44 GitHub · colbymchenry/codegraph
colbymchenry · 2026-05-29T03:49:24Z · 32054 stars/1900 forks/202 issues

repo momentum

S45 GitHub · esengine/DeepSeek-Reasonix
esengine · 2026-05-29T03:47:58Z · 12993 stars/755 forks/315 issues

repo momentum

S46 GitHub · vercel-labs/zerolang
vercel-labs · 2026-05-29T03:48:20Z · 4667 stars/298 forks/123 issues

repo momentum

S47 GitHub · mackerelio/mackerel-agent
mackerelio · 2026-05-29T02:15:02Z · 428 stars/90 forks/4 issues

repo momentum

S48 GitHub · superradcompany/microsandbox
superradcompany · 2026-05-29T02:35:21Z · 6346 stars/308 forks/50 issues

repo momentum

S49 GitHub · mochilang/mochi
mochilang · 2026-05-28T22:38:11Z · 328 stars/14 forks/51 issues

repo momentum

S50 GitHub · barnum-circus/barnum
barnum-circus · 2026-05-28T21:45:06Z · 106 stars/4 forks/3 issues

repo momentum

S51 GitHub · deusyu/harness-engineering
deusyu · 2026-05-29T03:39:34Z · 3245 stars/295 forks/0 issues

repo momentum

S52 GitHub · bgdnvk/clanker
bgdnvk · 2026-05-29T02:32:00Z · 320 stars/17 forks/6 issues

repo momentum

S53 GitHub · ai-hpc/ai-hardware-engineer-roadmap
ai-hpc · 2026-05-28T19:41:24Z · 122 stars/24 forks/0 issues

repo momentum

S54 GitHub · ai-boost/awesome-harness-engineering
ai-boost · 2026-05-29T02:47:29Z · 1273 stars/131 forks/37 issues

repo momentum

S55 GitHub · ModelEngine-Group/nexent
ModelEngine-Group · 2026-05-29T00:59:58Z · 4743 stars/612 forks/234 issues

repo momentum

S56 GitHub · Human-Agent-Society/CORAL
Human-Agent-Society · 2026-05-29T03:43:15Z · 676 stars/90 forks/8 issues

repo momentum

S57 GitHub · sipyourdrink-ltd/bernstein
sipyourdrink-ltd · 2026-05-29T03:38:17Z · 503 stars/41 forks/18 issues

repo momentum

S58 GitHub · hwfengcs/DM-Code-Agent
hwfengcs · 2026-05-28T14:46:55Z · 138 stars/12 forks/2 issues

repo momentum

S59 GitHub · smallcloudai/refact
smallcloudai · 2026-05-28T23:46:15Z · 3551 stars/314 forks/0 issues

repo momentum

S60 GitHub · china-qijizhifeng/agentic-harness-engineering
china-qijizhifeng · 2026-05-29T03:00:37Z · 466 stars/54 forks/0 issues

repo momentum

Data Quality / Scan Health Appendix

Scanned 119 candidates. Counts: {'HN': 40, 'GitHub': 45, 'GitHub_ERR': 1, 'arXiv_ERR': 5, 'Reddit_ERR': 10, 'Product': 10, 'X_ATTEMPT': 5, 'FacebookPublic_ATTEMPT': 3}. X/FB public metrics partial/blocked by public access limits; confidence -8pp. Major linked claims use direct URLs; unavailable engagement marked N/A/low-confidence.