GolemBot Architecture Design
Why "GolemBot"? A Golem is a body shaped from clay in legend — write a true name (shem) on it and it comes to life. Our project works the same way: inject a Coding Agent as the soul into a directory, and it becomes your AI assistant.
This document is the core architecture reference for the GolemBot project. All implementations should stay consistent with this document. When implementation conflicts with the architecture, or changes and optimizations are needed, this document must be updated first, before modifying code.
1. Product Positioning
GolemBot is a local-first personal AI assistant — it uses the Coding Agent you already have as its brain, enabling it to not just chat, but actually get things done.
The use case aligns with OpenClaw (personal assistant, multi-channel access, skill extensions), with the core difference being the brain:
- OpenClaw: directly calls LLM APIs + custom tool system → the brain is a stateless API
- GolemBot: uses Coding Agent CLI as the engine → the brain is a stateful Coding Agent that lives inside a directory
This leads to a key experience difference: OpenClaw's assistant is global, while GolemBot's assistant is directory-bound. Because a Coding Agent naturally works within a directory — skills, scripts, memory, and work artifacts all live inside it.
Unique advantages:
- More capable: Coding Agents can natively read/write files, run code, operate browsers, and perform multi-step reasoning
- Choice of engine: Cursor / Claude Code / Codex — use whichever you prefer
- Fully transparent: just
lsthe assistant directory to see what it knows, what it can do, and what it has done - Version-controllable: the entire assistant can be managed, shared, and cloned via git
2. Core Philosophy
Coding Agent = Soul, GolemBot = Body of Clay.
GolemBot does not implement LLM reasoning, tool invocation, or context management. It only does three things:
- Prepare the workspace (inject skills)
- Invoke the Coding Agent CLI (pass in the user's message)
- Return the response to the user
All the complex work (decision-making, planning, execution) is delegated to the Coding Agent.
Library-First, CLI Is Just a Thin Shell
GolemBot's core is an importable TypeScript library, not a CLI tool. The CLI is just one consumer of this library.
golembot core library (index.ts)
createAssistant() → Assistant
/ | \ \
CLI thin library HTTP Bot
shell import wrapper wrapper
(cli.ts) (3rd-party) (3rd-party) (3rd-party)All invocation methods share the same core logic with zero duplication. This makes GolemBot easy to embed in any scenario, serving as the core engine for Agent solutions across various domains.
3. Two Core Concepts
The entire framework has only two concepts — no more:
1. Assistant Directory
A directory is an assistant (a GolemBot). Directory structure:
~/my-assistant/
├── golem.yaml # Assistant config (only engine type and name)
├── skills/ # Skills directory — whatever is here gets loaded
│ ├── general/
│ │ └── SKILL.md # General assistant skill
│ └── ops-xhs/
│ ├── SKILL.md # Xiaohongshu operations skill
│ ├── xhs.py # Script bundled with the skill
│ └── brand-voice.md # Supporting knowledge document
├── AGENTS.md # Auto-generated context document by Golem
├── .golem/ # Internal state (sessions, etc., gitignored)
└── ... # Any files produced during Agent work (notes, data, reports, etc.)The config file golem.yaml is minimal — only configures the engine, not skills:
name: my-assistant
engine: cursor # cursor | claude-code | opencode
model: claude-sonnet # Optional, preferred model
# Optional: IM channel configuration (gateway connects whichever channels are configured)
channels:
feishu:
appId: ${FEISHU_APP_ID}
appSecret: ${FEISHU_APP_SECRET}
dingtalk:
clientId: ${DINGTALK_CLIENT_ID}
clientSecret: ${DINGTALK_CLIENT_SECRET}
# Optional: gateway service configuration
gateway:
port: 3000
token: ${GOLEM_TOKEN}Sensitive fields support ${ENV_VAR} placeholder references to environment variables (resolved by resolveEnvPlaceholders() at load time). Both channels and gateway are optional — when not configured, behavior is identical to pure CLI mode.
Skills are not declared in the config. The skills/ directory is the single source of truth — whatever skills are inside, the assistant has those capabilities. Want to add a skill? Drop the folder in. Want to remove one? Delete the folder. ls skills/ shows the assistant's full capability set.
2. Skill
A Skill is the carrier of an assistant's capabilities. A Skill is a directory containing:
SKILL.md: knowledge and instructions (required)- Any supporting files: knowledge documents, scripts, config templates, etc. (optional)
The format is fully compatible with Cursor's .cursor/skills/, Claude Code's .claude/skills/, and OpenClaw's SKILL.md.
There is no separate Tool concept. Scripts are placed directly in the Skill directory, and SKILL.md describes how to invoke them. Coding Agents can natively execute any script — no framework-level "registration" is needed.
4. Usage Methods
All methods internally share the same core logic (createAssistant → assistant.chat()).
Method 1: CLI (Quickest Start)
npm install -g golembot
mkdir ~/my-assistant && cd ~/my-assistant
golembot init # Interactive: choose engine, pick a name, generate config, copy default skills
golembot run # REPL conversationWant to add a skill? Just drop the Skill folder into the skills/ directory — no commands needed.
Method 2: Library Import (Developer Embedding)
import { createAssistant } from 'golembot';
const assistant = createAssistant({ dir: './my-agent' });
for await (const event of assistant.chat('Analyze the competitor data')) {
if (event.type === 'text') process.stdout.write(event.content);
}Method 3: Embedding in Various Scenarios
Slack Bot:
import { createAssistant } from 'golembot';
const assistant = createAssistant({ dir: './slack-agent' });
slackApp.message(async ({ message, say }) => {
let reply = '';
for await (const event of assistant.chat(message.text)) {
if (event.type === 'text') reply += event.content;
}
await say(reply);
});HTTP API:
import { createAssistant } from 'golembot';
const agent = createAssistant({ dir: './api-agent' });
app.post('/api/chat', async (req, res) => {
for await (const event of agent.chat(req.body.message)) {
res.write(JSON.stringify(event) + '\n');
}
res.end();
});Electron desktop apps, Telegram bots, cron scheduled tasks... all follow the same pattern: create assistant → call chat() → handle the event stream.
Method 4: Gateway (IM + HTTP Unified Service)
golembot gateway # Reads channels config from golem.yaml, starts IM adapters + HTTP service
golembot gateway --port 3000 # Override port
golembot gateway --verbose # Verbose loggingGateway is a long-running service that internally reuses createAssistant() + server.ts, with an IM channel adapter layer on top. Whichever channels are configured in golem.yaml, Gateway automatically connects to the corresponding IM platforms at startup:
- Feishu (Lark): WebSocket long-connection mode (no public IP required)
- DingTalk: Stream mode (WebSocket, no public IP required)
- WeCom (WeChat Work): Webhook callback mode (requires a public URL)
IM channels and HTTP API work in parallel. One channel crashing does not affect the others.
Method 5: Onboard Setup Wizard
mkdir my-bot && cd my-bot
golembot onboardInteractive guided setup (7 steps): choose engine → pick a name → select IM channels → configure channel credentials → choose scenario template → generate config → start Gateway. Automatically generates golem.yaml, .env, .env.example, and .gitignore.
5. Architecture
Source File Structure
Core and CLI are separated:
Core library (importable):
index.ts— Public API:createAssistant(opts) → Assistant. Coordinates workspace, engine, and session. Includes a concurrency lock: for the same assistant instance and same sessionKey, only onechat()can execute at a time; different sessionKeys can run in parallel.engine.ts— Engine interface + Cursor / Claude Code / OpenCode three-engine implementation. Process management and skill injection are internal engine details.workspace.ts— Readsgolem.yaml(includingchannelsandgatewayfields), scans theskills/directory, and auto-generatesAGENTS.md.resolveEnvPlaceholders()resolves${ENV_VAR}placeholders.session.ts— Multi-user session storage:.golem/sessions.json, indexed bysessionKey.server.ts— HTTP service:createServer(assistant, opts) → http.Server.POST /chat(SSE),POST /reset,GET /health.channel.ts—ChannelAdapterinterface andChannelMessagetype definitions.buildSessionKey()generates channel-level session keys,stripMention()removes @mentions.channels/feishu.ts— Feishu (Lark) adapter (@larksuiteoapi/node-sdkWebSocket long-connection mode).channels/dingtalk.ts— DingTalk adapter (dingtalk-streamStream mode).channels/wecom.ts— WeCom adapter (@wecom/crypto+xml2jsWebhook callback mode).gateway.ts— Gateway long-running service: reads golem.yaml → creates Assistant → starts HTTP service → iterates channels config to start adapters → routes messages toassistant.chat()→ replies.
CLI thin shell:
cli.ts— Command entry points:init,run,serve,gateway,onboard,status,skill. Automatically loads.envfiles.onboard.ts— Guided configuration wizard (7-step interactive). Generates golem.yaml, .env, .env.example, .gitignore, and optionally installs scenario templates.
IM SDK dependency strategy: @larksuiteoapi/node-sdk, dingtalk-stream, @wecom/crypto, and xml2js are all optional peerDependencies. Gateway dynamically imports the corresponding SDK at startup, providing installation prompts if missing.
Core API
export function createAssistant(opts: {
dir: string; // Assistant directory path
engine?: string; // Override engine config from golem.yaml
model?: string; // Override model config
apiKey?: string; // Agent API key (Cursor: CURSOR_API_KEY, Claude Code: ANTHROPIC_API_KEY, OpenCode: depends on Provider)
}): Assistant;
export interface Assistant {
chat(message: string, opts?: { sessionKey?: string }): AsyncIterable<StreamEvent>;
init(opts: { engine: string; name: string }): Promise<void>;
resetSession(sessionKey?: string): Promise<void>;
}
export type StreamEvent =
| { type: 'text'; content: string }
| { type: 'tool_call'; name: string; args: string }
| { type: 'tool_result'; content: string }
| { type: 'error'; message: string }
| { type: 'done'; sessionId?: string; durationMs?: number; costUsd?: number; numTurns?: number };costUsd and numTurns are provided by the Claude Code engine; they are undefined for the Cursor engine.
The API is minimal: createAssistant + chat + init + resetSession — that's it.
Session Routing (sessionKey)
chat() accepts an optional sessionKey parameter for session isolation in multi-user scenarios:
// Single user (Phase 1 mode, fully backward compatible)
assistant.chat("Hello")
// Equivalent to assistant.chat("Hello", { sessionKey: "default" })
// Multi-user — each user gets an independent engine session
assistant.chat("Hello", { sessionKey: "feishu:user_123" })
assistant.chat("Look up the data", { sessionKey: "slack:U456" })Storage structure changes:
// Phase 1: .golem/sessions.json
{ "engineSessionId": "abc-123" }
// Phase 2: .golem/sessions.json
{
"default": { "engineSessionId": "abc-123" },
"feishu:user_123": { "engineSessionId": "def-456" },
"slack:U456": { "engineSessionId": "ghi-789" }
}Design principles:
- Default key is
"default"— single-user scenarios need not be aware of sessionKey - Different sessionKeys map to different engine sessions, but share the same skills and working directory
- Lock granularity changed from "one global lock" to per-sessionKey locking: same key queues, different keys run in parallel
resetSession(key?)clears the session for the specified key; omitting it clears"default"
HTTP Service (golembot serve)
Built-in lightweight HTTP service, allowing any IM webhook to connect:
golembot serve --port 3000 --token my-secretEndpoint design:
POST /chat
Headers: Authorization: Bearer <token>
Body: { "message": "Hello", "sessionKey": "feishu:user_123" }
Response: text/event-stream (SSE)
data: {"type":"text","content":"Hello"}
data: {"type":"tool_call","name":"readFile","args":"{}"}
data: {"type":"done","sessionId":"xxx"}
POST /reset
Headers: Authorization: Bearer <token>
Body: { "sessionKey": "feishu:user_123" }
Response: 200 { "ok": true }
GET /health
Response: 200 { "status": "ok", "name": "my-assistant" }Design principles:
- Uses Node.js built-in
node:http, zero extra dependencies - SSE protocol (
text/event-stream), consumable by all languages/platforms - Bearer token authentication (
--tokenparameter orGOLEM_TOKENenvironment variable) server.tsexports acreateServer()factory, usable by both CLI and third parties
Typical flow for connecting IM:
Feishu webhook → your 3-line forwarding script → POST localhost:3000/chat → SSE → push back to Feishu
Slack event → your 3-line forwarding script → POST localhost:3000/chat → SSE → push back to SlackYou can also import the library directly (no HTTP):
import { createAssistant } from 'golembot';
const bot = createAssistant({ dir: './my-bot' });
slackApp.message(async ({ message, say }) => {
let reply = '';
for await (const ev of bot.chat(message.text, { sessionKey: `slack:${message.user}` })) {
if (ev.type === 'text') reply += ev.content;
}
await say(reply);
});init Workflow
golembot init (or assistant.init())
1. Check if directory already has golem.yaml → error if yes
2. Interactive prompts: engine type (cursor/claude-code/opencode), assistant name
3. Create golem.yaml
4. Create skills/ directory
5. Copy built-in skills to skills/ (general + im-adapter)
6. Create .golem/ directory (internal state)
7. Generate AGENTS.md
8. Generate .gitignore (ignoring .golem/)Gateway Workflow
golembot gateway
1. Auto-load .env (at CLI entry layer)
2. Read golem.yaml (channels + gateway config)
3. resolveEnvPlaceholders() resolves ${ENV_VAR} placeholders
4. Create Assistant instance
5. Start HTTP service (preserving /chat, /reset, /health endpoints)
6. Iterate channels config, dynamically import + start corresponding adapters
7. Adapter receives message → buildSessionKey() → stripMention() → assistant.chat() → reply
8. Streaming response concatenated into complete text before sending at once (IM platforms don't support streaming)
9. SIGINT/SIGTERM → gracefully shut down all adapters and HTTP serviceData Flow: One Conversation Turn
Whether invoked via CLI or library import, the internal flow is exactly the same:
Caller → assistant.chat(message)
→ index.ts: acquire concurrency lock
→ workspace.ts: ensureReady() — scan skills/ + generate AGENTS.md
→ engine.ts: invoke(message, skillList, sessionId?)
→ Inject Skills:
Cursor → .cursor/skills/ symlink
Claude → .claude/skills/ symlink + CLAUDE.md
OpenCode → .opencode/skills/ symlink + opencode.json
→ Start Agent process:
Cursor → child_process.spawn
Claude → child_process.spawn
OpenCode → child_process.spawn
→ Parse output line by line → yield StreamEvent
→ index.ts: save session + release lock
→ Caller receives StreamEvent streamAGENTS.md Auto-Generation
workspace.ts auto-generates AGENTS.md in the assistant directory (note: this is in the assistant directory, not this project's) during each ensureReady() call, based on scanning the skills/ directory:
# Assistant Context
## Installed Skills
- general: General personal assistant capabilities
- ops-xhs: Xiaohongshu operations assistant (includes xhs.py script)
## Directory Structure
- skills/ — Skills directory (each subdirectory is a skill, containing SKILL.md and optional scripts)
- AGENTS.md — This file, auto-generated by GolemBot
## Conventions
- Information that needs to be remembered persistently should be written to notes.md
- Generated reports/files go into the corresponding directoryThis lets the Coding Agent understand its environment and capabilities from the moment it starts.
6. Multi-Turn Interaction
- Engine-native first: Multi-turn context relies on the Coding Agent CLI's native session mechanism (both Cursor and Claude Code support
--resume) - Engines without resume support are treated as single-turn: Files in the workspace serve as the only cross-turn memory
- No TTL: GolemBot does not proactively expire sessions; the engine manages session lifecycle itself
- Auto-fallback on resume failure: If the engine's
--resumefails (engine-side expiration/corruption), a new session is automatically started, transparent to the user - Manual reset: Users can explicitly clear sessions via
/reset(CLI) orassistant.resetSession()(API) - Session storage:
.golem/sessions.json, stores only{ engineSessionId }
7. Engine Interface
interface AgentEngine {
invoke(prompt: string, opts: {
workspace: string; // Assistant directory path
skillPaths: string[]; // Absolute paths of skill directories under skills/
sessionId?: string;
model?: string;
}): AsyncIterable<StreamEvent>;
}
type StreamEvent =
| { type: 'text'; content: string }
| { type: 'tool_call'; name: string; args: string }
| { type: 'tool_result'; content: string }
| { type: 'error'; message: string }
| { type: 'done'; sessionId?: string; durationMs?: number; costUsd?: number; numTurns?: number };Three engines are implemented, created via the createEngine(type) factory function:
CursorEngine — Invokes agent CLI via child_process.spawn:
class CursorEngine implements AgentEngine {
async *invoke(prompt, opts) {
// 1. Inject Skills: symlink skillPaths to .cursor/skills/
// 2. spawn: agent -p <prompt> --output-format stream-json --stream-partial-output ...
// 3. stripAnsi + parse stream-json line by line → yield StreamEvent
// 4. segmentAccum deduplication (Cursor's summary events)
}
}ClaudeCodeEngine — Invokes claude CLI via child_process.spawn:
class ClaudeCodeEngine implements AgentEngine {
async *invoke(prompt, opts) {
// 1. Inject Skills: symlink skillPaths to .claude/skills/ + generate CLAUDE.md
// 2. spawn: claude -p <prompt> --output-format stream-json --verbose --dangerously-skip-permissions ...
// 3. Parse stream-json line by line → parseClaudeStreamLine() → yield StreamEvent[]
// 4. No ANSI stripping or deduplication needed
}
}OpenCodeEngine — Invokes opencode CLI via child_process.spawn:
class OpenCodeEngine implements AgentEngine {
async *invoke(prompt, opts) {
// 1. Inject Skills: symlink skillPaths to .opencode/skills/
// 2. Generate/update opencode.json (permission config + model config)
// 3. spawn: opencode run "prompt" --format json [--model provider/model] [--session ses_xxx]
// 4. Parse NDJSON line by line → parseOpenCodeStreamLine() → yield StreamEvent[]
// 5. Multi-Provider API Key inferred via resolveOpenCodeEnv()
}
}Key differences between the three engines:
| CursorEngine | ClaudeCodeEngine | OpenCodeEngine | |
|---|---|---|---|
| Spawn method | child_process.spawn | child_process.spawn | child_process.spawn |
| Output format | stream-json (with ANSI) | stream-json (pure JSON) | NDJSON (--format json) |
| Skill injection | .cursor/skills/ symlink | .claude/skills/ + CLAUDE.md | .opencode/skills/ + opencode.json |
| Session resume | --resume <uuid> | --resume <uuid> | --session <ses_xxx> |
| API Key | CURSOR_API_KEY | ANTHROPIC_API_KEY | Depends on Provider |
| Permission bypass | --force --trust --sandbox disabled | --dangerously-skip-permissions | opencode.json permission |
But the externally exposed StreamEvent is completely consistent.
8. Key Design Decisions
- Library-first, CLI is a thin shell: The core is an importable library (
createAssistant); the CLI is just one consumer. This makes GolemBot embeddable in any scenario. - Directory is the assistant: The assistant lives in a directory, and the directory is the single source of truth. Coding Agents naturally work within directories — this is a deliberate difference from OpenClaw (global config).
- Directory is the skill list: Whatever is in
skills/gets loaded — skills are not declared in a config file, eliminating config-vs-reality mismatches. - Only two concepts: assistant directory + Skill. No Tools, no Blueprints, no Registry.
- Skill is capability: Knowledge (Markdown) and tools (scripts) both live in the Skill directory, not separated. Coding Agents can natively execute scripts — no framework "registration" needed.
- Skill injection is the engine's responsibility: Each engine decides how to inject Skills (Cursor →
.cursor/skills/symlink, Claude Code →.claude/skills/symlink +CLAUDE.md, OpenCode →.opencode/skills/symlink +opencode.json) — the core layer doesn't care. - Don't do anything the Agent should do: No context management, no tool scheduling, no decision-making, no session TTL. Everything is delegated to the Coding Agent.
- Concurrency safety: The same assistant instance allows only one
chat()execution at any given time, preventing multiple requests from simultaneously operating on the workspace. - TypeScript: Mature channel ecosystem (Telegram/Slack/Discord), and subprocess invocation is language-agnostic.
9. Evolution Roadmap
Phase 1 — CLI Assistant (Current)
golembot init+golembot runtwo commands- Cursor Engine (child_process.spawn + Skill injection)
- Skills directory scanning
- Session management (resume + auto-fallback)
- Concurrency lock
- 4 core source files + 1 CLI thin shell
Phase 2 — Multi-User + HTTP Service (Current)
- Session routing:
chat(msg, { sessionKey })supports multi-user isolation - Concurrency lock isolated by sessionKey: different users can run in parallel
server.ts: built-in HTTP service + SSE streaming response + Bearer token authenticationgolembot serveCLI command- Session storage upgraded to per-key indexed multi-user structure
Phase 3 — Multi-Engine ✅
Claude Code Engine✅ (ClaudeCodeEngine, stream-json parsing, native.claude/skills/injection)OpenCode Engine✅ (OpenCodeEngine, NDJSON parsing,.opencode/skills/injection, multi-Provider API Key,opencode.jsonpermission config)- Codex Engine (to be implemented)
Phase 4 — Gateway + IM Channels ✅
✅ChannelAdapterinterface +ChannelMessagetypeFeishu adapter (WebSocket long-connection,✅@larksuiteoapi/node-sdk)DingTalk adapter (Stream,✅dingtalk-stream)WeCom adapter (Webhook,✅@wecom/crypto)Gateway long-running service (✅golembot gateway)✅golem.yamlextended withchannels+gatewayfields✅${ENV_VAR}placeholder resolutionCLI✅.envauto-loading
Phase 5 — Out-of-the-Box ✅
Onboard setup wizard (✅golembot onboard, 7-step interactive)Built-in Skills:✅general(enhanced, with persistent memory conventions) +im-adapter(IM reply conventions)Template system (6 scenario templates: customer-support, data-analyst, code-reviewer, ops-assistant, meeting-notes, research)✅Docker deployment (Dockerfile + docker-compose.yml)✅README.md + LICENSE + CONTRIBUTING.md✅
Phase 6 — Ecosystem Expansion (To Be Implemented)
- Skill repository (
golembot skill search/install, community skill discovery and installation) - Codex Engine
- Permissions integration (
golem.yamlproject-level permission config) - WebSocket support (bidirectional communication)
- Multi-assistant routing (single Gateway serving multiple assistants)