Architecture
Overview
agent-wechat runs the official WeChat desktop client inside a Docker container and controls it via UI automation. A Rust server (Axum) sits alongside WeChat in the container and exposes a REST + WebSocket API.
┌─────────────────────────────────────────────────────┐│ Docker Container ││ ││ WeChat Linux ←── Xvfb + AT-SPI (accessibility) ││ ↕ ││ agent-server (Rust/Axum, port 6174) ││ - FSM engine for UI automation ││ - REST + WebSocket API ││ - VNC web viewer at /vnc/ │└──────────────────────┬──────────────────────────────┘ │ HTTP / WebSocket ↓ CLI, AI agent, or custom codeKey components
Container environment
- Xvfb — headless X11 display server (no physical monitor needed)
- Fluxbox — lightweight window manager
- AT-SPI2 — accessibility framework used to read the WeChat UI tree
- xdotool / xclip — input simulation (clicks, keystrokes, clipboard)
- noVNC + websockify — browser-based VNC viewer at
/vnc/ - ffmpeg — media format conversion (WXGF to JPEG/GIF, SILK to MP3)
Rust server (agent-server)
The server is the core of agent-wechat. It:
- Reads the accessibility tree of the WeChat window to understand the current UI state
- Executes FSM plans to perform actions (login, open chat, send message, etc.)
- Reads WeChat’s SQLite databases directly for fast message and contact retrieval
- Exposes REST endpoints for all operations
- Serves a WebSocket for login flow events and real-time updates
FSM engine
All UI automation is driven by a deterministic finite state machine — no LLM is involved. Each action (login, send message, open chat) is defined as a plan with states, transitions, and selectors:
- CSS-like selectors match elements in the accessibility tree
- States define what to look for and what action to take
- Transitions move between states based on what’s visible on screen
This makes actions fast, cheap, and reliable.
Database access
Instead of scraping the UI for messages, agent-wechat reads WeChat’s internal SQLite databases directly using SQLCipher (the databases are encrypted). This provides:
- Fast message retrieval without scrolling the UI
- Access to the full contact list
- Media file lookup (images, voice, video)
Data flow
Client (CLI / OpenClaw / Wechaty / custom) │ │ REST API (port 6174) ↓agent-server (Rust) │ ├── Read: SQLite DBs → messages, contacts, chat list ├── Write: FSM plans → xdotool/xclip → WeChat UI └── Media: WeChat file system → decode → serve via API