The Evolution of My AI Setup: Why I Left OpenClaw for a Custom Vector-Backed Brain
A while ago, I wrote about my OpenClaw setup, where I deployed two separate AI agents. It worked really well. Having an AI triage my emails, manage my shopping lists, and pull context from an Obsidian vault felt a bit absurd in the best way.
But in AI, a few months is a lifetime.
As I pushed my agents into more complex tasks, the cracks started to show. OpenClaw was a great starting point, but it was rigid. Relying on simple Markdown files and local search (QMD) for memory meant the agents had no deep, semantic understanding of past conversations. Testing was awkward, and when an agent went off the rails or racked up API costs, I had very little visibility into why.
I realized I didn’t just need a bot framework. I needed a modular, observable, deeply integrated agent architecture. So I built my own.
Enter XelaBot.
Here’s what changed, what I threw out, and why the agents are better now.
1. From Markdown to a Semantic Vector Brain
In the old setup, my “Single Source of Truth” was an Obsidian Vault. The agents used Quick Markdown (QMD) to search through it. It worked, but keyword search is dumb. If I told the bot “I don’t like spicy food,” a keyword search for “pizza” would never connect that to avoiding jalapeños.
I overhauled the whole memory system. XelaBot now uses a semantic long-term memory architecture.
- Zero External Dependencies: Instead of spinning up heavy vector databases like Pinecone or Weaviate, everything runs inside a single SQLite file using
sqlite-vec(SIMD-accelerated vector search built directly into SQLite). - Voyage AI Embeddings: I use
voyage-3.5for embeddings, and the retrieval quality is excellent. - Smart Memory Lifecycle: The brain doesn’t just store facts. It manages them. It uses Reciprocal Rank Fusion (RRF) to combine text and vector searches. It has Contradiction Detection (if I change my mind, it updates the memory rather than keeping conflicting facts) and Memory Decay, which means the agent slowly loses confidence in old, stale memories over time unless they are reinforced.
The Obsidian vault is still connected via a tool, but the agent’s actual “working brain” is now a highly efficient semantic SQLite database.
2. True Multi-Agent Orchestration
Previously, B-Sisstant and Projektbot were completely isolated instances. Now they share a unified, protocol-based Python 3.12 codebase, managed by uv.
The biggest upgrade is how they collaborate. A single instance can now act as an orchestrator. My “Projektbot” can play CEO. If a complex question lands in the Telegram chat, the CEO can spawn an ephemeral “Researcher” or “Investor” sub-agent in the background, let them analyze the data, synthesize their opinions, and then reply to the group.
3. Model Agnosticism & Multimodality
I used to rely heavily on Anthropic’s Claude. I still like Claude, especially Sonnet for coding and Haiku for cheap background triage, but being locked into one provider is risky.
I integrated LiteLLM as an abstraction layer. Now switching from Claude to GPT-4o, or even to a local Ollama model, means changing exactly one line in a YAML config. I also added OpenAI’s gpt-4o-transcribe via API, so the bots can natively understand voice messages, plus vision capabilities to analyze images and PDFs dropped into the chat.
4. Observability: The Custom Admin Dashboard
When an autonomous agent has access to your calendar, emails, and credit card-backed APIs, “hoping it does the right thing” is not a strategy. I needed to see what was happening under the hood.
I built a custom Admin Panel (FastAPI backend + Svelte 5 frontend), locked behind my Tailscale VPN. From that dashboard, I can:
- Edit “Souls” live: Tweak the agent’s personality prompt on the fly.
- Browse and delete memories: See exactly what the agent has chosen to remember about me.
- Inspect Traces: A full pipeline inspector that shows me exactly how many tokens were used in a turn, which tools were called, how long they took, and what the raw API payload looked like.
5. Strict Guardrails and Safety
To prevent run-away loops and run-away server bills, XelaBot has strict guardrails.
- Hard Limits: I can set maximum daily costs per instance (e.g., $2.00/day). If the bot hits the limit, it goes to sleep.
- M1 Safety Runtime: I implemented an approval queue for risky tools. If the agent wants to execute a sensitive action, it pauses its execution state and sends an approval request. I can review the exact parameters in the admin dashboard before clicking “Resume.”
The Verdict
Moving away from a pre-built framework to a custom, protocol-driven architecture was a lot of work. I wrote over a thousand tests just to make sure the runner loop, memory flow, and tool execution were rock solid.
The result is a system that finally feels like a real “exocortex.” It’s fast, deeply personalized, highly observable, and resilient. My Telegram chats are no longer just interfaces to an LLM. They’re windows into a persistent, self-improving digital brain.