OpenClaw


I’ve explained Openclaw to my girlfriend as ‘agent’—but what is agent? She knew ChatGPT, but wtf is agent.


Here’s how I explained it to her:


[1] a system is basically has input-process-output. A ‘model’ is ‘process’ in the chain. We give input to a model, model will output things.


[2] a exact shape of model is like: f(x) = x * 8 - 10; given x as input, and if we input 7, what is the result? … The result is f(7) = 7 * 8 - 10 = 46.


[3] before LLM, there’s ML (machine learning) model, it doesn’t contain simple math, it contain complex math in it. ML basically splits into 2 categories: Supervised and Unsupervised. Supervised is a model where it learn from bunch of ‘labelled data’, simply like ‘remember’ its average/mean/median based on its labels, so it can ‘predict’ things. Unsupervised means a model learn from a bunch of ‘unlabelled data’ which means it learn the patterns by similarity itself without human intervention.


Now you understand ‘model’. The question, how can a ‘math formula’ can ‘predict’ things?


Let me give you numbers: 1 2 4 8 16—what is the next number? … 32. How come? You predicted it. But how?


Human can ‘intuitively’ predicting the numbers, but computer can only ‘calculate’—we need to math-ing that intuitive thinking. So based on above numbers, we first create a model with this formula f(n) = n .. but this formula is wrong, if we input n=1, the result is 1 (it’s true), but if we input n=2, it output is 2 (which is true), but if we input n=3, it output 3! Why? .. Because f(n) = n; if n=3 so the output is 3, THIS IS WRONG.


To adjust this, we need to ‘reconfigure’ the formula multiple times to find the pattern: f(n) = n * 2; the result n=2 is 4, so this formula false too. A model is ‘learning’ mean adjusting the formula—so they’ll try to create multiple formulas until it found formula that ‘fit’ and resulting the exact pattern. It’ll end up to f(n) = 2^(n-1);—and this formula is our intuitive thinking of the pattern previously (1 2 4 8 16).


But real world doesn’t built upon exact numbers like that, that’s why ML scientist use probability. That’s another story. Let’s continue.


[4] now what’s ChatGPT? an LLM, but what’s LLM? Large-Language Model. What it means? It means, a model that has been fed a lot of books and writings. Imagine the previous model where it found the pattern of the given sequence numbers, LLM is the same, but instead of we giving sequence numbers, we give it words. But how ‘words’ can be ‘calculated’? Doens’t calculation only possible with numbers? Yes.


That’s why we have ‘tokens’—the scientist built things called tokenization which convert text into words into subwords and that subwords into token/number. By converting bunch of texts into bunch of numbers, the ‘model’ will get ‘pattern’ from it. Instead of a simple 1 2 4 8 16, it’ll receives massively huge sequence of numbers. From that sequence of massive numbers, it’ll understand the patterns of greetings, question-answers, and so on. It’s all in probability-space, which is a massive multi-dimensional matrices.


Imagine a simple dimension is a graph with X and Y. For example, I give you: [(1, 2), (2, 4), (3, 8), (4, 16)]—after a quick look, you realize that it shows the same pattern as before, right? But instead a single dimension of (1 2 4 8 16), here, you got 2 dimension: X and Y. But a model can be more than that, it’s a super multi-dimensional, the same pattern, but in different scale. Not only X, Y, Z, but it can have hundreds, thousands, even now is million. The limit is what we call ‘embedding’ size nowadays.


[5] a model is not enough. The first it launched, ChatGPT was often hallucinated, right? Why?—The simple answer is: because they didn’t understand about ‘fact’. LLM is a huge word, text, question-answer pattern engine; but if it doesn’t understand if what it generate is true (fact) or false (not fact), it’ll just hallucinate. That’s why new technique has been developed overtime: memory, inferences, chain of thought, tools, skills etc.


[6] previously we say that a model is a simple: input-process-output engine; but with some new techniques to reduce hallucination, now the process is ‘looping’, it repeatedly execute input-process-output indefinitely.


The first loop is the inferences as ‘internal’ reasoning steps. If the thinking level is low, it means to limit reasoning steps like 2-3 times only. If it higher, it iterate higher like 5-10 times, or 20-30 times.


The second loop is the external steps. It’s where tools and skills are. The process was deciding whether to use tool or not. And if it decided to use tool, it’ll return again to process the tool output, and decide again whether the given information is enough or not. If not, it may decided to use tools again, and again, and again until it completing its goal.


[7] this loop, is what we called ‘agent’ nowadays. That’s what OpenClaw is, a loop-engine with built-in opinionated tools and easy system to custom too. It includes integration to multiple communication channels such as Discord, Telegram, Whatsapp, Slack and more.


In the time I explained this to my gf. The future challenges are the agents environment harnessing, it’s the known way to allow agents to get more facts about user’s works and its environment. How we should build an environment where agents can utilize it fully without lowering security in user’s personal system. This why came new techniques such as MCP, APIs, and CLIs. But this demands more optimization in containerization, so providers can manage multiple instances of agents for users with minimal resources.


At the time I tried OpenClaw alternative such as PicoClaw and ZeroClaw, built a toolbox for them, and still experimenting.


The future is, one person with multiple agents, each agents with its own environments. The challenge is how those agents communicated across their environments.


In my case, as a software engineer, I code PHP, Flutter, and Rust. But do I personally need multiple environments/computers? No, actually, but YES for agents. Because exposing my personal setup to AI Agents is like exposing my underware. The agents shouls have their own underware (a.k.a environments). Their environment should be minimal, so it secure. It must be fast to provision, snapshots. It must be stored effectively, not wasting my personal space, like using cloud/homelab. It must be able to communicate blazingly fast, like using shared Redis/in-memory engine. It must be able to have shared memory, such as using vector database (pgvector, qdrant) or shared markdown/obsidian in NFS accessible from all environments.


That’s how you explain OpenClaw to a gf. But first thing first: find a gf. Hehe.