Looking for claws

To get a clue where energy and climate-tech are heading, you’ve got to come to grips with AI. Last weekend, I found some spare time to play with OpenClaw, a project that marketed itself as an ‘AI assistant that can actually do things’. I tested it in a personal setting, barely knowing what I was doing, and put my thoughts on paper afterwards. Hope it’s worth a read, and as always I appreciate any feedback.

When I first came across the posts of OpenClaw - or Clawdbot/Moltbot as it was previously named - I struggled to understand whether it is just another hype, or actually holds any relevance about how AI could be used. Many of the stories I read sounded a bit too crazy to be true, such as agents paying humans to do work or groups of agents ‘autonomously’ deciding to start a religion.

I reckoned the only way to judge this new phenomenon was to get first-hand experience. With help from Claude I got my own OpenClaw agent running on a small cloud server. When you first talk to it, the agent asks you what its purpose is and how it should be named. I told my agent it should help us in planning our dishes and named him Sgt. Pepper. Without further ado I put Sgt. Pepper to the test: I gave it read-only access to where we store our recipes, as well as a (fresh) login for a Dutch supermarket website. I then asked it to order food for one of our all-time favorites: Spicy macaroni with bechamel from the book Falastin by Tara Wigley & Sami Tamimi (can recommend). I expected this to be a daunting task given that the recipe has 19 ingredients and the supermarket website doesn’t operate an official API.

To my positive surprise, Pepper immediately asked me questions back: whether it should order items that are typically on our shelf, such as olive oil and pepper, and whether we prefer organic vegetables and meat. In our previous recipe AI workflow, created about a year ago, we had to explicitly instruct the LLM on this matter.

Pepper then started to reason how it should communicate with the supermarket website. It found a code repository by Mark Ooms and then asked me some questions to set-up the authentication. After some back-and-forth it told me it got connected and started filling up my basket. It was fun to see articles popping up in my basket. At the same time, the process was quite slow and the agent was constantly reasoning. It seemed like this thing was burning through tokens like there was no tomorrow. The Claude API dashboard confirmed my fears: we just got started and Pepper had happily consumed over €3 of tokens - a number that I saw rising by the minute.1

Finally, after Pepper had added 12 items to our basket, it turned back to me telling me it ran into communication issues. It appeared that Pepper had put a wrong item in our basket and wanted to delete it, but it could not get the delete command working via the non-official connection. With tokens still rising, and me running out of time to find the root cause, I decided to call it a day.

It was a fun little experiment with lots of learnings. The fact that I do not have a computer science degree and barely knew what I was doing probably contributed to the joy. That said, it’s hard for me to judge the significance of systems like OpenClaw. It is definitely fun (and dangerous2) to use, but technically, it feels mostly like a combination of already existing solutions. Then again, I guess that’s the case for more innovations that end up being impactful and definitely not a reason to reject this concept from the start.

With LLMs continuing to improve the possibilities of these agents will increase as well. At some point, it will probably be possible to run a (local) good agent on consumer hardware, which is obviously optimal for privacy. When that will be feasible is not clear to me, but at least for now, I got a glimpse of what such that future could look like.


  1. As I would later find out, these relatively high costs can definitly be avoided by linking your agent to cheaper open-weights models such as Minimax M2.5 or Qwen3. ↩︎

  2. I understand the risk is twofold: first, there is the fundamental lethal trifecta problem as described by Simon Willison, second, there are regular cybersecurity risks associated with running a cloud server. I think the latter can be addressed when you know what you are doing, but the lethal trifecta sounds more fundamental to me. Curious to see how this will develop in 2026. ↩︎