Show HN: Rowboat – Open-source IDE for multi-agent systems

github.com

144 points by segmenta a day ago

Hi HN! We’re Arjun, Ramnique, and Akhilesh, and we are building Rowboat (https://www.rowboatlabs.com/), an AI-assisted IDE for building and managing multi-agent systems. You start with a single agent, then scale up to teams of agents that work together, use MCP tools, and improve over time - all through a chat-based copilot.

Our repo is https://github.com/rowboatlabs/rowboat, docs are at https://docs.rowboatlabs.com/, and there’s a demo video here: https://youtu.be/YRTCw9UHRbU

It’s becoming clear that real-world agentic systems work best when multiple agents collaborate, rather than having one agent attempt to do everything. This isn’t too surprising - it’s a bit like how good code consists of multiple functions that each do one thing, rather than cramming everything into one function.

For example, a travel assistant works best when different agents handle specialized tasks: one agent finds the best flights, another optimizes hotel selections, and a third organizes the itinerary. This modular approach makes the system easier to manage, debug, and improve over time.

OpenAI’s Agents SDK provides a neat Python library to support this, but building reliable agentic systems requires constant iterations and tweaking - e.g. updating agent instructions (which can quickly get as complex as actual code), connecting tools, and testing the system and incorporating feedback. Rowboat is an AI IDE to do all this. Rowboat is to AI agents what Cursor is to code.

We’ve taken a code-like approach to agent instructions (prompts). There are special keywords to directly reference other agents, tools or prompts - which are highlighted in the UI. The copilot is the best way to create and edit these instructions - each change comes with a code-style diff.

You can give agents access to tools by integrating any MCP server or connecting your own functions through a webhook. You can instruct the agents on when to use specific tools via ‘@mentions’ in the agent instruction. To enable quick testing, we added a way to mock tool responses using LLM calls.

Rowboat playground lets you test and debug the assistants as you build them. You can see agent transfers, tool invocations and tool responses in real-time. The copilot has the context of the chat, and can improve the agent instructions based on feedback. For example, you could say ‘The agent shouldn’t have done x here. Fix this’ and the copilot can go and make this fix.

You can integrate agentic systems built in Rowboat into your application via the HTTP API or the Python SDK (‘pip install rowboat’). For example, you can build user-facing chatbots, enterprise workflows and employee assistants using Rowboat.

We’ve been working with LLMs since GPT-1 launched in 2018. Most recently, we built Coinbase’s support chatbot after our last AI startup was acquired by them.

Rowboat is Apache 2.0 licensed, giving you full freedom to self-host, modify, or extend it however you like.

We’re excited to share Rowboat with everyone here. We’d love to hear your thoughts!

simonw 19 hours ago

"It’s becoming clear that real-world agentic systems work best when multiple agents collaborate, rather than having one agent attempt to do everything."

I'll be honest: I don't buy that premise (yet). It's clearly a popular idea and I see a lot of excitement about it (see Google's A2A thing) but it feels to me like a pattern that, in many cases, will make LLM code even harder to get reliable results from.

I worry it's the AI-equivalent of microservices: useful in a small set of hyper complex systems, the vast majority of applications that adopt it would have been better off without.

If there are strong arguments counter to what I've said here I'd love to hear them!

  • danenania 17 hours ago

    A few concrete examples of multi-agent collaboration being useful in my project Plandex[1]:

    - While it uses Sonnet 3.7 by default for creating the edit snippet when writing code, calls related to applying the snippet and validating the result (and falling back to a whole file write if needed) use o3-mini (soon to be o4-mini) which is 1/3 the cost, much faster, and actually more accurate and reliable than Sonnet for this particular narrow task.

    - If Sonnet 3.7's context limit is exceeded in the planning stages, it can switch to a Gemini model for planning, then go back to Sonnet again for the implementation steps (since these only need the files relevant to each step).

    - It eagerly summarizes the conversation after each response so that the summary can be used later if the conversation gets too long. This is only practical because much smaller models than the main planning/coding models are sufficient for a good summary. Otherwise it would be way too expensive.

    It's definitely more complex, but I think in these cases at least, there's a real payoff for the trouble.

    1 - https://github.com/plandex-ai/plandex

    • rchaves 7 hours ago

      is this multi-agent collaboration though, or is it just a workflow? All examples you listed seem to have pretty deterministic control flows (write then validade, context exceeded, after each response, etc)

      when I think of multi-agent collaboration I think of also the control flow and handover to be defined by the agents themselves, this is the thing I have yet to see examples of in production, and the premise that I also don't buy yet

  • ActionHank 6 hours ago

    It has been my experience that having short focused tasks overseen by some controller code that wires things together works more efficiently than multiagent approaches.

    The agents “chat” a whole lot back and forth to figure out what code be a direct instruction.

    • segmenta 3 hours ago

      Curious - what was the use case you were trying out?

  • ethan_smith 17 hours ago

    The microservices analogy is spot-on - multi-agent systems introduce coordination overhead that's only justified when domain complexity naturally decomposes into specialized tasks with clear interfaces.

    • segmenta 4 hours ago

      Agree that the microservices analogy is great for the maintainability aspect of multi-agents. However, there is one more dimension which is specific to LLMs - performance. Smaller agents tend to have better instruction-following accuracy.

  • segmenta 18 hours ago

    Here are a few practical reasons for multi-agent systems:

    1. LLMs handle narrower, simpler instructions better - decomposing into multiple agents improves reliability (related to instruction following accuracy).

    2. Similarly, tool-calling accuracy improves when each agent has a smaller set of specific tools assigned to them.

    3. Smaller agents mean prompt changes (which aren't very deterministic) can be isolated and tested more easily.

    4. Dividing agents by task enables stronger, more precise guardrails for real-world use cases.

    Happy to discuss further!

    • simonw 18 hours ago

      That's a really good answer. I suggest turning that into a set of working examples to help promote the idea - part of my hesitance around this is that it sounds good on paper but I've not seen convincing evidence that it works yet.

      (Claude Code is an example that I believe does make good use of this pattern, but it's frustratingly closed source.)

  • rchaves 6 hours ago

    same here, but I would even avoid "strong arguments" because that's what we all have been doing so far

    what I want is real use cases, show me real-world production examples from established companies where multi-agent collaboration helped them better than a simple agent + tools and deterministic workflows

  • nurettin 18 hours ago

    The sentence should read;

    "It is becoming clear that agentic systems which run a prompt per work node is becoming a curiosity so we should hype it as the correct solution in order to make a buck despite all the efforts that have been spent trying to one-shot complex problems."

    • rchaves 6 hours ago

      well I think hype is not bad per se, I'd do it even if not trying to make a buck, it's okay (up to a point) to hype up something so that eventually it finds a problem where it fits well, but yeah, I'm still waiting on this one

victorbjorklund 12 hours ago

"It’s becoming clear that real-world agentic systems work best when multiple agents collaborate, rather than having one agent attempt to do everything."

In a recent episode of Practical AI with the people behind All Hands:

"...when the Open Hands project started out, we were kind of on this bandwagon of trying to create a big agentic framework that you could use with and define lots of different agents. You could have your debugging agent, you could have your software architect agent, you could have your browsing agent and all of these things like this. And we actually implemented a framework where you could have one agent delegate to another agent and then that agent would go off and do this task and things like this.

One somewhat surprising thing is how ineffective this paradigm ended up being from two perspectives. So the first perspective is it didn't really and this is specifically for the case of software engineering. There might be other cases where this would be useful. But the first is in terms of effectiveness, we found that having a single agent that just has all of the necessary context, it has the ability to write code, use a web browser to gather information and execute code. Ends up being able to do a pretty large swath of tasks without a lot of specific tooling and structuring around the problems."

https://practicalai.fm/310

Not saying it is wrong. But I don't think it is something that is "clear" and we can take for granted so some benchmarks/reasoning why would have been great.

  • segmenta 11 hours ago

    Thanks for the pointer. We do agree that not all agentic systems should be multi-agent.

    Having said that, from our experience we see that for complex workflows e.g. customer support for enterprises, both quality and maintainability stands to gain when the system is decomposed into smaller scoped agents. We see a parallel of this in humans as well. For instance, when we call into customer support we get routed to different human agents based on our query.

    OpenAI says something similar in their recent guide on building agents [0]: "For many complex workflows, splitting up prompts and tools across multiple agents allows for improved performance and scalability. When your agents fail to follow complicated instructions or consistently select incorrect tools, you may need to further divide your system and introduce more distinct agents."

    A relevant benchmark here might be the Instruction Following benchmark: https://scale.com/leaderboard/multichallenge. Even the best reasoning models top out at ~60% accuracy on this.

    The options to improve accuracy then, are (a) either fine-tune a model on this task specific dataset, (b) or decompose the problem into smaller sub-problems (divide-and-conquer) - this is more practical and maintainable.

    [0] https://cdn.openai.com/business-guides-and-resources/a-pract...

NitpickLawyer 14 hours ago

This is cool! Seems like this is what AutoGen Studio wanted to be. And what a lot of "agentic" libs fell short of - a way to chain together stuff by using natural language.

Quick questions (I only looked at the demo video and briefly skimmed the docs, sorry if the qs are explained somewhere):

- it looks to me that a lot of the heavy weight "logic" is handled via prompts (when a new agent is created your copilot edits the "prompts"). Have you tested this w/ various models (and especially any open weights ones) to make sure the flows still work? This reminds me of the very early agent libraries that worked w/ oAI GPTs but not much else.

- if the above assumption is correct, are there plans of using newer libs where a lot of the logic / lifting is done by code instead of simply chaining prompts and hope the model can handle it? (A2A, pydantic, griptape, etc)

  • akhisud 12 hours ago

    Thanks!

    1. That's right - Rowboat's agent instructions are currently written in structured prompt blocks, and a lot of logic does live there (with @mentions for tools, other agents, and reusable prompts). We support oAI GPTs at the moment (we chose to start with the oAI Agents SDK), but we're actively working on expanding to other LLMs as well. One of our community contributors just created a fork for Rowboat + OpenRouter. Re: performance, we expect other closed LLMs to perform comparably, and (with good prompt hygiene + role instructions) open LLMs as well, if individual agent scope is kept precise.

    2. We've been discussing both A2A and pydantic! Right now, Rowboat is designed to be prompt-first, but we’re integrating more typed interfaces. Design-wise, its likely that prompts might stay central - encoding part of the logic and also acting as the glue layer between more code-based components. Similar to how code has comments, config, and DSLs, agent systems could benefit from human-readable intent even when the core logic is more structured.

    Does that make sense?

asnyder a day ago

This sounds really really nice. Potentially exactly what I've been hoping would exist. Thank you for putting it out there!

Will try it out over the weekend. Exciting stuff.

  • segmenta a day ago

    Thanks, that's great to hear! We'd love to learn what worked and what didn't for you.

flynumber a day ago

Now this is what I'm talking about — this checks all the boxes for me.

I was looking at "Agent builders" for a while now and nothing really stood out. They all seemed to use a "node" type structure, while I was looking to tell something what I need using natural language.

The only thing that came close was Vessium, but I never heard back after adding myself to the waiting list weeks ago. I also wasn’t so hot about their questions either — "Are you cool with paying for a product just to try it," or something to that effect. I’ll admit though, I said yes. =)

Either way, congrats on the launch and wishing you much success.

  • segmenta a day ago

    Thanks! We wanted to have the copilot planning and building out agents be a core part of Rowboat (like how Cursor works with code).

zcror 10 hours ago

I really liked the mascot haha. The product seems really cool, will try it out when I get free, I was thinking if such a tool exists few days ago, but most of them are paid, awesome tool!

  • segmenta 10 hours ago

    Glad you like the mascot! Thanks, curious to learn your feedback once you try it.

esafak a day ago

1. Are you going to support Google's A2A protocol?

2. Are you going to support stateless chat?

  • akhisud a day ago

    1. A2A is on our roadmap (still exploring), for agents built on Rowboat to communicate with external agents. I assume that's what you mean as well.

    2. Rowboat agents are technically stateless in that they do not store any messages and state between turns. Our HTTP API [0] requires previous messages and state to be explicitly passed from the previous turn to the next turn. For now, the state is pretty simple - just the name of the last agent that responded to the user. This allows the system to start off directly from where the previous agent left off. The Python SDK [1] manages the state for you. Does that make sense?

    [0] API docs - https://docs.rowboatlabs.com/using_the_api/

    [1] SDK docs - https://docs.rowboatlabs.com/using_the_sdk/

justanotheratom a day ago

Congratulations on the launch. Suggest adding the demo video right in the landing page. That is the first thing I would be looking for. IMO.

  • segmenta a day ago

    Thanks! We will put it out on our website.

sirjaz a day ago

It would be awesome if this could be wrapped around a native app rather than another webapp. Otherwise, great job!

  • gavinray a day ago

    Why do you want to shove a web app into a native window?

  • segmenta a day ago

    TBH we weren't sure if people preferred a native app or a web app for this kind of tool. This is useful feedback! We are trying to figure out how to bundle the micro-services together into maybe an electron app.

    • ramnique a day ago

      Just to add more details here - currently the dashboard is a Next.js app, but the agents runtime (and Copilot) are Python apps since they're using the OpenAI agents sdk. We're trying to figure out the best way to bundle these into a single native app.

      • pylotlight 14 hours ago

        I would suggest wails/go which imo is way simpler for cli/gui tools over rust based apps.

      • sirjaz a day ago

        You could use tauri and use the native webview of the particular os

        • ramnique a day ago

          We'll check this out, thanks!

A4ET8a8uTh0_v2 20 hours ago

I will test it out. I am mildly skeptical about the use case, but that may be today's experience of current project's PM not knowing anything about background of the project they are managing, which immediately makes me realize the limitations of all similar systems.

  • segmenta 19 hours ago

    Thanks for checking it out. Curious what you think after testing.

andsoitis 15 hours ago

Is Rowboat a metacircular rowboat?

  • akhisud 12 hours ago

    Haha not yet, but we're planning to make the copilot a Rowboat multi-agent system!

asasidh 19 hours ago

Your rowing mascot is cute.

ic_fly2 15 hours ago

I bet the next AI powered IDE will be called sailboat or some other preppy water themed sport.

Good luck down the stream.

whall6 17 hours ago

This is just screaming Jevon’s paradox

  • segmenta 16 hours ago

    Fair point. The copilot creates multi-agents only when necessary, acting as a check on unnecessary complexity.

owebmaster a day ago

I'm happy it is not another VSCode fork. But isn't this missing a text editor to be called a proper IDE?

  • 85392_school a day ago

    IDE just stands for Integrated Development Environment, so something that doesn't edit text could still be an IDE

    • owebmaster 19 hours ago

      Yes, I know. And an IDE by any acceptable definition needs a text editor.

        An integrated development environment (IDE) is a software application that provides comprehensive facilities for software development. An IDE normally consists of at least a source-code editor, build automation tools, and a debugger.
  • segmenta 21 hours ago

    Fair point. Increasingly the workflow for multi-agent systems involves structured agent instructions (almost resembling code), testing, connecting tools and improving agents. Given this we decided to call it an IDE.