Jules, our asynchronous coding agent

326 points by meetpateltech a day ago

turblety 21 hours ago

Why has Google totally overcomplicated their subscription models?

Looking at "Google AI Ultra" it looks like I get this Jules thing, Gemini App, Notebook, etc. But if I want Gemini CLI, then I've got to go through the GCP hellscape of trying to create subscriptions, billing accounts then buying Google Code Assist or something, but then I can't get the Gemini app.

Then of course, this Google AI gives me YouTube Premium for some reason (no idea how that's related to anything).

ygouzerh 12 hours ago

It looks like there are two different entities inside Google who provides AI products.
From a professional context for example, we are using in my company both Google Workspaces and GCP.
With Google Workspaces, we have including in our subscription Gemini, Veo 3, Jules, etc. Everything is included in a subscription models, rate-limited but unlimited. The main entrypoint is gemini.google.com
However, everytime we need to use the API, then we need to use GCP. It gives us access to some more advances models like Veo3 instead of Veo3-fast, and more features. It's unlimited usage, but pay-as-you-go. The main entrypoint is GCP Vertex AI
And both teams, Google Workspaces and GCP are quite separate. They often don't know really well what the others teams provides.
- veonik 9 hours ago
  
  To add to the confusion, you can also just use Gemini via API (without Vertex AI). It shows up as a separate item in billing.
  In the (latest, of three different) Go SDK, you can use either Vertex AI or Gemini. But not all features exist in either. Gemini can use uploaded files as attachments, and Vertex AI can use RAG stores, for example. Gemini uses API key based authentication, while Vertex AI uses the traditional credentials. All in the same SDK.
  It's a mess.
- stingraycharles 7 hours ago
  
  This sounds like par of the course for Google indeed, multiple teams creating the same products and competing without centralized product management or oversight.
  Happens with their other products as well, eg their meet / chat / voice / hangouts products.
- dostick 5 hours ago
  
  If it’s two competing, then we know what’s next - one of them to be killed.
  - jeffrallen 4 hours ago
    
    Or both and all the user data deleted. :(
- snthpy 11 hours ago
  
  Thanks. That explains some things.
  OT question about Google Workspaces: What's the difference between My Drive, Shared Drives, and "Workspaces"? When would I want to use each in a team setup?
  - zdragnar 10 hours ago
    
    You can share things with anyone, even if you're using Google drive with a normal Gmail (non-workspace) Google account.
    My drive are files you created / uploaded (and thus count against your space quota) and shared is where things go that have been shared with you by others, or public drive links that you've visited.
    Workspace is a shared space private to the company/organization workspace group.
ryandvm 20 hours ago

And God forbid you were an early Google for Domains adopter and have your own Google Workspace account because nothing fucking works right for those poor saps.
- slabity 15 hours ago
  
  You think that's bad? I had my own Google Workspace account with Google Domains and then foolishly linked my Google Fi cellphone to it.
  Trying to get that stuff resolved was such a pain that I eventually had to ask a friend who knew someone that worked at Google for assistance. Their support team had absolutely no public contact info available. I eventually managed to get my data and migrate the services I actually use (Google Fi and Youtube) to a non-workspace account.
  The funny thing is that a few months later they tried to send a $60 bill to collections because they reopened the account for 2 days for me to migrate things off. I was originally going to pay it to just get them off my back, but Google's own collections agency wouldn't let me pay through card or check or anything. The only way I could pay was to "Log into your Google Workspace account" which NO LONGER EXISTED.
  Now it's just an amusing story about incompetence to look back on, but at the time it was stressful because I almost lost my domains, cell phone number, and email addresses all at once. Now I never trust anything to a single company.
  - resize2996 14 hours ago
    
    Somewhere around 2022, someone flipped a switch that changed Google Fi support from best-in-class to 'we're trying to get people to cancel.'
  - thecupisblue 5 hours ago
    
    Ironically, I stopped paying for a workspace a few years ago when I shutdown a startup. The workspace got suspended and removed. I am still able to use it across any service requiring a Google account, which makes me think that if I buy a failed startup domain and sign up I could get access to their data.
- rkomorn 20 hours ago
  
  Add "moving to a different country while owning an account that started as Google Apps for Domains" for a little more flavor.
  "Can't share the subscription because the other person in your family is in another country."
  Okay guess I'll change countr- "No you can't change your Google Workspace account's country."
  - phs318u 14 hours ago
    
    This issue alone is driving me to switch to Microsoft for a particular use-case where I inherited a Google Apps for Domains account. Why anyone who knows the history of Google's behaviour with respect to supporting businesses that are not advertisers, would still choose Google over other options, continues to baffle me.
    
    rkomorn 10 hours ago
    
    My Google Workspace account is my personal, single-user domain.
    IMO, Google is still the best option. I like GMail (their spam filtering is nearly flawless), Google Drive and Docs is the right mix of working and complexity, Google Photos integrates well with what I use, etc.
    It's basically Google One with the tradeoff of my rough edges in exchange for hosting my own final.
    I've occasionally looked at switching away (to proton, MSFT, etc) and the most likely switch would be to a personal Google account.
    I'm arguably "in too deep" because it'll soon be 20 years that I'm a google customer for that domain. At the time they were definitely the best option (I used to even self-host DNS for my domain on my home desktop).
    In the corporate world, having faced a mix of options over my career, I still prefer the google stack (with the exception of google chat which I last used 3 years ago and wow was that bad at the time).
    
    rkomorn 3 hours ago
    
    > It's basically Google One with the tradeoff of my rough edges in exchange for hosting my own final
    I must not have been awake when I wrote this...
    It should be "the tradeoff of the rough edges in exchange for hosting my own domain".
  - kyleee 19 hours ago
    
    Nobody is getting a promotion for fixing that shit
    
    samrus 12 hours ago
    
    Does google know this startegy just makes people less likely to sign up for their new shiny services, as the "killed by google" meme spreads more and more
    
    endtime 11 hours ago
    
    I worked at Google for ten years (as an IC). Here's my personal perspective.
    Yes, of course, the individual employees know. But the decision making for these kinds of things is usually a full-time middle manager, who isn't deciding on behalf of Google as a whole, but on behalf of their organization within Google (could be 50 people, could be 2000). It's not just _not_ that manager's job to make the globally optimal decision for Google, it's actually likely often in direct conflict with their job, which is basically "set the priorities of your org such that they launch things that make your boss look good to his boss". Spending headcount on maintaining niche stuff is usually not that (and takes resources away from whatever is).
    
    jjani 10 hours ago
    
    And this is exactly why they need to be broken up. If that offering was their core service, you can bet it would be the priority of the middle managers.
    
    rkomorn 9 hours ago
    
    I'm personally unconvinced that smaller companies put out better products, and that breaking up google would raise the bar either at the new entities, or at the competition.
    The integration between Google products is definitely one of the things that keeps me with them.
    I've seen more than a few companies that are no better at their core service than the giants.
    
    jjani 3 hours ago
    
    On average, they sure put out better customer service than the likes of Google and Meta.
    
    rkomorn 3 hours ago
    
    I'm not sure "better customer service for an equally bad product" is what really moves the needle for me.
    
    ponector 8 hours ago
    
    How smaller company will have other incentive for middle managers?
    Small companies I've been working in are sometimes even worse.
    Web app which takes 10+ sec to fully load? That's ok, focus on the new features!
    
    jjani 3 hours ago
    
    > Web app which takes 10+ sec to fully load? That's ok, focus on the new features!
    This web app wouldn't survive unless it already had backing of the Microsoft level or did offer immense unique value to its users.
    
    ponector 29 minutes ago
    
    It's rather lack of choice. You are going to use app of your utilities provider no matter how crappy it is.
- zoba 4 hours ago
  
  Yes! This me and it is mind boggling how poor the experience is.
esher 21 hours ago

Watch YouTube while AI is coding for you.
- weakwire 21 hours ago
  
  That’s actually great!
rsanheim 15 hours ago

Because their main business is selling ads and maintaining their stranglehold on that market via analytics, chrome, Chromebook, android, SSO via google, etc.
The dev focused products are a sideshow amongst different fiefdoms at google. They will never get first billing or focus.
absurddoctor 20 hours ago

But unlike some other pieces of the Ultra subscription you can’t share YouTube premium with family. So now I have both and Google has suggested a few times that I shouldn’t be doing that.
discordance 3 hours ago

I think you're holding it wrong.
I installed Gemini CLI today, went to AI studio, got a free Gemini 2.5 Pro API key (without setting up billing or a credit card) and auth'd with Gemini CLI using the key. Took like 30 seconds. Unfortunately the results were much poorer than what I've been getting with Claude Code.
coredog64 21 hours ago

> Then of course, this Google AI gives me YouTube Premium for some reason (no idea how that's related to anything).
One of the common tests I've seen for the Google models specifically is understanding of YT videos: Summarization, transcription, diarization, etc. One of their APIs allows you to provide a YT video ID rather than making you responsible for downloading the content yourself.
- neuronexmachina 9 hours ago
  
  Tangentially, last week I asked Gemini's research mode to write up a strategy guide for a videogame based on a 20-episode "masterclass" youtube series. It did a surprisingly good job.
Lucasoato 15 hours ago

We’ve been trying to understand Google Workspace subscriptions but it’s a complete mess. It’s not even clear which plans include GMail and which don’t, Google used to be the simple but great company, why do you feel so stranded when subscribing to their product now?
When you enter Google Cloud you end up in a shroud of darkness in which admins can’t see the projects created by users.
I’m the admin for our Google Workspace, I can’t even tell if we have access to Google AI studio or not, their tutorials are complete bullshit, the docs are just plain wrong because they reference to things that are not reflected in the platform.
I had to switch back to English because their automated translations are so awful, didn’t they really think to at least let one person review once each document before releasing it to the public?!
It’s a 450 billion dollars company and they can’t realize that they added so many layers of confusion that 99% of their users won’t need. The biggest issue is that they won’t solve this anytime soon, they dug themselves into a limitless pit.
- codazoda 14 hours ago
  
  An old boss of mine used to say, “There’s money in confusion.”
  - samrus 12 hours ago
    
    Feels like a short term, low trust, reputation lag exploit. You mguht make more money from the confusion until some comes along offering a better product with a sensible billing plan and kills you because its easier for people to use.
    Thats my hope anyway
  - emmelaich 12 hours ago
    
    https://en.wikipedia.org/wiki/Confusopoly
    Typically applied to mobile phone plans but applicable to many other markets.
  - SchemaLoad 10 hours ago
    
    For the competitor
gman83 21 hours ago

I was wondering about this too, and apparently they're working on integrating it, so the Google AI Pro/Ultra subscriptions will also give API/CLI credits or something -- https://github.com/google-gemini/gemini-cli/issues/1427
jacksnipe 19 hours ago

I wonder if bundling it with ai is to deal with that pesky internal issue where engineers are always trying to turn off ads for their yt accounts
- thrtythreeforty 15 hours ago
  
  Is this really a thing? That's hilarious
zoba 4 hours ago

They block me from subscribing because I have a custom domain for my personal email. I’d gladly give them money but they say “Sign up with your personal email” when I try to subscribe. Such poor design
artdigital 8 hours ago

You're forgetting the 'Google Developer Premium' subscription ($300/y) which also bundles Gemini Code Assist / CLI, some Vertex credits, but none of the other Gemini things
One can make an argument that other Gemini stuff shouldn't be in there because it's not dev related, but Jules at least should
- turblety 6 hours ago
  
  Oh gees another subscription. This Google Developer Premium does look closer to something I would pay for, but really I just want something that gives me everything in a single subscription, that I can use. Like Claude, OpenAI or most other services on the planet.
  - artdigital 5 hours ago
    
    Yeah agree. I currently don't subscribe to any of the Gemini things because I'm too much in the Claude ecosystem and can't justify another sub. But if I were to subscribe to something, it would be the Google Developer Premium plan. Mostly for Gemini CLI, but that alone is currently still way too rate-limited and not worth it.
    Maybe if they add Jules and some version of Gemini to it (without the Gdrive storage and YT premium stuff), I would get onto it.
    Having Jules in the normal consumer Gemini subscription makes no sense. It's very clearly a dev product, so it should be bundled with other dev products, so through GCP with Code Assist or Google Developer Premium. It feels completely out of places in the list of features on https://one.google.com/ai
navinsylvester 10 hours ago

I subscribed to ultra to give deep think a try but i will not extend it even a day for all the other packaging put together by someone who might be working for a competitor. Who does these things and the fragmentation is crazy as mentioned. Chinese deep agents may be entrenched(just kidding), its crazy that someone up the ladder is so off point and not worry about losing his job.
BlackjackCF 17 hours ago

The cynic in me thinks that complicated subscriptions schemes are an effective way to make people overpay for stuff.
- SchemaLoad 10 hours ago
  
  I'd guess Google is just a million disconnected teams with their own products and product managers and pricing schemes, and no one thinking about the overall strategy or cohesion.
- JonathanMerklin 16 hours ago
  
  I think it's possible that that may be an additional benefit (for Google), but to me it seems overwhelmingly more likely that the main explanation here is Conway's Law.
  - pyman 15 hours ago
    
    Google is terrible at two things: building UIs and pricing their products.
    Their reputation precedes them.
slightwinder 4 hours ago

At this point, I'm not sure whether Google is just full of people trolling the world, or people so smart and unworldly that they are trolling themselves. The website for Jules has a section for plans, yet it's neither mention the price, nor which actual plan they are talking about or where I can find those f**ing plans, not even a link. This is just ridiculous. Has Google already replaced all their people with AI?

artdigital 8 hours ago

I tried Jules multiple times during the preview. Almost every week once, and it’s pretty terrible. Out of all the cloud coding assistants, it’s the worst. I honestly thought it was just an experiment that got abandoned and never expected it to actually become a real product, similar to how GH Copilot Spaces was an experiment and turned into Copilot agent.

It does what it wants, often just “finishes” a task preemptively and asking follow ups does nothing besides it rambling for a bit, the env sometimes doesn’t persist and stuff just stops working. For a while it just failed completely instantly because the model was dead or something

Out of the dozen times I tried it, I think I merged maybe one of its PRs. The rest I trashed and reassigned to a different agent.

My ranking

- Claude Code (through gh action), no surprise there

- ChatGPT Codex

- GitHub Copilot Agent

- Jules

I will try it again today to see if the full release changed anything (they give 3 months free trial for previous testers) but if it’s the same, I wouldn’t use or pay for Jules. Just use Codex or GitHub agent. Sorry for the harsh words

artdigital 2 hours ago

Alright, I wanted to give Jules another fair try to see if it improved, but it's still terrible.
- It proposed a plan. I gave it some feedback on the plan, then clicked approve. I came back a few minutes later to "jules is waiting for input from you", but there was nothing to approve or click, it just asked "let me know if you're happy with the plan and I'll get started". I told Jules "I already approved the plan, get started" and it finally started
- I have `bun` installed through the environment config. The "validate config" button successfully printed the bun version, so it's installed. When Jules tries to use bun, I get `-bash: bun: command not found` and it wastes a ton of time trying to install bun. Then bun was available until it asked me for feedback. When I replied, bun went missing again. Now for whatever reason it prefixes every command with "chmod +x install_bun.sh && ./install_bun.sh", so each step it does is installing bun again
- It did what I asked, then saw that the tests break (none were breaking beforehand, our main branch is stable), and instead of fixing them it told me "they're unrelated to our changes". I told it to fix everything, it was unable to. I'm using the exact same setup instructions as with Copilot Agent, Codex and Claude Code. Only Jules is failing
- I thought I'll take over and see what it did, but because it didn't "finish", it didn't publish a branch. I asked it to push a branch, it started doing something and is now in "Thinking" for a while. Seems to be running tests and lint again which are failing. But eventually it published the branch.
At this point I gave up. I don't have time to debug why bun is missing when in the env configuration it is available, or why it vanished in between steps, or figure out why only jules isn't able to properly run our testsuite. It took forever for a relatively small change, and each feedback iteration is taking forever.
I'm sure it'll be great one day, and I'll continue to re-visit it, but for now I'll stick with the other 3 when I need an async agent
causal 2 hours ago

I still need to try them, but I'm having a hard time envisioning async agents being nearly as useful to me as something local like Claude Code because of how often I need to intervene and ensure it is working correctly.
Won't the loop be pretty long-tail if you're using async agents? Like don't you have to pull the code, then go through a whole build/run/test cycle? Seems really tedious vs live-coding locally where I have a hot environment running and can immediately see if the agent goes off the rails.
- artdigital 2 hours ago
  
  We use async agents heavily. The key is to have proper validation loops, tests and strong/well phrased system prompts, so the agent can quickly see if something is broken or it broke convention.
  We have proper issue descriptions that go into detail what needs to be done, where the changes need to be made and why. Break epics/stories down into smaller issues that can be chopped off easily. Not really different to a normal clean project workflow really.
  Now for most of the tickets we just assign them to agents, and 10 minutes later pull requests appears. The pull requests get screened with Gemini Code Assist or Copilot Agent to find obvious issues, and github actions check lint, formatting, tests, etc. This gets pushed to a separate test environment for each branch.
  We review the code, test the implementation, when done, click merge. Finished.
  I can focus on bigger, more complex things, while the agents fix bugs, handle small features, do refactorings and so on in the background. It's very liberating. I am convinced that most companies/environments will end up with a similar setup and this becomes the norm. There really isn't a reason why not to use async agents.
  Yeah sure if you give a giant epic to an agent it will probably act out of line, but you don't really have these issues when following a proper project management flow
nonhaver 7 hours ago

similar experience. i would put codex over claude personally due to the better rate limits (of which i haven’t hit once yet even on extensive days) but jules was not very good - too messy and i prefer alternative outputs to creating a pull request. like in codex you can copy a git patch which is so incredibly useful to add personal tweaks before committing
jerf 2 hours ago

"I honestly thought it was just an experiment that got abandoned and never expected it to actually become a real product, similar to how GH Copilot Spaces was an experiment and turned into Copilot agent."
My guess is that this is a play for the future. They know that current-day AIs can't really handle this in general... but if you wait for the AI that can and then try to implement this functionality you might get scooped. Why not implement it and wait for the AIs to catch up, is probably what they are thinking.
I'm skeptical LLMs can ever achieve this no matter how much we pour into them, but I don't expect LLMs to be the last word in AI either.

p1nkpineapple 20 hours ago

I've been actually kind-of enjoying using Jules as a way of "coding" my side project (a react native app) using my phone.

I have very limited spare time these days, but sometimes on my walk to work I can think of an idea/feature, plan out what I want it to do (and sometimes use the github app to revise the existing code), then send out a few jobs. By the time I get home in the evening I've got a few PRs to review. Most of the code is useless to me, but it usually runs, and means I can jump straight into testing out the idea before going back and writing it properly myself.

Next step is to add automatic builds to each PR, so that on the way home I can just check out the different branches on my phone instead of waiting to be home to run the ios simulator :D

azath92 2 hours ago

Im not sure about how this translates to react native, AFAICT build chains for apps less optimiside, but using vercel for deployment, neon for db if needed, Ive really been digging the ability for any branch/commit/pr to be deployed to a live site i can preview.
Coming from the python ecosystem, ive found the commit -> deployed code toolchain very easy, which for this kind of vibe coding really reduces friction when you are using it to explore functional features of which you will discard many.
It moves the decision surface on what the right thing to build to _after_ you have built it. which is quite interesting.
I will caveat this by saying this flow only works seamlessly if the feature is simple enough for the llm to oneshot it, but for the right thing its an interesting flow.
fullstackwife 15 hours ago

Async vibe coding is the new hot thing, I'm also recommending to check GH Copilot Coding Agent (NOT the VScode one)
Freedom2 16 hours ago

I hooked up a GitHub repo that's long been abandoned by me and I've just been tinkering with menial stuff - updating dependencies, refactoring code without changing any actual implementation details, minor feature or style updates. It mostly works well for those use cases. I don't know if I'd give it anything important to develop though.
sergeyk 16 hours ago

This is exactly why we built superconductor.dev, which has live app preview for each agent. We support Claude Code as well as Gemini, Codex, Amp. If you want to check it out just mention HN in your signup form and I’ll prioritize you :)

esafak 21 hours ago

The daily task limit went down from 60 to 15 (edit: on the free plan) with this release. Personally I wasn't close to exhausting the limit because I had to spend time going back and forth, and fixing its code.

To communicate with the Jules team join https://discord.gg/googlelabs

lacoolj 21 hours ago

That's odd cuz my daily task limit went up to 100.
Are you on Google Pro or using it free?
Also, I've found that even with 60, over an entire full day/night of using it for different things, I never went over 10 tasks and didn't feel like I was losing anything. To be clear, I've used this every weekend for months and I mean that I've never gone over 10 on any one day, not overall.
15 should be plenty, especially if you aren't paying for it. I will likely never use 100 even on my busiest of weekends

zaptheimpaler 14 hours ago

I tried it with this prompt a month or two ago, and again now:

"Write a basic raytracer in Rust that will help me learn the fundamentals of raytracing and 3D graphics programming."

Last time it apparently had working or atleast compiling code, but it refused to push the changes to the branch so I could actually look at it. I asked it, cajoled it, guilted it, threatened it, it just would not push the damn code. So i have no idea if it worked.

This time it wrote some code but wrote 2 main.rs files in separate directories. It split the code randomly across 2 directories and then gets very confused about why it doesn't run right. I explained the problem and it got very lost running around the whole filesystem trying to run the program or cargo in random directories, then gave up.

jsnell 13 hours ago

Yeah, the interaction model around pushing commits is absolutely baffling.
nojito 11 hours ago

This isn’t a zero shot tool.
Not sure why folks continue to zero shot things.
- jstummbillig 4 hours ago
  
  I don't understand the category. Is getting things right not the goal?
- SchemaLoad 10 hours ago
  
  These tools are marketed as if you just drop the jira text in and it spits out a finished PR. So it's notable that they don't work the way they are advertised.
  - nojito 6 hours ago
    
    Which is not what the OP did.
- nonhaver 7 hours ago
  
  there exists tools that can zero shot complex tasks (claude/codex). bar has been raised and jules doesn’t stack up. knowing google it will probably improve in due time

mvieira38 a day ago

Good to see competition for Codex. I think cloud-based async agents like Codex and Jules are superior to the Claude Code/Aider/Cursor style of local integration. It's much safer to have them completely isolated from your own machine, and the loop of sending them commands, doing your own thing on your PC and then checking back whenever is way better than having to set up git worktrees or any other type of sandbox yourself

agentastic 21 hours ago

Codex/Jules are taking a very different approach than CC/Curser,
There used to be this thesis in software of [Cathedral vs Bazaar](https://en.wikipedia.org/wiki/The_Cathedral_and_the_Bazaar), the modern version of it is you either 1) build your own cathedral, and you bring the user to your house. It is a more controlled environment, deployment is easier, but also the upside is more limited and also shows the model can't perform out-of-distribution. OpenAI has taken this approach for all of its agentic offering, whether ChatGPT Agent or Codex.
2) the alternative is Bazaar, where you bring the agent to the user, and let it interact with 1000 different apps/things/variables in their environment. It is 100x more difficult to pull this off, and you need better model that are more adaptable. But payoff is higher. The issues that you raised (env setup/config/etc) are temporary and fixable.
- robluxus 16 hours ago
  
  This is the actual essence of CATB, has very little to with your analogy:
  -----
  > The software essay contrasts two different free software development models:
  > The cathedral model, in which source code is available with each software release, but code developed between releases is restricted to an exclusive group of software developers. GNU Emacs and GCC were presented as examples.
  > The bazaar model, in which the code is developed over the Internet in view of the public. Raymond credits Linus Torvalds, leader of the Linux kernel project, as the inventor of this process. Raymond also provides anecdotal accounts of his own implementation of this model for the Fetchmail project
  -----
  Source: Wikipedia
  - jval43 10 hours ago
    
    While the GP is completely off-base with their analogy, the Wikipedia summary is so simplified to the point of missing all the arguments made in the original essay.
    If you're a software developer and especially if you're doing open source, CATB is still worth a read today. It's free on the author's website: http://www.catb.org/~esr/writings/cathedral-bazaar/cathedral...
    From the introduction:
    >No quiet, reverent cathedral-building here—rather, the Linux community seemed to resemble a great babbling bazaar of differing agendas and approaches (aptly symbolized by the Linux archive sites, who'd take submissions from anyone) out of which a coherent and stable system could seemingly emerge only by a succession of miracles.
    > The fact that this bazaar style seemed to work, and work well, came as a distinct shock. As I learned my way around, I worked hard not just at individual projects, but also at trying to understand why the Linux world not only didn't fly apart in confusion but seemed to go from strength to strength at a speed barely imaginable to cathedral-builders.
    It then goes on to analyze why this worked at all, and if the successful bazaar-style model can be replicated (it can).
- conartist6 an hour ago
  
  CATB was about how to organize people to tackle major community/collaborative efforts in a social system that is basically anarchy.
  Both situations you've described are Cathedrals in the CATB sense: all dev costs are centralized and communities are impoverished by repeating the same dev work over and over and over and over.
- throwup238 21 hours ago
  
  Cursor now has “Background Agents” which do the same thing as Codex/Jules.
- highfrequency 20 hours ago
  
  Can you elaborate on how Codex vs. CC maps onto this cathedral vs. bazaar dichotomy? They seem fairly similar to me.
  - agentastic 20 hours ago
    
    of course,
    cathedral = sandbox env in the provider's cloud, so [codex](https://chatgpt.com/codex) uses this model. Their codex-cli product is the Bazaar model, where you run in your computer, in your own environment.
    Claude Code, on the other hand, doesn't have the cloud-based sandboxing product, you have to run in on your computer, so the bazaar model. You can also run in in a way that anthropic never envisioned (e.g. give it control to your house). Curser also follows the same model, albeit they have been trying to get into the cathedral model by using the background agent (as someone also pointed out below). Presumably not to lose the market share to codex/jules/etc.
    
    ramoz 13 hours ago
    
    Claude Code does have remote sandboxing, and it’s better & more enterprise ready than any of these alternatives.
    Can deploy as a github action right now.
    Tag it in any new issue, pr, etc.
    Future history will highlight Claude Code as the first true form agent. These other analogies are not intuitive enough for the evolution of an os-native agent into eventual ai robotics.
vb-8448 21 hours ago

It's safer have them completely isolated, but it's slower and more expensive.
Sometimes I just realize that CC going nuts and stop it before it goes too far (and consume too much). With this async setup, you may come after a couple of hours and see utter madness(and millions of tokens burned).
- itsalotoffun 4 hours ago
  
  You need a supervisor agent to periodically check the progress and `if (madness) halt(1)`
- unshavedyak 21 hours ago
  
  Completely agree. I also want to tightly control the output, and the more it just burns and burns the more i become overwhelmed by a giant pile of work to review.
  A tight feedback loop is best for me. The opposite of these async models. At least for now.
diggan 5 hours ago

> I think cloud-based async agents like Codex and Jules are superior to the Claude Code/Aider/Cursor style of local integration
Ideally, a combination of both I feel like would be a productive setup. I prefer the UI of Codex where I can hand-off boring stuff while I work on other things, because the machines they run Codex on is just too damn slow, compiling Rust takes forever and it needs to continuously refetch/recompile dependencies instead of leveraging caching, compared to my local machine.
If I could have a UI + tools + state locally while the LLM inference is the only remote point, the whole workflow would end up so much faster.
stillsut 18 hours ago

I think the Github-PR model for agent code suggestions is the path of least resistance for getting adoption from today's developers working in an existing codebase. It makes sense: these developers are already used to the idea and the ergonomics of doing code reviews this way.
But pushing this existing process - which was designed for limited participation of scarce people - onto a use-case of managing a potentially huge reservoir of agent suggestions is going to get brittle quickly. Basically more suggestions require a more streamlined and scriptable review workflow.
Which is why I think working in the command line with your agents - similar to Claude and Aider - is going to be where human maintainers can most leverage the deep scalability of async and parallel agents.
> is way better than having to set up git worktrees or any other type of sandbox yourself
I've built up a helper library that does this for you for either aider or claude here: https://github.com/sutt/agro. And for FOSS purposes, I want to prevent MS, OpenAI, etc from controlling the means of production for software where you need to use their infra for sandboxing your dev environment.
And I've been writing about how to use CLI tricks to review the outputs on some case studies as well: https://github.com/sutt/agro/blob/master/docs/case-studies/i...
mkw5053 12 hours ago

I also just got an email tonight for early access to try CC in the browser. "Submit coding tasks from the web." "Pick up where Claude left off by teleporting tasks to your terminal" I'm most interested to see how the mobile web UI/UX is. I frequently will kick something off, have to handle something with my toddler, and wish I could check up on or nudge it quickly from my phone.
pjm331 17 hours ago

FWIW you can run Claude code async via GitHub actions and have it work on issues that you @ mention it from - there’s even a slash command in Claude code that will automatically set up your repository with the GitHub action config to do this
xiphias2 21 hours ago

I agree but I just love codex-1 model that is powering codex and see pro 2.5 as inferior.
It's interesting that most people seem to prefer local code, I love that it allows me to code from my mobile phone while on the road.
- jondwillis 20 hours ago
  
  What kind of things are you coding while “on the road”? Phone addiction aside, the UX of tapping prompts into my phone and either collaborating with an agent, or waiting for a background agent to do its thing, is not very appealing.
  - xiphias2 19 hours ago
    
    Mainly thinking about what are the minimum testable changes that I can give to codex to work on the background.
    Tapping the prompts in is the easy part, but async model is different to work with, I feel more like a manager, not a co-developer.
- muratsu 16 hours ago
  
  hey I am exactly the same, is there a way to reach out to you? I would love to chat more about mobile coding
mattnewton a day ago

Getting the environment set up in the cloud is a pain vs just running in your environment imo. I think we’ll probably see both for the foreseeable future but I am betting on the worse-is-better of cli tools and ide integrations winning over the next 2 years.
- mvieira38 21 hours ago
  
  It took me like half an afternoon to get set up for my workplace's monorepo, but our stack is pretty much just Python and MongoDB so I guess that's easier. I agree, it's a significant trade-off, it just enables a very convenient workflow once it's done, and stuff like having it make 4 different versions with no speed loss is mind-blowing.
  One nice perk on the ChatGPT Team and Enterprise plans is that Codex environments can be shared, so my work setting this up saved my coworkers a bunch of time. I pretty much just showed how it worked to my buddy and he got going instantly
- drdrey 21 hours ago
  
  with something like github copilot coding agent it's really not, the environment setup is just like github actions
- MattGaiser 21 hours ago
  
  It’s surprisingly good. If you try Copilot in GitHub, it has had no issues setting up temporary environments every single time in my case.
  No special environment instructions required.
  - mattnewton 14 hours ago
    
    It has depended heavily on the project. New SPA for the web? No problem. Nontrivial application with three services each with their own container in a monorepo? ML inference code that requires cuda hardware? No chance.

timdumol 20 hours ago

I've tried using Jules for a side project, and the code quality it emits is much worse than GH Copilot (using Claude Sonnet), Gemini CLI, and Claude Code (which is odd, since it should have the same model as Gemini CLi). It also had a tendency to get confused in a monorepo -- it would keep trying to `cd backend && $DO_STUFF` even when it was already in backend, and iterate by trying to change `$DO_STUFF` rather than figure out that it's already in the backend directory.

xnx 20 hours ago

> I've tried using Jules for a side project, and the code quality it emits is much worse than GH Copilot
It might be worth trying again.
"Jules now uses the advanced thinking capabilities of Gemini 2.5 Pro to develop coding plans, resulting in higher-quality code outputs"
- timdumol 19 hours ago
  
  Ah, I missed that. I do vaguely remember that it used to use Flash, but I can't find where I saw it now. Thanks, I'll give it a shot!
qingcharles 20 hours ago

I just tried Jules for the first time and it did a fantastic job on reworking a whole data layer. Probably better than I would have expected from Copilot. So.. I'm initially impressed. We'll see how it holds up. I was really impressed with Copilot, but after a lot of use there are times when it gets really bogged down and confused and you waste all the time you would have saved. Which is the story of AI right now.
ttul 18 hours ago

I used it to make a small change (adding colorful terminal output) to a side project. The PR was great. I am seeing that LLM coding agents excel at various things and suck at others quite randomly. I do appreciate the ease of simply writing a prompt and then sitting back while it generates a PR. That takes very little effort and so the pain of a failure isn't significant. You can always re-prompt.

simonw 19 hours ago

I like the term "asynchronous coding agent" for this class of software. I found a couple of other examples of it in use, which makes me hope it's going to stick:

- https://blog.langchain.com/introducing-open-swe-an-open-sour...

- https://github.com/newsroom/press-releases/coding-agent-for-...

nylonstrung 2 hours ago

I'm pretty sure if you use it will stick Simon

ramoz 21 hours ago

There is only one true agent in 2025, Claude Code.

That said, Gemini is very powerful for it's quality long-context capabilities: https://www.reddit.com/r/ClaudeAI/comments/1miweuv/comment/n...

patrickhogan1 20 hours ago

I agree with you at this point. Even though Google is performing well on benchmarks and releasing impressive models like World Models Genie 3, the Gemini CLI suggestions/changes feel overly formulaic. Almost like its priorities are that of an OCD coder that cares more about tabs vs spaces instead of building a useful feature. For example, in a recent project, Google CLI spent all of my token allotment for that day on trivial tasks like tweaking ESLint configs or modularizing code that didn't need modularization.
In contrast, Claude Code seems to interpret my prompts better and helps me ship real product features for users.
Maybe it’s a system prompt issue. Its likely my prompting causing the problem. But Claude Code seems to understand my intent better.
- patrickhogan1 13 hours ago
  
  I’m being down voted. I don’t have an agenda. I’m simply sharing my experience. If you’re getting good results with Gemini CLI as an alternative to Claude Code, please let me know what you’re doing to get that performance.
  I’m impressed by Gemini Pro 2.5’s NLP capabilities. I use that model in production on several projects. My comments are directed only at Gemini CLI. Which FWIW is better than OpenAI Codex CLI, but much worse (for me) than Claude Code.
  Even with Pro, the strict token limits combined with the model's tendency to add unrequested modifications means I run out of tokens before completing my intended tasks. Others have the same issue https://github.com/google-gemini/gemini-cli/issues/4300
- ramoz 19 hours ago
  
  It's how these models/their-harnesses (e.g. the Claude Code js program) are being trained together in the RL stages.
  I think the software is now a very important part of the training process. Which is why I think frontier labs are only capable of shipping "actual" agents.
  Anthropic has figured something out here that others have not.
  https://news.ycombinator.com/item?id=44816424
- dash2 18 hours ago
  
  Perhaps this is the modern version of "every company ships its own org chart"? Maybe Gemini's priorities are those of a Google engineer, Claude's are those of an engineer at Anthropic....
the_sleaze_ 20 hours ago

Thinking the same. I don't want Github approval process to sit in between me and the changes - the killer feature of claude code is being able to head it off as it starts to go down a bad path, and to code myself in between its steps.
Do you let juniors complete full features without asking questions or make them check in when they get flustered?
- jondwillis 20 hours ago
  
  I do want to try out some background agents, but from my experience with Cursor’s (frontier model agents) frequency of going off the rails despite having rules and context to help avoid producing slop, I can’t see background agents being that generally useful yet.
  - ramoz 20 hours ago
    
    for you or anyone else that wants this to be real - I would love to test a solution out with you.
pjm331 17 hours ago

Source graph amp is pretty good as well albeit lacking a lot of the polish and features of Claude code
But I sometimes reach for it for code review in particular since it calls out to o3 via its “oracle” tool
- ramoz 17 hours ago
  
  o3 is a great oracle I use as well - in my dumb reddit/theater mode I mention that.
  I'm building integrations for both Claude Code and AMP! AMP also provides really important features of a harness that others haven't quite caught up on. OpenCode, sort of, but that is driven in a bit of a cultish open source way.

youssefarizk 5 hours ago

Something very strange happened when they first shipped it in preview. It was GOOD, back when the daily limit was 5 requests.

A couple days / weeks later, they announced that the request went up to 60 day (which is way over what anyone actually needs) and the quality dropped instantly. For those short few days at the start, it was actually my favourite coding agent, but somehow they killed it

natch a day ago

Why is the pricing so well hidden? I had to ask Grok. Google would not show even the overview page unless I click-to-agree to all their terms and conditions.

OK found a good page for the plans here… ymmv if you're not logged in:

https://gemini.google/subscriptions/

rvnx a day ago

It should be illegal to say "> Highest task limits" or change them retroactively like Claude or Cursor did
- unreal6 21 hours ago
  
  In the middle of a billing cycle (which could be a month or year, in some cases), I would agree
jondwillis 20 hours ago

>had to use grok
Had to implies that you pointed other models at the task and they failed, or that grok is your go-to model for this.
Can you explain?
- natch an hour ago
  
  Yes the reason Grok is the go to is that it has live data from X conversations so it’s very good for finding answers backed by threads on brand new topics.
  In the old days we would search twitter for “sirens Potrero hill” and find users talking about what they were seeing live out their window that was the cause of the siren. Now grok has that same info.
  I don’t live on (in?) Potrero hill, just an example.

purpleidea 21 hours ago

I've been playing with it, and I've been generally not impressed.

There are both obvious annoying UI bugs (which should be easy to fix unless they vibe coded the whole thing) and the output of the tool isn't very good for anything but the simplest problems.

If the model was really good, I'd love this, but it's not.

xnx 20 hours ago

> If the model was really good, I'd love this, but it's not.
Might be worth trying again now:
"Jules now uses the advanced thinking capabilities of Gemini 2.5 Pro to develop coding plans, resulting in higher-quality code outputs"

lacoolj 20 hours ago

I've used this tool for a few months now and have been pretty impressed by it. It handles large quantities of tasks very well and is good at making tests for very specific/isolated functions.

I have found it is not very good when trying to make new projects with different react libraries, inside of existing projects (for instance, my admin UI that I had it place inside of my existing server project).

If you start noticing it change directories and move around and delete/move directories a lot, you should stop the process, reconsider what you're telling it to do and how, then start from scratch with a new task.

rcakebread 18 hours ago

I don't have much experience with using LLMs to help write code, but I gave Jules a try on a new, very unorganized Python project I recently started. About 800 lines of code. It need a major refactoring, so I simply asked Jules to make suggestions.

At a cursory glance, it did a great job. It failed the first time. I gave it the error message and it fixed it. I was shocked it ran after that. Not bad for the free plan.

klipklop 16 hours ago

Jules quickly after a few messages/prompts just gets stuck in an endless loop like Gemini-cli does. These two products are really behind Claude Code at producing something beyond the most limited and basic changes.

The worst part is that there is no "STOP" button to quickly get it out of the loop it's stuck in.

ankit219 14 hours ago

This was a version where they wanted to collect data. The next version where gemini cli/jules harness is likely part of the RL training environment and the model would work a lot better. Thats the trajectory of improvement.

surrTurr 7 hours ago

why are AI products named so simlar? JetBrains Junie[1], now Jules...

[1]: https://www.jetbrains.com/junie/

Retr0id 21 hours ago

> over 140,000 code improvements shared publicly.

Where can I check them out?

thomasingalls 8 hours ago

It baffles me that I can't give it feedback by commenting on the pr. If it's bothering to offer a create PR workflow, I should be able to course correct targeted bits of code using GitHub's UI, no?

franze a day ago

the agent is good, the UI horrible.

"Usability declines in inverse proportion to the number of vice-presidents who sign the release notes." Law of Interface Inversion

esafak 21 hours ago

No signatures: https://jules.google/docs/changelog/

byefruit a day ago

How is this different from https://github.com/google-gemini/gemini-cli ?

Edit: it seems this is a hosted version. Would be nice if they actually joined up some of their products.

mattnewton a day ago

Idk, I think this is easier to talk about than “codex” by open ai which means either means the cli or the web interface to an agent with its own computer.
(Or a deprecated code fine tuned model)
esafak 21 hours ago

Being hosted, it does not have access to your development environment. Its Ubuntu sandbox is quite restricted. https://jules.google/docs/environment/
0x457 19 hours ago

Jules is web only from my understanding, similar to OpenAI's Codex (web version...)
You give it a task and it produces a PR. While gemini-cli is more like pair programming with AI.

leosanchez 4 hours ago

I would like for it to have GitHub issues integration.

itsalotoffun 4 hours ago

Something like this? https://blog.google/technology/developers/introducing-gemini...

ravenical 5 hours ago

Did they get AI to generate this pixel art? It looks terrible.

loliver666 5 hours ago

Yes of course! Who cares if it looks worse than a human can do, it was free and instant . And that's all that matters in our AI future.

venantius 8 hours ago

I feel like this is closest to trying to compete with Charm’s Crush rather than anything else.

UncleOxidant 21 hours ago

What does it mean by "asynchronous coding agent" exactly? They don't go into any details there. Like how does this differ from Gemini CLI? Is this more of a pass a high level idea to it and then go on vacation sort of thing? If so, I don't see how that can't end badly.

nemomarx 21 hours ago

give high level user stories to it > it writes code and tests and etc for several hours > returns to you when it thinks it's done for you to review a pull request or etc
- UncleOxidant 21 hours ago
  
  I'm afraid that's a hard nope. Gemini CLI is already doing stuff I don't want it to unless I'm very careful to keep it on a short leash.
  - 0x457 19 hours ago
    
    Well, first Jules came before Gemini CLI. Second, that's okay, as long as it can verify its work (i.e. run tests) it will eventually figure out what to do.
    Its sandbox is very limited and prevents proper grounding IMO. However, if their sandbox works for your project, it will be alright.

herval 20 hours ago

I’m I being pedantic or does the jules.google landing page screams “howdie, kids” (the Buschemi meme).

It tries to be funny and authentic, but the cheap looking mascot and low contrast text makes it feel like IBM pretending to be vibecoded startup.

Google has/had a distinct branding with its austere and no-nonsense style in the past, then moved into a clunky-but-not-AWS design aesthetic with GCP (which is still recognizable), and now the AI products just look so completely inconsistent, you can’t even tell they’re from Google

jmtulloss 19 hours ago

Both from the design scheme and the process it uses to go about its business, Jules seems very inspired by replit

theusus a day ago

Used it didn't like it. Claude Code is far better because the active collaboration part.

mvieira38 a day ago

Different use cases, IMO. With a cloud solution like this it's much easier to ask it to solve whatever issues or backlog tasks you have and continue working on your own on your main project. I don't think this is a solution for vibecoding or for the AI copilot crowd
- throwup238 a day ago
  
  It is also great for on the go when you only have a phone. I frequently fire off agents when I get a new idea or some backlog I want to tackle while I’m the gym - the 2 minute rest periods between sets is perfect to write up a prompt or review some changes.
r0fl a day ago

I thought I would like it based on the pitch but gave up using it after just a handful of times
Liking kiro a lot these days
- ghawkescs a day ago
  
  How long is the queue for invites to Kiro these days? I joined the wait-list right after it launched.
  - jjani a day ago
    
    Seemingly infinite, I don't think they've invited anyone from the list so far.
    
    0x457 19 hours ago
    
    Do you need an invitation? I'm just using my Amazon Q Dev account that I pay $20 a month for. Works fine with Kiro.
beefnugs 21 hours ago

So are there just 100 developers sitting in the edge of their seats constantly refreshing all the spy reports from other AI companies, waiting to copy the exact same idea and shit it out at top speed?
Or is it more of a vibe code thing where every new feature from everyone is recreated by every other company in a matter of days?
Do they even realize they are destroying their own industry economics? The only reason anyone uses big tech is because there are no alternatives

mcemilg 7 hours ago

Sorry, but it's garbage for now, at least in my case. From an asynchronous agent, I would expect it to get more done in 10 minutes than a regular agent like claude code. Instead, you give it a task, wait 10 minutes, and get garbage code. Then you provide feedback, wait another 10 minutes, and still get something that doesn't compile. Meanwhile, Claude Code does this in 10 seconds and usually produces runnable code.

siva7 7 hours ago

That's the problem with remote async agents. You can't steer the ship effectively anymore. Sure, i have 10 pull requests in 10 minutes by now, and i'm throwing all of them out because they are crap after i spent about an hour reviewing them.

SchizoDuckie a day ago

Who in their right mind hands off tasks to one of these for their day job? They can never be trusted.

esafak 21 hours ago

You have to review their work, the same as any human's. What's the matter, you don't like cheap assistants?
- munificent 21 hours ago
  
  > What's the matter, you don't like cheap assistants?
  I think the main reason I'm not personally excited about AI is that... no, I don't, actually.
  I'm in my late 40s. I have had many opportunities to move into management. I haven't because while I enjoy working with others, I derive the most satisfaction from feeling like I'm getting my hands dirty and doing work myself.
  Spending the entire day doing code reviews of my army of minions might be strictly more productive, but it's not a job I would enjoy having. I have never, for a second, felt some sort of ambitious impulse to move up the org chart and become some sort of executive giving marching orders.
  The world that AI boosters are driving towards seems to me to be one where the only human jobs left are effectively middle management where the leaf nodes of the org chart are all machines. It may the case that such a world has greater net productivity and the stock prices will go up.
  But it's not a world that feels meaningful, dignified, or desirable to me.
  - nerdix 17 hours ago
    
    There are 3 kinds of developers.
    1. Those that are motivated by "building things". The actual programming is just a means to an end.
    2. Those that are motivated by the salary alone and either hate the work or are indifferent to it.
    3. Those that are motivated by the art of programming itself. Hands on keyboard, thinking through a problem and solving it with code.
    Developers that fall into category 1 and 2 love AI. Its basically a dream come true for them ("I knocked out 3 sides projects in a month" for #1 and "You're telling me that all I have to do is supervise the AI and I still get paid?" for #2).
    Its basically a living nightmare for developers in category 3.
    I've noticed that founders seem to be way higher on AI than non-founders. I think a lot of founders fit into category 1.
    
    munificent 15 hours ago
    
    I wouldn't say "kinds" because they overlap. Every developer has their own mixture of how much each of those categories is rewarding for them.
    But, yes, I think that's a good breakdown of where most of the reward from coding comes from.
  - lbrito 21 hours ago
    
    I feel exactly this but I'm in my mid 30s. You're lucky in the sense that you probably have a longer career and may be able to retire.
    
    munificent 21 hours ago
    
    I'm definitely not at retirement age yet, but I do have to admit that I'm hopeful I can make it to retirement while still mostly working in a way that I enjoy.
    At the same time, I've realized that "let me just try to squeeze out the last of my career" is a really unhealthy mindset for me to hold. It sort of locks me into a feeling like my best days are behind me or something.
    So I am trying to dabble in using AI for coding and trying to make sure I stay open-minded and open to learning new things. I don't want to feel like a dinosaur.
    
    freshtake 19 hours ago
    
    I've used all of the popular coding agents, including Jules. The reality to me is that they can and should be used for certain kinds of low severity and low complexity tasks (documentation, writing tests, etc.). They should not be used for the opposite end of the spectrum.
    There are many perspectives on coding agents because there are many different types of engineers, with different levels of experience.
    In my interactions I've found that junior engineers overestimate or overuse the capabilities of these agents, while more senior engineers are better calibrated.
    The biggest challenge I see is what to do in 5 years once a generation of fresh engineers never learned how compilers, operating systems, hardware, memory, etc actually work. Innovation almost always requires deep understanding of the fundamentals, and AI may erode our interest in learning these critical bits of knowledge.
    What I see as a hiring manager is senior (perhaps older) engineers commanding higher comp, while junior engineers become increasingly less in demand.
    Agents are here to stay, but I'd estimate your best engineering days are still ahead.
  - keeda 17 hours ago
    
    You're missing a third option, which is actually closer to the role of managing coding agents: being a "senior engineer / architect / what-have-you". IME the more senior engineering roles (staff, principal, fellow, etc) in most companies, especially Big Tech companies, involves coordinating large projects across multiple teams of engineers. It is essentially a necessity to achieve the scale of impact required at those levels.
    At that level, you almost never get to be hands-on with code; the closest you get is code reviews. Instead you "deliver value" through identifying large-scale opportunities, proposing projects for them, writing design and architecture docs, and conducting "alignment meetings" where you convince peers and other teams to build the parts needed to achieve your vision. The actual coding grunt work is done by a bunch of other, typically more junior engineers.
    That is also the role that often gets derided as "architecture astronauts." But it is still an extremely technical role! You need to understand all the high-level quirks of the underlying systems (and their owners!) to ensure they can deliver what you envision. But your primary skills become communication and people skills. When I was in that role, I liked to joke that my favorite IDEs are "IntelliJ, Google Docs, and other engineers."
    You'll note that is a very different role from management, where your primary responsibilities are more people-management and owning increasingly large divisions of the business. As a senior engineer you're still a leaf node in the org-chart, but as a manager you have a sub-tree that you are trying to grow. That is where org-chart climbing (and uncharitably, "empire-building") become the primary skillset.
    As such, the current Coding Agent paradigm seems very well-suited for senior engineers. A lot of the skillsets are the same, only instead of having to persuade other teams you just write a deisgn doc and fire off a bunch of agents, review their work, and if you don't like their outputs, you can try again or drop down to manual coding.
    Currently, I'm still at the "pair-program with AI" stage, but I think I'll enjoy having agents. These days I find that coding is just a means to an end that is personally more satisfying: solving problems.
    
    majormajor 14 hours ago
    
    > As such, the current Coding Agent paradigm seems very well-suited for senior engineers. A lot of the skillsets are the same, only instead of having to persuade other teams you just write a deisgn doc and fire off a bunch of agents, review their work, and if you don't like their outputs, you can try again or drop down to manual coding.
    I have tried this a few times, it's not there yet. The failures are consistently-shaped enough to make we wonder about the whole LLM approach.
    Compared to handing off to other engineers there are a few problems:
    - other engineers learn the codebase much better over time, vs relying on either a third party figuring out the right magic sauce to make it understand/memoize/context-ize your codebase or a bunch of obnoxious prompt engineering
    - other engineers take feedback and don't make the same types of mistakes over and over. I've had limited luck with things like "rules" for more complex types of screwups - e.g. "don't hack a solution for one particular edge case three-levels deep in a six-level call tree, find a better abstraction to hoist out the issue and leave the codebase better than you found it"
    - while LLMs are great at writing exhaustive coverage tests of simple functionality, they aren't users of the project and generally struggle to really get into that mindset to anticipate cross-cutting interactions that need to be tested; instead you get a bunch of local maxima "this set of hacks passes all the current testing" candidate solutions
    - the "review" process starts to become silly and demoralizing when your coworker is debating with you about code neither of you wrote in a PR (I sure hope you're still requiring a second set of human eyes on things, anyway!)
    If you have a huge backlog of trivial simple small-context bugs, go nuts! It'll help you blow through that faster! But be prepared to do a lot of QA ;)
    Generally I'd call most of the issues "context rot" in that even after all the RL that's been done on these things to deal better with out-of-distribution scenarios, they still struggle with the huge amount of external context that is necessary for good engineering decision making in a large established codebase. And throwing more snippets, more tickets, more previous PRs, etc, at it seems to rapidly hit a point of diminishing returns as far as its "judgement" in picking and following the right bits from that pile at the right time.
    It's like being a senior engineer with a team of interns but who aren't progressing so you're stuck as a senior engineer cleaning up crappy PRs constantly without being able to grow into the role of an architect who's mentored and is guiding a bunch of other staff and senior engineers who themselves are doing more of the nitty gritty.
    Maybe the models get better, maybe they don't. But for now, I find it's best to go for the escape hatch quickly once things start going sideways. Because me getting better at using today's models won't cause any generational leap forward. That feels like it will only come from lower level model advances, and so I don't want to get better at herding interns-who-can't-learn-from-me. Better for now to stick to mentoring the other people instead.
  - esafak 21 hours ago
    
    You could consider yourself liberated to concentrate on higher level concerns like architecture and API/product design.
    
    munificent 15 hours ago
    
    I've never seen someone who can do good architecture, API, or product design that doesn't deeply relish getting their hands dirty all the way down in the guts of the thing. (To be clear, I have seen plenty of people who like getting their hands dirty who also suck at design. It's a necessary but not sufficient condition.)
    How can you do good design work if the only "people" who have experience with what you're designing are the AI agents you order around? I guess if you're designing an API that you only intend to be used by other AI agents, that's probably fine.
    At some point, though, it's gotta feel like working at a pet food company coming up with new cat food recipes. You can be very successful by focus testing on cats, but you'll never really know the thing you're making. (No judgement if you do want to eat cat food, I guess.)
    
    dingnuts 21 hours ago
    
    oh come on, I got into this field because I like to code.
    now I'm liberated to do all the crap I don't like and never code. fuck off
- vb-8448 21 hours ago
  
  They will produce PR(and probably shitty code) on a rate you are not able to review XD
  - lbrito 21 hours ago
    
    There will likely be another agent to review the PRs and make questionable choices :D
    
    9dev 21 hours ago
    
    And all that with an energy requirement a lot higher than a single human just doing it right in the first place, and learning something in the process. It all seems so incredibly weird and futile to me.
    
    vb-8448 21 hours ago
    
    And token will burn and provider will bill XD
  - esafak 21 hours ago
    
    And it often does! When I don't like its work I provide stricter instructions and repeat if I think it will succeed.
    I still end up ahead.
- swat535 14 hours ago
  
  An assistant is an intelligent human being who understands basic concepts, they are not a slot machine like AI is.
  My experience using these is that it makes more time to reverse engineer the bloat they spill out than to write the thing myself.
  God help you if you attempt to teach them anything, they will say "You're absolutely right!" and then continue churning out the same broken code.
  Then you have to restart with a "fresh" context and give them the requirements from scratch and hope that this time they come up with something better.
- percentcer 21 hours ago
  
  Assistants can be taught
  - esafak 21 hours ago
    
    And these models get upgraded -- at a much faster averaged rate than humans. Continual vs punctuated improvement :)
- SchizoDuckie 21 hours ago
  
  I can trust humans to do as I ask.
  - jondwillis 20 hours ago
    
    Have you met humans? I can’t trust myself with half of the things I do.
    Not saying that I trust LLMs more…
ActionHank a day ago

They can be great for focused tasks with very specific acceptance criteria. Especially in cases where you have broad test coverage that can verify nothing broke.
We already see bots that monitor repos to bump versions. I suspect we will see this expand to handle larger version bumps, minor issues, minor features. Basically junior dev learning tasks.
- SchizoDuckie 21 hours ago
  
  Great. So Junior devs will be useless now. Now how are we going to train more senior devs that know what they're doing?
  - alex_suzuki 21 hours ago
    
    No need. In a year, senior devs will be useless as well. </sarcasm>
  - ianandrich 21 hours ago
    
    Thats the neat part. We won't.
  - seunosewa 21 hours ago
    
    They will train themselves by doing open source projects with AI.
  - midnitewarrior 21 hours ago
    
    I really appreciate your optimism about a future world where you expect senior devs will be needed. How do we get the tech bros to share your vision for the future?
    
    SchizoDuckie 21 hours ago
    
    As it stands right now, until there is some radically new way that doesn't hallucinate implementations, is grounded in security rules and actually understands what it's doing in the larger context of the system it's working in I am not really worried about my job.
    I stopped worrying about what techbro's think a long time ago. I saw one slinging a blockchain ai nft filesystem that will ingest and organize your documents for you on twitter yesterday.
  - brap 21 hours ago
    
    I’m sorry but how is that any of your business?
    If a company prefers small teams right now, at the cost of not having juniors to grow into seniors in the future, they are well within their rights to make that decision.
    Might be an awful decision, might be a smart one, in any case there is no “we” here.
    
    SchizoDuckie 21 hours ago
    
    How is that any of my business? Well, I'm a software dev by trade and hobby, and I hack the planet on the side and advise multibillion $$$ companies on the security mistakes they make.
    Even for the next 5 years I'd like to be able to have some capable humans in my teams.
    
    brap 19 hours ago
    
    Then hire juniors for your own team? How is this an issue?
    
    vineyardmike 21 hours ago
    
    > I’m sorry but how is that any of your business?
    Part of living in a society is considering the social impact of things. Such as the erosion of training opportunities for young talent.
    Each business can make their own decisions, but someone should be thinking about the greater good. “Within your rights” doesn’t mean it’s a good thing, nor should that be the sole standard we set for members of our society. Same reason people hire interns and write technical blogs, open source code and sponsor school hackathons. Sometimes the greater good should be a consideration.
    
    brap 19 hours ago
    
    >Same reason people hire interns and write technical blogs
    I’m sorry but almost nobody does this for the greater good
jmtulloss 20 hours ago

https://blog.singleton.io/posts/2025-06-14-coding-agents-cro...
The former CTO of stripe, for one.
They show you the code they produce. Why wouldn’t you trust it after reading it?
asadm 19 hours ago

i do. ALL. THE. TIME.

dvngnt_ 21 hours ago

How long do we think it will take for google to rename this like the did from Bard > Gemini

arcticfox 21 hours ago

My problem with Codex is it can't really run Docker. Can Jules or any other competitor?

timrogers 18 hours ago

PM for GitHub Copilot coding agent here!
Our asynchronous coding agent can run Docker in its GitHub Actions-powered development environment - for example it could start a Dockerized web server.
You can learn more about the agent at https://docs.github.com/en/copilot/concepts/coding-agent/cod....
achierius 21 hours ago

Claude Code can.
- 0x457 19 hours ago
  
  Entirely different product. Claude Code runs on my machine, Jules runs on some sandboxed ubuntu vm in GCP without any input (beyond high level user story promp)

computer23 19 hours ago

Waiting for Google to buy the rights to Ask Jeeves.

oblio a day ago

What does this compete with?

felipemesquita a day ago

The codex thing inside ChatGPT, the copilot thing in the github web ui
throwup238 21 hours ago

OpenAI Codex, Github Copilot Agents, Cursor Background Agents, and Devin.
- rmonvfer 21 hours ago
  
  Is Devin still alive?
  - hiatus 20 hours ago
    
    Yes, and Cognition AI bought Windsurf.
joshdick a day ago

Claude Code
- mattnewton a day ago
  
  No Gemini-cli competes with that, this competes with the web-interface-around-agent-with-it’s-own-machine space,
  not the pair-programming-on-your-machine space I would put the cli tools in
- loloquwowndueo a day ago
  
  They even had to choose a French-sounding name to make the comparison clear?
  - longtimelistnr 21 hours ago
    
    dont shoot the messenger, but it's supposed to be like a "butler sounding name"
    
    loloquwowndueo 20 hours ago
    
    Got it - and Jenkins and Hudson were already taken.
    
    nathan_douglas 20 hours ago
    
    Wadsworth is free. brb startup
esafak 21 hours ago

Hosted agents, followed by local CLI agents.
span_ a day ago

Codex by openai

kundi 21 hours ago

Tried both Jules and Gemini CLI, heavily advertised and disappointing. Running it on any slightly more complex codebase, it will crash every few iterations and then complain I have drained all the credits (although it hasn’t done anything yet), not close to live up any basic expectations to their advertised generosity. Disappointing experience

connectsnk 17 hours ago

I am confused. Is this a claude code competitor?

simonpure 20 hours ago

There's now also Gemini CLI GitHub Actions for a similar async experience -

https://github.com/google-github-actions/run-gemini-cli

pmarreck 17 hours ago

Somehow I missed this news from May!

varispeed 19 hours ago

They could call is Deidre. Missed opportunity.

jebronie a day ago

its way better than the github thing in my experience it produces usable PRs

0x457 19 hours ago

A blind monkey smashing a keyboard can produce better PR and PR reviews than GitHub copilot. I don't get how they managed to make copilot so bad.

42lux a day ago

The naming is pathetic.

esafak 21 hours ago

Jules the octopus!