• 1 Post
  • 311 Comments
Joined 1 year ago
cake
Cake day: February 5th, 2025

help-circle
  • I expect “throwaway scripts” to still be written in a way that defends against all of the innumerable shitass foot guns present in the language. Claude was incapable of doing this in a reasonable time frame.

    There’s the problem with your expectations. You may be able to follow your little guide to bash problems and “best practices” but defending against the innumerable shitass footguns present in bash is not a task that can be accomplished by anybody in a reasonable timeframe…

    I wasn’t so thrilled with Claude in the October 2025 timeframe - Opus was slow and costly and wrote un-necessarily weird solutions for simple problems, Sonnet would still get caught in bug-fix creates new bug loops. It (and the other models like Gemini, GPT, etc.) has improved, significantly, since then. Back then it wasn’t hard to “make the tool look bad.” It’s still not too hard to make the tools look bad today if you try, but it is much easier for me to make them look good.

    I, too, would be more sour mood if I hated the tools and still had to be demonstrating to management “how we’re going to leverage AI for software development” - which is on our goals this year.


  • Gemini seems more willing to just tell me what it “thinks” the answer to a question is based off of its training data, which is not a particularly reliable thing for an LLM to do.

    Yeah. I pay for Claude, my company pays even more for Cursor, so comparing them to free Gemini probably isn’t fair.

    Gemini is very useful for offhand queries while Claude is chewing on a bigger problem, but if it’s something that needs complex analysis and/or extensive research… the tools that let you build up a folder full of files related to the task are vastly superior to chatbots. Gemini does have a Claude Code command line tool that does that kind of development in a folder, I didn’t install it until last week. Gave it a coding problem to work on (lookup realtime weather radar data from NOAA, present recent data on a map on a webpage)… it sort of succeeded, but with poor user experience. Again, I’m in “Free mode” which can do quite a bit on a day’s allowance of tokens, but… I don’t feel like their paid modes would be particularly higher quality. If they are, they’re doing themselves a tremendous disservice by demoing such substandard performance in free mode.



  • having to remind the AI once or twice that the change it wanted to make would run afoul of that.

    I find that doing periodic “directed reviews” with the AI agent looking for violations is helpful. Not a 100% guarantee of compliance on the first pass, but pretty close after 3 or 4 passes from varying perspectives.

    Also, when something is really important, having the agent break up the changes as small as practical, then reviewing each change by hand as it goes through, definitely is more confidence inspring than attempting to review 3000 lines of change in a 20,000 line module of code all at once against a 20 page description of changes.




  • I find that I get the best results when I develop a suite of documents in parallel with the code: requirements, architecture, designs, lessons learned, indexes into those documents, traceable ID tags on atomic, testable item descriptions. Development plans. When a new agent is introduced to the project, it can “get up to speed quickly” by jumping to the current working point on the development plan and indexing into all the relevant details in the other documents before even starting to read the existing code.

    That working method itself is evolving, and each new LLM driven project builds on the previous successful projects’ processes…



  • WTF are you expecting Claude to code in bash?

    I have found Sonnet and Opus to both be very capable in bash, but then, I don’t usually ask bash to do super-complex things - its syntax is just too screwy to think about big applications in it.

    I will say, you might be misguiding the LLM by filling it full of bad examples before starting. Kind of like the advice about not staring at a tree downslope while skiing, if you’re fixated on it you’re MORE likely to hit it.


  • I couldnt possibly deploy with any confidence a large project or honestly a small project I expected someone to rely on without layers of test.

    In my world, that depends just about entirely upon how “dynamic” the code base is expected to be after release. We send a lot of things into the field, thousands of copies used for important work, which we pretty much know certain aspects of the system are unlikely to be changed once released. Others are very likely to be changed. “Back in the day” we’d make reasoned judgement calls about which ones would benefit from the effort of unit / integration testing and which ones that effort would be better invested elsewhere. As time marches on, our procedures and cross-departmental “advisors” who aren’t so cozy with the code are relentlessly pushing for more and more automated testing. It is safer, no argument, but it also delays launch - sometimes without added value IMO.


  • The hassle is all on the agent, not on me.

    So much this. That hassle on the agent, a few minutes of me waiting for it to crunch out the unit tests, saves me tons of hassle later - not going in circles re-fixing problems that were fixed before.

    Same for keeping implementation code and documentation in sync - I’ve got hundreds of out-of-date wiki pages that simply aren’t worth my time to fix. But when it’s the agent keeping the docs in sync, just tell it to do it and wait a few minutes - totally worth the effort.



  • I’ve been using it rather heavily since about October of last year, I definitely do notice the models getting better, the tools around the models starting to do some things automatically that I had to manually prompt for last year (especially remembering key instructions). I also believe I am getting better at using them, how much that contributes to my overall results is extremely hard to quantify, but the feeling is definitely there. Like - last October I used to “just ask” for things without having a documented set of requirements. Today, I just know that the requirements document is necessary when the level of complexity is above… well, above a one-off simple example of how to do something relatively trivial.


  • using the right tools and giving them the right instructions.

    The right tools is definitely key. Back an eternity ago, like October 2025, there was only Claude IMO if you wanted anything bigger than about a page of code. The others have come a long way - better than Claude was then, and I still feel like Claude is out in front, though by a less dramatic margin now.

    As for “the right instructions” - I’d say it’s more of “use the right process” which basically involves applying all those best practices that have developed over the past decades for human development, but we old farts from back before their time “don’t need all that, it’s a waste of time” because, basically, we internally practice most of the discipline without doing the documentation. With the AI tools: document your requirements, your architecture, tool choice selection process, designs, development plan, comment the code with traceability to why the code is being written, unit and integration tests, reviews, lessons learned, etc. etc. Having all that documentation kept with the project, well organized, is key to “bringing the AI agent up to speed” which you may be doing often. They really do demonstrate the eternal sunshine of the spotless mind, so if you have them take the time to write everything relevant down as they go (not just the code), then when a new one comes online it can jump into the middle of a development plan without repeating (as many) mistakes / making (as many) bad assumptions.

    To be brutally honest, working with AI coding agents reminds me a LOT of working with overseas programmer consultants - if you don’t get everything in writing you’re gonna have a bad time.



  • In the late 1980s there was a time where we seriously weighed the option of hand assembly vs using compilers and hand assembly didn’t always lose. In the early 1990s I wanted to use C++ but the available compiler for IBM compatible PCs was too buggy to be of value.

    By the mid 1990s that had changed, good C compilers were exceeding all but the highest effort human assembly code - if you didn’t like how it looked in assembly, you could much more easily “fix it” with a tweak to the C code instead of the assembly. I feel like we’re sort of getting there with AI agent LLMs today - if you don’t like what it provided, tell it why and let it try again - it’s usually faster and easier and gets a better product for the time invested to use the tool instead of calling it a slop box and doing it yourself.



  • I’ve seen very mixed results depending on which model I’m using. The newer ones, since about November of 2025, have been getting significantly better - but some of the “free class” tools are still using older ones today.

    Free Gemini gave me extremely ridiculously bad advice about how to get through a traffic jam today. Free Gemini also drew the crudest sketch imaginable for a prompt, same prompt fed to ChatGPT yielded a really nice quality cartoon panel of basically exactly everything in the prompt, with some nice/appropriate embellishments.