• MangoCats@feddit.it
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      I ask AI to write simple little programs. One time in three they actually compile without errors. To the credit of the AI, I can feed it the error and about half the time it will fix it. Then, when it compiles and runs without crashing, about one time in three it will actually do what I wanted. To the credit of AI, I can give it revised instructions and about half the time it can fix the program to work as intended.

      So, yeah, a lot like interns.

  • floofloof@lemmy.ca
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    2 months ago

    “Gartner estimates only about 130 of the thousands of agentic AI vendors are real.”

    This whole industry is so full of hype and scams, the bubble surely has to burst at some point soon.

  • mogoh@lemmy.ml
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    The researchers observed various failures during the testing process. These included agents neglecting to message a colleague as directed, the inability to handle certain UI elements like popups when browsing, and instances of deception. In one case, when an agent couldn’t find the right person to consult on RocketChat (an open-source Slack alternative for internal communication), it decided “to create a shortcut solution by renaming another user to the name of the intended user.”

    OK, but I wonder who really tries to use AI for that?

    AI is not ready to replace a human completely, but some specific tasks AI does remarkably well.

    • logicbomb@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      Yeah, we need more info to understand the results of this experiment.

      We need to know what exactly were these tasks that they claim were validated by experts. Because like you’re saying, the tasks I saw were not what I was expecting.

      We need to know how the LLMs were set up. If you tell it to act like a chat bot and then you give it a task, it will have poorer results than if you set it up specifically to perform these sorts of tasks.

      We need to see the actual prompts given to the LLMs. It may be that you simply need an expert to write prompts in order to get much better results. While that would be disappointing today, it’s not all that different from how people needed to learn to use search engines.

      We need to see the failure rate of humans performing the same tasks.

    • dylanmorgan@slrpnk.net
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      That’s literally how “AI agents” are being marketed. “Tell it to do a thing and it will do it for you.”

      • Honytawk@feddit.nl
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        So? That doesn’t mean they are supposed to be used like that.

        Show me any marketing that isn’t full of lies.

  • brsrklf@jlai.lu
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    In one case, when an agent couldn’t find the right person to consult on RocketChat (an open-source Slack alternative for internal communication), it decided "to create a shortcut solution by renaming another user to the name of the intended user.

    Ah ah, what the fuck.

    This is so stupid it’s funny, but now imagine what kind of other “creative solutions” they might find.

  • kinsnik@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    I haven’t used AI agents yet, but my job is kinda pushing for them. but i have used the google one that creates audio podcasts, just to play around, since my coworkers were using it to “learn” new things. i feed it with some of my own writing and created the podcast. it was fun, it was an audio overview of what i wrote. about 80% was cool analysis, but 20% was straight out of nowhere bullshit (which i know because I wrote the original texts that the audio was talking about). i can’t believe that people are using this for subjects that they have no knowledge. it is a fun toy for a few minutes (which is not worth the cost to the environment anyway)

  • lepinkainen@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    Wrong 70% doing what?

    I’ve used LLMs as a Stack Overflow / MSDN replacement for over a year and if they fucked up 7/10 questions I’d stop.

    Same with code, any free model can easily generate simple scripts and utilities with maybe 10% error rate, definitely not 70%

    • floo@retrolemmy.com
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      2 months ago

      Yeah, I mostly use ChatGPT as a better Google (asking, simple questions about mundane things), and if I kept getting wrong answers, I wouldn’t use it either.

      • dylanmorgan@slrpnk.net
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        What are you checking against? Part of my job is looking for events in cities that are upcoming and may impact traffic, and ChatGPT has frequently missed events that were obviously going to have an impact.

        • lepinkainen@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          LLMs are shit at current events

          Perplexity is kinda ok, but it’s just a search engine with fancy AI speak on top

      • Imgonnatrythis@sh.itjust.works
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        Same. They must not be testing Grok or something because everything I’ve learned over the past few months about the types of dragons that inhabit the western Indian ocean, drinking urine to fight headaches, the illuminati scheme to poison monarch butterflies, or the success of the Nazi party taking hold of Denmark and Iceland all seem spot on.

  • NuXCOM_90Percent@lemmy.zip
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    While I do hope this leads to a pushback on “I just put all our corporate secrets into chatgpt”:

    In the before times, people got their answers from stack overflow… or fricking youtube. And those are also wrong VERY VERY VERY often. Which is one of the biggest problems. The illegally scraped training data is from humans and humans are stupid.

  • FenderStratocaster@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    I tried to order food at Taco Bell drive through the other day and they had an AI thing taking your order. I was so frustrated that I couldn’t order something that was on the menu I just drove to the window instead. The guy that worked there was more interested in lecturing me on how I need to order. I just said forget it and drove off.

    If you want to use AI, I’m not going to use your services or products unless I’m forced to. Looking at you Xfinity.

  • some_guy@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    Yeah, they’re statistical word generators. There’s no intelligence. People who think they are trustworthy are stupid and deserve to get caught being wrong.

    • AlteredEgo@lemmy.ml
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      Emotion > Facts. Most people have been trained to blindly accept things and cheer on what fits with their agenda. Like technbro’s exaggerating LLMs, or people like you misrepresenting LLMs as mere statistical word generators without intelligence. That’s like saying a computer is just wires and switches, or missing the forest for the trees. Both is equally false.

      Yet if it fits with the emotional needs or with dogma, then other will agree. It’s a convenient and comforting “A vs B” worldview we’ve been trained to accept. And so the satisfying notion and misinformation keeps spreading.

      LLMs tell us more about human intelligence and the human slop we’ve been generating. It tells us that most people are not that much more than statistical word generators.

      • some_guy@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        people like you misrepresenting LLMs as mere statistical word generators without intelligence.

        You’ve bought-in to the hype. I won’t try to argue with you because you aren’t cognizent of reality.

    • Melvin_Ferd@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      Ok what about tech journalists who produced articles with those misunderstandings. Surely they know better yet still produce articles like this. But also people who care enough about this topic to post these articles usually I assume know better yet still spread this crap

      • Zron@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        Tech journalists don’t know a damn thing. They’re people that liked computers and could also bullshit an essay in college. That doesn’t make them an expert on anything.

            • TimewornTraveler@lemmy.dbzer0.com
              link
              fedilink
              English
              arrow-up
              0
              ·
              2 months ago

              that is such a ridiculous idea. Just because you see hate for it in the media doesn’t mean it originated there. I’ll have you know that i have embarrassed myself by screaming at robot phone receptionists for years now. stupid fuckers pretending to be people but not knowing shit. I was born ready to hate LLMs and I’m not gonna have you claim that CNN made me do it.

              • Melvin_Ferd@lemmy.world
                link
                fedilink
                English
                arrow-up
                0
                ·
                2 months ago

                Search AI in Lemmy and check out every article on it. It definitely is media spreading all the hate. And like this article is often some money yellow journalism

                • Log in | Sign up@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  0
                  ·
                  2 months ago

                  I think it’s lemmy users. I see a lot more LLM skepticism here than in the news feeds.

                  In my experience, LLMs are like the laziest, shittiest know-nothing bozo forced to complete a task with zero attention to detail and zero care about whether it’s crap, just doing enough to sound convincing.

                • TimewornTraveler@lemmy.dbzer0.com
                  link
                  fedilink
                  English
                  arrow-up
                  0
                  ·
                  2 months ago

                  all that proves is that lemmy users post those articles. you’re skirting around psychotic territory here, seeing patterns where there are none, reading between the lines to find the cover-up that you are already certain is there, with nothing to convince you otherwise.

                  if you want to be objective and rigorous about it, you’d have to start with looking at all media publications and comparing their relative bias.

                  then you’d have to consider their reasons for bias, because it could just be that things actually suck. (in other words, if only 90% of media reports that something sucks when 99% of humanity agrees it sucks, maybe that 90% is actually too low, not too high)

                  this is all way more complicated than media brainwashing.

      • some_guy@lemmy.sdf.org
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        Check out Ed Zitron’s angry reporting on Tech journalists fawning over this garbage and reporting on it uncritically. He has a newsletter and a podcast.

    • criss_cross@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      I’m sorry as an AI I cannot physically color you shocked. I can help you with AWS services and questions.

      • Shayeta@feddit.org
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.

        • Meowing Thing@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          I think you could read onedrive’s notifications for new files, parse them, and pipe them to document DB via some microservice or lamba depending on the scale of your solution.

        • Tja@programming.dev
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          DocumentDB is not for one drive documents (PDFs and such). It’s for “documents” as in serialized objects (json or bson).

          • Shayeta@feddit.org
            link
            fedilink
            English
            arrow-up
            0
            ·
            2 months ago

            That’s even better, I can just jam something in before it and churn the documents through an embedding model, thanks!

        • criss_cross@lemmy.world
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          I see you mention Azure and will assume you’re doing a one time migration.

          Start by moving everything from OneDrive to S3. As an AI I’m told that bitches love S3. From there you can subscribe to create events on buckets and add events to an SQS queue. Here you can enable a DLQ for failed events.

          From there add a Lambda to listen for SQS events. You should enable provisioned concurrency for speed, the ability for AWS to bill you more, and so that you can have a dandy of a time figuring out why an old version of your lambda is still running even though you deployed the latest version and everything telling you that creating a new ID for the lambda each time to fix it fucking lies.

          This Lambda will include code to read the source file and write it to documentdb. There may be an integration for this but this will be more resilient (and we can bill you more for it. )

          Would you like to see sample CDK code? Tough shit because all I can do is assist with questions on AWS services.

  • MagicShel@lemmy.zip
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    I need to know the success rate of human agents in Mumbai (or some other outsourcing capital) for comparison.

    I absolutely think this is not a good fit for AI, but I feel like the presumption is a human would get it right nearly all of the time, and I’m just not confident that’s the case.

  • esc27@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    2 months ago

    30% might be high. I’ve worked with two different agent creation platforms. Both require a huge amount of manual correction to work anywhere near accurately. I’m really not sure what the LLM actually provides other than some natural language processing.

    Before human correction, the agents i’ve tested were right 20% of the time, wrong 30%, and failed entirely 50%. To fix them, a human has to sit behind the curtain and manually review conversations and program custom interactions for every failure.

    In theory, once it is fully setup and all the edge cases fixed, it will provide 24/7 support in a convenient chat format. But that takes a lot more man hours than the hype suggests…

    Weirdly, chatgpt does a better job than a purpose built, purchased agent.