A ‘Shocking’ Amount of the Web Is Already AI-Translated Trash, Scientists Determine::Researchers warn that most of the text we view online has been poorly translated into one or more languages—usually by a machine.

  • maegul (he/they)@lemmy.ml
    link
    fedilink
    English
    arrow-up
    0
    ·
    10 months ago

    The whole webring idea needs to come back. Human curated recommendations of good resources and pages. So long as these pages remain in the control of humans and dedicated to curation and are decentralised, unlike the search engines, then they’ll be reliable.

    Plugging in some social and community organisation, perhaps like a wiki, and you could get even more out of it.

  • Meowoem@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    ·
    10 months ago

    This is basic math, articles are written in one language but there are lots of languages they can be translated into so if a site written in English has a Spanish, french, and Portuguese version 75% of that counts as ai translated garbage - because apparently having stuff available to non English speakers is a bad thing now?

    As for ‘poorly’ What’s their mechanism for determinng it? How much is well translated or are they just assuming it’s poor because it’s possible it could be? Likewise what percentage is human translated and how do they determine that? Or is it another assumption to fit their narrative?

    Clickbait doomer nonsence.

  • Linssiili@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    0
    ·
    edit-2
    10 months ago

    Recently I was looking for info (in finnish) how to prevent car windows from fogging. I found a really weird website all about car windows, but it kept confusing car and house windows. It instructed to clean car windows by “opening the window and cleaning between the panels”.

    It was obviously ai-generated, but I couldn’t figure out why. They weren’t selling anything, there were no ads and no links to other websites or services.

    • crazyCat@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      0
      ·
      10 months ago

      People who care about SEO for their window-related businesses will pay the blog to link to them from there.

    • theluddite@lemmy.ml
      link
      fedilink
      English
      arrow-up
      0
      ·
      10 months ago

      It’s probably either waiting for approval to sell ads or was denied and they’re adding more stuff. Google has a virtual monopoly on ads, and their approval process can take 1-2 weeks. Google’s content policy basially demands that your site by full of generated trash to sell ads. I did a case study here, in which Google denied my popular and useful website for ads until I filled it with the lowest-quality generated trash imaginable. That might help clarify what’s up.

        • theluddite@lemmy.ml
          link
          fedilink
          English
          arrow-up
          0
          ·
          edit-2
          10 months ago

          Dates could be made up, too.The blog posts that I generated for my site included made up dates in the past. The internet archive says it has a snapshot for March of 2023, but when I click it, it says it doesn’t, so I have no way of verifying. The theory about parking real estate hoping to sell it also seems pretty plausible to me. Who knows what dumb shit they’re up to.

    • jdf038@mander.xyz
      link
      fedilink
      English
      arrow-up
      0
      ·
      10 months ago

      Perhaps parking a site for traffic and then using the enshitified data to sell it?

      It makes me sick how dumb it sounds.

  • Brkdncr@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    10 months ago

    I recently was searching for some tips on overlanding routes. So many sites are just long strung together SEO word salad.

  • Jayu@lemm.ee
    link
    fedilink
    English
    arrow-up
    0
    ·
    10 months ago

    The most annoying aspect of this is when you know actual information has to be out there, but it is being drowned out by dozens of sites reposting the less relevant and low quality information… And then you go to search in another language and you see substandard machine translations of all the garbage you were just fleeing, lol.

    • wikibot@lemmy.worldB
      link
      fedilink
      English
      arrow-up
      0
      ·
      10 months ago

      Here’s the summary for the wikipedia article you mentioned in your comment:

      The dead Internet theory is an online conspiracy theory that asserts that the Internet now consists mainly of bot activity and automatically generated content that is manipulated by algorithmic curation, marginalizing organic human activity. Proponents of the theory believe these bots are created intentionally to help manipulate algorithms and boost search results in order to ultimately manipulate consumers. Furthermore, some proponents of the theory accuse government agencies of using bots to manipulate public perception, stating “The U. S. government is engaging in an artificial intelligence powered gaslighting of the entire world population”.

      to opt out, pm me ‘optout’. article | about

  • grue@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    10 months ago

    I’ve been saying for quite a while now that the Internet was best in the '90s and early 2000s back before it was commercialized, even despite all the “under construction” gifs and whatnot. The signal/noise ratio has only continued to drop since then.

    • rottingleaf@lemmy.zip
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      10 months ago

      I hope you remember the amounts of spam and machine-translated text back then.

      Being not an English speaker, you’d basically expect most of what you find to be machine-translated and badly at that.

      Pirate localizations of games were basically translated the way that you’d get some basic idea sometimes somewhere, but in general it was probably worse than the English version, which would at least make some sense if you knew some English.

      It’s people and IT companies which were better.

      • grue@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        10 months ago

        Since I am an English speaker, my '90s Internet experience was very different than that. There were “link farms” (pages designed to exploit early search engine algorithms that scored pages higher when they got linked to a lot) and e-mail spam, of course, but being unsophisticated, it was generally a lot easier not to get suckered in by than the firehose of AI-written advertorials and shit we have today.

        • wikibot@lemmy.worldB
          link
          fedilink
          English
          arrow-up
          0
          ·
          10 months ago

          Here’s the summary for the wikipedia article you mentioned in your comment:

          An advertorial is an advertisement in the form of editorial content. The term “advertorial” is a blend (see portmanteau) of the words “advertisement” and "editorial. " Merriam-Webster dates the origin of the word to 1946. In printed publications, the advertisement is usually written to resemble an objective article and designed to ostensibly look like a legitimate and independent news story. In television, the advertisement is similar to a short infomercial presentation of products or services.

          to opt out, pm me ‘optout’. article | about

    • maness300@lemmy.world
      link
      fedilink
      English
      arrow-up
      0
      ·
      10 months ago

      Counterpoint: the Internet still exists as it did back then, but relatively smaller compared to what it’s become.

      You just need to find the right people and content to interact with, which is harder now because there’s so much more garbage. I’d say they have grown in absolute numbers.

  • SomeGuy69@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    ·
    10 months ago

    I need an AI Firefox extension that detects badly translated AI text and automatically blocks those domains.