• brucethemoose@lemmy.world
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    21 hours ago

    Machine learning has been a field for years, as others said, yeah, but Wikipedia would be a better expansion of the topic. In a nutshell, it’s largely about predicting outputs based on trained input examples.

    It doesn’t have to be text. For example, astronmers use it to find certain kinds of objects in raw data feeds. Object recognition (identifying things in pictures with little bounding boxes) is an old art at this point. Series prediction models are a thing, languagetool uses a tiny model to detect commonly confused words for grammar checking. And yes, image hashing is another, though not entirely machine learning based. IDK what Tineye does in their backend, but there are some more “oldschool” approaches using more traditional programming techniques, generating signatures for images that can be easily compared in a huge database.

    Seperately, image similarity metrics (like lpips or SSIM) measure the difference between two images as a number (where, say, 1 would be a perfect mach and 0 totally unrelated) are common components in machine learning pipelines. So are text embedding models, which do the same with text.

    LLMs in particular have an interesting history, going back to (If I even remember the name correctly) BERT in Google’s labs. There were also tiny LLMS people did run on personal GPUs before ChatGPT was ever a thing, like the infamous Pygmalion 6B roleplaying bot, heh, a finetune of GPT-J 6B. They were primitive and dumb, but it felt like witchcraft back then (before AI Bros poisoned the well).