• mosiacmango@lemm.ee
    link
    fedilink
    arrow-up
    0
    ·
    edit-2
    7 months ago

    Sounds like it searched the subtitles, found the time stamps, and returned the relevant text. Useful, but ultimately a pretty simple bot.

    Much more impressive if it “watched” the video for the first time, formed it own subtitles, then pulled the data out. That would be a feat.

    • partial_accumen@lemmy.world
      link
      fedilink
      arrow-up
      0
      ·
      edit-2
      7 months ago

      Much more detailed than that. In the video there was a 3 piece band playing for a few seconds on screen. The user prompt asked: “Tell me where I can buy the shirt the keyboardist is wearing at timestamp 32 seconds”. The Bot found the website of the vendor selling the shirt.

      • mosiacmango@lemm.ee
        link
        fedilink
        arrow-up
        0
        ·
        7 months ago

        Okay, that’s pretty neat, but at the same time basically the same as loading a still image into a current AI image matching suite and having it identify a keyboard, then a shirt near the keyboard, then reverse image search that tshirt. It’s super cool to be able to do, but kinda standard at this point.

        I guess the interactivity, being able to feed in a url on the fly is the value add. I still would have liked my “generate subtitles then search them” imaginary bot more though.