• FundMECFSResearch@lemmy.blahaj.zone
    link
    fedilink
    arrow-up
    25
    ·
    6 days ago

    I know people are gonna freak out about the AI part in this.

    But as a person with hearing difficulties this would be revolutionary. So much shit I usually just can’t watch because open subtitles doesn’t have any subtitles for it.

    • kautau@lemmy.world
      link
      fedilink
      arrow-up
      13
      ·
      edit-2
      6 days ago

      The most important part is that it’s a local LLM model running on your machine. The problem with AI is less about LLMs themselves, and more about their control and application by unethical companies and governments in a world driven by profit and power. And it’s none of those things, it’s just some open source code running on your device. So that’s cool and good.

            • jonjuan@programming.dev
              link
              fedilink
              English
              arrow-up
              1
              ·
              34 minutes ago

              They are using open source models that have already been trained. So no extra energy is going into the models.

            • jsomae@lemmy.ml
              link
              fedilink
              arrow-up
              4
              ·
              edit-2
              6 days ago

              I don’t have a source for that, but the most that any locally-run program can cost in terms of power is basically the sum of a few things: maxed-out gpu usage, maxed-out cpu usage, maxed-out disk access. GPU is by far the most power-consuming of these things, and modern video games make essentially the most possible use of the GPU that they can get away with.

              Running an LLM locally can at most max out usage of the GPU, putting it in the same ballpark as a video game. Typical usage of an LLM is to run it for a few seconds and then submit another query, so it’s not running 100% of the time during typical usage, unlike a video game (where it remains open and active the whole time, GPU usage dips only when you’re in a menu for instance.)

              Data centers drain lots of power by running a very large number of machines at the same time.

              • Lifter@discuss.tchncs.de
                link
                fedilink
                arrow-up
                1
                ·
                8 hours ago

                Training the model yourself would take years on a single machine. If you factor that into your cost per query, it blows up.

                The data centers are (currently) mainly used for training new models.

              • msage@programming.dev
                link
                fedilink
                arrow-up
                1
                ·
                5 days ago

                From what I know, local LLMs take minutes to process a single prompt, not seconds, but I guess that depends on the use case.

                But also games, dunno about maxing GPU in most games. I maxed mine for crypto mining, and that was power hungry. So I would put LLMs closer to crypto than games.

                Not to mention games will entertain you way more for the same time.

                • jsomae@lemmy.ml
                  link
                  fedilink
                  arrow-up
                  2
                  ·
                  edit-2
                  5 days ago

                  Obviously it depends on your GPU. A crypto mine, you’ll leave it running 24/7. On a recent macbook, an LLM will run at several tokens per second, so yeah for long responses it could take more than a minute. But most people aren’t going to be running such an LLM for hours on end. Even if they do – big deal, it’s a single GPU, that’s negligible compared to running your dishwasher, using your oven, or heating your house.

            • Potatar@lemmy.world
              link
              fedilink
              arrow-up
              1
              ·
              5 days ago

              Any paper about any neural network.

              Using a model to get one output is just a series of multiplications (not even that, we use vector multiplication but yeah), it’s less than or equal to rendering ONE frame in 4k games.

        • Sixty@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          0
          ·
          6 days ago

          Curious how resource intensive AI subtitle generation will be. Probably fine on some setups.

          Trying to use madVR (tweaker’s video postprocessing) in the summer in my small office with an RTX 3090 was turning my office into a sauna. Next time I buy a video card it’ll be a lower tier deliberately to avoid the higher power draw lol.

          • kautau@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            5 days ago

            I think it really depends on how accurate you want / what language you are interpreting. https://github.com/openai/whisper has multiple variations on their model, but they all pretty much require VRAM/graphics capability (or likely NPUs as they become more commonplace).

    • M137@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      5 days ago

      I agree that this is a nice thing, just gotta point out that there are several other good websites for subtitles. Here are the ones I use frequently:

      https://subdl.com/
      https://www.podnapisi.net/
      https://www.subf2m.co/

      And if you didn’t know, there are two opensubtitles websites:
      https://www.opensubtitles.com/
      https://www.opensubtitles.org/

      Not sure if the .com one is supposed to be a more modern frontend for the .org or something but I’ve found different subtitles on them so it’s good to use both.

    • mormund@feddit.org
      link
      fedilink
      arrow-up
      4
      arrow-down
      1
      ·
      6 days ago

      Yeah, transcription is one of the only good uses for LLMs imo. Of course they can still produce nonsense, but bad subtitles are better none at all.

      • kautau@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        edit-2
        5 days ago

        Just an important note, speech to text models aren’t LLMs, which are literally “conversational” or “text generation from other text” models. Things like https://github.com/openai/whisper are their own, separate types of models, specifically for transcription.

        That being said, I totally agree, accessibility is an objectively good use for “AI”

        • mormund@feddit.org
          link
          fedilink
          arrow-up
          1
          ·
          4 days ago

          That’s not what LLMs are, but it’s a marketing buzzword in the end I guess. What you linked is a transformer based sequence-to-sequence model, exactly the same principal as ChatGPT and all the others.

          I wouldn’t say it is a good use of AI, more like one of the few barely acceptable ones. Can we accept lies and hallucinations just because the alternative is nothing at all? And how much energy/CO2 emissions should we be willing to waste on this?

    • hushable@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      6 days ago

      Indeed, YouTube had auto generated subtitles for a while now and they are far from perfect, yet I still find it useful.