- cross-posted to:
- opensource@programming.dev
- cross-posted to:
- opensource@programming.dev
cross-posted from: https://lemmy.ca/post/37011397
The popular open-source VLC video player was demonstrated on the floor of CES 2025 with automatic AI subtitling and translation, generated locally and offline in real time. Parent organization VideoLAN shared a video on Tuesday in which president Jean-Baptiste Kempf shows off the new feature, which uses open-source AI models to generate subtitles for videos in several languages.
As long as the models are OpenSource I have no complains
And the data stays local.
When we getting amd’s fsr upscaling and frame-gen? Also would subtitles make more sense to use the jellyfin approach.
I have an AMD card, add VLC as a game in the drivers, and you can turn on AMFM (frame gen).
If it doesn’t work you could just turn it on system wide in display settings of the Adrenaline Software (gear upper right corner, display/gaming).
I think it requires at least a 6000 series GPU however.
If you have a Samsung TV or other modern smart TV connected to a laptop, you can also turn on frame-gen using Auto Motion Plus, set to Custom.
Judder Reduction 10 is double frames, so 24 FPS -> 48.
I’m using Linux. There probably is a way to have a similar outcome but an integrated solution as vlc and amd once mentioned would be better.
The nice thing is, now at least this can be used with live tv from other countries and languages.
Think you want to watch Japanese tv or Korean channels with out bothering about downloading, searching and syncing subtitles
I prefer watching Mexican football announcers, and it would be nice to know what they’re saying. Though that might actually detract from the experience.
GOOOOOOAAAAAAAAALLLLLLLLLL
Just fill up the whole screen with this.
The opposing team has scored.
This might be one of the few times I’ve seen AI being useful and not just slapped on something for marketing purposes.
And not to do evil shit
But the toppings contains potassium benzoate.
Amazing. I can finally find out exactly what that nurse is yelling about while she gets railed by the local basketball team.
Something about a full-court press?
Will it be possible to export these AI subs?
Imagine the possibilities!
Hes maaaa, hes what? Hes maaaaaa!
The technology is nowhere near being good though. On synthetic tests, on the data it was trained and tweeked on, maybe, I don’t know.
I corun an event when we invite speakers from all over the world, and we tried every way to generate subtitles, all of them run on the level of YouTube autogenerated ones. It’s better than nothing, but you can’t rely on it really.Really? This is the opposite of my experience with (distil-)whisper - I use it to generate subtitles for stuff like podcasts and was stunned at first by how high-quality the results are. I typically use distil-whisper/distil-large-v3, locally. Was it among the models you tried?
is your goal to rely on it, or to have it as a backup?
For my purpose of having backup nearly anything will be better than nothing.No, but I think it would be super helpful to synchronize subtitles that are not aligned to the video.
This is already trivial. Bazarr has been doing it for all my subtitles for almost a decade.
You were not able to test it yet calling it nowhere near good 🤦🏻
Like how should you know?!
Relax, they didn’t write a new way of doing magic, they integrated a solution from the market.
I don’t know what the new BMW car they introduce this year is capable of, but I know for a fact it can’t fly.
No such comment yet? I’ll be the first then.
Oh no, AI bad, next thing they add is cryptocurrency mining!
ai for accessibility is nowhere near the same thing as crypto mining
Lol, that is his opinion as well, this was sarcasm, i’m 99% sure Yours does not seem so sarcastic…
But it’s burning amazon forests for capitalist greed!
/s
When does this get released? I really want to try it
I will be impressed only when it can get through a single episode of Still Game without making a dozen mistakes
This sounds like a great thing for deaf people and just in general, but I don’t think AI will ever replace anime fansub makers who have no problem throwing a wall of text on screen for a split second just to explain an obscure untranslatable pun.
That still happens? Maybe wanna share your groups? ;)
It’s unlikely to even replace good subtitles, fan or not. It’s just a nice thing to have for a lot of content though.
I have family members who can’t really understand spoken English because it’s a bit fast, and can’t read English subtitles again, because again, too fast for them.
Sometimes you download a movie and all the Estonian subtitles are for an older release and they desynchronize. Sometimes you can barely even find synchronized English subtitles, so even that doesn’t work.
This seems like a godsend, honestly.
Funnily enough, of all the streaming services, I’m again going to have to commend Apple TV+ here. Their shit has Estonian subtitles. Netflix, Prime, etc, do not. Meaning if I’m watching with a family member who doesn’t understand English well, I’ll watch Apple TV+ with a subscription, and everything else is going to be pirated for subtitles. So I don’t bother subscribing anymore. We’re a tiny country, but for some reason Apple of all companies has chosen to acknowledge us. Meanwhile, I was setting up an Xbox for someone a few years ago, and Estonia just… straight up doesn’t exist. I’m not talking about language support - you literally couldn’t pick it as your LOCATION.
For all their faults, Apple knows accessibility. Good job Timmy.
They are like the * in any Terry Pratchett (GNU) novel, sometimes a funny joke can have a little more spice added to make it even funnier
Bless those subbers. I love those walls of text.
Translator’s note: keikaku means plan
And yet they turned down having thumbnails for seeking because it would be too resource intensive. 😐
Video decoding is resource intensive. We’re used to it, we have hardware acceleration for some of it, but spewing something around 52 million pixels every second from a highly compressed data source is not cheap. I’m not sure how both compare, but small LLM models are not that costly to run if you don’t factor their creation in.
All they’d need to do is generate thumbnails for every period on video load. Make that period adjustable. Might take a few extra seconds to load a video. Make it off by default if they’re worried about the performance hit.
There are other desktop video players that make this work.
I mean, it would. For example Jellyfin implements it, but it does so by extracting the pictures ahead of time and saving them. It takes days to do this for my library.
Yeah, I do this for plex as well, and stash. I think if the file already exists in the directory vlc should use it. It’s up to you to generate them. That is exactly how cover art for albums on songs worked in VLC for a decade before they added the feature to pull cover art on the fly.
I get what you are saying, but I don’t think there is any standardized format for these trickplay images. The same images from Plex would likely not be usable in Jellyfin without converting the metadata (e.g. to which time in the video an image belongs to). So VLC probably does not have a good way to understand trickplay images not made by VLC.
It is useful for internet streams though, not really for local or lan video.
Wonderful! Now we descriptive audio for the visually impaired!
As vlc is open source, can we expect this technology to also be available for, say, jellyfin, so that I can for once and for all have subtitles.done right?
In the *arr suite, bazarr has a plugin called Subgen which you can add and you can set it to generate subtitles on your entire library if you want, or only missing subtitles. The sync is spot on compared to 90% of what Opensubtitles delivers. I sometimes re-gen them with this plugin just because opensubtitles is so constantly out of sync (e.g. highly rated subtitles 4 lines will be at breakneck pace and the next 10 will be super slow and then everything is 3 seconds off)
It isn’t in-player but it works. The downside is it is a larger model and takes ~20 minutes to generate a movie length of subtitles.
It’s already available for anyone to use. https://github.com/openai/whisper
They’re using OpenAI’s Whisper model for this: https://code.videolan.org/videolan/vlc/-/merge_requests/5155
Note that openai’s original whisper models are pretty slow; in my experience the distil-whisper project (via a tool like whisperx) is more than 10x faster.
Has there been any estimated minimal system requirements for this yet, since it runs locally?
It’s actually using whisper.cpp
From the README:
Memory usage Model Disk Mem tiny 75 MiB ~273 MB base 142 MiB ~388 MB small 466 MiB ~852 MB medium 1.5 GiB ~2.1 GB large 2.9 GiB ~3.9 GiB
Those are the model sizes
Oh wow those pretty tiny memory requirements for a decent modern system! That’s actually very impressive! :D
Many people can probably even run this on older media servers or even just a plain NAS! That’s awesome! :D
crunchyroll is currently using AI subtitles. it’s obvious because when someone says “mothra. Funky…” it captions “mother fucker”
Malevolent Kitchen Intensifies
That explains why their subtitles have seemed worse to me lately. Every now and then I see something obviously wrong and wonder how it got by anyone who looked at it. Now I know why. No one looked at it.
my wife and I love laughing at the dumbass mistakes it makes.
some characters name is Asura Halls?
instead of “That’s Asura Halls!” you get “That asshole!”
but if I was actually hearing impaired I’d be really pissed that I’m being treated as second class even though Sony still took my money like everyone else.
I hope it’s available for Stash App. I wanna know what this JAV girls are saying.
( ͡° ͜ʖ ͡°)
Ooooh I like this