Otter@lemmy.ca to

Technology@lemmy.worldEnglish · 3 months ago

VLC player demos real-time AI subtitling for videos

www.theverge.com

673

VLC player demos real-time AI subtitling for videos

www.theverge.com

Otter@lemmy.ca to

Technology@lemmy.worldEnglish · 3 months ago

The video player is testing a new AI tool to create subtitles in over 100 languages.

cross-posted from: https://lemmy.ca/post/37011397

!opensource@programming.dev

The popular open-source VLC video player was demonstrated on the floor of CES 2025 with automatic AI subtitling and translation, generated locally and offline in real time. Parent organization VideoLAN shared a video on Tuesday in which president Jean-Baptiste Kempf shows off the new feature, which uses open-source AI models to generate subtitles for videos in several languages.

You must log in or register to comment.

Chat

asbestos@lemmy.world
link
fedilink
English
arrow-up
287·
3 months ago
Finally, some good fucking AI
- shyguyblue@lemmy.world
  link
  fedilink
  English
  arrow-up
  169·
  3 months ago
  I was just thinking, this is exactly what AI should be used for. Pattern recognition, full stop.
  - snooggums@lemmy.world
    link
    fedilink
    English
    arrow-up
    71·
    3 months ago
    Yup, and if it isn’t perfect that is ok as long as it is close enough.
    
    Like getting name spellings wrong or mixing homophones is fine because it isn’t trying to be factually accurate.
    - TJA!@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      37·
      3 months ago
      Problem ist that now people will say that they don’t get to create accurate subtitles because VLC is doing the job for them.
      
      Accessibility might suffer from that, because all subtitles are now just “good enough”
      - Railcar8095@lemm.ee
        link
        fedilink
        English
        arrow-up
        34·
        3 months ago
        Or they can get OK ones with this tool, and fix the errors. Might save a lot of time
      - snooggums@lemmy.world
        link
        fedilink
        English
        arrow-up
        24·
        3 months ago
        Regular old live broadcast closed captioning is pretty much ‘good enough’ and that is the standard I’m comparing to.
        
        Actual subtitles created ahead of time should be perfect because they have the time to double check.
      - LandedGentry@lemmy.zip
        link
        fedilink
        English
        arrow-up
        13·
        edit-2
        3 months ago
        deleted by creator
        
        TheMachineStops@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        7·
        edit-2
        3 months ago
        From experience AI translation is still garbage, specially for languages like Chinese, Japanese, and Korean , but if it only subtitles in the actual language such creating English subtitles for English then it is probably fine.
        
        catloaf@lemm.ee
        link
        fedilink
        English
        arrow-up
        2·
        3 months ago
        That’s probably more due to lack of training than anything else. Existing models are mostly made by American companies and trained on English-language material. Naturally, the further you get from the model, the worse the result.
        
        TheMachineStops@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        3·
        3 months ago
        It is not the lack of training material that is the issue, it doesn’t understand context and cultural references. Someone commented here that crunchyroll AI subtitles translated Asura Hall a name to asshole.
        
        LandedGentry@lemmy.zip
        link
        fedilink
        English
        arrow-up
        1·
        edit-2
        3 months ago
        deleted by creator
      - TachyonTele@lemm.ee
        link
        fedilink
        English
        arrow-up
        10·
        3 months ago
        I have a feeling that if you care enough about subtitles you’re going to look for good ones, instead of using “ok” ai subs.
      - shyguyblue@lemmy.world
        link
        fedilink
        English
        arrow-up
        2·
        edit-2
        3 months ago
        I imagine it would be not-exactly-simple-but-not- complicated to add a “threshold” feature. If Ai is less than X% certain, it can request human clarification.
        
        Edit: Derp. I forgot about the “real time” part. Still, as others have said, even a single botched word would still work well enough with context.
        
        snooggums@lemmy.world
        link
        fedilink
        English
        arrow-up
        1·
        edit-2
        3 months ago
        That defeats the purpose of doing it in real time as it would introduce a delay.
        
        shyguyblue@lemmy.world
        link
        fedilink
        English
        arrow-up
        1·
        3 months ago
        Derp. You’re right, I’ve added an edit to my comment.
    - vvv@programming.dev
      link
      fedilink
      English
      arrow-up
      15·
      3 months ago
      I’d like to see this fix the most annoying part about subtitles, timing. find transcript/any subs on the Internet and have the AI align it with the audio properly.
      - Scrollone@feddit.it
        link
        fedilink
        English
        arrow-up
        2·
        3 months ago
        YES! I can’t stand when subtitles are misaligned to the video. If this AI tool could help with that, it would be super useful.
- LandedGentry@lemmy.zip
  link
  fedilink
  English
  arrow-up
  13·
  edit-2
  3 months ago
  deleted by creator
- Petter1@lemm.ee
  link
  fedilink
  English
  arrow-up
  1·
  3 months ago
  Finally some good AI fucking 🤭
m8052@lemmy.world
link
fedilink
English
arrow-up
188·
3 months ago

What’s important is that this is running on your machine locally, offline, without any cloud services. It runs directly inside the executable

YES, thank you JB
- Petter1@lemm.ee
  link
  fedilink
  English
  arrow-up
  3·
  3 months ago
  Justin Bieber?
  - T4V0@lemmy.pt
    link
    fedilink
    English
    arrow-up
    5·
    3 months ago
    Jack Black?
    - maccentric@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3·
      3 months ago
      James Brown?
  - m8052@lemmy.world
    link
    fedilink
    English
    arrow-up
    5·
    3 months ago
    JB stands for Jean-Baptiste, who is the main maintainer of vlc
  - brbposting@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    2·
    3 months ago
    Ah JBK of course!
    - Petter1@lemm.ee
      link
      fedilink
      English
      arrow-up
      2·
      3 months ago
      ❤️
renzev@lemmy.world
link
fedilink
English
arrow-up
150·
3 months ago
This sounds like a great thing for deaf people and just in general, but I don’t think AI will ever replace anime fansub makers who have no problem throwing a wall of text on screen for a split second just to explain an obscure untranslatable pun.
- rustyricotta@lemmy.ml
  link
  fedilink
  English
  arrow-up
  58·
  3 months ago
  Bless those subbers. I love those walls of text.
- FMT99@lemmy.world
  link
  fedilink
  English
  arrow-up
  31·
  3 months ago
  Translator’s note: keikaku means plan
- JohnEdwa@sopuli.xyz
  link
  fedilink
  English
  arrow-up
  23·
  3 months ago
- FordBeeblebrox@lemmy.world
  link
  fedilink
  English
  arrow-up
  21·
  3 months ago
  They are like the * in any Terry Pratchett (GNU) novel, sometimes a funny joke can have a little more spice added to make it even funnier
- cley_faye@lemmy.world
  link
  fedilink
  English
  arrow-up
  10·
  3 months ago
  It’s unlikely to even replace good subtitles, fan or not. It’s just a nice thing to have for a lot of content though.
  - boonhet@lemm.ee
    link
    fedilink
    English
    arrow-up
    11·
    edit-2
    3 months ago
    I have family members who can’t really understand spoken English because it’s a bit fast, and can’t read English subtitles again, because again, too fast for them.
    
    Sometimes you download a movie and all the Estonian subtitles are for an older release and they desynchronize. Sometimes you can barely even find synchronized English subtitles, so even that doesn’t work.
    
    This seems like a godsend, honestly.
    
    Funnily enough, of all the streaming services, I’m again going to have to commend Apple TV+ here. Their shit has Estonian subtitles. Netflix, Prime, etc, do not. Meaning if I’m watching with a family member who doesn’t understand English well, I’ll watch Apple TV+ with a subscription, and everything else is going to be pirated for subtitles. So I don’t bother subscribing anymore. We’re a tiny country, but for some reason Apple of all companies has chosen to acknowledge us. Meanwhile, I was setting up an Xbox for someone a few years ago, and Estonia just… straight up doesn’t exist. I’m not talking about language support - you literally couldn’t pick it as your LOCATION.
    - brbposting@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      5·
      3 months ago
      For all their faults, Apple knows accessibility. Good job Timmy.
- Appoxo@lemmy.dbzer0.com
  link
  fedilink
  English
  arrow-up
  4·
  3 months ago
  That still happens? Maybe wanna share your groups? ;)
m-p{3}@lemmy.ca
link
fedilink
English
arrow-up
71·
edit-2
3 months ago
Now I want some AR glasses that display subtitles above someone’s head when they talk à la Cyberpunk that also auto-translates. Of course, it has to be done entirely locally.
- Obi@sopuli.xyz
  link
  fedilink
  English
  arrow-up
  20·
  3 months ago
  I guess we have most of the ingredients to make this happen. Software-wise we’re there, hardware wise I’m still waiting for AR glasses I can replace my normal glasses with (that I wear 24/7 except for sleep). I’d accept having to carry a spare in a charging case so I swap them out once a day or something but other than that I want them to be close enough in terms of weight and comfort to my regular glasses and just give me AR like overlaid GPS, notifications, etc, and indeed instant translation with subtitles would be a function that I could see having a massive impact on civilization tbh.
  - m-p{3}@lemmy.ca
    link
    fedilink
    English
    arrow-up
    8·
    3 months ago
    I believe you can put prescription lenses in most AR glasses out there, but I suppose the battery is a concern…
    
    I’m in the same boat, I gotta wear my glasses 24/7.
  - vvv@programming.dev
    link
    fedilink
    English
    arrow-up
    4·
    3 months ago
    I think we’re closer with hardware than software. the xreal/rokid category of hmds are comfortable enough to wear all day, and I don’t mind a cable running from behind my ear under a clothes layer to a phone or mini PC in my pocket. Unfortunately you still need to byo cameras to get the overlays appearing in the correct points in space, but cameras are cheap, I suspect these glasses will grow some cameras in the next couple of iterations.
  - AlligatorBlizzard@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    3·
    3 months ago
    It’d be incredible for deaf people being able to read captions for spoken conversations and to have the other person’s glasses translate from ASL to English.
    
    Honestly I’d be a bit shocked if the AI ASL -> English doesn’t exist already, there’s so much training data available, the Deaf community loves video for obvious reasons.
  - Midnight Wolf@lemmy.world
    link
    fedilink
    English
    arrow-up
    2·
    3 months ago
    soon
    
    Breaking news: “WW3 starts over an insult due to a mistranslated phrase at the G7 summit. We will be nuked in 37 seconds. Fuck like rabbits, it’s all we can do. Now over to Robert with traffic.”
    - pressanykeynow@lemmy.world
      link
      fedilink
      English
      arrow-up
      1·
      3 months ago
      There are plenty mistranslations in politics done by humans already.
Phoenixz@lemmy.ca
link
fedilink
English
arrow-up
50·
edit-2
3 months ago
As vlc is open source, can we expect this technology to also be available for, say, jellyfin, so that I can for once and for all have subtitles.done right?

Edit: I think it’s great that vlc has this, but this sounds like something many other apps could benefit from
- QuadratureSurfer@lemmy.world
  link
  fedilink
  English
  arrow-up
  22·
  3 months ago
  It’s already available for anyone to use. https://github.com/openai/whisper
  
  They’re using OpenAI’s Whisper model for this: https://code.videolan.org/videolan/vlc/-/merge_requests/5155
  - Eagle0110@lemmy.world
    link
    fedilink
    English
    arrow-up
    5·
    3 months ago
    Has there been any estimated minimal system requirements for this yet, since it runs locally?
    - WalnutLum@lemmy.ml
      link
      fedilink
      English
      arrow-up
      11·
      edit-2
      3 months ago
      It’s actually using whisper.cpp
      
      From the README:
      
      Memory usage Model Disk Mem tiny 75 MiB ~273 MB base 142 MiB ~388 MB small 466 MiB ~852 MB medium 1.5 GiB ~2.1 GB large 2.9 GiB ~3.9 GiB
      
      Those are the model sizes
      - Eagle0110@lemmy.world
        link
        fedilink
        English
        arrow-up
        3·
        3 months ago
        Oh wow those pretty tiny memory requirements for a decent modern system! That’s actually very impressive! :D
        
        Many people can probably even run this on older media servers or even just a plain NAS! That’s awesome! :D
  - lukewarm_ozone@lemmy.today
    link
    fedilink
    English
    arrow-up
    3·
    3 months ago
    Note that openai’s original whisper models are pretty slow; in my experience the distil-whisper project (via a tool like whisperx) is more than 10x faster.
- GreenKnight23@lemmy.world
  link
  fedilink
  English
  arrow-up
  20·
  edit-2
  3 months ago
  deleted by creator
  - Alexstarfire@lemmy.world
    link
    fedilink
    English
    arrow-up
    16·
    3 months ago
    That explains why their subtitles have seemed worse to me lately. Every now and then I see something obviously wrong and wonder how it got by anyone who looked at it. Now I know why. No one looked at it.
    - GreenKnight23@lemmy.world
      link
      fedilink
      English
      arrow-up
      17·
      edit-2
      3 months ago
      deleted by creator
  - dance_ninja@lemmy.world
    link
    fedilink
    English
    arrow-up
    4·
    3 months ago
    Malevolent Kitchen Intensifies
- Eezyville@sh.itjust.works
  link
  fedilink
  English
  arrow-up
  12·
  edit-2
  3 months ago
  I hope it’s available for Stash App. I wanna know what this JAV girls are saying.
  - NOT_RICK@lemmy.world
    link
    fedilink
    English
    arrow-up
    4·
    3 months ago
    ( ͡° ͜ʖ ͡°)
- JustEnoughDucks@feddit.nl
  link
  fedilink
  English
  arrow-up
  5·
  3 months ago
  In the *arr suite, bazarr has a plugin called Subgen which you can add and you can set it to generate subtitles on your entire library if you want, or only missing subtitles. The sync is spot on compared to 90% of what Opensubtitles delivers. I sometimes re-gen them with this plugin just because opensubtitles is so constantly out of sync (e.g. highly rated subtitles 4 lines will be at breakneck pace and the next 10 will be super slow and then everything is 3 seconds off)
  
  It isn’t in-player but it works. The downside is it is a larger model and takes ~20 minutes to generate a movie length of subtitles.
- asbestos@lemmy.world
  link
  fedilink
  English
  arrow-up
  2·
  3 months ago
  Ooooh I like this
billwashere@lemmy.world
link
fedilink
English
arrow-up
49·
3 months ago
This might be one of the few times I’ve seen AI being useful and not just slapped on something for marketing purposes.
- PalmTreeIsBestTree@lemmy.world
  link
  fedilink
  English
  arrow-up
  16·
  3 months ago
  And not to do evil shit
  - SatansMaggotyCumFart@lemmy.world
    link
    fedilink
    English
    arrow-up
    7·
    3 months ago
    But the toppings contains potassium benzoate.
ZeroOne@lemmy.world
link
fedilink
English
arrow-up
39·
3 months ago
As long as the models are OpenSource I have no complains
- Knock_Knock_Lemmy_In@lemmy.world
  link
  fedilink
  English
  arrow-up
  32·
  3 months ago
  And the data stays local.
TheRealKuni@lemmy.world
link
fedilink
English
arrow-up
31·
3 months ago
And yet they turned down having thumbnails for seeking because it would be too resource intensive. 😐
- DreamlandLividity@lemmy.world
  link
  fedilink
  English
  arrow-up
  16·
  3 months ago
  I mean, it would. For example Jellyfin implements it, but it does so by extracting the pictures ahead of time and saving them. It takes days to do this for my library.
  - fishpen0@lemmy.world
    link
    fedilink
    English
    arrow-up
    1·
    3 months ago
    deleted by creator
    - DreamlandLividity@lemmy.world
      link
      fedilink
      English
      arrow-up
      2·
      3 months ago
      I get what you are saying, but I don’t think there is any standardized format for these trickplay images. The same images from Plex would likely not be usable in Jellyfin without converting the metadata (e.g. to which time in the video an image belongs to). So VLC probably does not have a good way to understand trickplay images not made by VLC.
- cley_faye@lemmy.world
  link
  fedilink
  English
  arrow-up
  11·
  3 months ago
  Video decoding is resource intensive. We’re used to it, we have hardware acceleration for some of it, but spewing something around 52 million pixels every second from a highly compressed data source is not cheap. I’m not sure how both compare, but small LLM models are not that costly to run if you don’t factor their creation in.
  - TheRealKuni@lemmy.world
    link
    fedilink
    English
    arrow-up
    2·
    3 months ago
    All they’d need to do is generate thumbnails for every period on video load. Make that period adjustable. Might take a few extra seconds to load a video. Make it off by default if they’re worried about the performance hit.
    
    There are other desktop video players that make this work.
- serenissi@lemmy.world
  link
  fedilink
  English
  arrow-up
  3·
  3 months ago
  It is useful for internet streams though, not really for local or lan video.
VerPoilu@sopuli.xyz
link
fedilink
English
arrow-up
27·
3 months ago
I hope Mozilla can benefit of a good local translation engine that could come out of it as well.
- m-p{3}@lemmy.ca
  link
  fedilink
  English
  arrow-up
  16·
  edit-2
  3 months ago
  They technically already do with Project Bergamot.
  - VerPoilu@sopuli.xyz
    link
    fedilink
    English
    arrow-up
    5·
    3 months ago
    I know they do, but it’s lacking so many languages.
    - viking@infosec.pub
      link
      fedilink
      English
      arrow-up
      2·
      3 months ago
      And it takes forever. I’m using the TWP plugin for Firefox (which uses external resources, configurable to google, bing and yandex translate respectively), and it’s near instantaneous. The local one from Mozilla often takes 30 seconds, and sometimes hangs until I refresh the page.
      - mrvictory1@lemmy.world
        link
        fedilink
        English
        arrow-up
        1·
        3 months ago
        Weird, I can translate with FF even on my phone and it’s not a strong one.
Doorbook@lemmy.world
link
fedilink
English
arrow-up
23·
3 months ago
The nice thing is, now at least this can be used with live tv from other countries and languages.

Think you want to watch Japanese tv or Korean channels with out bothering about downloading, searching and syncing subtitles
- sugar_in_your_tea@sh.itjust.works
  link
  fedilink
  English
  arrow-up
  13·
  3 months ago
  I prefer watching Mexican football announcers, and it would be nice to know what they’re saying. Though that might actually detract from the experience.
  - InFerNo@lemmy.ml
    link
    fedilink
    English
    arrow-up
    6·
    3 months ago
    GOOOOOOAAAAAAAAALLLLLLLLLL
    - Qwaffle_waffle@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      4·
      3 months ago
      Just fill up the whole screen with this.
    - werefreeatlast@lemmy.worldBanned
      link
      fedilink
      English
      arrow-up
      3·
      3 months ago
      The opposing team has scored.
Thistlewick
link
fedilink
English
arrow-up
21·
3 months ago
Amazing. I can finally find out exactly what that nurse is yelling about while she gets railed by the local basketball team.
- SwingingTheLamp@midwest.social
  link
  fedilink
  English
  arrow-up
  4·
  3 months ago
  Something about a full-court press?
Clot@lemm.ee
link
fedilink
English
arrow-up
19·
3 months ago
Will it be possible to export these AI subs?
- Scrollone@feddit.it
  link
  fedilink
  English
  arrow-up
  8·
  3 months ago
  Imagine the possibilities!
Nalivai@lemmy.world
link
fedilink
English
arrow-up
16·
3 months ago
The technology is nowhere near being good though. On synthetic tests, on the data it was trained and tweeked on, maybe, I don’t know.
I corun an event when we invite speakers from all over the world, and we tried every way to generate subtitles, all of them run on the level of YouTube autogenerated ones. It’s better than nothing, but you can’t rely on it really.
- Scrollone@feddit.it
  link
  fedilink
  English
  arrow-up
  6·
  3 months ago
  No, but I think it would be super helpful to synchronize subtitles that are not aligned to the video.
  - Telodzrum@lemmy.world
    link
    fedilink
    English
    arrow-up
    5·
    3 months ago
    This is already trivial. Bazarr has been doing it for all my subtitles for almost a decade.
- TriflingToad@sh.itjust.works
  link
  fedilink
  English
  arrow-up
  4·
  edit-2
  3 months ago
  is your goal to rely on it, or to have it as a backup?
  For my purpose of having backup nearly anything will be better than nothing.
  - Nalivai@lemmy.world
    link
    fedilink
    English
    arrow-up
    2·
    3 months ago
    When you do live streaming there is no time for backup, it either works or not. Better than nothing, that’s for sure, but also maybe marginally better than whatever we had 10 years ago
- Petter1@lemm.ee
  link
  fedilink
  English
  arrow-up
  4·
  3 months ago
  You were not able to test it yet calling it nowhere near good 🤦🏻
  
  Like how should you know?!
  - Nalivai@lemmy.world
    link
    fedilink
    English
    arrow-up
    5·
    edit-2
    3 months ago
    Relax, they didn’t write a new way of doing magic, they integrated a solution from the market.
    I don’t know what the new BMW car they introduce this year is capable of, but I know for a fact it can’t fly.
- lukewarm_ozone@lemmy.today
  link
  fedilink
  English
  arrow-up
  2·
  edit-2
  3 months ago
  Really? This is the opposite of my experience with (distil-)whisper - I use it to generate subtitles for stuff like podcasts and was stunned at first by how high-quality the results are. I typically use distil-whisper/distil-large-v3, locally. Was it among the models you tried?
  - Nalivai@lemmy.world
    link
    fedilink
    English
    arrow-up
    1·
    3 months ago
    I unfortunately don’t know the specific names of the models, I will comment additionally if I will not forget to ask people who spun up the models themselves.
    The difference might be that live vs recorded stuff, I don’t know.
SuperCub@sh.itjust.works
link
fedilink
English
arrow-up
10·
3 months ago
Haven’t watched the video yet, but it makes a lot of sense that you could train an AI using already subtitled movies and their audio. There are times when official subtitles paraphrase the speech to make it easier to read quickly, so I wonder how that would work. There’s also just a lot of voice recognition everywhere nowadays, so maybe that’s all they need?
/home/pineapplelover@lemm.ee
link
fedilink
English
arrow-up
4·
3 months ago
When does this get released? I really want to try it

Technology@lemmy.world

technology@lemmy.world

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: !technology@lemmy.world

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

2.66K users / day
10.3K users / week
18.8K users / month
36K users / 6 months
321 local subscribers
69.1K subscribers
14.7K Posts
598K Comments
Modlog