Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

return2ozma@lemmy.world · 1 year ago

Microsoft’s VASA-1 can deepfake a person with one photo and one audio track

venusaur@lemmy.world · 1 year ago

One photo? That’s incredible.

Wild Bill@midwest.social · 1 year ago

Yeah. Incredibly horrific.

T00l_shed@lemmy.world · 1 year ago

Yes I hate what AI is becoming capable of. Last year everyone was laughing at the shitty fingers, but were quickly moving past that. I’m concerned that in the near future it will be hard to tell truth from fiction.

redcalcium@lemmy.institute · 1 year ago

Combine this with an LLM with speech-to-text input and we could create a talking paintings like in harry potter movies. Heck, hang it on a door and hook it with smart lock to recreate the dorm doors in harry potter and see if people can trick it to open the door.

NotMyOldRedditName@lemmy.world · 1 year ago

Any sufficiently advanced technology is indistinguishable from magic.

Harry Potter wasn’t a fantasy movie, it was a SciFi and we just didn’t know it.

MuchPineapples@lemmy.world · 1 year ago

It was midichlorians all along.

DUMBASS@leminal.space · 1 year ago

You’re a Jedi 'Arry!

Pope-King Joe@lemmy.world · 1 year ago

Imma wot?

chatokun@lemmy.dbzer0.com · 1 year ago

I was actually discussing this very idea with my brother, who went to the Wizarding World of Harry Potter at Universal Studios, Orrrlandooooo recently and while he enjoyed himself, said it felt like not much is new in theme parks nowadays. Adding in AI driven pictures you could actually talk to might spice it up.

Flying Squid@lemmy.world · 1 year ago

I like your optimism where this doesn’t result in making everything worse.

Admiral Patrick@dubvee.org · 1 year ago

Also Microsoft…

Microsoft warns deepfake election subversion is disturbingly easy

I know the genie’s out of the bottle, but goddamn.

Etterra@lemmy.world · 1 year ago

Microsoft: I know this will only be used for evil, but I’ll be damned if I’m gonna pass up on the hype-boost to my market share.

Every other big corp: same!

slaacaa@lemmy.world · 1 year ago

“At long last, we have created the Torment Nexus from classic sci-fi novel Don’t Create The Torment Nexus”

antlion@lemmy.dbzer0.com · 1 year ago

Since it’s trained on celebrities, can it do ugly people or would it try to make them prettier in animation?

The teeth change sizes, which is kinda weird, but probably fixable.

It’s not too hard to notice for an up close face shot, but if it was farther away it might be hard - the intonation and facial expressions are spot on. They should use this to re-do all the digital faces in Star Wars.

Ms. ArmoredThirteen@lemmy.ml · 1 year ago

These vids are just off enough that I think doing a bunch of mushrooms and watching them would be a deeply haunting experience

return2ozma@lemmy.world · 1 year ago

The first video her bottom teeth shift around.

Dozzi92@lemmy.world · 1 year ago

So esse finally the music video for Drugs by Ratatat.

PipedLinkBot@feddit.rocks · 1 year ago

Here is an alternative Piped link(s):

music video for Drugs by Ratatat

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

MeekerThanBeaker@lemmy.world · 1 year ago

This is why I don’t post my picture online and I never talk to anyone ever, while hiding my head inside a nylon stocking (unrelated).

1 year ago

AnAnonymous@lemm.ee · 1 year ago

Paranoia vibes starting in 3, 2, 1…

dhork@lemmy.world · 1 year ago

Vasa? Like, the Swedish ship that sank 10 minutes after it was launched? Who named that project?

hakunawazo@lemmy.world · 1 year ago

No, like the crispbread.

DUMBASS@leminal.space · 1 year ago

There are a lot of flying vehicles named after birds who famously plummet to the ground at breakneck speeds.

Jimmycakes@lemmy.world · 1 year ago

They developed an ai to name all future ai. Ironically it is unnamed.

BetaDoggo_@lemmy.world · 1 year ago

The “why would they make this” people don’t understand how important this type of research is. It’s important to show what’s possible so that we can be ready for it. There are many bad actors already pursuing similar tools if they don’t have them already. The worst case is being blindsided by something not seen before.

Jesus@lemmy.world · 1 year ago

Microsoft’s research teams always makes some pretty crazy stuff. The problem with Microsoft is that they absolutely suck at translating their lab work into consumer products. Their labs publications are an amazing archive of shit that MS couldn’t get out the door properly or on time. Example - multitouch gesture UIs.

As interesting as this is, I’ll bet MS just ends up using some tech that Open AI launches before MS’s bureaucratic product team can get their shit together.

thefartographer@lemm.ee · 1 year ago

The pores don’t stretch, but the teeth and irises sure do!

T00l_shed@lemmy.world · 1 year ago

I’m sure they will fix that before you know it.

AutoTL;DR@lemmings.world · 1 year ago

This is the best summary I could come up with:

On Tuesday, Microsoft Research Asia unveiled VASA-1, an AI model that can create a synchronized animated video of a person talking or singing from a single photo and an existing audio track.

In the future, it could power virtual avatars that render locally and don’t require video feeds—or allow anyone with similar tools to take a photo of a person found online and make them appear to say whatever they want.

To show off the model, Microsoft created a VASA-1 research page featuring many sample videos of the tool in action, including people singing and speaking in sync with pre-recorded audio tracks.

The examples also include some more fanciful generations, such as Mona Lisa rapping to an audio track of Anne Hathaway performing a “Paparazzi” song on Conan O’Brien.

While the Microsoft researchers tout potential positive applications like enhancing educational equity, improving accessibility, and providing therapeutic companionship, the technology could also easily be misused.

“We are opposed to any behavior to create misleading or harmful contents of real persons, and are interested in applying our technique for advancing forgery detection,” write the researchers.

The original article contains 797 words, the summary contains 183 words. Saved 77%. I’m a bot and I’m open source!

Maeve@kbin.social · 1 year ago

A long time ago, someone from a not free country wrote a white paper on why we should care about privacy, because written words can be edited to level false accusations (charges) with false evidence. This chills me to the bone.

tal@lemmy.today · 1 year ago

I’d be less-concerned about the impact on not-free countries than free countries. Dictator Bob doesn’t need evidence to have the justice system get rid of you, because he controls the justice system.

Kiosade@lemmy.ca · 1 year ago

This is turning into some Mistborn shit. “Don’t trust writing not written on metal”

Sanctus@lemmy.dbzer0.com · 1 year ago

“You shot that man, citizen. Here is video evidence. Put your hands against the wall.” - and more coming to you soon!

I_Miss_Daniel@lemmy.world · 1 year ago

Feed it Microsoft Merlin. What will happen?