Alternative titles: We're Live.
- The Idea
- Create a simple hardware box that allows users to do live swapping or deepfakes of their face onto any content they would normally watch on TV.
- The box would site between the TV and something like an Apple TV, DVR, Roku, Chromecast or something else.
- You could capture a photo of face with a built in camera, or access a webapp and upload a selection of faces to be used.
- Why do I like this idea? Is this a tech demo to explore a novel application, or something more significant?
- There is something exciting, hilarious and unsettling about watching TV where every character on screen has your face, or your spouses face. Maybe you just feel comforted by the visage of Danny DeVito and want everyone on screen to look like him.
- As the walled garden of software and hardware becomes more impenetrable for weird use cases like this, I think using a hardware box to modify the signal is still a compelling act. With the advent of smart TV's, many of us don't even have the ability to alter the signal - everything is folded into one black box that we can't reasonably hack or control anymore.
- There are plenty of compelling precedents here, even face filters built into dozens of popular apps work on similar principals. The difference for me is that this is purely for watching at length on TV and also applying it live to signals that weren't meant (or maybe aren't allowed) to be processed like that.
- What do I have so far?
- I experimented with this idea years ago with my project "We're Live", but the software stack and hardware was a pain to deal with and wouldn't have been easy for others to set up. I also didn't have a solution for audio pass-through.
- Next Steps/Where I'm stuck
- Awareness of the state of the art - I haven't touched face swapping technologies since about 2014, so there are probably a ton of great new real time methods that have greatly improved quality.
- Video capture and passthrough would be relatively easy, especially if I'm ok with downsampling to 1080p, but audio sync and passthrough might not be insignificant to deal with. There are capture devices that can do audio passthrough via 1/8", and it may be as simple as using TV settings to delay the audio feed so that it aligns with the processed video.
- The copy protection on HDMI signals could add a hurdle to this, but there are devices out there that strip HDCP and allow for capture and passthrough. More info on this.
- Finding the right hardware solution that isn't crazy expensive, can live in a reasonably sized enclosure, does HDMI capture and output, and can deal with HDCP could take some trial and error.
- I would want to make all of this in a modular fashion that can either be packaged for a small run of ~5 devices to be passed out AND post the build instructions and software so that anyone can make their own box.
- Capturing and selecting the face to be pasted onto the TV signal could be done many ways. The simplest would probably be to have a simple web app hosted on the box where you upload and select the face.
- Technologies Involved
- HDMI HDCP stripping device - likely an HDMI Splitter
- HDMI Capture device
- PC (Windows or Linux)
- Media device (Apple TV, Roku, Chromecast, Game Console etc.)
- TV
- Logistical thoughts
- Hardware research would be involved, but not too much work. I suspect the hardware wouldn't be more than $2000, depending on the desired performance of the PC, and the quality of capture device
- Software:
- The bulk of the work here would be in software development of the face swap element.
- A second crucial component would be to create the web app that would allow people to change settings and upload photos.
- I would wager maybe 1-2 weeks of development for the web app and a similar amount for the face swap element, depending on the desired quality of output.
- Are there other elements that could make this better?
- The web portal to control the settings would open up a lot of options. You could set a repository of faces to swap in and watch a show that features all of your friends as the actors. You could set modes that vary the amount that your swapped face shows up, anywhere from constantly to every 5-10 minutes. You could load other faces from the internet.
- Adding a camera and capture button to the device could be a good workaround for a public exhibition setup.
- High quality deepfake results that people think