In his latest project, artist and coder Mario Klingemann uses a neural network to match archival movie footage with the content of recent movie trailers. He regularly posts the resulting “neural reinterpretations” on his Twitter. The results are technically impressive. They’re also a fascinating view into how to explore the creative possibilities of a machine learning technique.
Looking through Klingemann’s tweets you can trace his explorations:
Mario Klingemann’s neural scene classifier grouping scenes it finds similar.
- Early in the explorations he posted clusters of similar frames and clips in a 3x3 grid.
- He then experiments with compilations, like scenes of water flowing (and other scenes the model thinks look similar to water).
- Then he adds an element of interactivity, using his webcam as the source against which to match the archival footage.
- He tries using the matches to create new reaction gifs.
A neural reinterpretation of the Fight Club trailer, with the original footage on the left and the matched on the right.
- Then he moves into the trailer reinterpretations, with both stand-alone reinterpreted versions (Fight Club) and side-by-side versions (Fight Club side-by-side).
- He does another version of the Fight Club trailer using a tool that allows him to select from several algorithm-supplied suggestions:
The movie trailer reinterpretations are a great showcase for the technique for a couple of reasons:
Trailers are made up of short clips. This gives the algorithm lots of shots at finding interesting matches (every cut is a new example). If it was instead focused on a 2 minute long continuous scene, you wouldn’t get to see nearly as many matches. Also the fact that the cuts are often timed to the music makes the reinterpreted content appear more connected to the audio of the trailer.
Films have a built up vocabulary of what different shots mean, like a close-up of a face to signal intense feelings. Film-makers employ these patterns consciously. As film watchers, we may not think about scene types explicitly, but we do build up associations and expectations with different framing, movements, and styles. The side-by-side reinterpretations make this referential language more visible by showing us two examples at a time, helping us notice the similarity the machine has identified. We can then often extrapolate even further into “ah, right, that’s another one of those ‘vehicles rushing by’ shots” that you normally don’t consciously note. This takes the trailers from technical demos into artistic territory.
A still from “Learning to see: Gloomy Sunday” by Memo Atken
“Learning to see: Gloomy Sunday” by Memo Akten explores similarity in a different, fascinating way. He has a model trained on specific types of art that interpret his webcam photos and generate new images: for example, a sheet becomes waves. Like in the trailer reinterpretations, what takes this beyond technical demo is how suggestive the association can be. The machine’s ability to identify similarity between a sheet and a wave gives us an understanding that we can then apply outside of the context of the video. It’s a suggestive analogy that opens out so that the viewer can build upon it and make their own connections.
More from the Blog
Jul 24 2018
by — We are excited to share the latest report and prototype from our machine intelligence R&D team: Multi-Task Learning. Wax on.. face off! When humans learn new tasks, we take advantage of knowledge we’ve gained from learning, or having learned, related tasks. Take the 1984 movie Karate Kid, where Mr. Miyagi takes on Daniel as his martial arts student. He begins Daniel’s training by having ...
Jul 31 2018
Topic models can extract key themes from large collections of documents in an unsupervised manner, which makes them one of the most powerful tools in organizing, searching, and understanding the vast troves of text data produced by humanity. Their power derives, in part, from their in-built assumptions about the nature of text; specifically, to identify topics, the model has to give the notion ...
Sep 5 2019
by — Machine learning enables us to build systems that can predict the world around us: like what movies we’d like to watch, how much traffic we’ll experience on our morning commute, or what words we’ll type next in our emails. There are many types of models and tasks. Face detection models transform raw image pixels into high level signals (like the presence and position of eyes, noses, and ears) ...