Project Background

The goal for the project was to implement a motion amplification program. This program would take a video as input and for a selected frequency passband have select motion amplified. To do this we had to decompose the video into frequency components and amplify the small motions.

Motion Amplification Approaches

From our research we decided to work on motion amplification with two approaches; Eulerian amplification and phase-based amplification. Eulerian amplification is an approach developed in an MIT paper Wu et al. where the video frames are decomposed into spatial frequency bands using a Laplacian pyramid which temporal filters are then applied to a pass-band of interest amplifying motion. The phase-based approach was an improvement on the Eulerian approach developed by [Wadhwa et al.] which uses complex steerable pyramids to decompose the video and instead filter out the phase and amplify that.

The phased-based approach has advantages over the Eulerian approach the main one being reduced noise. We decided to implement the Eulerian method first and if we had time we would implement the improved phased based approach.

Temporal Filtering

A video is a sequence of images shown quickly over time. We often filter over time keeping other variables constant but we can instead keep our pixel value constant and see how the signal changes over time. Looking at how pixel slices change over time can allow us to discern motion information from the video. Using the Eulerian approach we amplify a certain pass-band for frequencies of interest and magnify them by a constant. We can also attenuate still objects (DC component) and higher frequency motion that we don't need to amplify. For the phase-based approach we amplify and attenuate certain passband phases.

Spatial Frequency

Spatial frequency describes the periodic distributions of light and dark in an image. High spatial frequencies correspond to fine detail such as edges and low spatial frequencies correspond to coarse information that contributes more to the entire image.

Image Pyramids

Image pyramids are a form of multi-signal representation where an image is repeatedly down-sampled and filtered which results in a set of images of shrinking size similar to the structure of a physical pyramid. A common pyramid is the Gaussian pyramid where pixels are average over a Gaussian distribution blurring and shrinking the image. This effectively low-passes the image as high frequency fine detail is lost. A Laplacian pyramid is similar to a Gaussian pyramid but each level of the pyramid is the difference between levels effectively band-passing the image.

Steerable pyramids use an over-complete basis of filters at each level of the pyramid that not only filter at different frequencies but also orientation.

A Gaussian Pyramid Pyramid

Eulerian Motion Amplification

The approach has the video decomposed into spatial frequency bands using a Laplacian pyramid. Then the spatial bands and filter temporally to detect motion. Then for our target passband we amplify the signal. Then we collapse the pyramid to recover the modified video.