--- Fast Video Inpainting on a Budget

FM²FVI: Flow Matching for Fast Multi-frame Video Inpainting

Mathis Wauquiez Yann Gousseau Alasdair Newson Andrés Almansa

LTCI, Télécom Paris, ISIR, Sorbonne Université, MAP5, Université Paris Descartes

Report Code

Original

Inpainted

Original

Inpainted

Abstract

Removing an unwanted object from an image now takes just a few seconds on a smartphone. Doing the same on a video can require tens of minutes of computation on a modern GPU. This accessibility gap is no coincidence: video inpainting relies predominantly on diffusion models whose computational cost grows linearly with the temporal dimension.

We propose FM²FVI, a frugal approach to video inpainting based on Flow Matching, a recent generative modeling framework closely related to diffusion models. Flow Matching offers several advantages: mathematical simplicity, the ability to handle non-Gaussian source distributions, connections to optimal transport, and most importantly, fast sampling with fewer function evaluations.

Our first contribution is a complete and modular library for training Flow Matching models, supporting all state-of-the-art parameterizations including schedulers, sampling strategies, loss functions, ODE solvers, and guidance modes.

We then develop two image inpainting methodologies based solely on image self-similarity, avoiding priors from large datasets that may raise ethical or legal concerns.

Finally, we extend our most promising approach to video inpainting, demonstrating that it is possible to achieve visually satisfying results with a model using fewer than 500,000 parameters, trained on a single video. This represents a significant step toward democratizing video editing on consumer hardware while reducing energy consumption and environmental impact.

FM²FVI: Flow Matching for Fast Multi-frame Video Inpainting

Mathis Wauquiez Yann Gousseau Alasdair Newson Andrés Almansa

Report Code

Abstract

FMFVI: Flow Matching for Fast Video Inpainting