SinFusion: Training Diffusion Models on a Single Image or Video

Supplementary Material

This page contains many videos, please give it a minute to load
If a video does not appear - please Refresh (F5)

Overview

HP-VAE-GAN [1] comparison
  • We compare to results published by HP-VAE-GAN [1] and by VGPNN [3]
  • Comparison is done on 10 videos, 144x256 resolution, supplied by HP-VAE-GAN in their results.
  • Note: Due to runtime limitations, HP-VAE-GAN only trained on 13 frames from the video.
    We train on the full videos (between 80-400 frames for each video) as our method can train on long videos without a large increase in runtime.

SinGAN-GIF [2]
  • We compare to results published by SinGAN-GIF [2] and by VGPNN [3]
  • Comparison is done on 5 videos (each consisting of 7-15 frames), supplied by SinGAN-GIF in their results. No code is available, thus no further comparison can be made.
Comparison to HP-VAE-GAN [1], VGPNN [3] Back to Top Back to Main Page
Input Video (short):
Input Video (long):
SinFusion (Ours):
HP-VAE-GAN:
VGPNN:
Input Video (short):
Input Video (long):
SinFusion (Ours):
HP-VAE-GAN:
VGPNN:
Input Video (short):
Input Video (long):
SinFusion (Ours):
HP-VAE-GAN:
VGPNN:
Input Video (short):
Input Video (long):
SinFusion (Ours):
HP-VAE-GAN:
VGPNN:
Input Video (short):
Input Video (long):
SinFusion (Ours):
HP-VAE-GAN:
VGPNN:
Input Video (short):
Input Video (long):
SinFusion (Ours):
HP-VAE-GAN:
VGPNN:
Input Video (short):
Input Video (long):
SinFusion (Ours):
HP-VAE-GAN:
VGPNN:
Input Video (short):
Input Video (long):
SinFusion (Ours):
HP-VAE-GAN:
VGPNN:
Input Video (short):
Input Video (long):
SinFusion (Ours):
HP-VAE-GAN:
VGPNN:
Input Video (short):
Input Video (long):
SinFusion (Ours):
HP-VAE-GAN:
VGPNN:



Comparison to SinGAN-GIF [2], VGPNN [3] Back to Top Back to Main Page
Input Video:
SinFusion (Ours):
SinGAN-GIF:
VGPNN:
Input Video:
SinFusion (Ours):
SinGAN-GIF:
VGPNN:
Input Video:
SinFusion (Ours):
SinGAN-GIF:
VGPNN:
Input Video:
SinFusion (Ours):
SinGAN-GIF:
VGPNN:
Input Video:
SinFusion (Ours):
SinGAN-GIF:
VGPNN:

Back to Top Back to Main Page


Relevant references:
[1] ‏Gur, S., Benaim, S., & Wolf, L. (2020). Hierarchical patch vae-gan: Generating diverse videos from a single sample. Advances in Neural Information Processing Systems, 33, 16761-16772.‏
[2]Arora, R., & Lee, Y. J. (2021). Singan-gif: Learning a generative video model from a single gif. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 1310-1319).‏ ‏
[3] Haim, N., Feinstein, B., Granot, N., Shocher, A., Bagon, S., Dekel, T., & Irani, M. (2022). Diverse Video Generation from a Single Video. arXiv preprint arXiv:2205.05725.‏