SinFusion: Training Diffusion Models on a Single Image or Video

Supplementary Material

Back to Main
In This Page:
NNF Comparison
SinFusion vs VGPNN [1] Projector model Ablation Comparison to VDM [2]

Nearest-neighbour field (NNF) comparison (SinFusion vs VGPNN [1])

As explained in the paper, our video generation method doesn't simply copy spatio-temporal chunks, but generated never-seen-before frames according to the main motions in the video.
This can be seen in the following video depiction of Fig. 9.
Each row contains four videos. From left to right -
VGPNN [1] Generated video - A video generated by VGPNN. The generation copies large spatio-temporal chunks as-is from the input video.
VGPNN NNF color map - The NNF map corresponding to the VGPNN video. Every large uniformly-colored chunk is copied as-is from the original video.
SinFusion Generated video - A video generated by SinFusion (ours). Our video is more diverse and doesn't simply copy chunks from the input video.
SinFusion NNF color map - The NNF map corresponding to our generated video. The varied color map represents diverse directions to nearest-neighbour patches, indicating that our method doesn't copy large existing chunks from the single input video.

VGPNN Generated Video, NNF Map	SinFusion Generated Video, NNF Map

Projector Ablation

Here we compare several videos generated by our model with and without the usage of the Projector model.
This shows the importance of the Projector model in removing of small artifacts generated by our auto-regressive Predictor model.
Notice how the videos that were generated without the Projector (right column) slowly accumulate visual artifacts and decay to poor quality.

Input Video	Generation example Predictor & Projector	Generation example No Projector (Only Predictor)

Comparison to VDM

As described in the paper, we show a basic qualitative comparison between our single-video DDPM to a VDM [2] trained on a single video.
Top: generated videos using our method.
Bottom: generated videos using VDM [2].
For further explanation, please see discussion in the supplementary material details file.