Source: Denoising Diffusion Probabilistic Models

AI source

Jonathan Ho, Ajay Jain, Pieter Abbeel Jun 18, 2020

PublishedJune 18, 2020

AuthorJonathan Ho, Ajay Jain, Pieter Abbeel

Source: Denoising Diffusion Probabilistic Models

Summary

This paper demonstrated that Diffusion Models — a class of generative models that learn to reverse a gradual noising process — can produce high-quality images competitive with GANs. By framing image generation as iterative denoising, the authors showed that a conceptually simple training objective (predict the noise added to an image) yields state-of-the-art sample quality. This work launched the diffusion model paradigm that now powers image generators like DALL-E, Stable Diffusion, and Midjourney.

Key Claims

Diffusion models match or exceed GANs on image quality. On unconditional CIFAR-10, the model achieved an Inception Score of 9.46 and FID of 3.17 — state-of-the-art at the time. On 256x256 LSUN, sample quality was comparable to ProgressiveGAN.
The training objective is denoising. The forward process gradually adds Gaussian noise to an image over T steps until it becomes pure noise. The model learns to reverse each step — given a noisy image, predict the noise that was added. Training reduces to a simple mean-squared error loss.
A connection to score matching exists. The authors showed that a specific parameterization of the model is equivalent to denoising score matching with Langevin dynamics, connecting diffusion models to an established theoretical framework.
Sampling is progressive decompression. The generation process starts from pure Gaussian noise and iteratively denoises, producing progressively more detailed images. This can be interpreted as a generalization of autoregressive decoding along a bit ordering.
Log-likelihoods are not competitive. Despite excellent sample quality, the model’s log-likelihoods were worse than other likelihood-based models. The authors found that most of the model’s capacity was spent on imperceptible image details (lossless coding of fine structure).

Relevance and Implications

DDPM established diffusion models as a viable alternative to GANs for high-quality image generation, with several advantages: stable training (no mode collapse), principled mathematical framework, and flexibility. The approach was rapidly extended to text-to-image generation (DALL-E 2, Stable Diffusion), video, audio, 3D content, and even molecular design. Diffusion models represent the second major generative paradigm alongside autoregressive language models — while LLMs generate text token by token, diffusion models generate images by iterative refinement from noise.

Sources

Original paper (arXiv)

Source: Denoising Diffusion Probabilistic Models

Source: Denoising Diffusion Probabilistic Models

Summary

Key Claims

Relevance and Implications

Sources

Related pages