Learning Objectives
By the end of this section, you will be able to:
- Explain the SDEdit algorithm and how it enables image-to-image transformation without additional training
- Understand the noise injection process and its role in controlling the transformation
- Use the strength parameter to balance between input preservation and prompt adherence
- Implement img2img generation with Stable Diffusion
- Apply img2img to practical tasks like style transfer, sketch refinement, and image editing
SDEdit: Stochastic Differential Editing
SDEdit, introduced by Meng et al. (2021), provides a remarkably simple yet powerful approach to image editing with diffusion models. The key insight:
Core Idea: Add noise to an input image to a certain level, then denoise it with a new text prompt. The amount of noise controls how much of the original image is preserved.
Why This Works
Consider the diffusion process:
- At : Clean image with all details
- At : Image structure visible, but details are noisy
- At : Pure noise, no original image information
By adding noise to a specific timestep and then denoising:
- Structure preserved: Large-scale features survive the noising
- Details regenerated: Fine details are created fresh during denoising
- Prompt guides regeneration: Text conditioning shapes the new details
The Noise Injection Process
The noise injection follows the standard forward diffusion formula:
Where:
- is the encoded input image
- is random noise
- determines the signal-to-noise ratio
Signal Preservation
The amount of original image signal preserved depends on :
| Timestep | alpha_bar | Signal Ratio | What Survives |
|---|---|---|---|
| t = 100 | 0.98 | 98% | Almost everything |
| t = 300 | 0.85 | 85% | Overall structure, colors |
| t = 500 | 0.60 | 60% | Major shapes, composition |
| t = 700 | 0.30 | 30% | Rough layout only |
| t = 900 | 0.05 | 5% | Almost nothing |
Intuition: Noise injection is like progressively blurring an image. At low noise levels, you can still see details. At high noise levels, only the vague silhouette remains.
The Strength Parameter
The strength parameter (0.0 to 1.0) controls the transformation intensity:
Choosing the Right Strength
| Strength | Use Case | Original Preserved |
|---|---|---|
| 0.2 - 0.3 | Subtle tweaks, color correction | Very high |
| 0.4 - 0.5 | Style transfer, texture changes | High |
| 0.5 - 0.7 | Significant style change, content preserved | Medium |
| 0.7 - 0.8 | Major transformation, composition kept | Low |
| 0.8 - 1.0 | Almost pure generation, slight hints | Minimal |
Common Mistake
Implementation
Here's a complete implementation using the Diffusers library:
Basic img2img
Advanced: Inpainting
A variant of img2img that only regenerates masked regions:
- Input: Image + mask (white = regenerate, black = keep)
- Process: Only noisy regions are denoised; masked areas preserved
- Use case: Object removal, adding elements, fixing artifacts
Applications and Use Cases
1. Style Transfer
Transform images to different artistic styles:
- Input: Photograph
- Prompt: "An oil painting in the style of Van Gogh"
- Strength: 0.5-0.7 (preserve structure, change style)
2. Sketch to Image
Refine rough sketches into detailed images:
- Input: Hand-drawn sketch
- Prompt: "Detailed digital art of [subject]"
- Strength: 0.6-0.8 (add details, follow structure)
3. Photo Enhancement
Improve photo quality and add details:
- Input: Low-quality or blurry photo
- Prompt: "High quality, detailed photograph of [subject]"
- Strength: 0.3-0.5 (enhance without changing content)
4. Scene Modifications
Change aspects of a scene while keeping composition:
- Input: Daytime scene
- Prompt: "Same scene at night with stars"
- Strength: 0.5-0.7 (change lighting, keep structure)
Pro Tip: For best results, describe both what you want AND elements from the original image in your prompt. This helps guide the model to preserve desired features.
Summary
Image-to-image generation with SDEdit provides powerful transformation capabilities without additional training:
- SDEdit algorithm: Add noise to input, then denoise with new prompt - simple yet effective
- Noise injection: Forward diffusion formula determines how much original signal is preserved
- Strength parameter: 0.0 = no change, 1.0 = pure generation; typical range 0.4-0.7 for editing
- Applications: Style transfer, sketch refinement, photo enhancement, scene modifications
- Key insight: Structure survives noising better than details, enabling structural preservation with detail regeneration
Looking Ahead: In the next section, we'll explore IP-Adapter, which enables using images as prompts - providing a different form of image conditioning that captures style and content semantically.