Introduction

Deepfakes have exploded in popularity — and controversy — thanks to rapid advances in generative AI. In this guide, we’ll dive deep into how photo deepfakes are made using the popular open-source AI image generation framework Stable Diffusion.

If you’re new to Stable Diffusion, it’s a powerful text-to-image AI model capable of creating high-quality, realistic images from simple prompts. Unlike closed models like Midjourney or DALL-E, Stable Diffusion is open-source, allowing anyone to train custom datasets, build new models, and share them with a massive global community.

While this openness fuels creativity, it also makes deepfake creation more accessible than ever — raising serious ethical and security concerns.

What Are Photo Deepfakes?

A deepfake is AI-generated digital media designed to replicate someone’s likeness in a way that can be almost indistinguishable from reality. While video deepfakes get most of the attention, photo deepfakes are just as convincing — and much easier to create.

There are two main methods used to create AI-generated fake images:

1. Face Swaps

This method takes an existing photo (or video) and swaps the face with a target face.

  • Works frame-by-frame for video deepfakes
  • Lower quality and less flexible
  • Requires fewer resources
  • Common in mobile apps found on iOS and Android stores

2. Custom Models

Instead of replacing just a face, custom models generate entirely new images in any pose, style, or setting.

  • Requires more technical skill and GPU resources
  • Allows unlimited creative control
  • Produces far more realistic results
  • The preferred method for professional deepfake creators

In this article, we’ll focus on custom models created with Stable Diffusion.

Why Stable Diffusion Is the Go-To Tool for Deepfakes

Unlike Midjourney or DALL-E, Stable Diffusion supports training on external datasets, making it the perfect choice for creating targeted image generations.

Some key points:

  • Open-source model with strong community support
  • GUI tools like Automatic 1111 and ComfyUI make it beginner-friendly
  • Popular base models: SD 1.5 and SDXL
  • Specialty models like Realistic Vision or Photon are fine-tuned for lifelike human images

Custom Models & LoRA Training

While base models can produce realistic humans, they don’t know your specific target. To generate convincing deepfakes, creators train the model with reference images using one of two methods:

Textual Inversions (TI)

Small prompt-based embeddings that teach the model specific visual features.

LoRA (Low-Rank Adaptation)

A more powerful and flexible training technique that integrates directly into image prompts.

Training LoRAs is often done with kohya_ss, a popular open-source tool. Training can be run locally with a strong GPU or on cloud services like Runpod.

Once trained:

  1. Load your LoRA into Stable Diffusion
  2. Reference it in the prompt using its trigger word (e.g., andy-tcp)
  3. Combine with a realistic model for photorealistic results

Step-By-Step Example: Creating a Photo Deepfake in Stable Diffusion

  1. Choose a Realistic Model – Example: Photon from CivitAI
  2. Load Your LoRA – Import the .safetensors file into Automatic 1111
  3. Write Your Prompt – Include the LoRA trigger word
  4. Add Negative Prompts – Remove unwanted artifacts
  5. Select Sampler & Steps – Example: SDE sampler, 20 steps
  6. Generate the Image – Review and refine as needed

Spotting AI-Generated Photos

While AI models are improving rapidly, they still make tell-tale mistakes. To identify a fake:

  1. Check the Fingers – Too many, too few, or oddly shaped
  2. Look at Limbs – Disjointed or unnatural bending
  3. Watch Side Profiles – Depth and angles are often off
  4. Inspect Backgrounds – Blurry faces or warped objects
  5. Read Text Carefully – Misspellings or nonsense letters are common

Tools for Detecting Deepfakes

While spotting mistakes visually works today, AI is improving quickly. Tools include:

  • Illuminarty – Free AI image detection tool (but prone to false positives)
  • Microsoft Video Authenticator & DeepDetector – Claim high accuracy, but are closed to the public
  • Automatic 1111 PNG Info Tool – Reads embedded metadata from Stable Diffusion images (if not stripped)

Beyond Photos: Other Forms of Deepfakes

Photo deepfakes are just one category. Others include:

  • Video face swaps – Real-time replacement of faces in video
  • Audio & lip-sync deepfakes – Fake voices matched to mouth movements
  • Voice cloning – Audio-only impersonations
  • Style impersonation – Mimicking someone’s writing style for phishing or social engineering

Conclusion

Stable Diffusion is a powerful creative tool — but in the wrong hands, it can be used for highly convincing deepfakes. Understanding how these images are made and how to detect them is critical for staying ahead of AI-driven misinformation.

If you found this guide helpful, share it with your network — and let me know in the comments what related topics you’d like me to cover next.


Leave a comment

additional resources

Explore by category to find regularly updated content — including blog posts, scripts to YouTube and podcast videos, infographics, and other valuable resources