Skip to content

Stable Diffusion with ComfyUI

ComfyUI is a powerful node-graph interface for Stable Diffusion, allowing advanced control over image generation workflows. Instead of a one-box prompt with fixed settings, ComfyUI lets you build a visual graph of nodes (model loaders, image processors, samplers, etc.) to customize every step of generation. This tutorial will cover text-to-image, image-to-image, inpainting, and ControlNet usage in ComfyUI, complete with example workflows, images, and tips. We’ll also highlight how ComfyUI handles parameters like resolution, seed, and CFG scale via nodes (not in the text prompt) and provide links to further resources on mastering ComfyUI.

ComfyUI Basics: Node-Based Workflows

ComfyUI’s interface is a blank canvas where you add and connect nodes to form a workflow graph. Each node performs a specific function (loading a model, encoding text, sampling an image, etc.), and you connect them with “wires” that pass data (like latent images or conditioning signals). For example, the default text-to-image workflow in ComfyUI consists of nodes for loading the Stable Diffusion model, encoding the text prompt (positive & negative), creating an empty latent noise image, the KSampler (diffusion sampler), VAE decode, and Save Image, all wired together in sequence. Below is a screenshot of a basic text-to-image workflow in ComfyUI, with each node performing a role in the generation pipeline:

Example of a ComfyUI workflow graph (default text-to-image). Nodes (left to right): Load Checkpoint (loads the Stable Diffusion model), Empty Latent Image (defines canvas size and initial noise), two CLIP Text Encode nodes for positive and negative prompts (outputs text embeddings), KSampler (does the diffusion process with parameters like seed, steps, CFG), VAE Decode (converts latent to final image), and Save Image (previews/saves output). Arrows show how data flows between nodes.

Notice that generation parameters are handled by nodes – for instance, the Empty Latent Image node sets the resolution (width/height) of the output, and the KSampler node has fields for seed, steps, and CFG scale, etc. This means you do not include things like resolution, seed, or CFG scale in your text prompt – those are not part of the prompt text, but rather configured in the appropriate nodes (e.g. setting the KSampler’s seed or the latent image size). The text prompt itself should only describe the image content you want (and we’ll cover prompt tips shortly). ComfyUI’s node approach might feel technical at first, but it offers tremendous flexibility: you can insert additional processing (like upscalers or mask operations), branch into multiple samplers, or integrate control modules, all by adding/modifying nodes in the graph.

Tip: You can load ready-made workflows. ComfyUI supports importing workflows saved as JSON or even PNG images that embed the workflow metadata. The official ComfyUI documentation provides example workflow images you can drag into the UI to recreate setups for text-to-image, img2img, etc. (this is a great way to start learning the node setups). Also, the UI has a “Load Default” option which spawns the default SD1.5 text2img workflow.

Text-to-Image Generation in ComfyUI

Text-to-Image Generation in ComfyUI

Text-to-image is the fundamental Stable Diffusion use-case: you provide a text prompt and the AI generates an image. In ComfyUI, ensure you have a Stable Diffusion model checkpoint (e.g. SD1.5) placed in the ComfyUI/models/checkpoints/ folder and loaded via a Load Checkpoint node. The basic steps for a text-to-image workflow in ComfyUI are:

1. Load the Model: Use a Load Checkpoint node and select your Stable Diffusion model file (e.g. v1-5-pruned-emaonly.safetensors). This node supplies the UNet (diffusion model), the CLIP text encoder, and typically a VAE into the graph (the model components are exposed to other nodes).

2. Prepare the Canvas (Latent Noise): Add an Empty Latent Image node, setting the desired output width & height (e.g. 512×512) and batch size. This node produces a latent space filled with random noise as the starting canvas, and its size defines the final image dimensions (think of it as an empty “painting canvas” of noise at the chosen resolution).

3. Text Prompt Encoding: For your positive prompt (what you want in the image), add a CLIP Text Encode node. Enter your prompt text here. Similarly, you can add another CLIP Text Encode for a negative prompt (things to avoid). Connect the positive and negative outputs from these nodes into the KSampler’s conditioning inputs – ComfyUI uses orange “CONDITIONING” wires for these text embeddings. For example, the positive prompt might be a majestic castle on a hill, sunset lighting, detailed, masterpiece and negative prompt low quality, blurry, bad anatomy. The CLIP Text Encode nodes convert your text into embeddings that the diffusion model can understand.

4. Sampling (Image Generation): The core of generation is the KSampler node. Connect the model (from Load Checkpoint), the latent image (from Empty Latent), and the conditioning (from your CLIP text nodes) into the KSampler. In the KSampler’s properties, set your desired sampling algorithm (e.g. Euler a, DPM++ Karras, etc.), number of steps (e.g. 20–50), and CFG scale (Classifier-Free Guidance, e.g. 7 or 8). You can also set a seed number for reproducibility, or leave it random (e.g. seed = -1 uses a random seed each run). When you queue or run the workflow, the KSampler will perform the diffusion process: starting from the random latent, it iteratively denoises

it guided by the model and your text prompt encodings to produce a latent representation of an image.

5. Decode and Save: The output of KSampler is a denoised latent image. To turn this into a viewable image, pass it through a VAE Decode node, which converts the latent back into a regular image (pixel space). Finally, connect that to a Save Image (or Preview Image) node, which will display the image in the UI and allow you to save it to disk. After running the sampler, you should see the resulting picture appear. If you’re not satisfied, you can tweak the prompt or parameters and run again – with a random seed you’ll get a different variation each time.

Prompt Tips: Writing effective Stable Diffusion prompts in ComfyUI is the same syntax as in other UIs like Automatic1111. Use English descriptions, and you can weight terms like keyword:1.3 for emphasis. ComfyUI supports prompt attention syntax: e.g. (magic castle:1.2) increases weight, [text] or (text:0.5) lowers weight. Separate concepts with commas and be specific but concise. Include quality boosters if desired (e.g. “masterpiece, best quality” in positive prompt, and common unwanted artifacts in negative like “low quality, extra limbs”). For example, a prompt for an anime-style image might be: 1girl, flowing silver hair, intricate armor, (sunset:1.3), cinematic lighting, masterpiece, best quality with negative prompt lowres, bad hands, text, watermark. Experiment with phrasing – the model was trained on lots of image captions, so sometimes simpler descriptive language works best. Keep in mind the CFG scale: too low and the image may ignore your prompt; too high and it may overfit to prompt words (causing artifacts). Often a CFG of ~7–8 is a good starting point.

Example: To illustrate text-to-image, here is a Stable Diffusion generated image for the prompt “Cleopatra as a spaceship commander, comic book style, detailed illustration, starry nebula background” (negative: “blurry, low quality”). This showcases how creative and detailed outputs can be when you craft a strong prompt:

Stable Diffusion output example – “Cleopatra as a spaceship commander.” The model has imagined the historical figure in a sci-fi comic style (note the golden Egyptian headpiece combined with a cosmic backdrop). This image was generated from a text prompt alone. ComfyUI’s node workflow makes it easy to adjust parameters (like using a different sampler or increasing steps) to refine such results.

Each run can produce variations – if you want to keep a specific result, note the seed value from the KSampler (ComfyUI shows it in the node UI), so you can reuse it to replicate that image exactly. Conversely, to get variety, use a random or different seed for each generation. You can also create batch outputs by increasing the KSampler’s batch count or using the “Queue” multiple times; ComfyUI even has a “Queue (instant)” mode to continuously generate until stopped(useful when exploring prompts, but be mindful of VRAM/compute limits).

Image-to-Image generation lets you feed an initial picture and have Stable Diffusion transform it into a new image based on your prompt. In ComfyUI, the img2img workflow is very similar to text2img, but with an added step to input the source image and an important parameter to control how much the output sticks to the original. Common use cases include style transfer (e.g. turn a photo into a painting), variations of an AI-generated image, upscaling with refinements, or colorizing/sketching.

To set up img2img in ComfyUI, you will include a Load Image node (or an image input node) and feed its output into the model’s latent space. Here’s how to build it:

1. Original Image Input: Add a Load Image node and upload your source image (this can be any image file – e.g. a photograph or a drawing). This node will output the image in pixel form. Typically, you then connect it to an Image Encode node (like VAE Encode or a specialized encoder) to convert it into latent space. In ComfyUI’s official img2img example, they use the model’s VAE to encode the input image into a latentmedium.com. Alternatively, ComfyUI offers a convenient “Load Image” node that directly outputs a latent if the image has been encoded in a workflow file – but generally, encoding via VAE is needed to get a latent tensor for the diffusion model.

2. Replace Empty Latent with Encoded Latent: In the text2img workflow, we used an Empty Latent Image filled with random noise. For img2img, we want to start from the latent of our input image instead. So, connect the encoded latent from the input image into the KSampler’s latent_image input (instead of an empty latent). This conditions the diffusion process on the starting image.

3. Set Denoise Strength: The denoise parameter in KSampler becomes critical for img2img. It controls how much noise to add to the input image’s latent before regenerating. In ComfyUI, ensure the KSampler’s denoise value is less than 1.0 for img2img. A lower denoise (e.g. 0.3–0.5) means the output will stay very close to the original (only minor changes), while a higher value (e.g. 0.8) means the output will deviate more and incorporate more of the new prompt. If you set denoise = 1.0, it essentially ignores the input image entirely (full diffusion from random noise). Finding the right balance is key: for subtle edits or style conversion, use low denoise; for major alterations while keeping basic structure, use mid-high denoise.

4. Prompt and Other Settings: You still use CLIP Text Encode nodes for prompts in img2img. The prompt will guide how the image is transformed. For example, if you have a rough sketch and want a realistic painting, your prompt might describe the desired style and content. The seed, steps, CFG work similarly as before. Often, you might reduce steps a bit for img2img since the model has an initial image to work from (but it’s not a strict rule – more steps can still yield finer detail).

5. Run and Adjust: Execute the workflow. The KSampler will inject noise into the input image latent and then refine it according to the prompt over the given steps. If the result is too close to the original image, try a higher denoise; if it drifted too far or got chaotic, lower the denoise. The ComfyUI tutorial notes that “the smaller the denoise value, the smaller the difference between the generated image and the reference image; the larger the denoise, the larger the difference”. This is a core principle of img2img.

For example, suppose you have a plain 3D render of a scene and you want to add artistic detail. You can feed it in and prompt for a “dramatic painting” style with denoise ~0.5, and the output will paint over the image, preserving composition but adding painterly details. Or you could take a Stable Diffusion result you liked but wish it were in a different color scheme – you can prompt for the new style with a moderate denoise and get a variant.

Use Case Example: Let’s say we take a simple child-like drawing and use img2img to turn it into a polished artwork. We start with a crude doodle (stick figures or basic shapes) and provide a prompt like “a detailed fantasy landscape oil painting, rich colors, sunset”. With the doodle as input and denoise around 0.7, Stable Diffusion will reinterpret the basic forms of the doodle into the detailed landscape, giving a result that follows the prompt yet loosely matches the composition of the input. By adjusting denoise or the prompt, you could make the output closer or more different. The ComfyUI community often uses this to go from a hand-drawn sketch to a beautiful image, or to iterate on AI outputs by feeding them back in (sometimes repeatedly for successive improvements).

Use Case Example: Let’s say we take a simple child-like drawing and use img2img to turn it into a polished artwork. We start with a crude doodle (stick figures or basic shapes) and provide a prompt like “a detailed fantasy landscape oil painting, rich colors, sunset”. With the doodle as input and denoise around 0.7, Stable Diffusion will reinterpret the basic forms of the doodle into the detailed landscape, giving a result that follows the prompt yet loosely matches the composition of the input. By adjusting denoise or the prompt, you could make the output closer or more different. The ComfyUI community often uses this to go from a hand-drawn sketch to a beautiful image, or to iterate on AI outputs by feeding them back in (sometimes repeatedly for successive improvements).

Inpainting is a special case of img2img where only a part of the image is changed while the rest remains intact. In ComfyUI, inpainting can be achieved by using a Mask. The workflow involves loading the original image, creating a mask (e.g. via the Mask Editor in the Load Image node, or by using a masked image), encoding the masked image to latent, and then diffusing only the masked region with a model (preferably using an inpainting-specific model checkpoint). ComfyUI has a MaskEditor built into the Load Image node UI which lets you paint a mask over the areas you want to regenerate. After masking and hitting “Save to node,” the masked image and mask are sent to an Inpainting VAE Encode node, producing a latent with an understanding of which region to fillmedium.com. The KSampler then takes that latent and your prompt to generate new content only in the masked area (by essentially treating masked parts as noise to replace). One important tip: when using the default KSampler for inpainting, ComfyUI’s guide suggests keeping denoise = 1.0 for the masked region to fully regenerate itmedium.commedium.com (unlike general img2img where lower denoise preserves image, in inpainting a lower denoise might just leave the mask area empty). Also, using a fixed seed can help reproducibly tweak the inpainted result until it’s just rightmedium.com. After sampling, the VAE Decode and Save Image nodes will give you the final image with the hole filled in.

Outpainting (extending an image beyond its original borders) can be done by a similar approach: you provide the original image in a larger canvas (with empty regions), use a mask to indicate the new areas, and prompt the model to generate content there. This often requires tiling or careful prompt phrasing to match the style. In ComfyUI, you might manually set up an outpaint by combining an image with blank areas (or use nodes to concatenate images/canvas) and then inpainting the blanks. There are community workflows that simplify outpainting as well.

Overall, inpainting in ComfyUI is a bit more involved than one-click solutions in other UIs, but it is very powerful. You have control over the mask, can use multiple conditioned inputs, or even combine ControlNet with inpainting (e.g. to ensure the regenerated part follows a certain pose or sketch). An advanced workflow might encode the original image to latent for context, and run a ControlNet on the original to extract edges or pose, feeding that to KSampler along with the mask – so the model fills the gap in a way consistent with the original. This kind of flexibility is where ComfyUI shines for power users.

One of ComfyUI’s strongest advantages is how easily you can integrate ControlNet into your workflows. ControlNet is an extension technique for diffusion models that allows you to condition generation on auxiliary inputs like sketches, poses, depth maps, outlines, In simpler terms, ControlNet lets you say: “Generate an image following this rough shape/pose/structure and matching my text prompt.” This greatly improves controllability – rather than random attempts, you can guide Stable Diffusion with a reference.

In ComfyUI, ControlNet is implemented via a couple of custom nodes: typically a Load ControlNet Model node (to load the controlnet weights for a certain type, e.g. openpose, canny, depth, etc.) and an Apply ControlNet node that links into the main sampler. The general workflow is:

· Prepare a Reference Image or Map: This could be a photo with a pose you like, a scribble sketch of a layout, a depth map, etc. Each ControlNet model expects a specific kind of input (e.g. the Canny model expects an edge-detected image, the OpenPose model expects a human pose skeleton). If your input isn’t already in the needed form, you’ll use a preprocessor node to convert it (ComfyUI Advanced ControlNet custom nodes provide many preprocessors like Canny edge detector, OpenPose estimator, etc.)

· Load ControlNet Model: Place a Load ControlNet node and select the appropriate ControlNet model file (ensure you’ve downloaded the ControlNet weights for SD; for example, control_v11p_sd15_openpose.safetensors for pose control on SD1.5comfyui-wiki.com and put it in models/controlnet/). You might organize models in subfolders per model version (e.g. controlnet/sd1.5/ vs controlnet/sdxl/ since ControlNets aren’t one-size-fits-all across model versions.)

· Apply ControlNet Node: ComfyUI’s standard ControlNet workflow uses an “Apply ControlNet” (or similar) node that takes in: (1) the ControlNet model (from the loader), (2) the conditioning image (from your preprocessor or directly from a Load Image if it’s already in the right form), and (3) it usually also connects with the main model’s conditioning. In practice, the Apply ControlNet node sits between your text encodings and the KSampler. For example, you connect your positive/negative conditioning into the ControlNet node, and then its output conditioning goes into the KSampler. This effectively merges the text prompt guidance with the control guidance for the ControlNet node also often has a strength parameter (how strongly to enforce the control) and can output an annotated image for debugging.

· Sampler Incorporates Control: When you run the generation, the KSampler will now take into account the additional ControlNet condition. For instance, if you used an OpenPose ControlNet with a stick-figure pose input, the diffusion process will be constrained to generate a figure in that exact pose. If you used a depth map, the composition will adhere to that depth layering. Essentially, ControlNet acts as a “translator” of your reference image into a form the diffusion model can use as extra conditioning signals – like telling the model “follow these lines or shapes when drawing.” The result is often uncannily good: you get the content/style from your text prompt while also preserving the structure from the reference. For example, you can take a rough stick-figure drawing of a person and generate a fully detailed character matching that pose.

Example of ControlNet in action (OpenPose model). Left: an input image of a woman. Center: the pose skeleton extracted by ControlNet’s OpenPose preprocessor. Right: the output image generated by Stable Diffusion via ComfyUI, using the pose as a condition. The output woman’s pose matches the input exactlycomfyui-wiki.com, but the appearance (clothing, style) was transformed based on the text prompt (here she appears as an “ice queen” character). This showcases how ControlNet can maintain composition (e.g. pose) while allowing creative changes.

To use this yourself, for instance, you would: load the OpenPose ControlNet model, feed in your own photo (or stick figure drawing) to an OpenPose detector node, connect that to ControlNet and then to your sampler along with a prompt like “regal ice queen, flowing gown, intricate details…” The generated image will have the queen in the exact pose from your input image. This level of control is what makes ControlNet so powerful.

Popular Control Types: ControlNet models come in many varieties, each targeting a specific type of guidance. Some common ones include Canny edges (for general outlines), Depth maps (for preserving 3D structure)comfyui-wiki.com, OpenPose (human body/hand poses), Lineart or Scribble (for sketches), Segmentation maps (for controlling layouts by labeled regions), Shuffle/Tile (for rearranging or enhancing compositions), and more. In ComfyUI, you can actually chain multiple ControlNets at once – e.g. enforce both a pose and a background outline simultaneously (you’d use multiple Load+Apply ControlNet nodes, and combine their outputs to the sampler). Users report that using two or more ControlNets can further refine results, though it increases processing load. ComfyUI’s flexibility readily allows multi-ControlNet setups.

Workflow Note: Make sure the ControlNet model you load matches your base model (SD1.5 ControlNets for 1.5 models, SDXL ControlNets for SDXL, etc.) Also, some ControlNet custom nodes (like the preprocessor ones) might need to be installed separately as ComfyUI extensions. The community wiki provides installation guides for docs.comfy.org. Once set up, using ControlNet in ComfyUI is typically a matter of a few extra nodes, as described.

Advanced Tips & Customizations

By now, you can see that ComfyUI essentially gives you a lego toolbox for AI image generation. You can create very advanced pipelines by inserting additional nodes:

· LoRA and Embeddings: ComfyUI supports LoRA (Low-Rank Adaptation) models and textual embeddings for customizing style or characters. There are Load LoRA nodes which you connect to your model (and control their strength) to apply a LoRA on the fly. For example, to apply a LoRA for “anime style,” you’d load it and connect its output into the Checkpoint or into a dedicated LoRA strength node feeding the UNet. Textual inversion embeddings (learned .bin or .emb files for specific concepts) can be simply referenced by name in your prompt if you load them in ComfyUI’s embedding folder; or you might have a node to load an embedding. This way, you can achieve the same custom concepts you use in other UIs.

· Hi-Res Fix and Upscaling: You might have heard of “highres fix” in Automatic1111 – typically it means generating a smaller image then upscaling and refining details with a second pass. In ComfyUI, you can set up a similar two-stage workflow: First sampler generates a low-res image, then feed that output into a second KSampler with denoise < 1 and larger resolution (using an Upscale or Latent Resize node between) to add detail. In fact, the ComfyUI community often chains KSampler nodes for this. Alternatively, you can attach an ESRGAN/Latent upscaler node after your output and then decode to get a higher-res result. The modularity allows using different upscaling models or even doing tile-based super-resolution.

· Custom Nodes & Automation: ComfyUI is extensible. Many user-created custom nodes exist for tasks like applying filters, doing face restoration, integrating with other AI models, etc. You can install these to expand your node library. For example, there are nodes for Green Screen (isolating subjects), for integrating BLIP (image captioning) if you want an automated prompt from an image, for making GIFs from a series of images, etc. The ComfyUI Wiki and community forums showcase a lot of these additions, turning ComfyUI into a general multimodal AI playground (not just Stable Diffusion).

· Batch Workflows and Condition Mixing: The node system even allows things like multiple prompts or conditioning mixing. You could encode two different prompts and use a Conditioning Mix node to interpolate between them, feeding that into the model to achieve prompt morphing. Or run parallel diffusion branches and combine images. These are advanced uses, but they demonstrate that once you grasp the basics, you can experiment far beyond the standard capabilities of a simple UI.

· Troubleshooting: If a workflow isn’t working, ComfyUI provides error messages on nodes. Common issues include forgetting to load a model, mismatched shapes (e.g. trying to feed a latent of one size into a model expecting another – ensure your latent image resolution matches the model’s trained size, typically 512 for SD1.x), or out-of-memory if settings are too high. You can right-click a node and see its output, use Preview nodes to inspect intermediate images (like see the ControlNet processed image, etc.), which helps debug. The “Batch Count” on the UI can queue multiple runs easily, but if you accidentally leave it running in “Queue (Instant)” mode, remember to stop it to avoid endless generation.

Finally, don’t be afraid to consult resources and examples. ComfyUI has an official documentation site and a vibrant community. Many users share workflow files for specific effects (for example, complex ControlNet combos or style workflows) which you can load and study. Below are some useful links:

· Official ComfyUI Documentation: The official docs cover installation, basic and advanced tutorials, and node reference. Start with the [Getting Started guide and Tutorials on ComfyUI’s site】 It includes step-by-step examples for Text-to-Image, Image-to-Image, Inpainting, ControlNet, etc., as cited throughout this tutorial. 

· ComfyUI GitHub & Releases: Check out the ComfyUI GitHub  for the latest updates, issues, and community discussions. The README also has installation instructions and basic usage info.

· Community Wiki (ComfyUI Wiki): A community-driven knowledge base is available, providing in-depth articles – for example, a detailed ControlNet usage guide and advanced tutorials on new features. This Wiki often has more up-to-date tips and covers custom node usage. 

· RunComfy & Blogs: The RunComfy site has a number of user-friendly guides (e.g. “Mastering ComfyUI ControlNet”) and even a cloud-based ComfyUI you can try in your browser. There are also blog posts on Medium (like PromptingPixels on ComfyUI Inpainting) and others that walk through specific workflows.

· Community Forums and Discord: The r/ComfyUI subreddit is a great place to see shared workflows, ask questions, and learn tricks from other users. There’s also an official ComfyUI Discord where you can get real-time help, share results, and stay updated on new node developments.

With this tutorial and the above resources, you should be well-equipped to explore ComfyUI and unlock the full potential of Stable Diffusion. Happy image generating!