Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've been playing with Stable Diffusion a lot the past few days on a Dell R620 CPU (24 cores, 96 GB of RAM). With a little fiddling (not knowing any python or anything about machine learning) I was able to get img2img.py working by simply comparing that script to the txt2img.py CPU patch. Was only a few lines of tweaking. img2img takes ~2 minutes to generate an image with 1 sample and 50 iterations, txt2img takes about 10 minutes for 1 sample and 50 generations.

The real bummer is that I can only get ddim and plms to run using a CPU. All of the other diffusions crash and burn. ddim and plms don't seem to do a great job of converging for hyper-realistic scenes involving humans. I've seen other algorithms "shape up" after 10 or so iterations from explorations people do online - where increasing the step count just gives you a higher fidelity and/or more realistic image. With ddim/plms on a CPU, every step seems to give me a wildly different image. You wouldn't know that steps 10 and steps 15 came from the same seed/sample they change so much.

I'm not sure if this is just because I'm running it on a CPU or if ddim and plms are just inferior to the other diffusion models - but I've mostly given up on generating anything worthwhile until I can get my hands on an nvida GPU and experiment more with faster turn arounds.



> You wouldn't know that steps 10 and steps 15 came from the same seed/sample they change so much.

I don't think this is CPU specific, this happens at these very low number of samples, even on the GPU. Most guides recommend starting with 45 steps as a useful minimum for quickly trialing prompt and setting changes, and then increasing that number once you've found values you like for your prompt and other parameters.

I've also noticed another big change sometimes happens between 70-90 steps. It's not all the time and it doesn't drastically change your image, but orientations may get rotated, colors will change, the background may change completely.

> img2img takes ~2 minutes to generate an image with 1 sample and 50 iterations

If you check the console logs you'll notice img2img doesn't actually run the real number of steps. It's number of steps multiplied by the Denoising Strength factor. So with a denoising strength of 0.5 and 50 steps, you're actually running 25 steps.

Later edit: Oh and if you do end up liking an image from step 10 or whatever, but iterating further completely changes the image, one thing you can do is save your output at 10 steps, and use that as your base image for the img2img script to do further work.


https://github.com/Birch-san/stable-diffusion has altered txt2img to support img2img and added other samplers, see:

https://github.com/Birch-san/stable-diffusion/blob/birch-mps...

That branch (birch-mps-waifu) runs on M1 macs no problem.


With the 1.4 checkpoint, everything under 40 steps can't be used basically and you only get good fidelity with >75 steps. I usually use 100, that's a good middleground.


How do you change these steps in the given script? Is it the --ddim_steps parameter? Or --n_iter? Or ... ?


With --ddim_steps


I found I got quite decent results with 15-30 steps when generating children’s book illustrations (of course, no expectation for hyperrealism there)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: