Artificial intelligence-based image generation software has seen dramatic improvements, with tools such as MidJourney capable of creating photorealistic images from simple prompts.
There is, however, another key player that could blow everything else out of the water—and the water would be perfectly rendered with accurate reflections and dynamics.
Stable Diffusion XL 1.0 can create textured 3D objects from prompts and the visual effect is quite stunning. It takes around 10–20 seconds for a 1024x1024 image, depending on your system spec and the software already has over 10 million users.
In order to know what makes Stable Diffusion XL look so good, it is necessary to understand where other products and visual effects have gone wrong, especially if you want to avoid the same problem yourself in your own work. As experts in IT management for London's marketing companies, we decided to have a look...
What is Uncanny Valley?
The uncanny valley hypothesis shows that a being that closely resembles a human can elicit a feeling of revulsion or fear.
If it looks nothing like a human then everything is fine and there is no effect or negative feeling. For example, your dishwasher should not induce feelings of panic or fear.
On the opposite end of the spectrum are things that are so lifelike and similar to the human form that they are also free from the uncanny valley effect, for example, a photograph of another human being.
The uncanny valley is that middle ground where the robot or entity looks similar to a human but with something not quite right that gives an overall disturbing effect. The attention is immediately drawn to all the things that are wrong about the subject that make it inhuman, despite the amount of work that has gone into the other features that may be more or less anatomically correct.
What Has This Got To Do With Stable Diffusion?
The reason this is relevant is because with Stable Diffusion you don't get the uncanny valley feeling as with other tools and visual effects.
The application has such realism that it goes beyond the uncanny valley effect and out the other side into photorealistic textured human features.
It's only really when you have a near miss that the uncanny valley effect is experienced- if it looks almost human but not quite due to some subtle imperfection, the brain will quickly pick up on the small inaccuracy that makes it inhuman and invoke a feeling of repulsion or creepiness.
This means that artificially generated characters and robots can have the unintended effect of being eerie-looking to real humans when they were supposed to be cute, friendly-looking or relatable.
With Stable Diffusion, the effect is so life-like it goes beyond the parameters of the uncanny valley and this subconscious effect is not experienced.
What Normally Gives it Away?
When working in post-production or the filmmaking industry it is important to understand what gives the game away to your subconscious brain and lets you know that the human likeness is not a human, instilling a deep-seated instinctive fear or sense of revulsion.
For example, if you are working on the latest Pixar film you don’t want your cute and cuddly character to be the stuff of nightmares for the children that will watch the film.
Certain aspects of the human form are intrinsically hard to replicate faithfully and these are often the features that make the character or human likeness stand out as being “wrong” in some way. You focus on the small imperfections and how these are not right.
The uncanny valley feeling is normally brought on in features such as faces and hands and also in the movement of characters. If there’s something off about the way they walk this will stand out a mile and ruin the rest of the effect and work up to that point.
This is where SDXL 1.0 is particularly proficient, in illustrating human hands, faces and features with true photorealistic quality.
The word photorealistic is bandied around quite frequently but with SDXL 1.0 you literally cannot tell whether it’s a photograph you are looking at or not.
With a range of options and features available, it is possible to create stunning artistic effects with lighting and styles applied to the image.
One of the example images shows a dog in centre focus running on a beach with a woman laughing and running behind and slightly out of focus, with the dog frozen in mid-flight as if the shot has been captured with the dog hurtling towards the camera operator, with the sun setting in the background.
If it wasn’t part of a video detailing SDXL effects you would assume it had been lifted from any social media feed from real people with a real dog.
SDXL 1.0 Effects and Features
Stable Diffusion XL boasts a range of art styles that can be applied to the generated images. These include - Anime, Photographic, Digital Art, Comic book, Fantasy art, Analog film, Neon punk, Isometric, Low poly, Origami, Line Art, Craft clay, Cinematic, 3D model, and Pixel Art.
However, this list is by no means exhaustive with some 106 different styles to choose from.
The latest iteration of the product operates on a simpler command prompt than previous versions meaning that you can just go straight into operating it without any real training as such.
With features such as readable text being able to be incorporated into the images and spatially arranged objects and depth perception, combined with the ability to render difficult concepts, fictional settings, hands and realistic body parts, this means that Stable Diffusion XL 1.0 clearly caters to those in artistic or creative based industries.
The possibilities are endless and you can literally type whatever you want into the system and it will generate a photorealistic version of anything you can imagine, whether it exists in real life or not.
Even if the prompt provided does not work for some reason and there is some kind of error, the results are like some amazing abstract digital artwork.
Stable Diffusion can be accessed remotely through sites such as HuggingFace and through their own main StableDiffusionXL website.
From here you can test the product out and enter prompts to generate the images without needing to download anything. However, you may need to wait in queues and the processing delay would not be suitable for commercial applications.
There is however another method to get full instant access to the product as detailed later but the demonstration page on the main site or providers such as HuggingFace are a great way to demo the software and see how it works.
To give an example of how simple it is to operate—without needing to log in or create an account—you can enter the Stable Diffusion XL home page, scroll to the bottom and enter a command prompt into the field.
In this example the command prompt was simply “Future city” and the software generated this fantastic result.
Prompt: Future City
With only a slightly more complicated prompt, quite astounding effects can be achieved. For the next image, the prompt had a small alteration to affect the lighting by using the words “Future city at sunset”.
Anyone can produce these results and this was entered into without any prior training or knowledge whatsoever. As you can see the effects are quite visually stunning and of course, with a further understanding of the product and the correct prompts to use, then some amazing work could be created.
Prompt: Future city at sunset
SDXL1.0 vs Midjourney
One of the key questions when it comes to AI image generation for creative industries—is it better than Midjourney?
Similar to SDXL, Midjourney is capable of producing some stunning visual effects and amazing-looking, artistic images that look as though it has come from a professional photographer.
When comparing images side by side using the same prompts, it is clear that the interpretation of the prompts differs from one program to the next but there is no clear winner in terms of the image quality and realism.
For example, if you look at 10 or 20 sets of images where the same command prompt has been used on Midjurney and then on SDXL to get a comparison, you would most likely find that some of the images you preferred on the Midjourney system, whereas others looked better with SDXL—there’s no obvious choice here.
However, with SDXL 1.0 it seems there are more options in terms of lighting effects, filters and other visual components and the system understands more about what you want to achieve without needing to be trained or proficient in creating the prompts in a certain way.
It should be noted that the command prompt system works slightly differently on Midjourney and Stable Diffusion, so it is not an entirely fair test. Midjourney 5.2 is widely recognised as one of the best if you ask industry experts and offers truly stunning visuals and image quality.
Stable Diffusion offers slightly more depth to the images due to the fact there are accurate and convincing shadowing and multi-layer compositions with objects being in and out of focus as if captured on a real camera.
One of the key advantages of Stable Diffusion XL is that it is available for commercial use and open source, meaning that you can start your AI-enhanced creative industry using SDXL today and the code can be modified and adapted as needed.
System Requirements of SDXL 1.0
As AI image generation software becomes more advanced, so are the system requirements needed to process the amount of data needed.
To run Stable Diffusion XL adequately you would need at least 32 GB RAM with a decent graphics card with 12 GB of VRAM such as the recommended MSI Gaming GeForce RTX 3060 12GB. With 12 GB of VRAM, it will take approximately 20 seconds to produce a 1024x1024 image.
Ideally, you want 24 GB of VRAM and a graphics card such as ASUS TUF GeForce RTX 4090 24GB. This will give seamless operation and images produced within seconds, important for commercial applications with extensive use of the software.
I Don’t Have £2k Lying Around for the Graphics Card Alone—Can I Still Run Stable Diffusion XL With my Dinky Computer?
It is not recommended to run Stable Diffusion XL without a GPU—however, it is possible to use Stable Diffusion for your own AI artwork and image generation without needing any of the hardware detailed in the system specification.
At Lyon, we provide reliable cloud solutions featuring hardware on demand such as virtual graphics processing platforms or vGPUs. These services are catered specifically to creative industries—digital artists, post-production houses and filmmakers—and allow them to enhance their creativity through the latest AI-enhanced image generation software without requiring any of the expensive infrastructure or equipment needed to run the programs.
Through a simple interface with all your work in one place and stored securely with unlimited storage, you can access high-end machines remotely and leverage their processing power through the cloud.
This means you can run Stable Diffusion XL, Midjourney, or any other processor-hungry AI applications from your own devices, without needing to upgrade and performance can be scaled up or down as needed.
To find out how to acquire AI image generation tools such as SDXL 1.0 for your business, without needing the hardware to run it, contact our advisors at Lyon today and we can take you through the process and explain which setup would be most suitable for your creative industry.