Creating Isometric RPG Game Backgrounds

Written by

Hackmans

Date published

February 24, 2023

Using Stable Diffusion techniques to create 2D game environments

Tools and Techniques Used

Stable Diffusion v1.5
Automatic1111 WebUI
Alpaca Photoshop Plugin
Boosting Monocular Depth
Substance Designer
Unity URP
Amplify Shader Editor

Whilst exploring ways of generating isometric backgrounds, I came across a post by Ivan Garcia Filho showing game assets created with a Stable Diffusion based platform using a detailed prompt and high step count.

An intricate modular top-down isometric concept art with PBR materials of a victorian gothic ornated steampunk lamp, in ominous hellish industrial mood and a neat and clean composition with sharp precisely stabilized straight lines, colorful tone mapped cinematic volumetric lighting and global illumination producing shinning edge reflections and detailed ambient occlusion with smooth cold shadows and hot highlights increasing depth and perspective

AI Generated - Isometric game Assets, Ivan Garcia Filho

Another experiment trying to generate isometric game assets for indie games! Using PixelmindAI powered by Stable Diffusion! The Prompt: "An intricate modular top-down isometric concept art with PBR materials of a victorian gothic ornated steampunk lamp, in ominous hellish industrial mood and a neat and clean composition with sharp precisely stabilized straight lines, colorful tone mapped cinematic volumetric lighting and global illumination producing shinning edge reflections and detailed ambient occlusion with smooth cold shadows and hot highlights increasing depth and perspective" Parameters: scale: 16, steps: 250, Sampler : kdmp2ancestral

www.artstation.com

AI Generated - Isometric game Assets, Ivan Garcia Filho

runwayml/stable-diffusion-v1-5 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

runwayml/stable-diffusion-v1-5 · Hugging Face

I started testing some prompts using the same structure but changing the content and the style modifiers to see what sort of futuristic / cyberpunk elements it could generate and get a feel for how the prompt was working.

Early results weren’t great but that was due to using a lower step count, which I didn’t think would be necessary but makes a huge difference here. Bumping up the steps into the 100s and a higher CFG of 15-30 snapped it into more interesting results.

I really liked the style of the open style building section, so I carried on iterating through ranges of steps and CFG with X/Y Plot in Automatic1111 WebUI using the same prompt.

GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI

Stable Diffusion web UI. Contribute to AUTOMATIC1111/stable-diffusion-webui development by creating an account on GitHub.

github.com

GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI

An intricate modular top-down isometric concept art with PBR materials of a cyberpunk building, in ominous hellish industrial mood and a neat and clean composition with sharp precisely stabilized straight lines, colorful tone mapped cinematic volumetric lighting and global illumination producing shinning edge reflections and detailed ambient occlusion with smooth cold shadows and hot highlights increasing depth and perspective

The almost infinite nature of Stable Diffusion generations can make it difficult to settle on a particular output. Early on I used to experience a lot of FOMO, feeling like I had missed the perfect seed or setting, but doing X/Y plots and being brutal about curation has helped me get to a desired result quicker over time. I chose CFG 16 and iterated through steps and hit a great result at 100.

I mainly work in Photoshop for editing and cleaning up generated images and have been beta testing the Alpaca Stable Diffusion plugin, which allows me to to continue working in a familiar environment whilst accessing features like inpainting and img2img.

Alpaca - Humans 🤝 AI Models for Image generation

www.getalpaca.io

I placed the building image onto a larger canvas and used outpainting to extend the rest of the building and some more of the walkway using the same prompt.

I cleaned up the background and ran the final image through img2img at double resolution to get more details.

One of my ideas for using the backgrounds in Unity game engine, was to remove the lighting from the image and then add it back in using custom shaders. I achieved this by painting out the strong colors in Photoshop, using a new layer set to color blending mode and neutral grey colors sampled from the original image.

In order to get my own lighting I needed to create a normal map for the 2D scene. Rather than hand painting it (which is an option) I tried automating the process using MiDaS and LeRes in Boosting Monocular Depth to generate depth maps of the image.

GitHub - compphoto/BoostingMonocularDepth

Contribute to compphoto/BoostingMonocularDepth development by creating an account on GitHub.

github.com

I brought the MiDaS depth map and (inverted) LeRes depth map images into Substance Designer to use the Height to Normal World Units node to generate normal maps and then combined them both using Normal Blend node since individually they roughly represent large and small detail information.

3D design software - Adobe Substance 3D Designer

Create parametric 3D assets, models, materials, patterns, & lighting with complete control using industry-standard Adobe Substance 3D Designer software.

www.adobe.com

3D design software - Adobe Substance 3D Designer

The resulting normal map is far from perfect but sufficient enough for my testing. I masked out the background in Photoshop and filled it with the base normal vector value RGB(128, 128, 255) or #8080FF

Here is how the scene looks in Unity using the delit image as the base color and the normal map for the background material on a 3D plane, and two colored point lights in 3D space. The lighting wraps around surfaces in an almost convincing way in some places and makes a crude illusion of scene lighting.

Download

Download Unity now and get started with the world’s most popular development platform for creating 2D and 3D multiplatform experiences and games.

unity.com

I tried a few different techniques using the depth and normal maps in a custom Unity URP shader I created in Amplify Shader Editor. I used the depth map to try Parallax Occlusion Mapping to add some subtle fake 3D perspective to the camera movement but it looked pretty bad since the depth is incorrect for the isometric view.

Amplify Shader Editor

Get the Amplify Shader Editor package from Amplify Creations and speed up your game development process. Find this & other Visual Scripting options on the Unity Asset Store.

assetstore.unity.com

I created an implementation of Normal Mapping Shadows which enabled me to have shadows cast by the 2D scene from the sun directional light. Though the effect added some interesting visual qualities to the scene it is not great for representing actual lighting, but could potentially be used as a custom lighting pass to shade 3D characters.

For the main alleyway environment featured in the video at the top of the page, I followed the same methods of iterating through XY plots of an alleyway prompt until I had a starting point I was happy with.

I mirrored the image and placed it into a 2048x2048 canvas in Photoshop then used the Alpaca plugin to outpaint the rest of the environment using the same prompt but slightly altering the wording from alleyway to building or street etc to guide the content that was being generated. For each new section I picked my favourite option from 5 possible generations.

Upscaling is often a tricky process which I like to handle in stages. For this 2K image I divided into quarters and ran them through img2img at double resolution with the same initial prompt. Then due to inconsistencies between quarters, I repeated the same process for the overlapping seam areas and the center and composited them together in Photoshop using masks to blend each area and produce a seamless final 4K image.

Process Review

I am quite happy with the look of the backgrounds, though I would definitely like more control over the scene content which could be enabled with future techniques. Some of the testing didn’t result in usable content and there are areas for improvement:

Explore new methods of guiding the scene content
Find ways to increase style consistency of different types of locations
Develop better shader techniques for creating pseudo 3D effects from 2D backgrounds
Train a custom model for generating Normal Maps from backgrounds