Fine-Tuning an Isometric Style Dreambooth Model

Written by

Hackmans

Date published

February 24, 2023

Using Dreambooth fine tuning to create isometric views

Tools and Techniques Used

Stable Diffusion v1.5
Dreambooth (JoePenna Repo)
Automatic1111 WebUI
Google Earth
Google Earth Focal Lengths

Early on I was researching methods for doing custom isometric views of cityscapes using Stable Diffusion. One idea was to fine-tune a custom model based on aerial photography though it was tricky to find decent source images. I stumbled on a tutorial for capturing perfect isometric views from Google Earth using some free fixed focal length camera presets.

Landscape Axon Tutorial using Google Earth 🌎 & Photoshop [Without plug-ins]

This particular axon is another great way of illustrating the relationship between the landscape, terrain, and your project, and the context in a semi-isolated view compared to a full aerial render. 📤 Download Google Earth Pro: https://www.google.com/earth/download/gep/agree.html?hl=en-GB 📤 Download Google Earth Focal Lengths: https://gumroad.com/l/google-earth-focal-lengths Thanks again for watching! Subscribe for free tools, tips, and tricks to accelerate your design process! Work less design more with Archi Hacks. 🚀 Follow us on social media for the latest updates! 📱 Instagram: https://www.instagram.com/archi.hacks 🙌 Facebook: https://fb.me/architecture.hacks 🔗 Website: https://archihacks.com 🔖 Use coupon code ARCHIHACKS10 to get 10% off on your first order on Fiverr: https://fvrr.co/3Hw6f8d 📢 Get a free month's trial with Audible when you use the link: https://www.audibletrial.com/archihacks

www.youtube.com

Landscape Axon Tutorial using Google Earth 🌎 & Photoshop [Without plug-ins]

Google Earth Focal Lengths

These .kmz tour files allow you to customize the focal length to be used on Google Earth. Simply download and open them in Google Earth desktop application. Have fun!Google Earth Pro You can downlo...

archihacks.gumroad.com

I used the technique and lens presets mentioned in the video to capture some shots of dense Asian cities with a variety of urban architectural features that represented aspects of modern cityscapes that I wanted to be reproducible. The images were cropped to 512x512 and curated into a dataset of 8 images.

The dataset was used in JoePenna’s Dreambooth repo running locally on a RTX 3090Ti since it requires 24GB VRAM to train. I chose V1.5 Stable Diffusion as the base model and trained the dataset as a style, following my usual approach of using artstyle class and 1500 artstyle regularisation images, since I wanted the aesthetic composition to be learned and not necessarily the subject content. Training was ran for 3000 steps with the default learning rate 1e-06.

GitHub - JoePenna/Dreambooth-Stable-Diffusion: Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) by way of Textual Inversion (https://arxiv.org/abs/2208.01618) for Stable Diffusion (https://arxiv.org/abs/2112.10752). Tweaks focused on training faces, objects, and styles.

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) by way of Textual Inversion (https://arxiv.org/abs/2208.01618) for Stable Diffusion (https://arxiv.org/abs/2112.10752). Tweaks focuse...

github.com

GitHub - JoePenna/Dreambooth-Stable-Diffusion: Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) by way of Textual Inversion (https://arxiv.org/abs/2208.01618) for Stable Diffusion (https://arxiv.org/abs/2112.10752). Tweaks focused on training faces, objects, and styles.

runwayml/stable-diffusion-v1-5 · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

runwayml/stable-diffusion-v1-5 · Hugging Face

The first outputs generated from the model using the prompt:

A photo of <token artstyle>

already proved the concept, showing a variety of random cityscape locations with the isometric view featuring some of the architectural features but without repeating anything from the dataset.

The quality of the images aren’t great and it has a problem generating clean lines particularly for the roads and road markings (most likely due to the source images) but the concept is well captured. It was also surprisingly good at producing novel locations not featured in the dataset like prompting for medieval castles:

Medieval castle in <token artstyle>

The style was also flexible enough (with some prompt weighting) to generate entirely novel scene content whilst still retaining the appearance of aerial isometric images. Here are some examples generating images of Mars base locations and Steampunk cityscapes.

((Space colony buildings on mars)) in <token artstyle> ((Steampunk city concept art)) in <token artstyle>

One unwanted element that it had learned from the dataset was a slightly muted color palette present in all generated outputs. Though this can be fixed in post I would probably edit the source images for another training attempt to fix this.

I was struggling to get painterly style outputs using the model until I started playing with the Prompt Editing features of Automatic1111 WebUI. By starting the generations in the trained style for the first few steps and switching to oil painting or similar art modifiers it provided a much wider variety of style outputs which still retained the aerial isometric look.

GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI

Stable Diffusion web UI. Contribute to AUTOMATIC1111/stable-diffusion-webui development by creating an account on GitHub.

github.com

GitHub - AUTOMATIC1111/stable-diffusion-webui: Stable Diffusion web UI

futuristic city street, metropolis, dystopian, in [<token artstyle>:vibrant oil painting:0.25]

Process Review

I am quite happy with the results which matched the training goal outcome and it showcases the power of fine tuning custom models. I definitely feel like there are areas of improvement for future iterations:

Higher quality dataset images
Pre-processing images to increase contrast and saturation
More variety of dataset examples (other than cities)