Dodam

Dodam

AiTiles by Dodam

Introduction

Hello Falcon BMS community,

I’m very proud to present AiTiles, an AI upscaling of the default terrain tiles in Falcon BMS. These tiles have a 1024x1024 resolution, up from 512x512 of the original tiles. Tiles are 1x1km in BMS, so this means that they have a resolution of ~1m / px, up from the original ~2m / px, or 4 times as many pixels.

This is a WIP, and I will be fine-tuning the network in the coming weeks to see what I can tweak to improve the quality. This post will be updated as newer versions come out. The current revision is Rev. 3, released on 2021-05-25. Please provide feedback if you have any!

Description

The upscaled tiles were made by using a GAN (generative adversarial network), trained on 3.6 terabytes of 1ft resolution natural colour images publicly available from the USGS EROS (United States Geological Survey - Earth Resources Observation and Science Center), from the High Resolution Orthoimagery dataset.

In simple terms, I taught one neural network (the upscaler) to take 2m resolution aerial images and upscale it to 0.5m resolution (while maintaining the original image as much as possible), and another neural network (the detector) to figure out which images were the originals and which ones were the upscaled versions. Over time, they learned from each other, and the upscaler learned to make 0.5m resolution images from 2m ones that are indistinguishable from the photos that were taken at 0.5m resolution to begin with, filling in the details as necessary. I am only releasing the 1m version (which is downsampled from the 0.5m version) because the difference in quality seems marginal (currently, anyway). The architecture (SPUN-GAN) is my own creation that is very loosely based on ESR-GAN, that allows for more context sensitivity, feature localisation, and edge preservation.

Screenshots

Here’s a few closeups of how the new tiles look. The left image is the original BMS terrain tile, the centre is the original tile with bilinear upsampling, and the right image is the new AiTiles tile.

(Note: These were taken at 0.5m resolution rather than 1m resolution.)

Download Link
Click (2.33 GB compressed, 9.09 GB uncompressed)

Installation

Rename the folder ‘Falcon BMS\Data\TerrData\Korea\texture\texture_polak’ to whatever you want – this will be your backup
Extract the downloaded archive
Copy the folder ‘aitiles_rev3_texture_polak’ into ‘‘Falcon BMS\Data\TerrData\Korea\texture’
Rename ‘aitiles_texture_polak’ to ‘texture_polak’

Q & A

Why AiTiles and not AITiles with a capital I?
The name is an homage to HiTiles / HiTilesAF - I think HiTiles was one of the first online purchases I made as a teenager.

How is this different from just bilinear upsampling?
In bilinear upsampling, all you’re doing is averaging 4 pixels with slightly different weights to get a higher resolution image, but the image doesn’t contain any more information. Neural nets, on the other hand, can generate more information – effectively, the network “sees” something that looks like a house or a road at low resolution, and fills in the details to make it look like the high resolution house or a road that it’s seen at high resolution.

Why do some images look “painted” or pixelated when close up?
Upscaling based on the kind of limited data like the BMS terrain textures is a really hard problem on several fronts: firstly, there’s very little data – the 2m resolution means that a typical American home occupies about 6x6 pixels, and a car is 2x1 pixels. Secondly, there is a huge variation in buildings, roads, and terrain, the details of which are lost at such a low resolution. In addition, there are things in the BMS terrain textures (low buildings with blue slate roofs or massive high-rise condominium complexes) that simply do not exist in the training data – the network has never seen these things, and doesn’t know what to do with them (as far as it knows, blue patches next to development is an outdoor swimming pool). Finally, not all BMS terrain textures look the same even amongst themselves (from what I can tell, some are actual photos, some are painted, some are painted on photos, and some are alpha blends of the other types – some even have block artifacts at 4m resolution, which I’m guessing were introduced when an artist made 512x512 textures based on 256x256 textures in a hurry). Together, these factors mean that the network has to make its best guess about what the high resolution data looks like from any number of things it could have been, and what it does is to locally guess something that looks reasonable. If you look at tiles that look particularly bad, you’ll find that they are often tiles where the artist took half of a photo of a city and blended it in Photoshop into a painted image of a forest – there’s no way that the network can learn to deal with that sort of artificial blending without additional handling (I might do that in a future version). Anyway, it might look a little weird when it’s very close up, but I think overall it still looks better than the original.

Will this affect my frame rates?
It will probably be marginal at best, but I cannot say for sure. I’m using 3 4K monitors (11520x2160) on an RTX 3090, and I haven’t noticed any difference in performance.

I don’t notice any difference.
At really low altitudes, the effects from anisotropy (views are slanted, textures are flat) will still make the textures look questionable no matter the resolution. At high altitudes, you won’t notice too much of a difference – on a standard 4K monitor (3840x2160) with the default 60 deg FoV, looking straight down from 18,000 ft AGL, you’re looking at ~5.7 km of terrain over 3840 pixels, which is about 1.5m / px resolution. It’ll be hard to tell the difference between 1m / px and 2m / px at that sort of scale; you’ll notice a difference below 6,000 ft AGL on a 4K display. If you’re playing on a 1080p display, you’ll probably need to be below 4,000 ft AGL.

Can you do this for the terrain textures in other theatres?
Gladly, but with a caveat: The upscaler was trained based on 2 terabytes of images of the U.S. east of the Appalachians; 1.5 terabytes of images of the Rockies and the Sierra Nevadas, as well as the deserts of Nevada, Utah, Arizona, and California; 173 gigabytes of images of metropolitan LA, Indianapolis, Chicago, Philadelphia, and St. Louis; 68 gigabytes of images of airports around the U.S. If your theatre looks different from all of these areas, the upscaler may not know what to do with your textures. Furthermore, the network hasn’t seen many really high mountains, because the images are aerial rather than satellite (which is why I was able to get them for free - if these had been satellite images, they’d have cost ~100 million dollars depending on the vendor). If you are a theatre developer or someone who can give me permission to do so, let me know in the comments or PM me and I’ll upscale your tiles for you.

Why not use something off the shelf?
I figured it’d be more fun to build this myself. Also, this is something very specific (aerial images) with its own goals (“visual quality metric” for photographs that other networks use is likely to be nearly completely useless for satellite imagery), and generic image enhancement nets didn’t seem to generalise super well. For example, using DeepAI’s Super Resolution API definitely helps the textures look clean, but it doesn’t know what a house or a road looks like from the sky, resulting in something that looks a little strange. I think other textures would benefit greatly from using off-the shelf solutions, though!

Is the code available somewhere?
It will be at some point, but I might want to write a paper on the network architecture so I’m keeping it private for now.

Licence

Whatever licence the BMS team has on their terrain textures applies, but other than that, use the textures however you want – USGS data has no restrictions, and this work is in public domain. Accreditation would be nice if you end up including my work in a theatre, but is unnecessary. I’d be incredibly happy if the BMS team ended up including these textures in the default installation.

Acknowledgements

Jonathan Zung, Princeton University Math Department, for being a great friend and helping me come up with a reasonable network architecture.
Zetta AI, whose open-source code policy let me adapt code I originally wrote for electron microscopy of brains to this project.
Malc, for answering my questions about the terrain textures in BMS.
Data available from the U.S. Geological Survey.

Dodam

@Arty:

Landuse are vector polygons and will reduce the total load if you run per landuse category and not all at once.
So it’s like flag which tiles are which type.

I understand what you’re saying, but that is something that cannot be done on a tile-by-tile basis – the tiles consist of composites of a few different ones sometimes, and what this means is that I need a pixel-by-pixel label, or what’s called a semantic segmentation. That’s a hard problem in its own right. Also, once segmented, those polygons add an extra issue of sharp masks. (I process tile-by-tile and not by a set of 16 because it wouldn’t fit, even on my 3090.)
To boot, this isn’t a classical algorithm where changing the parameters is enough – once the parameters have been changed, you have to let it train for a few days with the new parameters, meaning that n sets of parameters is actually n pairs of networks, which will each need a run through the entire dataset (because of the composite tiles).

@tad:

AiTiles has the ability to correct each tile separately, but cannot change its placement in the Terrain. For that, (new tile placement) it needs to know where to place the transition tiles and what transition tiles to use. This is outside the scope of this method.

What Arty’s saying has nothing to do with placing tiles – he’s just saying that I can run different nets for different parts of each tile.

Dodam

@Arty:

u need landuse… so to dictate to the algorithm the pattern of parameters to run.
Like
urban run A.
forest run B.
Mountain run C.
etc…

I’ve considered doing that, but that’s not a good approach for this problem for several reasons:

One net already takes forever to train, and also barely fits in memory.
This approach requires labels for the training data (“this patch is urban, forest, mountain, suburban, oceanic, etc.”), not just the input; I don’t have that data. It wouldn’t be the hardest thing in the world to get some land use data and collocate that with the images, but that’s a lot of data processing.
Determining what network to run requires training an extra network, and it’s likely better to cut out the middleman and let the network figure it out if you have the resources to train two different architectures.

Anyway, I posit that the reason why it looks worse on man-made objects isn’t because of the architecture, but rather because of the differences in the training data versus what’s in the terrain textures (most notably blue slate roofs, but also things like road widths, the size and colour of an average car, etc.).

Dodam

Heh, I’m still working on it, but the training is much slower (at least 4x) with the compression augmentation. I have something that looks significantly better on mountain terrains, but it looks a bit worse on man-made objects.

Also, I need newer hyperparametres (you can think of this as “how much do I weigh the different parts of the sausage making machine”) with the new augmentation, and often that’s the most time-consuming part of using neural nets. Basically, there are ~10 or so different parametres that I can tweak in this network, and with each tweak I need to come up with a story of “what went wrong here” before trying again with new parametres – each experiment takes a whole a day or so to stabilise, so it’s hard to do this quickly.

Dodam

Oh, I’m not saying that it was the wrong choice – it’s designed for this sort of thing, after all. Just that I have to deal with it as someone working with lossy tiles down the road.

Dodam

So what I’m working on now is dealing with the DXT compression artifacts – what I mistakenly thought was upsampling artifacts from a previous generation are actually just the side effects of the texture compression algorithm. I hadn’t understood the algorithm correctly. (This is the algorithm for people who might be interested in the details – you can see just how much detail is lost in each 4x4 block: https://www.fsdeveloper.com/wiki/index.php/DXT_compression_explained)

In BMS tiles, this is how the compression artifacts manifest – this is a closeup of the fields on a farm tile, and you can clearly see the 4x4 blocks where the shades are off. The neighbouring 4x4 blocks have completely different shading:

This is a key difference between the textures and the training data that the network sees, which significantly hinders quality – once the network gets really good at generating super-resolution scaling for real data, it becomes worse at dealing with BMS tiles, creating a bunch of block artifacts and noise. What would be the easiest is if I had access to the original tiles before the compression, but barring that, I am preprocessing the real terrain to add DXT compression artifacts.

The hope is that the network learns to see through the artifacts – I’m not sure how well it will do: at full resolution (though what I’m releasing is downsampled and compressed), the network is being asked to put out 128 times the data that comes into it. I think I should be able to get a significant increase in visual quality, but we’ll see what happens.

Dodam

@jayb:

Really, 9 gigs? That is 6 times the standard tile set, could explain why some people experience stuttering, the sheer amount of I/O

So the tiles themselves are 4 times as large (from 512x512 to 1024x1024, which is 4 times as many pixels), but they have more detail in them which makes them less compressible (EDIT: This part is actually incorrect – DXT5 compresses at a rate of 1:8 fixed) – if you look closely, the 512x512 tiles have heavy block artifacts from either DXT5 or upsampling somewhere in its life, which is something I’m trying to correct.

It might seem large, but as Arty noted 9GB is nothing for high resolution terrain – at 1m resolution, a square kilometre of uncompressed terrain is 4MB. This means that Rhode Island (the smallest U.S. state, about 60km x 60km – the distance you can cover in 4 minutes at 500 knots) would be ~13GB, and the Korean peninsula (just the land of the peninsula itself, not other parts of the KTO, slightly larger than Great Britain) would be ~900GB. This is the reason why MSFS 2020 gave up on installing terrain and switched to download-on-the-fly – I think they use 0.5m resolution imagery where available, so multiply the above numbers by 4.

Even outside terrain data, large textures are pretty expensive storage-wise – the latest Call of Duty games actually take up more than 200GB, and you’d be hard-pressed to find a new AAA game under 40GB. Anyway, BMS has never been about the fanciest eye candy so this is something one can live without if storage or I/O is an issue – FWIW, the latest PCI 4.0 SSDs can read at 7100MB/ (I have a 4TB one filled just with the real terrain data that I used for training the neural nets), and there are newer technologies for improving streaming performance (PS5’s SSD Texture Streaming) that might be available in a few years on the PC.

Dodam

Okay, Rev. 3 has been released – the water tile border issue has been fixed (though it’s more of a hack than a proper solution).

Dodam

@tad:

IIRC night tiles do not use any alpha channel.

Yeah – they are DXT1. Anyway, the night tiles issue has been fixed in Rev. 2. I’m going to be testing some solutions to the edge issue (they won’t be as bad as in sukhoi’s screenshots – most of that was the night tile issue), and I hope to have another revision out later today or tomorrow. You probably won’t notice the edges unless you’re over water, though.

Dodam

Ahhh, I think I exported the night tiles wrong in the alpha channel – shouldn’t be too hard to fix. (Sorry, I’ve never worked with DDS files before.)

*EDIT: Actually, it’s just doing weird things and outputting non-zeros when it shouldn’t. I’m going to use the bilinear as a zero mask for the night tiles for now. (This is from not having used any night tiles for training – it’s something I’ll revisit once the day tiles have been done to my satisfaction.)

Dodam

@tad:

Perhaps to save on your time this could be solved by knowing how the seamless borders are obtained in the basic/original set.

In the original, I think the tile creators just build as a set of 16 – that’s not something I can do, though. (It wouldn’t fit in the GPU to begin with.) The network sees a context of 48x48 pixels for each pixel at low resolution, so it can get a little confused at the edges (where half of that information just doesn’t exist).

Dodam

@Dodam

Best posts made by Dodam

Latest posts made by Dodam