Lock granularity, bubbles, shaders, texture filtering

sthalik

TBH, I don’t know (I know bubble code mainly from reading some)… but assuming the main bubble function(s) took some changes (probably some rewrites) through the years, I don’t think there are serious bottlenecks, as if there were, they would have been detected by profiling and a fix would have been at least tried or searched for, and that isn’t the case, AFAIK.

Now I remember more about the post I’ve read - it said that more than 3 cores cause bottlenecks because of inefficient multithreading. That hyperthreading causes inefficiency, too, because 2-3 cores is optimal, else performance is actually lost.

As for multi-core renderer, as far as I understand it, the actual primitive-drawing is done only on one core. The matter is putting other expensive tasks on a pinned thread and balancing, such-that they’re busy with actual useful work, not spinning on a synchronization primitive, performing calculations that are unnecessary, etc… At least I wouldn’t create multiple concurrent command streams for graphics. Have no of experience in the area, but learning about it would be a breath of fresh air from usual stuff

What I’ve meant is that if parallelization and other optimizations are done, code made more efficient because of this, bubble can be made less restrictive. And yes, I’m staring at a ‘black box’ here, only knowing what you told me

-sh

fingon

Autogen can be non-deterministic, it depends on whether or not a random seed is stored along with each campaign/region/tile. If the original seed is stored, and a “deterministic random number generator” used, the same random sequence can be replayed.

sthalik

Autogen typically allows for changing density, this removes (or reduces) possibility for MP-shared clipping of objects. Same issue as with the community 3D cities project, clipping through objects, either in MP or always.

But yeah, autogen seems to be the way to go, given how much work it saves from theater creators. I’ve only used X-Plane for a couple hours total, but densely-populated cities look real awesome at sunset.

Fenrir

I know nothing but couldnt you do a autogen for the theater makers to place trees in “tree areas” that gets saved when the theater maker saves his/her work?

Cheers

sthalik

Yeah. Let’s bikeshed some whether to use Perlin or Voronoi or Gauss or what for tree spacing

I-Hawk

@sthalik:

Now I remember more about the post I’ve read - it said that more than 3 cores cause bottlenecks because of inefficient multithreading. That hyperthreading causes inefficiency, too, because 2-3 cores is optimal, else performance is actually lost.

As for multi-core renderer, as far as I understand it, the actual primitive-drawing is done only on one core. The matter is putting other expensive tasks on a pinned thread and balancing, such-that they’re busy with actual useful work, not spinning on a synchronization primitive, performing calculations that are unnecessary, etc… At least I wouldn’t create multiple concurrent command streams for graphics. Have no of experience in the area, but learning about it would be a breath of fresh air from usual stuff

What I’ve meant is that if parallelization and other optimizations are done, code made more efficient because of this, bubble can be made less restrictive. And yes, I’m staring at a ‘black box’ here, only knowing what you told me

-sh

Actually, regardless of rendering, from what I know, the main load in Falcon is in campaign missions with high number of units within the bubble. I can only guess that while every unit’s processing code isn’t such a heavy load (as 10 ACs and a column of tanks in TE will not cause serious load on average system), adding more and more units, eventually will cause a heavy load, and FPS drop will be noticed very well, even on very strong systems. If we could get those processes to run on multiple cores, that will be the optimal solution for all Falcon performance problems. Because in relatively light environment (Empty TEs and camp flying with not many units in the bubble), Falcon FPS is already pretty good, usually bound by GPU.

While parallel processing for GPU will sure help, the real bottleneck is CPU load in camp missions because of high number of units in the bubble.

MrIch

Theaters could be mixed with autogen and non autogen zones. X-plane uses exclusion zones. A theater designer could create military targets as he wishes and these areas would not be mxied. Furthermore the landclass and street system of X-plane based on osm data is a big deal. Imagine to have a converter tool which reads all military objects from existing theaters and combine it with the osm and mesh data in order to create a second gen theater. …

sthalik

@I-Hawk:

While parallel processing for GPU will sure help, the real bottleneck is CPU load in camp missions because of high number of units in the bubble.

AI can be optimized, including preprocessing path data for A* or whatever is used… So only ‘forks’ in the road data are treated as ‘open set’ data…

Don’t know what’s causing the load though, doubt could use the public PDB for profiling. But it’s worth at least a try optimizing it, definitely.

As for parallel GPU, renderer thread needs no actual resources, except for synchronous flushes which sleep anyway (unless configured otherwise). Pushing into the CS is extremely cheap. So unless it computes/allocates memory (it shouldn’t!), no biggie.

I’m also curious about heat exhaust and HDR. Don’t know (again) how well they’re optimized for SIMD GPU processing, but getting rid of ‘if’ statements in shaders (if any) would speed up GPU processing immensely

fingon

sthalik, what do you mean by “clipping”? Just so we don’t talk past eachother.

Either way, I personally would have no problem if future procedurally generated geometry is a visual effect only., It should be MP-safe by being deterministic, but only to make sure people see the same things (eg earlier on clouds were not shared, which was crappy), not for targeting/objective purposes. It would be far too inefficient to treat autogen objects as full Falcon campaign objects. Existing campaign objectives should receive an X-plane-like exclusion zone so that they “pop out” of the autogen stuff.

Lotsa wishful thinking here…

sthalik

by clipping, meant can through skyscrapers with no damage.

Autogen is tons of work, ask Ben Supnik from X-Plane team nice fellah btw, helped me a lot with the wined3d fork.

fingon

In terms of CPU-bound campaigns I’d be interested to know what kind of schedule is applied to things like simple distance checks between units (agg 2D or de-agg 3D) as they move. These kinds of things can kill performance unnecessarily if done in aggressive timings. Who knows, maybe FPS could be gained by simple things like switching to distance calculations without the square root operation in places where it can work, and using tables to adjust (if they don’t already do things like that).

sthalik

FWIW original falcon 4.0 uses squared distance, no sqrt applied

fingon

Hehe, well, re skyscrapers: those should be true Falcon objectives (at least those over a certain height maybe), while things like road infrastructure (except bridges) residential housing and trees - I really don’t mind if they are not fully physical - Falcon is first and foremost a simulation of the in-cockpit experience .

fingon

sqrt: Yeah? good to hear, low-hanging fruit…

sthalik

@fingon:

Hehe, well, re skyscrapers: those should be true Falcon objectives (at least those over a certain height maybe), while things like road infrastructure (except bridges) residential housing and trees - I really don’t mind if they are not fully physical - Falcon is first and foremost a simulation of the in-cockpit experience .

Why not make objectives fast too?

fingon

That would be the platonic ideal

Jasajas

I hope you gentlemen continued your discussions elsewhere, since there is nothing new here. IMO, its the most progressive read on the forum and I salute you for pushing the frontier in debating what is doable to this amazing sim.

Bms Forever!

/Jas

Ps. I would love to see a more userfriendly Lodeditor, but thats probobly just because I´m stupid. Ds.

jhook

@sthalik:

Now I remember more about the post I’ve read - it said that more than 3 cores cause bottlenecks because of inefficient multithreading. That hyperthreading causes inefficiency, too, because 2-3 cores is optimal, else performance is actually lost.

As for multi-core renderer, as far as I understand it, the actual primitive-drawing is done only on one core. The matter is putting other expensive tasks on a pinned thread and balancing, such-that they’re busy with actual useful work, not spinning on a synchronization primitive, performing calculations that are unnecessary, etc… At least I wouldn’t create multiple concurrent command streams for graphics. Have no of experience in the area, but learning about it would be a breath of fresh air from usual stuff

What I’ve meant is that if parallelization and other optimizations are done, code made more efficient because of this, bubble can be made less restrictive. And yes, I’m staring at a ‘black box’ here, only knowing what you told me

-sh

First of all, this discussion flew under my radar last year, so I am just now reading it and I am grateful for this thread as it has illuminated a little bit behind the bms scene. Just some observations. Hyper-threading or multi-threading the code would improve performance in FBMS 10 fold AFAIK. I think what Sthalik is discussing is correct. It comes down to how the code would utilize multiple cores. This would open up a lot of headroom with FBMS to allow greater additions (such as better GFX and added features) without the reduction (if not improved) performance. I think that is where you guys are heading considering where you left off. Also, since Dunc has asked the question (The OS poll) it would require a more powerful operating system (such as 7 or 8 ) to manage the multi threading. Also, 7 or 8 would allow for more RAM to be recognized and used. Another big plus!

What I have been reading about lately is the fact that multi GPU’s are a problem. I experience problems running games with my 2 HD 7970’s. Micro stuttering, latency issues and overall performance issues. I will get better FPS, but I will get micro stuttering with some games. Sthalik hit it on the head with this post. Everything is done through 1 GPU. The other GPU renders off board processes like FSAA. Then is transmits that process back the the main GPU board for post processing. This is where you actually get issues like micro stuttering. SLI or Xfire does not matter. It is the same issue. Best bet is to get a fast single GPU board. As for FBMS, I would focus on the multi-threading (multi CPU’s) and OS utilizing more RAM to improve FBMS overhead. Everything else will depend on this kind of improvement. I think with multi core support, you could have normal maps for the terrain and still get around 50 FPS with a decent chip and RAM.

sthalik

jhook,

Hyper-threading or multi-threading the code would[…]

The code’s heavily parallel starting with 4.0. It’s -just- lock granularity, threading overhead etc. that’s causing it to perform less well at higher parallelization levels.

That’s a major thing to do, you just don’t take some-orders-of-magnitude-of-SLOC-codebase and refactor it like that…

Also, 7 or 8 would allow for more RAM to be recognized and used. Another big plus!

As far as 32-bit executables are concerned, more address space (what is that “RAM” thing?) doesn’t magically become available with newer OS versions.

I cry in terror at the mere thought of someone making it 64-bit clean without making “database” incompatible with the 32-bit version.

running games with my 2 HD 7970’s. Micro stuttering, latency […]

Nonsense to use crossfire when falcon’s CPU-bound.

-sh

jhook

@sthalik:

jhook,

Hyper-threading or multi-threading the code would[…]

The code’s heavily parallel starting with 4.0. It’s -just- lock granularity, threading overhead etc. that’s causing it to perform less well at higher parallelization levels.

That’s a major thing to do, you just don’t take some-orders-of-magnitude-of-SLOC-codebase and refactor it like that…

Also, 7 or 8 would allow for more RAM to be recognized and used. Another big plus!

As far as 32-bit executables are concerned, more address space (what is that “RAM” thing?) doesn’t magically become available with newer OS versions.

I cry in terror at the mere thought of someone making it 64-bit clean without making “database” incompatible with the 32-bit version.

running games with my 2 HD 7970’s. Micro stuttering, latency […]

Nonsense to use crossfire when falcon’s CPU-bound.

-sh

Thanks for the reply. I agree with the dual GPU approach. No need for that. But do you think multi-CPU threading would be possible for FBMS? More address space for more RAM usage? Seems like that would go a LONG way really. As for the OS thing, I was referencing that a better OS would help with the processing better. Don’t know if that is even a factor really. Anyway, would be great to see FBMS using multi-CPU’s and greater RAM.

Lock granularity, bubbles, shaders, texture filtering

84

10.6k

23.1k

372.6k