Lock granularity, bubbles, shaders, texture filtering
-
FWIW original falcon 4.0 uses squared distance, no sqrt applied
-
Hehe, well, re skyscrapers: those should be true Falcon objectives (at least those over a certain height maybe), while things like road infrastructure (except bridges) residential housing and trees - I really donāt mind if they are not fully physical - Falcon is first and foremost a simulation of the in-cockpit experience .
-
sqrt: Yeah? good to hear, low-hanging fruitā¦
-
Hehe, well, re skyscrapers: those should be true Falcon objectives (at least those over a certain height maybe), while things like road infrastructure (except bridges) residential housing and trees - I really donāt mind if they are not fully physical - Falcon is first and foremost a simulation of the in-cockpit experience .
Why not make objectives fast too?
-
That would be the platonic ideal
-
I hope you gentlemen continued your discussions elsewhere, since there is nothing new here. IMO, its the most progressive read on the forum and I salute you for pushing the frontier in debating what is doable to this amazing sim.
Bms Forever!
/Jas
Ps. I would love to see a more userfriendly Lodeditor, but thats probobly just because IĀ“m stupid. Ds.
-
Now I remember more about the post Iāve read - it said that more than 3 cores cause bottlenecks because of inefficient multithreading. That hyperthreading causes inefficiency, too, because 2-3 cores is optimal, else performance is actually lost.
As for multi-core renderer, as far as I understand it, the actual primitive-drawing is done only on one core. The matter is putting other expensive tasks on a pinned thread and balancing, such-that theyāre busy with actual useful work, not spinning on a synchronization primitive, performing calculations that are unnecessary, etcā¦ At least I wouldnāt create multiple concurrent command streams for graphics. Have no of experience in the area, but learning about it would be a breath of fresh air from usual stuff
What Iāve meant is that if parallelization and other optimizations are done, code made more efficient because of this, bubble can be made less restrictive. And yes, Iām staring at a āblack boxā here, only knowing what you told me
-sh
First of all, this discussion flew under my radar last year, so I am just now reading it and I am grateful for this thread as it has illuminated a little bit behind the bms scene. Just some observations. Hyper-threading or multi-threading the code would improve performance in FBMS 10 fold AFAIK. I think what Sthalik is discussing is correct. It comes down to how the code would utilize multiple cores. This would open up a lot of headroom with FBMS to allow greater additions (such as better GFX and added features) without the reduction (if not improved) performance. I think that is where you guys are heading considering where you left off. Also, since Dunc has asked the question (The OS poll) it would require a more powerful operating system (such as 7 or 8 ) to manage the multi threading. Also, 7 or 8 would allow for more RAM to be recognized and used. Another big plus!
What I have been reading about lately is the fact that multi GPUās are a problem. I experience problems running games with my 2 HD 7970ās. Micro stuttering, latency issues and overall performance issues. I will get better FPS, but I will get micro stuttering with some games. Sthalik hit it on the head with this post. Everything is done through 1 GPU. The other GPU renders off board processes like FSAA. Then is transmits that process back the the main GPU board for post processing. This is where you actually get issues like micro stuttering. SLI or Xfire does not matter. It is the same issue. Best bet is to get a fast single GPU board. As for FBMS, I would focus on the multi-threading (multi CPUās) and OS utilizing more RAM to improve FBMS overhead. Everything else will depend on this kind of improvement. I think with multi core support, you could have normal maps for the terrain and still get around 50 FPS with a decent chip and RAM.
-
jhook,
Hyper-threading or multi-threading the code would[ā¦]
The codeās heavily parallel starting with 4.0. Itās -just- lock granularity, threading overhead etc. thatās causing it to perform less well at higher parallelization levels.
Thatās a major thing to do, you just donāt take some-orders-of-magnitude-of-SLOC-codebase and refactor it like thatā¦
Also, 7 or 8 would allow for more RAM to be recognized and used. Another big plus!
As far as 32-bit executables are concerned, more address space (what is that āRAMā thing?) doesnāt magically become available with newer OS versions.
I cry in terror at the mere thought of someone making it 64-bit clean without making ādatabaseā incompatible with the 32-bit version.
running games with my 2 HD 7970ās. Micro stuttering, latency [ā¦]
Nonsense to use crossfire when falconās CPU-bound.
-sh
-
jhook,
Hyper-threading or multi-threading the code would[ā¦]
The codeās heavily parallel starting with 4.0. Itās -just- lock granularity, threading overhead etc. thatās causing it to perform less well at higher parallelization levels.
Thatās a major thing to do, you just donāt take some-orders-of-magnitude-of-SLOC-codebase and refactor it like thatā¦
Also, 7 or 8 would allow for more RAM to be recognized and used. Another big plus!
As far as 32-bit executables are concerned, more address space (what is that āRAMā thing?) doesnāt magically become available with newer OS versions.
I cry in terror at the mere thought of someone making it 64-bit clean without making ādatabaseā incompatible with the 32-bit version.
running games with my 2 HD 7970ās. Micro stuttering, latency [ā¦]
Nonsense to use crossfire when falconās CPU-bound.
-sh
Thanks for the reply. I agree with the dual GPU approach. No need for that. But do you think multi-CPU threading would be possible for FBMS? More address space for more RAM usage? Seems like that would go a LONG way really. As for the OS thing, I was referencing that a better OS would help with the processing better. Donāt know if that is even a factor really. Anyway, would be great to see FBMS using multi-CPUās and greater RAM.
-
But do you think multi-CPU threading would be possible for FBMS?
Already is since 4.0.
More address space for more RAM usage?
BMS doesnāt use more than 3Gb anyway with sane data assets.
better OS would help with the processing better.
Elaborate, thatās vague to the point of irrelevance.
would be great to see FBMS[ā¦]
Wrong tense.-sh
-
But do you think multi-CPU threading would be possible for FBMS?
Already is since 4.0.
More address space for more RAM usage?
BMS doesnāt use more than 3Gb anyway with sane data assets.
better OS would help with the processing better.
Elaborate, thatās vague to the point of irrelevance.
would be great to see FBMS[ā¦]
Wrong tense.-sh
I didnāt think FBMS was multi-CPU capable. Wow. Ok,
Since Dunc made a small and simple statement about Windows version and a 64 bit version on the horizon, that was actually my thoughts on running FBMS through a better OS. Simply because improving FBMS to a 64 bit OS system might allow for better processing and utilizing CPU/RAM more efficiently (i.e. more headroom). It was just some ideas for improving the performance in FBMS. Since it does use multi-CPUās (I thought that it did not use multi-CPUās) how many cores does FBMS utilize? Is there a maximum core limit?
-
The thread actually went on, very interesting at that. Some more fuel to the fire.
jhook: there are no hard limits on CPU amount as we understand it. It just spawns threads - like any other process that spawns threads - and OS schedules on them.
āUtilizing X more efficientlyā so vague that equates āmake the damn thing run faster!!!ā. So letās!
[ā¦]Iāve done some MPI and Open MP coding in some parallel processing university course (Although not CS or SW degree) and thatās my main knowledge of parallel processing. Never tried to check if something like that can fit Falcon code. I guess that if someone with expertise in multi threading and multi-core coding methods will try to get something implemented, it could work, even locally in some high processing load areas of the code (if that makes any senseā¦)
a breakthrough in that direction will be revolutionary for a sim like Falcon which can starve sometimes for CPU cycles.
The issues with parallelization ā if 2/3 of the code runs serialized AND waiting for the threaded part, the most speedup with infinite cores can be 1/3. There was a name for that law, forgot.
OpenMP is an abstraction with its own tradeoffs. From what Iāve seen on it, itās for dudes doing scientific processing of datasets. It wonāt solve the issue of interlocking either.
What would, IMO, solve the interlocking? Look at the control flow graph and see why things happen the way they do. In particular, render path and simulation arenāt running in unison.
As for lock granularity, thatās a separate thing in itself. Overreliance on locking -however- caused by the need for mutable state. There are ways to change it, for instance:
a) donāt mutate, copy the relevant parts into new instance as needed, say, every frame/dt. The memoryās managed manually so affordable.
b) isolate relevant state into values on its own and torture the graphvizād control flow graph until spotting something that can be done less bottleneckably.OpenMP has some autoparallelization, but Turing-machines leave little to the imagination of the compiler. Think ā sequence points, side effects, aliasing. It can work well with parallelizing+vectorizing self-contained loops, but what is it for Falcon?
That executionās structureās more-or-less derived from F4, puts a frame around what can be done. And the orders-of-magnitude of F4 BMS SLOC ā holy jumping jethrosā¦
-sh