FAST - Future Audio Sounds Thread

Blu3wolf

I was going to ask whether I should start a new thread for this tangent discussion, and then realised - If I was asking that question, clearly the answer was “yes”. So here we have FAST, the thread to discuss future audio and sounds.

@Dee-Jay:

Anyway … we have to deal with existing pilots and ATC voice frags and since we don’t have money to pay a professional studio to redo all the sound frags, and since we are already running low on Callsigns in the campaign, it is not about to change.

@Blu3wolf:

What would we need to have them redone? This seems like the sort of thing that could be done by students of media? People doing university studies, something like that? I could have a chat with some people in that field locally and ask if their students would be interested in producing that content, if I knew exactly what the team wanted that media to sound like. What sound bites, what tone/accent those sound bites should have, etc.

@Dee-Jay:

For each of the 14 voices, recording (with the right intonation), cleaning, cutting all the sample present in …:\Falcon BMS 4.35\Data\Sounds\F4Talk95v1-0-0.csv … plus some other that are not in 4.35 … plus some other that we would need.

Its that “plus some other” part that Id like to get some detail on, if its possible. I realise from my (limited) experience in production and post-production that a large part of that process is iterative - we get some audio, then want to go back and change it, assess the changes, decide we want more or perhaps just something different, etc etc. Step one of that process is to know what we want, or at least have a rough outline of what we want.

If this was a done deal and we had people keen to provide voice samples (and equipped, skilled, etc), would we stop at 14 volunteers? Or is it just that 14 voices is all Falcon 4.0 had, so its what we have been using forever? Would we want specifically american sounding voices? How about korean?

Is there a file that lists the text versions of the spoken audio? Is there additional voice lines we would want to add, given the choice? I guess Id like answers, if possible, from the devs, but I suppose any modders who had run into issues with wanting voice lines in BMS and those lines just not being available, now would be a good time to speak up (figuratively speaking, of course).

I see Khronik spent some time working with the voice audio. Was there much use made of the ability to generate new audio from TTS for arbitrary strings? Was that capability good enough to perhaps avoid needing to have real human voice actors?

Dee-Jay

I didn’t had the time to read (and understand) all yet, sorry I will a bit later …

Just a question: Are you speaking about real voices by actors or TTS?

Blu3wolf

Real voices by actors, if possible. However I am also wondering whether TTS might be well developed enough technology to do the job cheaper and just as well perhaps?

Im proposing that if we cant use TTS, we figure out what the audio needs to sound like, and what strings we want, and we contact media schools and universities to ask if their students are up for some ‘extra credit’ classes. They get credited for their voices - IF their work is good enough to use - and we get updated audio without paying an established studio. The down side being, it might take more time and effort, and the quality might be more variable, using volunteer students rather than a professional studio.

I guess Im also asking if anyone can maybe see a better alternative that Ive perhaps missed.

Dragon1-1

Why not call for volunteers on the forum? They won’t quite be professional quality, but do we really need that? There’s not much acting skill involved in saying things like “Golf”, “November”, “Two”, “Lead, you’re on fire” and “Leave the fat one for me.” All those people would need to do would be to say (and record) the list of lines the same way they would say them when flying in MP.

TTS is also pretty good these days, especially if you’re not expecting it to emote. I’d say, the LSO dialogue actually sounds better than the old sound clips. Real aviation dialogue is usually pretty deadpan, so it should work pretty well for that, and it would essentially allow anything to be said by the AI, just by adding it to the list of phrases. This is a big advantage, because with real VAs, the next time you want to add another thing for the pilots to say, you’d have to track all the VAs down and hope they’re all up for an additional line or several.

Depending on how long the list actually is, I might be up for that sort of thing. I’ve got a bit of an Eastern European accent, though.

Dee-Jay

IMO, we need a professional studio and the guaranty to have the same quality of recording.

TTS are good, but need some with intonation capable … => not Windows one and probably not free software. Otherwise it would be already done.

Blu3wolf

@Dragon1-1:

Why not call for volunteers on the forum? They won’t quite be professional quality, but do we really need that? There’s not much acting skill involved in saying things like “Golf”, “November”, “Two”, “Lead, you’re on fire” and “Leave the fat one for me.” All those people would need to do would be to say (and record) the list of lines the same way they would say them when flying in MP.

I think Ive suggested this before, actually. I recall someone (possibly Dee-Jay?) pointing out that most people do not have access to professional recording equipment, nor somewhere to record. You can get acceptable quality from a cheap microphone in some cases… but much more of an issue is where you record. The ambient noise, as well as the sound reflecting off of the walls, can drastically impact the audio quality you record.

To some extent, you can correct this partly in software, but far better is to start with high quality sound clips - and that starts with decent equipment and somewhere to record. If you look online, you can find tutorials on how to set up a recording room for amateur voice acting, and it can be done relatively cheaply.

I guess what Im getting at is I dont think it would be impossible, but it certainly is no guarantee that we would get even 14 volunteers from the forum who all had a decent microphone and a room they could modify to use as a recording room. That said - Id not be against making that suggestion. What is the worst that happens, we get 2 or 3 volunteers and once its recorded, the audio isnt quite high enough quality to actually use? At least it would be a learning experience.

Dragon1-1

It’s certainly better to work with high quality samples, but think about it: we are primarily talking about radio voices. You’d have to normalize the voices and then apply a filter to make it sound more radio-like, which would mask slight imperfections. Echoes might be a problem, but they can be mitigated, and better microphones can do that on their own to some degree. IMO, we should be aiming for authenticity. The F-16 doesn’t have a professional recording mike onboard, and it’s not going to sound like anything from a recording studio, even without things like KY-58 making it even worse.

I think the best way to find out what we can get is to try. Compile a list of all lines to be recorded, agree on quality requirements, and then post it somewhere prominent, asking for a few trial submissions. When I looked into the files, it seemed they had a lot of lines, but not overwhelmingly many. I think it’s less than what goes into a fully voiced, story-based DCS campaign (or one for Freespace 2 Open, another game with a modding community that has record of amateur VA work).

Dee-Jay

Radio filter should be applied on post processing once everything is done so we can have original samples and to be able to rework them on needs, then re-apply the filter.

All samples need to be stored on our SVN without filters.
Also … the XML listing all frags need to be correct first. (some lines are inconsistent) and of course, we would need to add some other.

Big task. For sure, if we do not consider the results, TTS would be way more practical.

Micro_440th

Hi Blu3wolf!

I’ve spended a lot of time working on this topic and trying stuff out.

As audio recording (music production, speech, etc.) as well as pre- and post production is my daily business I can say to get better results in BMS you need first: PROFESSIONAL native speakers.
Second, every speaker needs a proper recording setup (no, not a 15€ Amazon Headset) as well as time and passion.
Passion you will find in this community for sure. But I think finding professional speakers in the amount needed is not an easy thing (espacially when they have to do it for free).

Right now we have 14 voices available in BMS.
If you wanna replace even just ATC (Voice 12F + 13, ATIS not included) you need more than 2000 samples in the right tonation, speed, level, etc. Also someone who edits everything as well as sound shaping (EQ to Comp/Limiting). A LOT of work.

And then think about the next step of improvement. Is it enough only to replace sounds or enhance the whole enviroment?
I have in mind having different ATC voices for let’s say every country where you takeoff/land. How cool is that?

To me, voice sounds are not that bad right now even when they’re 22 years old. It WORKS.

As I said: If you can provide me 14+ speakers with 2 weeks time each and proper setups, lets talk.
If not, let’s search for a GOOD TTS solution.
I researched most of the possibilities which are free. So far, the end results would be mediocre. Not something we wish for BMS.

Cheers
Micro

Stevie

…this is another one where I’ve been through a bit of this sort of thing in RL. And I also have my own hobbyist recording studio - at least I did have one, before our magnitude 5.4, 6.4, and 7.1 earthquakes in 2019…

It’s not really very difficult to do the recordings - particularly if you have a Mac - and just about all (if not all) of the recording work itself can be done with freeware nowadays. Either version of Audacity (Mac or PC) is very suitable for this purpose; all you need do is use the appropriate file format for output, and a quiet room, studio condenser mic, and sealed back headphones are about all the more one needs. For strictly voice recording, anyway. You might want two mics if you are trying to capture true stereo, but I might think dual-mono could actually sound more realistic in this case.

The hard thing to do is to get all of the voice alerts to sound the same…i.e.; match the vocalizations in level and timbre, or even harder - between two individuals. We tried some of this for voice alerts for a RL jet once (actually, more than once now that I think about it) and when you listened to the baseline alerts vs the ones we recorded the difference was obvious (and it should be noted that the real “Bitchin’ Betties” have not been professional voice artists - just women chosen for the timbre of their voices)…so you really have to do ALL of them using the same individual to make them consistent. This holds true for the voice alerts, not so much for other comms like inter-flight, ATC, etc.

The other thing is that RL comms are anything but studio quality - this is actually also easy to reproduce given the number of free-ware filters and effects that are currently available for digital audio production. Compression, static, etc. are all pretty easily generated. It may also be possible to just enhance and/or modify the existing audio to give a more “modern and realistic” sound. If anything, I’d suggest trying that first.

Dee-Jay

Hi!

@Micro_440th:

To me, voice sounds are not that bad right now even when they’re 22 years old. It WORKS.

It works for sure.
However it is a hell of a nightmare to add some new features/comm’s … etc …

Ex: Accurate implementation of AWACS Check-In and other comm’s we would need (Alpha Check, Rolex, Retrograde …) New call-sings … Accurate Brevity for CAS, JTAC, Data Link, … etc … (many are not correct or are simply missing) …
Airbases ATC callsings … etc … etc … etc … and it is time consuming to try to create them from what we have as shown on previous post.

Blu3wolf

@Micro_440th:

As I said: If you can provide me 14+ speakers with 2 weeks time each and proper setups, lets talk.
If not, let’s search for a GOOD TTS solution.
I researched most of the possibilities which are free. So far, the end results would be mediocre. Not something we wish for BMS.

Cheers
Micro

How professional? Like, audio technica headsets and blue yeti mikes, or stuff with a few more zeroes on the end? Modern budget recording gear is way more capable than the expensive stuff from the 90s when this was originally recorded…

How professional for the speakers? My thoughts is target inexperienced voice actors and/or university media students: people who want to go into this as a career.

I’ve currently got some weeks to kill, but no recording room set up and I can provide just one voice at present. I do have some decent RØDE microphones at least.

SOBO-87

I’ve been playing around with Amazon Polly TTS for another project, which considering how much TTS BMS would need, it would probably be a free option.

Its not great, but it does ok in some situations. The voice selection is somewhat limited though unfortunately and quite deadpan. I don’t know of any better TTS service though, although to be honest I didn’t look that hard because Polly worked for my other project well enough.

Here is a little test video I threw together. Wait till the very end for some which I put some “radio” effects on to see how they sound.

digle

+1 for Amazon Poly TTS it sounds nice

Micro_440th

@Stevie:

…this is another one where I’ve been through a bit of this sort of thing in RL. And I also have my own hobbyist recording studio - at least I did have one, before our magnitude 5.4, 6.4, and 7.1 earthquakes in 2019…

It’s not really very difficult to do the recordings - particularly if you have a Mac - and just about all (if not all) of the recording work itself can be done with freeware nowadays. Either version of Audacity (Mac or PC) is very suitable for this purpose; all you need do is use the appropriate file format for output, and a quiet room, studio condenser mic, and sealed back headphones are about all the more one needs. For strictly voice recording, anyway. You might want two mics if you are trying to capture true stereo, but I might think dual-mono could actually sound more realistic in this case.

The hard thing to do is to get all of the voice alerts to sound the same…i.e.; match the vocalizations in level and timbre, or even harder - between two individuals. We tried some of this for voice alerts for a RL jet once (actually, more than once now that I think about it) and when you listened to the baseline alerts vs the ones we recorded the difference was obvious (and it should be noted that the real “Bitchin’ Betties” have not been professional voice artists - just women chosen for the timbre of their voices)…so you really have to do ALL of them using the same individual to make them consistent. This holds true for the voice alerts, not so much for other comms like inter-flight, ATC, etc.

The other thing is that RL comms are anything but studio quality - this is actually also easy to reproduce given the number of free-ware filters and effects that are currently available for digital audio production. Compression, static, etc. are all pretty easily generated. It may also be possible to just enhance and/or modify the existing audio to give a more “modern and realistic” sound. If anything, I’d suggest trying that first.

Hi!

1.Bitchin Betty had how many voices to record? 50 or something? That would be easy.

2. I didnt get the Dual Mono thing. This way you create unnecessary phase issues. For a proper voice recording a simple mono track is enough. Keep it simple.

3. RL comms sound indeed the way you described and for sure you can edit those sounds that way.
But again that’s not the trick. The trick is the source and to get someone who is able:

to record himself
to hear voice nuances in pitch and tone
to record multiple hours a day for 1/2 weeks with the same intonation (professional speakers are able to do that because they are used to it)

As Dee-Jay mentioned there is also a lot missing and/or needed for upcoming releases.
I would love to bring this forward but we have to find at least a good solution which works for upcoming releases too.

That means if you record new voices for 4.36 and in 4.37 you need more words from that same voice it’s maybe smarter to use TTS to get the possibilty to reproduce the same quality as recorded before.

Cheers

Micro_440th

@Blu3wolf:

How professional? Like, audio technica headsets and blue yeti mikes, or stuff with a few more zeroes on the end? Modern budget recording gear is way more capable than the expensive stuff from the 90s when this was originally recorded…

How professional for the speakers? My thoughts is target inexperienced voice actors and/or university media students: people who want to go into this as a career.

I’ve currently got some weeks to kill, but no recording room set up and I can provide just one voice at present. I do have some decent RØDE microphones at least.

As an add on to my previous post here.

I’m not talking about state of the art recording equipment. I’ve made good recordings even with a ZOOM recorder.

Think about the next steps.
If you record now with 14 students for 4.36. Fine, maybe that works.

But after some years you need more words from the same voice for 4.37. If one of those person is now a successful and well paid actor we get a budget issue.
That being said we need a solution which works for years and which should be able to enhance when needed. And that to me is TTS.

Blu3wolf

While I agree TTS is more convenient long term, my understanding that it has some significant shortfalls seems to have been reinforced by the recent posts.

As far as potential budget issues: Intrinsic motivators are a pretty neat thing. If we could in fact get community voices, I suspect you wouldnt have to worry about budget concerns.

Micro_440th

@BenDean87:

I’ve been playing around with Amazon Polly TTS for another project, which considering how much TTS BMS would need, it would probably be a free option.

Its not great, but it does ok in some situations. The voice selection is somewhat limited though unfortunately and quite deadpan. I don’t know of any better TTS service though, although to be honest I didn’t look that hard because Polly worked for my other project well enough.

Here is a little test video I threw together. Wait till the very end for some which I put some “radio” effects on to see how they sound.

Not bad but I there are definitely better TTS applications available. If I remember correctly you can increase voice speed in Polly too?

In your example the word flow is much to slow. When you increase speed you will hear a lot of converting artifacts and it creates unnatural sentences.

I can post a link later about the application I have in mind

Micro_440th

@Blu3wolf:

While I agree TTS is more convenient long term, my understanding that it has some significant shortfalls seems to have been reinforced by the recent posts.

As far as potential budget issues: Intrinsic motivators are a pretty neat thing. If we could in fact get community voices, I suspect you wouldnt have to worry about budget concerns.

For sure. I love that idea too!!! I PM you later today. Maybe we can team up

Dee-Jay

My advice, If you guys goes into that project, care about make some tests first on a limited amount of work, and modify process on needs taking in account the lessons learned of the initial tests …
Do something really satisfying before starting the global work, and keep in mind that it will have to be close coordinated because on future version frag are already different. (new/other words/sentences)

FAST - Future Audio Sounds Thread

69

10.8k

23.1k

373.4k