FAST - Future Audio Sounds Thread

Micro_440th

Hi Blu3wolf!

I’ve spended a lot of time working on this topic and trying stuff out.

As audio recording (music production, speech, etc.) as well as pre- and post production is my daily business I can say to get better results in BMS you need first: PROFESSIONAL native speakers.
Second, every speaker needs a proper recording setup (no, not a 15€ Amazon Headset) as well as time and passion.
Passion you will find in this community for sure. But I think finding professional speakers in the amount needed is not an easy thing (espacially when they have to do it for free).

Right now we have 14 voices available in BMS.
If you wanna replace even just ATC (Voice 12F + 13, ATIS not included) you need more than 2000 samples in the right tonation, speed, level, etc. Also someone who edits everything as well as sound shaping (EQ to Comp/Limiting). A LOT of work.

And then think about the next step of improvement. Is it enough only to replace sounds or enhance the whole enviroment?
I have in mind having different ATC voices for let’s say every country where you takeoff/land. How cool is that?

To me, voice sounds are not that bad right now even when they’re 22 years old. It WORKS.

As I said: If you can provide me 14+ speakers with 2 weeks time each and proper setups, lets talk.
If not, let’s search for a GOOD TTS solution.
I researched most of the possibilities which are free. So far, the end results would be mediocre. Not something we wish for BMS.

Cheers
Micro

Stevie

…this is another one where I’ve been through a bit of this sort of thing in RL. And I also have my own hobbyist recording studio - at least I did have one, before our magnitude 5.4, 6.4, and 7.1 earthquakes in 2019…

It’s not really very difficult to do the recordings - particularly if you have a Mac - and just about all (if not all) of the recording work itself can be done with freeware nowadays. Either version of Audacity (Mac or PC) is very suitable for this purpose; all you need do is use the appropriate file format for output, and a quiet room, studio condenser mic, and sealed back headphones are about all the more one needs. For strictly voice recording, anyway. You might want two mics if you are trying to capture true stereo, but I might think dual-mono could actually sound more realistic in this case.

The hard thing to do is to get all of the voice alerts to sound the same…i.e.; match the vocalizations in level and timbre, or even harder - between two individuals. We tried some of this for voice alerts for a RL jet once (actually, more than once now that I think about it) and when you listened to the baseline alerts vs the ones we recorded the difference was obvious (and it should be noted that the real “Bitchin’ Betties” have not been professional voice artists - just women chosen for the timbre of their voices)…so you really have to do ALL of them using the same individual to make them consistent. This holds true for the voice alerts, not so much for other comms like inter-flight, ATC, etc.

The other thing is that RL comms are anything but studio quality - this is actually also easy to reproduce given the number of free-ware filters and effects that are currently available for digital audio production. Compression, static, etc. are all pretty easily generated. It may also be possible to just enhance and/or modify the existing audio to give a more “modern and realistic” sound. If anything, I’d suggest trying that first.

Dee-Jay

Hi!

@Micro_440th:

To me, voice sounds are not that bad right now even when they’re 22 years old. It WORKS.

It works for sure.
However it is a hell of a nightmare to add some new features/comm’s … etc …

Ex: Accurate implementation of AWACS Check-In and other comm’s we would need (Alpha Check, Rolex, Retrograde …) New call-sings … Accurate Brevity for CAS, JTAC, Data Link, … etc … (many are not correct or are simply missing) …
Airbases ATC callsings … etc … etc … etc … and it is time consuming to try to create them from what we have as shown on previous post.

Blu3wolf

@Micro_440th:

As I said: If you can provide me 14+ speakers with 2 weeks time each and proper setups, lets talk.
If not, let’s search for a GOOD TTS solution.
I researched most of the possibilities which are free. So far, the end results would be mediocre. Not something we wish for BMS.

Cheers
Micro

How professional? Like, audio technica headsets and blue yeti mikes, or stuff with a few more zeroes on the end? Modern budget recording gear is way more capable than the expensive stuff from the 90s when this was originally recorded…

How professional for the speakers? My thoughts is target inexperienced voice actors and/or university media students: people who want to go into this as a career.

I’ve currently got some weeks to kill, but no recording room set up and I can provide just one voice at present. I do have some decent RØDE microphones at least.

SOBO-87

I’ve been playing around with Amazon Polly TTS for another project, which considering how much TTS BMS would need, it would probably be a free option.

Its not great, but it does ok in some situations. The voice selection is somewhat limited though unfortunately and quite deadpan. I don’t know of any better TTS service though, although to be honest I didn’t look that hard because Polly worked for my other project well enough.

Here is a little test video I threw together. Wait till the very end for some which I put some “radio” effects on to see how they sound.

digle

+1 for Amazon Poly TTS it sounds nice

Micro_440th

@Stevie:

…this is another one where I’ve been through a bit of this sort of thing in RL. And I also have my own hobbyist recording studio - at least I did have one, before our magnitude 5.4, 6.4, and 7.1 earthquakes in 2019…

It’s not really very difficult to do the recordings - particularly if you have a Mac - and just about all (if not all) of the recording work itself can be done with freeware nowadays. Either version of Audacity (Mac or PC) is very suitable for this purpose; all you need do is use the appropriate file format for output, and a quiet room, studio condenser mic, and sealed back headphones are about all the more one needs. For strictly voice recording, anyway. You might want two mics if you are trying to capture true stereo, but I might think dual-mono could actually sound more realistic in this case.

The hard thing to do is to get all of the voice alerts to sound the same…i.e.; match the vocalizations in level and timbre, or even harder - between two individuals. We tried some of this for voice alerts for a RL jet once (actually, more than once now that I think about it) and when you listened to the baseline alerts vs the ones we recorded the difference was obvious (and it should be noted that the real “Bitchin’ Betties” have not been professional voice artists - just women chosen for the timbre of their voices)…so you really have to do ALL of them using the same individual to make them consistent. This holds true for the voice alerts, not so much for other comms like inter-flight, ATC, etc.

The other thing is that RL comms are anything but studio quality - this is actually also easy to reproduce given the number of free-ware filters and effects that are currently available for digital audio production. Compression, static, etc. are all pretty easily generated. It may also be possible to just enhance and/or modify the existing audio to give a more “modern and realistic” sound. If anything, I’d suggest trying that first.

Hi!

1.Bitchin Betty had how many voices to record? 50 or something? That would be easy.

2. I didnt get the Dual Mono thing. This way you create unnecessary phase issues. For a proper voice recording a simple mono track is enough. Keep it simple.

3. RL comms sound indeed the way you described and for sure you can edit those sounds that way.
But again that’s not the trick. The trick is the source and to get someone who is able:

to record himself
to hear voice nuances in pitch and tone
to record multiple hours a day for 1/2 weeks with the same intonation (professional speakers are able to do that because they are used to it)

As Dee-Jay mentioned there is also a lot missing and/or needed for upcoming releases.
I would love to bring this forward but we have to find at least a good solution which works for upcoming releases too.

That means if you record new voices for 4.36 and in 4.37 you need more words from that same voice it’s maybe smarter to use TTS to get the possibilty to reproduce the same quality as recorded before.

Cheers

Micro_440th

@Blu3wolf:

How professional? Like, audio technica headsets and blue yeti mikes, or stuff with a few more zeroes on the end? Modern budget recording gear is way more capable than the expensive stuff from the 90s when this was originally recorded…

How professional for the speakers? My thoughts is target inexperienced voice actors and/or university media students: people who want to go into this as a career.

I’ve currently got some weeks to kill, but no recording room set up and I can provide just one voice at present. I do have some decent RØDE microphones at least.

As an add on to my previous post here.

I’m not talking about state of the art recording equipment. I’ve made good recordings even with a ZOOM recorder.

Think about the next steps.
If you record now with 14 students for 4.36. Fine, maybe that works.

But after some years you need more words from the same voice for 4.37. If one of those person is now a successful and well paid actor we get a budget issue.
That being said we need a solution which works for years and which should be able to enhance when needed. And that to me is TTS.

Blu3wolf

While I agree TTS is more convenient long term, my understanding that it has some significant shortfalls seems to have been reinforced by the recent posts.

As far as potential budget issues: Intrinsic motivators are a pretty neat thing. If we could in fact get community voices, I suspect you wouldnt have to worry about budget concerns.

Micro_440th

@BenDean87:

I’ve been playing around with Amazon Polly TTS for another project, which considering how much TTS BMS would need, it would probably be a free option.

Its not great, but it does ok in some situations. The voice selection is somewhat limited though unfortunately and quite deadpan. I don’t know of any better TTS service though, although to be honest I didn’t look that hard because Polly worked for my other project well enough.

Here is a little test video I threw together. Wait till the very end for some which I put some “radio” effects on to see how they sound.

Not bad but I there are definitely better TTS applications available. If I remember correctly you can increase voice speed in Polly too?

In your example the word flow is much to slow. When you increase speed you will hear a lot of converting artifacts and it creates unnatural sentences.

I can post a link later about the application I have in mind

Micro_440th

@Blu3wolf:

While I agree TTS is more convenient long term, my understanding that it has some significant shortfalls seems to have been reinforced by the recent posts.

As far as potential budget issues: Intrinsic motivators are a pretty neat thing. If we could in fact get community voices, I suspect you wouldnt have to worry about budget concerns.

For sure. I love that idea too!!! I PM you later today. Maybe we can team up

Dee-Jay

My advice, If you guys goes into that project, care about make some tests first on a limited amount of work, and modify process on needs taking in account the lessons learned of the initial tests …
Do something really satisfying before starting the global work, and keep in mind that it will have to be close coordinated because on future version frag are already different. (new/other words/sentences)

Wheelchock

@BenDean87:

I’ve been playing around with Amazon Polly TTS for another project, which considering how much TTS BMS would need, it would probably be a free option.

Its not great, but it does ok in some situations. The voice selection is somewhat limited though unfortunately and quite deadpan. I don’t know of any better TTS service though, although to be honest I didn’t look that hard because Polly worked for my other project well enough.

Here is a little test video I threw together. Wait till the very end for some which I put some “radio” effects on to see how they sound.

WOW polly is really impressive. I say +1 for polly as well. While it might be less human it allows for lots of variation and I’m sure it could only get better from there.

Micro_440th

That is the TTS software I have in mind.

https://speechelo-offer.com/

Here is an good review video about it:

I will send examples later. Licence is bought.

Blu3wolf

@Dee-Jay:

My advice, If you guys goes into that project, care about make some tests first on a limited amount of work, and modify process on needs taking in account the lessons learned of the initial tests …
Do something really satisfying before starting the global work, and keep in mind that it will have to be close coordinated because on future version frag are already different. (new/other words/sentences)

I am inclined to look into starting a “pilot scale” project on this topic. Just start recording audio for one voice for one particular subject, see what I get right, what I get wrong, and what that suggests I change. Even if none of it is useful, it might prove insightful.

Dee-Jay

@Micro_440th:

That the TTS software I have in mind.

https://speechelo-offer.com/

Here is an good review video about it:

I will send examples later. Licence is bought.

That one is interesting!!! …
But is it possible to have several different voices (maybe no need of 12 … but 6 - 8 would be already good, maybe we could add more later) and … Male/Female?

TTS would be optimal … and the good point is that, even if you do the most part of the job, if proven efficient, anybody from the team can buy the same (I would) and be able to add/modify … etc … ensure continuation in development).
And accents … this is maybe a GREAT things for other counties pilots (Korean …!! … Japanese!!)

Wheelchock

@Micro_440th:

That the TTS software I have in mind.

https://speechelo-offer.com/

Here is an good review video about it:

I will send examples later. Licence is bought.

While the sound & speech quality is good wouldn’t relying on an online service or cloud based generation service be problematic for future development? What happens when you want to expand the library of statements and the cloud service is no longer available ? What happens if someone uses BMS mostly offline so a service like Polly wouldn’t work for them….just thoughts…playing devils advocate

Dee-Jay

(not sure to understand what you mean?)

Nothing will be online. It will be pre-recorded as it is now.

Edit: OKAY, I got it. Software uses an on-line server to generate the sentences. mmm … indeed. But … it we think LARGE at the beginning, it will not be worse that it is now.

sasah320

@Wheelchock:

WOW polly is really impressive. I say +1 for polly as well. While it might be less human it allows for lots of variation and I’m sure it could only get better from there.

+1

I like this one. I see no problem at all

Micro_440th

Just a quick test.

You have up to 30 voices only for english. Server based is no problem. Indeed its good because this way you have a proper backup.
Also it is nice because Multiple people can work in it because its HTML based

FAST - Future Audio Sounds Thread

98

10.8k

23.1k

373.4k