Cloned Falcon 4 Voices - Add Voice Frags In the ORIGINAL Voices (Long)
-
All,
After a quite a bit of work, I have found a free public tool to clone the Falcon Four voices (12 pilot voices, 2 ATC voices, 3 LSO voices) called “Real Time Voice Cloning Tool” (RTVCT) or alternatively called, “Source Voice to Text-To-Speech” (SV2TTS), available here:CorentinJ/Real Time Voice Cloning Tool by Corentin Jermaine.
This tool runs in Python , using PyTorch for its AI for training, and, while it doesn’t provide an unmistakable clone, it is certainly close enough for what we might want to do with it in Falcon 4 BMS.
I have found that this RTVCT tool, after training the synthesizer and vocoder using the voice wavs extracted from Falcon 4 for each voice, can then reproduce those voices by typing what you want them to say in the tool (it’s a GUI based tool), and then it will save the output file as a .wav file (16 bit PCM Mono 16Khz sampling rate .wav).It may take a couple of tries and require a bit of “phonetic spelling” to get a good output .wav, but once you get it, the sound is easily close enough to be used in game with very little loss of immersion. You still need to reprocess the output wav by resampling the 16bit PCM Mono wav file from 16KHz down to 8Khz, which is what Falcon uses, but there are any number of tools (Audacity, Goldwave, etc.) that will do that for you quickly and easily. You can then use Falcon 4 TlkTool to insert the new .wavs into Falcon.tlk to make them work properly, and update the F4Talk95v1-0-0.csv and TTSEval.dat files to get proper subtitle text, airbase names etc.
Here are some examples:
Falcon 4 BMS Female Air Traffic Controller (Voice 12F):
Glenda1.wavGlenda Ground Truth Falcon 4 .wav file
Falcon 4 BMS Male Air Traffic Controller (Voice 13M):
Don1.wavDon Ground Truth Falcon 4 .wav file
These are not “spliced” words - I typed them into the SV2TTS (Real Time Voice Cloning) Toolbox (sometimes with a little phonetic spelling to help the vocoder), and this is the output file that the tool produced. It sounds pretty close to the real Falcon voice set to me, and works well in game.
Samples of all 14 primary Falcon Voice Clones and Ground Truth .wavs are available here: Falcon 4 Voice Files
I have already used this method to clone all Falcon 4 voices and then did the renaming and “re-voicing” of airbase names for all Coalition, Kuwaiti, Saudi, Bahraini, and Emirates airbases and airstrips (generic for the airstrips) of the Mideast 128 Theater airbases for SpbGoro’s outstanding theater. All necessary sound files are included in SpbGoro’s latest updates (v7 and up). Independently, I have also changed over 30 callsigns in my installation of Mideast128 Theater, but I have not published those yet (they will become available with a very large TE project I am working on for the Theater). I’ll publish some “how to’s” for airbase renaming and callsign renaming in new threads in the Sounds forum at a later time.
See this VERY useful thread by MeuMestre on the methodology used to change a callsign, for example. Another very useful tool for this work is Khronik’s outstanding TlkTool GUI.
The SV2TTS RTVCT isn’t necessarily easy to get installed and running, and runs best when you have an Nvidia GPU where it can use the CUDA cores to do the necessary computations (CPU can be used but it is slower). Instructions on how I got RTVCT installed and running on my Win10 PC with an NVIDIA RTX2070 Super GPU are available here: SV2TTC Installation and Operation for F4 Voice Cloning .pdf. You can pull me up in chat (@Tomcattwo) if you have issues with the installation.
Warning: this is not for the faint of heart, but it IS do-able. Also, you use this installation procedure AT YOUR OWN RISK!
Note: I do not have or know LINUX, Ubuntu or Mac operating systems, so I can be of no help on those if that is where you intend to try to do your installs. Sufficient information is available in the Github to get you going on those systems.
Also, it takes quite a bit of disk space (~up to 20G GB+ without any additional speech libraries other than what you need to clone Falcon 4 voices) for the files and speech libraries necessary to do this work, so make sure you have enough space!
It takes a fair amount of work to “train” the synthesizer to recognize a particular specific voice pattern, but I have already done this work and the necessary synthesizer and vocoder files for all Falcon 4 BMS voices for use in RTVCT are available here. There are three files you will need: F4 synthesizer pretrained files.zip, F4 vocoder pretrained files.zip, and F4 Speaker Files.zip.
Instructions on where to put these are included in the Installation and Operating Procedure .pdf
Caution – these are BIG files (6.1GB for the synthesizer pretrained files, zipped, another 0.93 GB for the Vocoder pretrained files, zipped, another 0.7 GB for theF4 Falcon Voice files, zipped).
I really hope some of you may want to try this to update other theaters with appropriate voice updates etc. Feel free to contact me if you have questions on this.
Regards,
Tomcattwo -
Amazing I hope I will be able to use your tool properly
It would open a new era for comms and means I would have to work a lot on ATC frags again !!
-
@mav-jp
Mav, let me know if you need any help. I’d really like to learn more about how the eval portion of the code works. I can see some immense opportunities. Right now, all I can do is “replace” things. But the whole world opens up if we can change how the code evaluates in-game events.
Regards,
TC2 -
@Micro_440th To put in your watch list
-
Fantastic result mate! I played a lot with it but couldnt get any results close to yours.
I will check out your settings and voices tomorrow.
Good work! -
@tomcattwo said in Cloned Falcon 4 Voices - Add Voice Frags In the ORIGINAL Voices (Long):
@mav-jp
Mav, let me know if you need any help. I’d really like to learn more about how the eval portion of the code works. I can see some immense opportunities. Right now, all I can do is “replace” things. But the whole world opens up if we can change how the code evaluates in-game events.
Regards,
TC2I change the code everyday to create new comms but I have always been limited by existing frags
I had to tweak them , assemble then and sometimes I couldn’t create what I wanted
If the tool works well the first job is to replace the existing bad phraseology by lore realistic ones
Then we could create better comms as
Some parts of the game
Really need it -
This is awesome stuff!
-
-
This is f’ing awesome. Time to employ an entire server farm to train all F4 comm permutations…unlimited flexibility for the future of BMS.
Better than MSFS and Microsoft Azure
But for real, this is a great idea and your samples sound amazing.
-
The male voice is not good enough I think it will not match if we mix old frags with this
-
@mav-jp what if we use it to replace everything? Instead of patching missing pieces?
-
@jstnj @seifer @MaxWaldorf
Appreciate the kind comments. Been working on this since August.When I first installed SV2TTS/RTVCT in August of this year, based on some early testing, I didn’t think it would sound good enough to use, but I was very pleasantly surprised when I started training the synthesizer for “Glenda” V12F ATC - she sounded really convincing right away. Training the vocoder helped even more.
There are some drawbacks…the voices do sound mechanical to a certain extent. Let’s face it, they aren’t voice actors, they’re clones
That said, they sound pretty good in Mideast128 in the sim.Some have asked if SV2TTS/RTVCT can do accents. Answer is yes it will pick up accents. In some other applications discussed in the Github repository, folks have done Mandarin Chinese output (a HUGE corpus of speech utterances had to be used for this effort) and I know some are working on Spanish, Russian and Polish and many other languages. There is an issue discussion pinned in the GitHub repository which discusses this work, ongoing since 2019. Those take a massive amount of work.
Can SV2TTS/RTVCT do emotions? To a very limited extent, and not controllably. The toolbox doesn’t recognize punctuation. It partly depends on what kind of voice emotions got trained into the synthesizer and vocoder. But it is random what, if any emotion the vocoder output will produce - mostly it is not much emotion and is unpredictable. And you cannot “force” the tool to form a question by just adding punctuation (i.e., it doesn’t recognize question marks). There is some work ongoing on Github in this direction (recognizing punctuation).
While there are some things that can help improve the output, the quality of the output remains heavily dependent on the quality of the input utterances used to train the synthesizer and the vocoder, and the underlying pretrained models that you use to build your single-voice training upon. We are fortunate to have enough input from the 32,699 Falcon .wav files from original Falcon 4.0, but we are limited by their quality (though the quality is still quite good enough to clone reasonably well). I am investigating using some other underlying voice models to try to single-train our voices on to see if I can improve thw quality of the output, control punctuation, etc. but that work will take a while.
The tool’s author made this tool for his Masters thesis, and got hired immediately by Resemble.ai, where he has produced a much more sophisticated version that is, of course, not free. They require anyone bringing a voice cloning task to them to have legal release from the owner of the voice, which we could not do (these Falcon voice actors did this over 20 years ago with original Falcon 4.0, and I am sure no one knows who or where they are now). But as smart people continue to work with this and apply better AI to the process, things could improve in terms of rreconizable reproducability of cloned voice output.
Can we train new Falcon 4 (pilot, or ATC or LSO) voices? Answer: yes. Toolbox will train ANY voice for which there are enough inputs of decent quality, but it’s final output is heavily dependent on the quality of the input used to train the synthesizer and vocoder. To clone a new voice with really good recognition requires hundreds to thousands of utterances (an utterance is any speech of any speech between about 2 and 12 seconds duration). Using a technique called “Single Voice Training”, which starts with an original pretrained synthesizer and vocoder trained using a hundred different voices and many thousands of utterances (available from the Corentin’s Github), and builds directly on that by repetition of the single voice you want to train multiple times. There are files in the toolbox repository to do that.
Once that is done for the new voice, you then have to generate a cloned .wav for every TalkID needed for a Falcon pilot (~2500+ individual .wav files per voice) or ATC (over 3000 per voice). That could take a while, but it could be done. So, conceivably, with time, we could have pilots that speak different languages altogether, or ATCs who speak English but with a distinct foreign accent. You just need a lot of utterances from the target voice and a lot of time to generate the cloned .wav files (each one takes about 5-10 seconds of real time to vocode, depending on the length of the utterance) . Utterances can even be cloned by using audiobooks to provide proper length utterances as sort of “pre-prepared” readings by voice actors. Again, the quality of the output depends heavily on the quality of the input. And we would also have to “dirty up” any cloned output of new voices to make them sound like radio transmissions in game. That was one of the really nice things about cloning our current Falcon voices - they already sound like radio transmissions, so the cloned output does as well.
The 14 Falcon voices took me about 8 hours each to prepare the utterances from the 32,699 available .wavs in falcon.tlk. I built a huge excel spreadsheet with the TalkID of every utterance for each of the 14 voices, seperated by voice. I also built tools in python and Excel VB and used some DOS command line stuff to automate that effort. Many of the Falcon Four voice frags are only a second long or less, so to train each voice, so I had to combine every 5 short utterances into a single longer utterance using the automation tools I built and fed that to the AI training process. So for each of the 14 basic Falcon voices, I had between 300 and 450 utterances of the right length to feed the training.
Then the actual single voice training for the synthesizer and vocoder for each voice took another 4-6 hours on average. But all 14 are done and this only needs to be done once.
Finally, I had to generate the necessary voice frags. To change a callsign requires 30 seperate cloned .wav files to be generated. To change a single airbase name requires 6 cloned .wav files. For 45 airbases, that’s 270 seperate files to generate. I did that for Mideast128 theater and the results sound really good to me.
Conversely, for the LSO voices (which are part of the V12F and V13M voice sets,there were only about 40 unique utterances. That is why the LSO clones don’t come close to the quality of the original 14 Falcon voices. Poor input results in poor cloned output.
More to follow .
Regards,
TC2 -
@mav-jp
Check out how Male ATC sounds in situ (I.e. in Mideast128 theater). I did some upgrades to the vocoder after doing some of the early .wav samples.
R/
TC2 -
@seifer said in Cloned Falcon 4 Voices - Add Voice Frags In the ORIGINAL Voices (Long):
@mav-jp what if we use it to replace everything? Instead of patching missing pieces?
the whole point of the proposal here is to mimic the original voice to be able to patch them
If you want to generate new 33 000 frags in 14 voices then we don’t need this tool but we need a good generator amd of course somebody ready to do it with proper balance and rhythm and cuts and so on that would be a huge task . This was done by a professional studio in the past
On top of that real actor voices are really goodi think the concept of mimic and patching is awesome
-
@mav-jp said in Cloned Falcon 4 Voices - Add Voice Frags In the ORIGINAL Voices (Long):
@tomcattwo said in Cloned Falcon 4 Voices - Add Voice Frags In the ORIGINAL Voices (Long):
@mav-jp
Mav, let me know if you need any help. I’d really like to learn more about how the eval portion of the code works. I can see some immense opportunities. Right now, all I can do is “replace” things. But the whole world opens up if we can change how the code evaluates in-game events.
Regards,
TC2I change the code everyday to create new comms but I have always been limited by existing frags
I had to tweak them , assemble then and sometimes I couldn’t create what I wanted
If the tool works well the first job is to replace the existing bad phraseology by lore realistic ones
Then we could create better comms as
Some parts of the game
Really need it@Mav-jp , yes I already cloned a couple of frags for Mideast128 that sounded like the splices were not real good (examples: “platform” and “furball”) and fixed all voices for frags “2 6” through “6 9” for all voices (using splices). If there are other frags you’d like me to clone in all voices, send me a list and I’ll do them and send you the. wav files.
R/
TC2 -
-
Really interested to see how this progresses…I kinda want to install this locally and mess around with it
-
@jstnj ,
Jump on in! First, I recommend you download Mideast128 Theater and fly around some at different airbases/airstrips to hear what it sounds like in the game - I think you’ll be pleasantly surprised. From my perspective, the clones are certainly good enough that there is no loss of immersion.I’ve provided pretty much everything you need in terms of instructions, download links, and tools needed, so give it a go!
I’d be happy to help out @Mav-jp whenever he is ready to work on improving the sound for the next iterations of Falcon BMS. Having more folks who know how to use the tool can only help. PM me if you have questions.I am contemplating whether or not I want to try to train the synthesizers using a different corpus of readings (LibriTTS) to see if I can get the synthesizer to produce better recognition with some degree of the ability to recognize punctuation. That is about a month-long effort (because I’d have to train the new “baseline” synthesizer to ~300,000 steps from scratch - that alone would take a week, then single-voice train all 14 Falcon voices again), with no guarantee it will be any better at all. But I can’t tackle that until I finish at least the first half of my TE project - it will take me another 2-4 weeks for that.
Regards,
TC2 -
-
@Tomcattwo
Hi Tomcat.
Need some help.
I can extract .wav files from “Falcon.tlk” using “TlkTool”.
But I need to know where can I find the texts that corresponds to every .wav, for example “100.wav” says “May day may day…”, and I need to find the texts that corresponds to a concrete .wav, to can modify and test only certain .wav’s, not all .wav’s.
Example: I want to change a .wav that says “Turning right…”, and I don’t know what .wav number is.And it’s crazy to hear .wav’s one by one until reach the one I need, there are thousands of them. I need to know where are the files related with “Falcon.tlk”, that link the ID’s from for example “100.wav” to “100: Turn right…” on another file. And what program will be needed to do this if aren’t plain texts file.
Thank’s in advance for any information. Nobody can give me this information along the years, to can mod some things on Falcon 4 BMS, mainly to change voices for others that I like much and can understand better.
My objective is to replace some .wav’s on “Falcon.tlk” with others that can be better understood for me, it’s for own purposes.
Note: When recompressing any .wav file to “falcon.tlk” with TlkTool, succeeds, but the .wav file looks bad into “.tlk”, and I can’t keep recompressing all the thousands of files in “falcon.tlk” for every test I run. Does anyone know the author of the TlkTool program or know what to do to get recompression of a single .wav file to work?. Thank’s in advance.
Another Note: And I found the files related to the sound, the list of linked ID’s yet located. The only question finally is how to recompress a single .wav file to “falcon.tlk” without this errors on TlkTool. Thank’s. -
Hello @cchaparro , congratulations (I think) on starting down the road to understanding Falcon BMS’s voice communications! First let’s start with a lesson on what goes into making BMS’s voice reply when it is supposed to do so.
BMS voice responses consist of one or more voice fragments (called “frags” for short) that BMS has to chain together to provide a single response. There are over 3000 frags used in BMS. Frags are listed by their FragID number in a file called fragFile.xml, which is located in the …Falcon BMS 4.37\data\Sounds folder (the same folder that contains Falcon.tlk, and all the other sound files we will consider in this lesson).
For example, when the pilot provides an input that prompts BMS to provide a voice response (for example, if our pilot, callsign “Mustang Four One” inputs “ttttt1” which requests BMS to provide the current atmospheric pressure, QNH), BMS has to first determine if the request was sent on the correct frequency (in this case the approach frequency), then it has to parse the input to identify that “the pilot has just requested to hear QNH information be sent on this frequency”). It then has to evaluate how to put the necessary frags together, using the correct voiceIDs (those are the numerical filenames of the correct .wav files in Falcon.tlk - each .wav has it’s own unique VoiceID which is the numerical part of its filename, and there are 40000 or more unique VoiceIDs in the Falcon.tlk file).
BMS uses the CommFile.xml and evalFile.xml files and internal hardcoding to determine which frags it needs, and their order,
to formulate the response. In our example, it needs to reply:“Mustang Four, Kunsan Approach, QNH is 2984.”
So, it needs:
- a frag for the callsign (“Mustang”)
- one for the flight number (“Four”)
- a frag for the airbase (“Kunsan”)
- a frag for the Agency (“Approach”)
- a frag for the response (“QNH is”)
- and then four frags for the numbers (“two”, “nine”, “eight”, “four”).
Nine frags just for this simple reply!
BMS also has to identify which voice is to be used for the reply. There are fourteen voices in Falcon BMS: voices 0 through 11 are pilot and AWACS/FAC voices. Voice 12 is the female ATC, and Voice 13 is the male ATC.
So every frag must have at least 2, but no more than 14, VoiceIDs associated with it - those are the .wav files. In our example, BMS will need to provide either a female ATC (Voice 12) or a male ATC (Voice 13) VoiceID. It picks one of the two at random to use for the reply. Let’s say it picks the female ATC (Voice 12). So for each of the 9 frags needed for our reply, it needs to select the Voice 12 VoiceID for each frag. How does it know which VoiceID to pick for each frag? The answer is in the magic of the fragFile.xml file, which identifies which VoiceIDs belong to each and every frag.
So now it knows what frags it needs for the response, and which VoiceIDs (.wav files) it needs to play. How does it know what to print on the screen for the reply? The answer is that it can find what to print in the F4Talk.csv file. The F4Talk.csv file is “comma separated variables” file, able to be opened and viewed as a text file using any text file viewer such as Notepad, Notepad+, or TextPad (my personal favorite), or in a spreadsheet file (such as Excel). F4Talk.csv lists every frag by its FragID and what is to be printed for each of the 14 voices, separated by commas. This file can be edited using any text editor, so you can change what you want to see printed. Note: F4Talk.csv only determines what gets printed on the screen. It does not directly affect the audio response information.
Since Falcon BMS know the nine FragIDs and the associated VoiceID (and voice number - Voice 12 in our example), Falcon BMS knows what to print because F4Talk.csv can identify exactly what to print. BMS then generates the response by playing each of the 9 identified VoiceIDs, in order, and prints the line as identified by the FragIDs/Voice positions from F4Talk.csv on the screen!
This is part one of my response. I will try to answer more of your questions in my next response below.
Regards,
Tomcattwo
(VoiceClone) -
@cchaparro
You asked: "The only question finally is how to recompress a single .wav file to “falcon.tlk” without this errors on TlkTool. "TLDR (too long, didn’t read): No, there is not a way I have found yet to recompress a single (or multiple) .wav file(s) to Falcon.tlk without errors. The only way I’ve found that works is compressing every .wav file into a new Falcon.tlk file
The long version:
I have not yet found a way to correctly add or delete a single (or multiple) .wav files to Falcon.tlk. The TlkTool only seems to work correctly if you create an entirely new Falcon.tlk file using every VoiceID .wav file needed. This is unfortunate. What I have had to do is to create every new or updated/corrected .wav file first, and then when I am ready to make a new Falcon.tlk file, I start with the entire set of decompressed 40,000 .wav files, and replace the ones I want to fix/add with the files I made, then compress a new Falcon.tlk using TlkTool.Important note: .wav files used in Falcon.tlk MUST be 16bit, Mono PCM files with 8KHz sampling rate Any other type will not work correctly in Falcon BMS!!
Another really useful tool for the type of work you are doing is TLKTOOLGUI by Khronik. TlkToolGui uses four files to create a TlkToolGui “project”:
- falcon.tlk
- fragFile.bin
- F4Talk.csv - NOTE: You must rename “F4Talk.csv” to “F4Talk_1_0_0.csv” or TlkToolGui will not process it correctly
- EvalFile.bin
You can make EvalFile.bin and fragFile.bin from the .xml files using TlkTool.
Once the project is created, you can view and hear every single VoiceID (Falcon.tlk. wav file) , arranged by Voice number (Voice 0 through Voice 13) and listed by FragID in this easy to read graphical user interface. This makes it very easy to find and identify individual .wav files and fragIDs. So, if you want to find the .wav files for “Mustang” for example, open the project in TlkToolGui, go to any voice, search down the list for “Mustang” and you’ll find the FragID. Look at that FragID in fragFile.xml, and it will list all the VoiceIDs for that frag.
I have not tried to use TlkToolGui to update/correct/fix/compress or decompress Falcon.tlk files. It uses TlkTool to do that work, so it also cannot properly insert a single (or multiple) .wav files into a correctly functioning Falcon.tlk file. I don’t think @Khronik or @lightning have been on the forum for a long time, so I’ve no idea if TlkTool can be fixed to allow for proper insertion of a single (or multiple) .wav files into Falcon.tlk properly.
My process:
-
I start with the latest BMS Version’s KTO sound files. I use TlkTool to decompress the base KTO Falcon.tlk file’s .wav files.
-
I determine exactly which VoiceIDs I need to change (I use huge spreadsheets to keep track of which FragIDs and VoiceIDs I need to change), and create the new .wav files. I use Correntin’s SV2TTC Voice Cloning Toolbox to create new cloned .wav files as I described in detail above so that the voices match the original Falcon voices. The .wav’s are put into 16bit, Mono PCM files with 8KHz sampling rate format (I use Goldwave). They are named using their corresponding VoiceID.wav
-
I update the fragFile.xml and F4Talk.csv files as needed (you only need to correct fragFile.xml if you are adding new frags, for example, adding brand new airbase names - airbase name FragIDs are called directly in the stations+ils.dat file in the …Falcon BMS 4.37\data\campaign folder but the frags and VoiceID/print info info needs to be put into the fragFile.xml and F4Talk.csv)
-
I paste my new .wav files into the decompressed “base” .wav files, replacing when prompted.
-
I compress the entire .wav set into a new Falcon.tlk file.
-
I create a new project in TlkToolGui using the new Falcon.tlk, F4Talk.csv, fragFile.bin and evalFile.bin files, and pray that it works correctly. If it does (HUZZAH!!) then I go test my new .wav files to see how they sound.
-
Then I make sure I have archived the old base soundfiles safely, copy the new soundfiles (Falcon.tlk, F4Talk.csv, fragFile.xml) into …Falcon BMS 4.37\data\Sounds, crank up Falcon, and test the new sounds in game.
It’s a long, complicated process, but it does work if you do it the right way. It takes a LOT of work. Adding new airbase names or changing callsigns requires additional changes to TTSEval.xml and to stations+ils.dat files and (for new callsigns) changes to the .strngs files using Mission Commander. Each new callsign requires 30 new cloned .wav files. Each new airbase name requires 6 new .wav files.
Please feel free to PM me or ask additional questions here.
Regards,
Tomcattwo
(VoiceClone) -
@Tomcattwo
WOW. Don’t know what more to say than… WOWThis was a unique and so impressive tutorial, guide, and a master class of how to use the right tools with the right knowledge to do the right job.
What I’m trying to do, compared with the amazing job that you are doing, is … a grain of sand in the desert.
For now I’ll stop these attempts to recompress independent .wav files until investigate or hope that the TlkTool or something similar will allow this recompression in the near future.
Replace some .wav files and recreate the entire “falcon.tlk” file from the thousands of previously uncompressed .wav files, to verify that they work, it involves hundreds and hundreds of disk read and write operations, and a lot of waiting time between tests.At least, there is a way to do it, which is very positive.
And users to thank enormously for sharing this information, which, as I say, I have been trying to obtain for years without positive results.
Thank’s for all all all this … WOW … really impressive. No words to say. Thank’s thank’s and … Thank’s.