I’m not a BMS developer, but have recently been in touch with @Tomcattwo, @Boxer, @Micro_440th and @Mav-jp about implementing voice synthesis. Additionally, to further improve the AI voices, it might be an option to have them go through the same sound processing that’s done to voice chat at the moment. But that is lower priority.
After discussing it with @Boxer, we split the project into phases:
Generate convincing new frags using text-to-speech (TTS). This way we could expand the possibilities of AI communication without any new recordings or code changes. If step 1 results in a usable voice model, I could use it to evaluate the computation needed for realtime TTS. Hopefully, this will give us enough information to determine how TTS might affect the sim and rendering processing. If performance is affected too much, realtime TTS might not be a good fit for BMS. Somewhere parallel to all this, we might still consider trying to route the AI voices through the signal processing in IVC.I’m lucky enough to have access to a GPU cluster at my work. Unfortunately, the GPUs aren’t always available because researchers tend to use them for actual science. 😉 But in between their work I’ve been training a multispeaker voice model based on the existing BMS frags. I think the result can still be improved, but I’ll try to share some progress soon. The good news is that if it works, it will immediately work for every existing voice in BMS.