Voice Acting and Text to Speech in Games

When I talk about making video game stories more accessible on this channel, one of the most common aspects I will discuss is subtitles and closed captioning. For deaf and hard of hearing players, subtitles and closed captions can be an invaluable tool for being able to follow along with story information primarily delivered as audio, such as spoken dialogue, or in-game sound effects critical to understanding the story.

However, one aspect I don’t talk about nearly as often is the inverse, making sure that primarily visual storytelling methods such as on screen dialogue text are presented to players as audio. I’m not going to touch too much today on the idea of audio descriptions, I’ve got a full video on this channel on that topic, but I will be focusing today on the importance of reading on screen text to players verbally.

So today, on Access-Ability, we’re going to be talking about the accessibility importance of voice acting, and text to speech, as accessibility support. We’re going to discuss the groups of users that are helped when on screen text is read out loud, how automated systems can help make games quickly more accessible on a budget, and the ways in which dedicated voice acting provides better results for the end user.

While support for opaque backgrounds, custom fonts, and custom text sizes can make text easier to read for blind or partially sighted players, there is a limit to the degree to which they can help. There is only so much room on screen, and for some users it’s simply not going to be enough to make text reliably and easily legible.

There are blind and partially sighted players who, for example, are able to play a game like Ratchet and Clank: Rift Apart in high contrast mode due to the game’s support for high contrast mode visuals and lock on aim assistance, but might struggle to read menu or UI text. Seeing big blocks of colour that are colour coded can make gameplay accessible, while on screen text remains difficult to read.

Ratchet and Clank: Rift Apart does feature full voice acting, so blind and partially sighted players can follow along with the plot without needing to read subtitles, but if that voice acting support was not there the game’s plot would be more difficult for many to follow.

Much of the same can be said for UI and menu access in video games. The Last of Us 2 is another title where high contrast mode and other accessibility features make the game more playable for partially sighted and sightless blind players, and it additionally features audio assistance for players while navigating text based menus. An automated text to speech voice is active as default when you first boot up the game, reading out any menu or in-game text that is either activated in world, or moved over in a menu. The game’s text to speech isn’t perfect, but it decently approximates the right intonation, and makes text based information available to players who may struggle to read it quickly or easily.

However, blind and partially sighted players are not the only people who see a benefit when on screen text is delivered additionally as audio.

For many players with ADHD, a combination of audio and text delivery in combination can be easier to follow, and not get distracted from, than plain text alone. By engaging more than one sense at a time, the brain is less likely to get distracted looking for external sensory input.

For players with dyslexia, less confident readers, or those for whom a game is presented in a non native language, reading text reliably and quickly can be difficult. While options such as dyslexia friendly alternative fonts can help a little, for all the above groups audio delivery alongside text can help make sure the words are read correctly, and improve the chances of following along with the plot.

Additionally, for users who struggle with non verbal subtext, such as autistic gamers, dedicated voice acting can provide more clues at intended tone and meaning than text alone. Being able to hear if a character sounds sad or angry can help to make the narrative more easily understood.

Now, I am aware that dedicated voice acting performances have a financial cost associated. I know a lot of indie developers in particular will cite cost as a barrier to voice acting inclusion, alongside the fact that in many cases no voice acting is perceived as better than poor quality or cheap voice acting by a wider audience. So, let’s talk a little bit first about the pros and cons of automated text to speech solutions.

Text to Speech automation is a very useful tool for conveying non narrative text to a player as audio, and is certainly better than offering no narrative text to speech support, with some caveats to keep in mind.

Text to Speech automation lacks personality of delivery, which will mean it is a lot less emotive in narrative delivery than a human voice actor might be. Think of this similarly to the difference between subtitles and sign language interpreter support: one conveys the words as written, but the other delivers more meaning along with the words themselves.

Automation also requires oversight, as if you simply set up an automated system and leave it unchecked I guarantee you’ll find places where it wildly mispronounces uncommon or custom words from your narrative. It’s not something to set and forget, you need to properly test its implementation and take the time to ensure everything is correctly delivered as audio to the player.

That said, this is likely a cheaper option for developers, and as long as it’s properly labelled will be available to those who need it, while not mistaken for dedicated voice acting by the average player.

Proper voice acting obviously gives a lot more active information to the player than automated text to speech systems, but it does require you to know all your text in advance. Things like custom player names, or custom game mode names in user creation lobbies will by necessity never be able to be recorded by a dedicated human, and in those cases text to speech support is more likely to be applicable.

While most AAA game developers are moving towards having most of their video games released with full voice acting, and increasing numbers of developers are supporting text to speech menu narration on a per game basis, one publisher and console manufacturer falling behind in this regard is Nintendo. The vast majority of their first party franchises still contain little to no voice acting, and they’re the only console manufacturer of the main three to have not yet implemented any form of system level text to speech menu support. I see a lot of players calling for voice acting to come to major series for personal and stylistic reasons, accessibility is a major area where Nintendo is falling behind with regards to their insistence on only delivering stories as on screen text.

I know that not every game stylistically wants to support voice acting, and not every developer has the budget to include it, but voice acting and text to speech support appreciably make video game text more accessible to disabled players, and we should really be pushing for its industry standardisation the same way we do for subtitles and closed captions.

Text isn’t always easy to read, and focus on. At the very least, in a future where text to speech automation solutions are common, we should be making sure that as much text as possible in games is able to be presented as audio, even if that audio is not delivered by a human.