Today Bee Audio’s new CEO Roy Forbes sent a disturbing email to all members on their production roster. He gleefully wrote about working with 2 different international companies “who share our vision and enthusiasm” to help develop AI Text-to-Speech (TTS) for audiobooks.
I say he wrote “gleefully” because, as a professional audiobook narrator, I am skilled at analyzing an author’s printed words and interpreting the subtext, the underlying meaning.
One doesn’t have to be a professional narrator to also understand that such an email is completely tone-deaf and insulting to the workforce. We do not hold any “vision and enthusiasm” for technology meant to replace us!
Artificial intelligence can’t detect the subtext in a sentence, much less over the trajectory of an entire book. I can’t believe it ever would be that good.
One’s voice conveys the essence of being human. Nothing expresses our thoughts, feelings, and emotions better than the human voice. Words on a page can fall flat and be interpreted in different ways, where a speaker makes their views known in multiple ways like volume, pitch, tone, and pauses.
If someone only wanted to have a story read to them, they could use the text-to-speech capabilities on their computer or e-reader. Sure, the inflections and pronunciations are often wrong, and the entire reading lacks any kind overall understanding of the material needed to guide the listener.
Of course, visually impaired people benefit from TTS and understand its fallacies. Paying consumers require much more,
People buy audiobooks because they want to be entertained, informed, and inspired. An audiobook is a performance art based on the narrator’s interpretation of the author’s words. We do far more than simply read the words on the page!
Before I ever walk in to the booth to record an audiobook, I’ve carefully prepared for the moment:
- I read the entire book.
- In a fiction book, I note all of the characters’ quirks and descriptions so that I can develop a convincing voice for each character and present them as real people in real circumstances, not some cartoon.
- In non-fiction books, I research the author and the content of the book so that I understand the message to be conveyed.
- I’ve done copious research on correct pronunciations. Anyone who has ever heard a GPS mispronounce the name of their town will be annoyed to have a computer voice mispronounce things in an audiobook. Mispronunciations take the listener out of the story.
Once I’m recording the book, I’m careful to distinguish voices among the many characters, especially when they converse in the same scene. The listener always needs to know who is speaking. Whether fiction or non-fiction, I must make organic acting decisions that help realize the author’s intent.
My experiences and knowledge shape every word that I utter and breathe LIFE into printed words.
I know that the narrator is the greatest cost in audiobook production, and companies perpetually look for ways to cut expenses. However, taking steps to remove the human voice and replace it with a synthesized one destroys the art form.
Technology is ideal for robots to replace humans in soul-sucking jobs like installing computer chips on a circuit board. It will never replace a human’s ability to convey emotion.
Since Bee Audio is actively participating in the development of TTS to replace narrators in audiobooks, I cannot in good conscience stay on their roster. I also encourage other narrators to avoid this company unless its “vision and enthusiasm” changes in favor of pro narrators.
Perhaps Bee’s TTS applications will be voicing their audiobooks even sooner than they realized.