While every sample they provide is suspiciously similar to the human version,(indicating overtraining, either on the samples or on a single voice), where I would have expected a different if still human quality voice from a fully functional system, this tech is coming, and soon. And when it does, voice acting will no longer prevent videogames from having complex stories, and we will find out if the industry is still capable of making them. Looking forward to it :)
One of the samples even has a breath intake at the same point as the recording. Not sure how they do that (didn’t read the paper) but I first thought it was the recording and compared the two to find the breathing wasn’t quite as natural as an actual person with lungs.