The phonetic features and pronunciation habits contained in people’s speaking process are almost unique. Even if they are imitated, they can not change the speaker’s most essential pronunciation and vocal tract characteristics. Voiceprint recognition is the process of identifying the speaker corresponding to the speech according to the voiceprint characteristics of the speech to be recognized.
At the end of 2018, the wechat Security Center announced that someone used “voice cloning” to defraud: Ms. Zhao received a message from her father on wechat and asked Ms. Zhao to transfer 200 yuan to wechat. Ms. Zhao asked in voice, “Dad, is that you?” Soon received a reply from the other party. As soon as Ms. Zhao heard her father’s voice, she turned the money around. But this is a scam. After verification, the voice sent by the fraudster is not the voice of Ms. Zhao’s father, but the voice of “cloning”.
According to media reports, voice cloning technology is no longer a new thing. Some systems can simulate the speaker’s voice with only one and a half hours of data.
At geekpwn 2017 international security geek competition, the players simulated their voiceprint characteristics according to the voice samples provided by the hero dubber in the game king’s glory, synthesized a “attack” voice, attacked four devices with voiceprint recognition function provided on site, cheated and passed the verification of “voiceprint lock”.
In January this year, a funny but worried news came from the Internet: “Google reCAPTCHA system has been cracked, and the accuracy of machine voice verification is as high as 85%”.
The working principle of Google reCAPTCHA system is to generate artificial voice of different letters and numbers through artificial intelligence technology, and design various types of voice lines such as different age, gender and speaking speed to read aloud respectively. Netizens recognize the initials and numbers read aloud by listening to the artificial voice, fill them in the web page and submit them for verification. This mechanism is used to ensure that web page operations are performed by real people rather than automatically by programs.
The mechanism looks impeccable, but the cracking method is also exquisite, that is, it also uses artificial intelligence technology for speech recognition (including using Google’s own speech recognition service), and then converts the spoken speech into character text, which is input into the web page for submission and verification for web page operation. This is really “lifting a stone and hitting yourself in the foot”.
This case shows that AI can identify various behaviors and information in real life and transform them into digital eigenvalue data form; It can also restore the existing digital eigenvalue data into real-life behavior and information.
With the progress of algorithms and the improvement of computing performance, the more the restoration effect of virtual information can be confused with real information, and the closer the virtual restoration information is to real information. It can be seen that voiceprint recognition has great loopholes.
Of course, this does not mean that voiceprint recognition has no prospect. It is understood that existing institutions are developing countermeasures to recognize “speech cloning”. Nuance communications, a voice control software developer, is studying algorithms to detect small frequency jumps at the connection points between voice segments. Adobe said that the voco voice cloning software it is developing also allows digital watermarks to be added to synthetic speech. Such sophisticated techniques may help computers identify suspicious speech.