Voice cloning and social engineering

WRITTEN BY
Adaptive Security
Whitepaper
5 min read
Download article
Download PDF
July 8, 2024

Text-to-speech (TTS) technology has made unprecedented leaps in recent years. Microsoft's VALL-E 2 model recently achieved human parity with just 3 seconds of audio from an unseen speaker. This is a significant milestone. While Microsoft has no plans to release this model publicly, it offers a glimpse into an imminent future where AI voice cloning is remarkably accurate and accessible with extremely limited sample sizes.

Despite the exciting potential applications of this technology in fields like accessibility and entertainment, there are serious security implications that most organizations are woefully unprepared to address. The sobering reality is that any public-facing executive with even a few seconds of audio available online – from a podcast, interview, or conference – is now a potential target for voice cloning attacks.

[Hold for potential voice clone of Brian]

Recent advancements have dramatically reduced the cost and time required to train TTS models, making high-quality audio spoofing tools increasingly accessible to malicious actors. This democratization of technology, while beneficial in many ways, has opened up new avenues for sophisticated cyberattacks.

Recent Voice Cloning Attacks

Several high-profile attacks have already leveraged this technology, demonstrating its potential for significant harm:

  1. One of the world's largest advertising firms, WPP, narrowly avoided a major financial loss when attackers used a deepfake of the CEO's voice in a Microsoft Teams meeting. The scammer attempted to set up a new business venture to solicit money and personal details.
  2. Popular password management company LastPass faced a similar threat when a deepfake of CEO Karim Toubba's voice was sent to an employee's WhatsApp account.
  3. The Wall Street Journal reported an alarming 700% increase in deepfake attacks on financial technology firms over the past year.
  4. According to McAfee, more than one-quarter of all adults have experienced some form of AI voice scam, with 77% of these victims suffering financial losses.

Why Executive Voice Scams Are Dangerous

Several factors contribute to the heightened risk of executive voice scams:

  • Available contact information: Employee mobile phone numbers and other personal details are often readily available online, sometimes at minimal cost. There is a plethora of free or low cost tools for enriching data. If an attacker can get their hands on your company’s email convention (e.g., first_initial_last_name@company.com), it is very likely they can get 30% or more of your employee’s mobile numbers as well.
  • Advanced open source intelligence (OSINT) tools: Cybercriminals are leveraging new technologies and AI-powered web scrapers to rapidly collect information about their targets. This open source intelligence, which might consist of social media engagement, LinkedIn posts, or recent interviews given by an executive at your organization, can then be leveraged to create highly personalized spear phishing campaigns.
  • Mobile device vulnerabilities: Many businesses struggle to implement proper security controls on mobile devices due to BYOD policies, privacy concerns, and employee resistance. Because of this, the mobile surface is relatively unprotected compared to traditional business email. However, with generative AI and tools like FraudGPT and the recent growth in smishing and phishing, the mobile surface can be readily exploited.
  • High-value targets: Executives and their close contacts are prime targets due to their access to sensitive information and ability to authorize significant actions.

$25M Deepfake - Hong Kong Attack

While AI is still an emerging technology, there are already public examples of deepfake audio and video scams resulting in enormous financial harm.

A stark example of the potential damage is the Hong Kong case where an employee transferred $25 million after receiving a fake video call from someone impersonating his boss. 

The attack unfolded when an employee received what appeared to be a legitimate video call from the company's chief financial officer (CFO). Unbeknownst to the employee, the "CFO" on the call was actually a highly convincing AI-generated deepfake. The cybercriminals had used advanced AI technology to create a realistic video and audio representation of the executive, complete with familiar mannerisms and voice patterns.

During the call, the fake CFO instructed the employee to transfer $25 million, citing a confidential acquisition as the reason for the urgent transaction. Convinced by the lifelike impersonation, the employee complied with the request, unknowingly transferring millions into the hands of cybercriminals.

This incident serves as a chilling reminder of how AI is reshaping the landscape of cyber threats. Traditional phishing attacks relied on email or text-based deception, but this case illustrates a new frontier where visual and auditory cues can be manipulated to devastating effect. It highlights the urgent need for organizations to adapt their security protocols and training programs to address these emerging AI-enhanced threats.

This incident underscores the urgent need for enhanced awareness and robust security measures to counter these sophisticated attacks.

The Training Gap: Why Current Solutions Fall Short

Despite the escalating frequency and severity of these threats, most cybersecurity training solutions remain outdated and ineffective:

Lack of Engagement: Traditional training methods are often easily ignored or quickly forgotten by employees.

Not AI-Focused: Many training programs have not been updated to address the specific challenges posed by AI-powered attacks.

Generic Content: Boilerplate email templates and generic scenarios fail to prepare employees for the nuanced and highly convincing nature of modern AI scams.

Absence of Personalization: Training often lacks customization to a company's specific security protocols and executive personas, leaving employees ill-prepared for targeted attacks.

The Adaptive Solution

To address these critical shortcomings, Adaptive offers a revolutionary approach to cybersecurity training:

  • Customized Simulations: We create training sessions using your executives' actual voices and likenesses to simulate potential threats, providing a hyper-realistic experience.
  • AI-First Approach: Our training is built from the ground up to address the unique challenges posed by AI-powered attacks.
  • Engaging Content: We use interactive scenarios and gamification to ensure high engagement and retention rates.
  • Continuously Updated: Our platform evolves with the threat landscape, ensuring your team is always prepared for the latest attack vectors.

Don't let your organization become the next victim of an AI voice cloning attack. Contact Adaptive today for a personalized demo and take the first step towards truly effective, modern cybersecurity training.

WRITTEN BY
Adaptive Security
Blog
5 min read
Download article
Download PDF
Subscribe to newsletter

Get your team ready for Generative AI

Schedule your demo today