Introduction
Large language models were designed to generate helpful, coherent text. In the wrong hands, that same capability becomes a powerful weapon. Attackers have adapted LLMs to produce phishing emails, malware code, and social engineering scripts at a scale and quality that was previously impossible.
This section examines the malicious LLM ecosystem—from purpose-built tools like WormGPT and FraudGPT to the fine-tuning of open-source models on phishing corpora—and explores how prompt injection itself has become a novel social engineering technique.
WormGPT and FraudGPT
WormGPT emerged in mid-2023 as one of the first publicly marketed malicious LLMs. Built on the GPT-J open-source model and fine-tuned on malware-related data, WormGPT was specifically designed to assist with cybercriminal activities without the safety guardrails present in commercial models.
FraudGPT followed shortly after, advertised on dark web forums as an "all-in-one" tool for crafting spear phishing emails, creating cracking tools, and generating scam landing pages. Both tools were sold as subscription services, mirroring legitimate SaaS business models.
- WormGPT: Based on GPT-J, trained on malware datasets, no content restrictions
- FraudGPT: Marketed for phishing, carding, and social engineering at $200/month
- DarkBART: A later variant claiming integration with dark web data sources
- PoisonGPT: Research demonstration of a model with poisoned factual knowledge
Important Context: While some of these tools were later revealed to be overhyped or even scams themselves, they represent a genuine trend. The real danger lies in the accessibility of open-source models that anyone can fine-tune without restrictions.
The commercialization of malicious AI has created a new class of threat actor: individuals with limited technical skills who can now produce sophisticated attacks by simply subscribing to a service and providing target information.
Fine-Tuning Open-Source LLMs
Perhaps more concerning than purpose-built malicious models is the ease with which any open-source LLM can be fine-tuned on phishing corpora. Models like LLaMA, Mistral, and Falcon can be adapted in a matter of hours using consumer-grade hardware.
The fine-tuning process typically involves collecting successful phishing emails, stripping safety training through techniques like DPO (Direct Preference Optimization) reversal, and training the model to generate content that mimics the style and persuasion patterns of effective social engineering campaigns.
The Fine-Tuning Pipeline
- Data Collection: Gather successful phishing templates, BEC samples, and social engineering scripts from underground forums
- Safety Removal: Use uncensored base models or apply fine-tuning that overrides safety alignment
- Domain Specialization: Train on industry-specific jargon, company naming conventions, and regional language patterns
- Evaluation: Test generated outputs against spam filters and human reviewers to optimize bypass rates
The democratization of this capability means that defenders can no longer rely on the assumption that sophisticated phishing requires sophisticated attackers. Any individual with basic ML knowledge and access to a GPU can produce enterprise-grade social engineering content.
Prompt Injection as Social Engineering
A fascinating convergence has emerged between prompt injection attacks on AI systems and traditional social engineering. Attackers are now embedding prompt injection payloads in emails, documents, and web pages that are likely to be processed by AI assistants.
When a victim's AI email assistant processes a message containing hidden prompt injection instructions, the assistant may be manipulated into summarizing the email favorably, forwarding sensitive information, or even drafting replies that further the attacker's goals.
- Invisible instructions: White text on white backgrounds or zero-width characters containing prompts
- Document-embedded payloads: PDF or Word documents with hidden instructions targeting AI summarization tools
- Chain attacks: Using prompt injection to make one AI system generate content that social-engineers a human
Emerging Threat: As organizations increasingly deploy AI assistants to triage emails and summarize documents, the attack surface for prompt injection social engineering grows exponentially. Defenders must consider AI assistants as potential attack amplifiers, not just passive tools.