Boo-AI — Master Artificial Intelligence by Building from Scratch

Introduction

Threat hunting is the proactive search for adversaries that have evaded automated detection systems. Unlike reactive alert investigation, threat hunting assumes that the organization is already compromised and actively seeks evidence of attacker presence. AI assistance transforms threat hunting from an art practiced by a handful of elite analysts into a scalable, systematic discipline.

This section explores how AI enhances each phase of the threat hunting workflow: from hypothesis generation to query construction, data analysis, and attack path discovery.

Proactive vs Reactive Hunting

Reactive security waits for alerts to trigger investigation. Proactive threat hunting initiates investigation without an alert, driven by intelligence about emerging threats, anomalies discovered during routine analysis, or systematic testing of detection coverage gaps.

The distinction matters because sophisticated adversaries specifically design their operations to avoid triggering existing detection rules. APT groups study their targets' security tools and tailor their techniques to fly under the radar. Only proactive hunting that looks beyond existing detections can find these well-crafted intrusions.

AI enables proactive hunting at scale by analyzing vast datasets for subtle anomalies that human hunters would not have time to examine. ML models can continuously scan network telemetry, endpoint logs, and identity data for patterns consistent with known attack techniques, surfacing leads for human analysts to investigate.

Hunting Maturity Model: Organizations progress through five levels of hunting maturity, from Level 0 (entirely reactive, relying solely on automated alerts) to Level 4 (automated hunting with AI-generated hypotheses and continuous coverage). Most organizations today operate at Level 1 or 2, with AI assistance enabling rapid advancement to higher maturity levels.

Hypothesis-Driven Hunting with ATT&CK

The MITRE ATT&CK framework provides a structured taxonomy of adversary tactics, techniques, and procedures (TTPs) that serves as the foundation for hypothesis-driven hunting. Each ATT&CK technique describes a specific method attackers use, along with detection strategies and data sources needed for identification.

A hypothesis-driven hunt begins with selecting an ATT&CK technique (for example, T1053 "Scheduled Task/Job") and formulating a testable hypothesis: "An attacker may have established persistence via scheduled tasks created outside of change management windows." The hunter then queries available data sources to validate or refute this hypothesis.

AI assists hypothesis generation by analyzing threat intelligence feeds and mapping reported adversary TTPs to the organization's technology stack and industry vertical. If a threat actor known to target financial institutions has been observed using a specific ATT&CK technique, the AI system automatically generates and prioritizes hunting hypotheses for that technique.

Hypothesis generation: AI maps current threat intel to relevant ATT&CK techniques
Data identification: Automated mapping of required data sources for each technique
Coverage analysis: AI identifies gaps in detection coverage across the ATT&CK matrix
Hunt prioritization: Risk-based ranking of hunting hypotheses by likelihood and impact

LLMs for Query Generation

One of the most practical applications of LLMs in threat hunting is translating natural language hypotheses into SIEM query languages. An analyst can describe what they are looking for in plain English, and the LLM generates the corresponding KQL (Kusto Query Language) for Microsoft Sentinel or SPL (Search Processing Language) for Splunk.

This capability dramatically lowers the barrier to entry for threat hunting. Instead of requiring deep expertise in query languages and schema knowledge, analysts can express their investigative intent naturally and iterate on the generated queries. The LLM can also suggest additional data sources and enrichment steps that the analyst may not have considered.

📝text

1# Example: Natural language to KQL translation
2
3Analyst: "Show me all PowerShell processes that downloaded
4files from external IPs in the last 7 days"
5
6Generated KQL:
7DeviceProcessEvents
8| where Timestamp > ago(7d)
9| where FileName == "powershell.exe"
10| where ProcessCommandLine has_any (
11    "Invoke-WebRequest", "wget", "curl",
12    "Net.WebClient", "DownloadFile", "DownloadString"
13)
14| join kind=inner (
15    DeviceNetworkEvents
16    | where RemoteIPType == "Public"
17) on DeviceId, InitiatingProcessId
18| project Timestamp, DeviceName, ProcessCommandLine,
19    RemoteIP, RemotePort, RemoteUrl

Graph Analysis for Attack Path Discovery

Graph databases like Neo4j and security-specific tools like BloodHound represent relationships between entities (users, computers, groups, permissions) as graph structures. Graph analysis reveals attack paths that are invisible in tabular data—chains of relationships that an attacker could exploit to traverse from an initial foothold to high-value targets.

BloodHound, originally designed for Active Directory attack path analysis, maps trust relationships, group memberships, session data, and administrative privileges into a graph. Shortest-path algorithms then identify the minimum number of hops an attacker needs to reach domain administrator from any compromised account.

AI-enhanced graph analysis extends beyond simple path-finding. GNN models can identify anomalous relationships (users with unexpected privileges), predict likely attack paths based on historical intrusion data, and recommend the highest-impact remediation actions to eliminate critical attack paths.

Ingest Active Directory, cloud IAM, and network topology data into a graph database
Map trust relationships, permissions, and session data as graph edges
Run shortest-path algorithms to identify critical attack paths to high-value targets
Apply GNN models to score nodes by compromise risk and remediation priority
Continuously monitor for new attack paths introduced by configuration changes