Boo-AI — Master Artificial Intelligence by Building from Scratch

Introduction

Tool schemas define the contract between your agent and your tools. Strong validation catches errors early, provides helpful feedback to the LLM, and prevents security issues from malformed inputs.

Why Validation Matters: LLMs are good at generating valid tool calls, but not perfect. Validation catches mistakes before they cause problems and gives the LLM a chance to self-correct.

JSON Schema Basics

JSON Schema is the standard for defining tool parameter structures:

Basic Types

🐍basic_types.py

1# Fundamental JSON Schema types
2schema = {
3    "type": "object",
4    "properties": {
5        # String
6        "name": {
7            "type": "string",
8            "description": "User's full name"
9        },
10
11        # Number (integer or float)
12        "age": {
13            "type": "integer",
14            "description": "User's age in years"
15        },
16
17        "price": {
18            "type": "number",
19            "description": "Price with decimals"
20        },
21
22        # Boolean
23        "is_active": {
24            "type": "boolean",
25            "description": "Whether the account is active"
26        },
27
28        # Array
29        "tags": {
30            "type": "array",
31            "items": {"type": "string"},
32            "description": "List of tags"
33        },
34
35        # Object
36        "address": {
37            "type": "object",
38            "properties": {
39                "street": {"type": "string"},
40                "city": {"type": "string"},
41                "zip": {"type": "string"}
42            }
43        },
44
45        # Null
46        "deleted_at": {
47            "type": ["string", "null"],
48            "description": "Deletion timestamp or null if not deleted"
49        }
50    },
51    "required": ["name", "age"]
52}

String Constraints

🐍string_constraints.py

1{
2    "type": "object",
3    "properties": {
4        # Length constraints
5        "username": {
6            "type": "string",
7            "minLength": 3,
8            "maxLength": 20,
9            "description": "Username (3-20 characters)"
10        },
11
12        # Pattern (regex)
13        "email": {
14            "type": "string",
15            "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$",
16            "description": "Valid email address"
17        },
18
19        # Enum (fixed values)
20        "status": {
21            "type": "string",
22            "enum": ["pending", "active", "suspended", "deleted"],
23            "description": "Account status"
24        },
25
26        # Format (semantic validation)
27        "created_at": {
28            "type": "string",
29            "format": "date-time",
30            "description": "ISO 8601 timestamp"
31        },
32
33        "website": {
34            "type": "string",
35            "format": "uri",
36            "description": "Valid URL"
37        }
38    }
39}

Number Constraints

🐍number_constraints.py

1{
2    "type": "object",
3    "properties": {
4        # Range
5        "quantity": {
6            "type": "integer",
7            "minimum": 1,
8            "maximum": 1000,
9            "description": "Quantity (1-1000)"
10        },
11
12        # Exclusive range
13        "discount": {
14            "type": "number",
15            "exclusiveMinimum": 0,
16            "exclusiveMaximum": 1,
17            "description": "Discount rate (0 < x < 1)"
18        },
19
20        # Multiple of
21        "cents": {
22            "type": "integer",
23            "multipleOf": 5,
24            "description": "Amount in cents, rounded to 5"
25        }
26    }
27}

Array Constraints

🐍array_constraints.py

1{
2    "type": "object",
3    "properties": {
4        # Size limits
5        "recipients": {
6            "type": "array",
7            "items": {"type": "string", "format": "email"},
8            "minItems": 1,
9            "maxItems": 50,
10            "description": "Email recipients (1-50)"
11        },
12
13        # Unique items
14        "tags": {
15            "type": "array",
16            "items": {"type": "string"},
17            "uniqueItems": True,
18            "description": "Unique tags, no duplicates"
19        },
20
21        # Tuple validation
22        "coordinates": {
23            "type": "array",
24            "items": [
25                {"type": "number", "description": "Latitude"},
26                {"type": "number", "description": "Longitude"}
27            ],
28            "minItems": 2,
29            "maxItems": 2,
30            "description": "[latitude, longitude]"
31        }
32    }
33}

Advanced Schema Patterns

Conditional Requirements

🐍conditional_schemas.py

1# Using oneOf for mutually exclusive options
2{
3    "type": "object",
4    "properties": {
5        "notification_type": {
6            "type": "string",
7            "enum": ["email", "sms", "push"]
8        }
9    },
10    "oneOf": [
11        {
12            "properties": {
13                "notification_type": {"const": "email"},
14                "email_address": {"type": "string", "format": "email"}
15            },
16            "required": ["notification_type", "email_address"]
17        },
18        {
19            "properties": {
20                "notification_type": {"const": "sms"},
21                "phone_number": {"type": "string", "pattern": "^\+[1-9]\d{1,14}$"}
22            },
23            "required": ["notification_type", "phone_number"]
24        },
25        {
26            "properties": {
27                "notification_type": {"const": "push"},
28                "device_token": {"type": "string"}
29            },
30            "required": ["notification_type", "device_token"]
31        }
32    ]
33}

Nested Objects

🐍nested_objects.py

1{
2    "type": "object",
3    "properties": {
4        "order": {
5            "type": "object",
6            "properties": {
7                "id": {"type": "string"},
8                "customer": {
9                    "type": "object",
10                    "properties": {
11                        "name": {"type": "string"},
12                        "email": {"type": "string", "format": "email"}
13                    },
14                    "required": ["name", "email"]
15                },
16                "items": {
17                    "type": "array",
18                    "items": {
19                        "type": "object",
20                        "properties": {
21                            "product_id": {"type": "string"},
22                            "quantity": {"type": "integer", "minimum": 1},
23                            "price": {"type": "number", "minimum": 0}
24                        },
25                        "required": ["product_id", "quantity", "price"]
26                    },
27                    "minItems": 1
28                }
29            },
30            "required": ["id", "customer", "items"]
31        }
32    }
33}

References and Definitions

🐍schema_definitions.py

1# Define reusable schemas
2{
3    "$defs": {
4        "Address": {
5            "type": "object",
6            "properties": {
7                "street": {"type": "string"},
8                "city": {"type": "string"},
9                "state": {"type": "string"},
10                "zip": {"type": "string", "pattern": "^\d{5}(-\d{4})?$"},
11                "country": {"type": "string"}
12            },
13            "required": ["street", "city", "zip", "country"]
14        },
15        "ContactInfo": {
16            "type": "object",
17            "properties": {
18                "email": {"type": "string", "format": "email"},
19                "phone": {"type": "string"}
20            },
21            "required": ["email"]
22        }
23    },
24
25    "type": "object",
26    "properties": {
27        "shipping_address": {"$ref": "#/$defs/Address"},
28        "billing_address": {"$ref": "#/$defs/Address"},
29        "contact": {"$ref": "#/$defs/ContactInfo"}
30    }
31}

Implementing Validation

Here's how to validate tool calls in your agent:

🐍validation_implementation.py

1import jsonschema
2from jsonschema import Draft7Validator, ValidationError
3from dataclasses import dataclass
4from typing import Any
5
6@dataclass
7class ValidationResult:
8    """Result of parameter validation."""
9    is_valid: bool
10    errors: list[str]
11    sanitized_params: dict | None
12
13class ToolValidator:
14    """Validate tool parameters against schemas."""
15
16    def __init__(self):
17        self.validators: dict[str, Draft7Validator] = {}
18
19    def register_tool(self, name: str, schema: dict):
20        """Register a tool schema for validation."""
21        self.validators[name] = Draft7Validator(
22            schema,
23            format_checker=jsonschema.FormatChecker()
24        )
25
26    def validate(self, tool_name: str, params: dict) -> ValidationResult:
27        """Validate parameters for a tool."""
28        if tool_name not in self.validators:
29            return ValidationResult(
30                is_valid=False,
31                errors=[f"Unknown tool: {tool_name}"],
32                sanitized_params=None
33            )
34
35        validator = self.validators[tool_name]
36        errors = []
37
38        # Collect all validation errors
39        for error in validator.iter_errors(params):
40            errors.append(self._format_error(error))
41
42        if errors:
43            return ValidationResult(
44                is_valid=False,
45                errors=errors,
46                sanitized_params=None
47            )
48
49        # Sanitize and return
50        sanitized = self._sanitize_params(params, validator.schema)
51
52        return ValidationResult(
53            is_valid=True,
54            errors=[],
55            sanitized_params=sanitized
56        )
57
58    def _format_error(self, error: ValidationError) -> str:
59        """Format validation error for LLM understanding."""
60        path = " -> ".join(str(p) for p in error.absolute_path)
61        if path:
62            return f"Parameter '{path}': {error.message}"
63        return error.message
64
65    def _sanitize_params(self, params: dict, schema: dict) -> dict:
66        """Apply defaults and coerce types where possible."""
67        sanitized = {}
68        properties = schema.get("properties", {})
69
70        for key, prop_schema in properties.items():
71            if key in params:
72                sanitized[key] = params[key]
73            elif "default" in prop_schema:
74                sanitized[key] = prop_schema["default"]
75
76        return sanitized
77
78
79# Usage
80validator = ToolValidator()
81
82validator.register_tool("send_email", {
83    "type": "object",
84    "properties": {
85        "to": {"type": "string", "format": "email"},
86        "subject": {"type": "string", "minLength": 1, "maxLength": 200},
87        "body": {"type": "string"},
88        "priority": {"type": "string", "enum": ["low", "normal", "high"], "default": "normal"}
89    },
90    "required": ["to", "subject", "body"]
91})
92
93# Validate
94result = validator.validate("send_email", {
95    "to": "invalid-email",  # Bad
96    "subject": "",           # Too short
97    "body": "Hello"
98})
99
100print(result.is_valid)  # False
101print(result.errors)
102# ["Parameter 'to': 'invalid-email' is not a 'email'",
103#  "Parameter 'subject': '' is too short"]

Helpful Error Messages

Error messages should help the LLM fix the problem:

🐍helpful_errors.py

1class ToolErrorFormatter:
2    """Format validation errors for LLM understanding."""
3
4    def format_for_llm(self, tool_name: str, errors: list[str]) -> str:
5        """Create an error message the LLM can act on."""
6
7        message = f"""Tool call to '{tool_name}' failed validation.
8
9ERRORS:
10{chr(10).join(f"- {e}" for e in errors)}
11
12Please fix these issues and try again. Make sure:
131. All required parameters are provided
142. Values match the expected types and formats
153. Strings match any required patterns or enums
164. Numbers are within the specified range
17"""
18        return message.strip()
19
20    def format_with_schema_hint(
21        self,
22        tool_name: str,
23        errors: list[str],
24        schema: dict
25    ) -> str:
26        """Include schema hints to help LLM correct errors."""
27
28        properties = schema.get("properties", {})
29        required = schema.get("required", [])
30
31        hints = []
32        for prop, spec in properties.items():
33            hint = f"  - {prop}"
34            if prop in required:
35                hint += " (required)"
36            if "type" in spec:
37                hint += f": {spec['type']}"
38            if "enum" in spec:
39                hint += f", one of {spec['enum']}"
40            if "format" in spec:
41                hint += f", format: {spec['format']}"
42            hints.append(hint)
43
44        return f"""Tool call to '{tool_name}' failed:
45
46ERRORS:
47{chr(10).join(f"- {e}" for e in errors)}
48
49EXPECTED PARAMETERS:
50{chr(10).join(hints)}
51
52Please correct the parameters and try again."""
53
54
55# Example usage in agent loop
56async def execute_tool_call(tool_call, validator, formatter):
57    """Execute tool call with validation."""
58
59    result = validator.validate(tool_call.name, tool_call.arguments)
60
61    if not result.is_valid:
62        # Return error as observation for LLM to learn from
63        return {
64            "success": False,
65            "error": formatter.format_for_llm(tool_call.name, result.errors)
66        }
67
68    # Execute with validated params
69    return await tools[tool_call.name](**result.sanitized_params)

Errors as Learning

When validation fails, return the error as an observation. LLMs can often self-correct when given clear error messages.

Pydantic Integration

Pydantic provides a cleaner way to define and validate tool schemas:

🐍pydantic_tools.py

1from pydantic import BaseModel, Field, EmailStr, validator
2from typing import Literal
3from enum import Enum
4
5class Priority(str, Enum):
6    LOW = "low"
7    NORMAL = "normal"
8    HIGH = "high"
9
10class SendEmailParams(BaseModel):
11    """Parameters for send_email tool."""
12
13    to: EmailStr = Field(
14        description="Recipient email address"
15    )
16    subject: str = Field(
17        min_length=1,
18        max_length=200,
19        description="Email subject line"
20    )
21    body: str = Field(
22        description="Email body content"
23    )
24    priority: Priority = Field(
25        default=Priority.NORMAL,
26        description="Email priority level"
27    )
28    cc: list[EmailStr] = Field(
29        default_factory=list,
30        max_length=10,
31        description="CC recipients (max 10)"
32    )
33
34    @validator("subject")
35    def subject_not_empty(cls, v):
36        if not v.strip():
37            raise ValueError("Subject cannot be empty or whitespace only")
38        return v.strip()
39
40    class Config:
41        use_enum_values = True
42
43
44class FileSearchParams(BaseModel):
45    """Parameters for file_search tool."""
46
47    query: str = Field(
48        min_length=1,
49        description="Search query"
50    )
51    path: str = Field(
52        default=".",
53        description="Directory to search in"
54    )
55    file_pattern: str = Field(
56        default="*",
57        description="Glob pattern for files"
58    )
59    max_results: int = Field(
60        default=20,
61        ge=1,
62        le=100,
63        description="Maximum results (1-100)"
64    )
65    case_sensitive: bool = Field(
66        default=False,
67        description="Case-sensitive search"
68    )
69
70
71def create_tool_from_model(name: str, model: type[BaseModel], fn):
72    """Create a tool definition from a Pydantic model."""
73
74    schema = model.model_json_schema()
75
76    return {
77        "name": name,
78        "description": model.__doc__ or f"Execute {name}",
79        "input_schema": {
80            "type": "object",
81            "properties": schema.get("properties", {}),
82            "required": schema.get("required", [])
83        },
84        "function": fn,
85        "validator": model
86    }
87
88
89# Validation with Pydantic
90async def execute_with_pydantic(tool_call, tools_registry):
91    """Execute tool call with Pydantic validation."""
92
93    tool = tools_registry[tool_call.name]
94
95    try:
96        # Pydantic validates and coerces
97        validated = tool["validator"](**tool_call.arguments)
98
99        # Execute with validated params
100        result = await tool["function"](**validated.model_dump())
101        return {"success": True, "result": result}
102
103    except ValidationError as e:
104        return {
105            "success": False,
106            "error": format_pydantic_errors(e)
107        }
108
109
110def format_pydantic_errors(error) -> str:
111    """Format Pydantic validation errors for LLM."""
112    messages = []
113    for e in error.errors():
114        loc = " -> ".join(str(l) for l in e["loc"])
115        messages.append(f"- {loc}: {e['msg']}")
116
117    return f"""Validation failed:
118{chr(10).join(messages)}
119
120Please correct these parameters and try again."""

Auto-generating JSON Schema

🐍auto_schema.py

1# Pydantic automatically generates JSON Schema
2schema = SendEmailParams.model_json_schema()
3print(json.dumps(schema, indent=2))
4
5# Output:
6# {
7#   "properties": {
8#     "to": {
9#       "format": "email",
10#       "title": "To",
11#       "type": "string",
12#       "description": "Recipient email address"
13#     },
14#     "subject": {
15#       "maxLength": 200,
16#       "minLength": 1,
17#       "title": "Subject",
18#       "type": "string",
19#       "description": "Email subject line"
20#     },
21#     ...
22#   },
23#   "required": ["to", "subject", "body"],
24#   "title": "SendEmailParams",
25#   "type": "object"
26# }

Summary

Tool schemas and validation:

JSON Schema: The standard for defining tool parameter structures
Type constraints: String patterns, number ranges, array limits
Advanced patterns: Conditional requirements, nested objects, references
Validation: Catch errors early, before tool execution
Helpful errors: Format errors so LLMs can self-correct
Pydantic: Cleaner Python-native approach to schemas

Next: Let's implement a complete tool execution system with proper error handling.