Boo-AI — Master Artificial Intelligence by Building from Scratch

Introduction

Gemini offers "thinking" model variants that can reason for extended periods before responding. Unlike standard models that generate tokens immediately, thinking models invest compute time in internal reasoning - similar to OpenAI's o3.

The Thinking Trade-off: More thinking time means better answers on complex problems, but slower responses and higher costs. The key is matching reasoning depth to task complexity.

Gemini Thinking Models

Gemini offers thinking variants with controllable reasoning:

Model	Max Output	Thinking Style	Best For
gemini-2.0-flash-thinking	32K tokens	Visible thinking	Complex reasoning, debugging
gemini-2.5-pro	8K tokens	Standard	General tasks
gemini-2.5-flash	8K tokens	Fast	High-volume, simple tasks

Thinking Model Output

🐍thinking_model.py

1import google.generativeai as genai
2
3# Use thinking model
4model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-01-21")
5
6response = model.generate_content(
7    "Solve this step by step: A train leaves Station A at 9:00 AM "
8    "traveling at 60 mph. Another train leaves Station B at 10:00 AM "
9    "traveling at 80 mph toward Station A. If the stations are 280 miles "
10    "apart, when and where do the trains meet?"
11)
12
13# Thinking models show their reasoning
14for part in response.parts:
15    if hasattr(part, "thought") and part.thought:
16        print("THINKING:")
17        print(part.text)
18    else:
19        print("\nANSWER:")
20        print(part.text)

Visible vs Hidden Thinking

📝thinking_comparison.txt

1Standard Model Response:
2"The trains meet at 11:30 AM, 150 miles from Station A."
3
4Thinking Model Response:
5[THINKING]
6Let me work through this step by step.
7
8First, let me set up the problem:
9- Train A: leaves 9:00 AM, speed 60 mph
10- Train B: leaves 10:00 AM, speed 80 mph
11- Distance between stations: 280 miles
12
13When Train B starts at 10:00 AM, Train A has already traveled:
14- 1 hour × 60 mph = 60 miles
15- Remaining distance: 280 - 60 = 220 miles
16
17Now both trains are moving toward each other:
18- Combined speed: 60 + 80 = 140 mph
19- Time to meet: 220 miles ÷ 140 mph = 1.57 hours ≈ 1 hour 34 minutes
20
21From Train B's departure (10:00 AM):
22- Meeting time: 10:00 AM + 1:34 = 11:34 AM
23
24Position from Station A:
25- Train A traveled: 2.57 hours × 60 mph = 154 miles
26
27Let me verify: Train B traveled 1.57 hours × 80 mph = 126 miles
28Station A to meeting point: 154 miles
29Station B to meeting point: 126 miles
30Total: 154 + 126 = 280 miles ✓
31
32[ANSWER]
33The trains meet at approximately 11:34 AM, about 154 miles from Station A.

Thinking Budget Control

You can control how much the model thinks:

🐍budget_control.py

1import google.generativeai as genai
2
3# Configure thinking budget
4model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-01-21")
5
6# Control thinking via generation config
7def generate_with_budget(prompt: str, thinking_budget: str) -> str:
8    """Generate with specified thinking budget.
9
10    thinking_budget options:
11    - 'none': No extended thinking
12    - 'low': Quick thinking (~1-2s)
13    - 'medium': Moderate thinking (~5-10s)
14    - 'high': Deep thinking (~30s+)
15    """
16
17    # Thinking budget is controlled via system instruction
18    system_prompts = {
19        "none": "Respond directly without extended reasoning.",
20        "low": "Think briefly before responding.",
21        "medium": "Think through the problem carefully before responding.",
22        "high": "Think very deeply about all aspects before responding. "
23                "Consider multiple approaches and verify your reasoning.",
24    }
25
26    response = model.generate_content([
27        system_prompts[thinking_budget],
28        prompt,
29    ])
30
31    return response.text
32
33
34# Usage examples
35# Quick response for simple task
36quick = generate_with_budget("What is 2 + 2?", "none")
37
38# Deep thinking for complex task
39deep = generate_with_budget(
40    "Design a distributed caching system for a global e-commerce platform.",
41    "high",
42)

Dynamic Budget Allocation

🐍dynamic_budget.py

1class DynamicThinkingAgent:
2    """Agent that dynamically allocates thinking budget."""
3
4    def __init__(self):
5        self.thinking_model = genai.GenerativeModel(
6            "gemini-2.0-flash-thinking-exp-01-21"
7        )
8        self.fast_model = genai.GenerativeModel("gemini-2.5-flash")
9
10    def process(self, task: str) -> str:
11        """Process task with appropriate thinking level."""
12        complexity = self.assess_complexity(task)
13
14        if complexity == "simple":
15            # Use fast model, no thinking
16            return self.fast_model.generate_content(task).text
17
18        elif complexity == "moderate":
19            # Use thinking model with moderate budget
20            return self.thinking_model.generate_content([
21                "Think through this carefully.",
22                task,
23            ]).text
24
25        else:  # complex
26            # Use thinking model with maximum budget
27            return self.thinking_model.generate_content([
28                "This is a complex problem. Think very deeply. "
29                "Consider multiple approaches. Verify your reasoning. "
30                "Take your time.",
31                task,
32            ]).text
33
34    def assess_complexity(self, task: str) -> str:
35        """Assess task complexity quickly."""
36        # Use fast model to assess
37        assessment = self.fast_model.generate_content(
38            f"Rate this task's complexity as 'simple', 'moderate', or 'complex'. "
39            f"Only respond with one word.\n\nTask: {task}"
40        )
41
42        result = assessment.text.strip().lower()
43        if result not in ["simple", "moderate", "complex"]:
44            return "moderate"  # Default
45        return result

When to Use Extended Thinking

Task Type	Thinking Level	Reasoning
Simple queries	None	Fast response more important
Code formatting	None	Mechanical task
Bug fixing	Medium	Need to trace logic
Architecture design	High	Many tradeoffs to consider
Algorithm optimization	High	Complex analysis needed
Security review	High	Must be thorough

Indicators for Extended Thinking

🐍thinking_indicators.py

1class ThinkingIndicators:
2    """Determine when extended thinking is beneficial."""
3
4    HIGH_THINKING_KEYWORDS = [
5        "design", "architect", "optimize", "security",
6        "refactor", "debug", "analyze", "compare",
7        "trade-off", "best approach", "pros and cons",
8    ]
9
10    LOW_THINKING_KEYWORDS = [
11        "format", "rename", "simple", "quick",
12        "typo", "comment", "log", "print",
13    ]
14
15    def should_think_deeply(self, task: str) -> bool:
16        """Determine if task warrants deep thinking."""
17        task_lower = task.lower()
18
19        # Check for high-thinking indicators
20        for keyword in self.HIGH_THINKING_KEYWORDS:
21            if keyword in task_lower:
22                return True
23
24        # Check for low-thinking indicators
25        for keyword in self.LOW_THINKING_KEYWORDS:
26            if keyword in task_lower:
27                return False
28
29        # Check task length/complexity
30        if len(task) > 500:
31            return True
32
33        # Default to moderate
34        return False
35
36    def estimate_thinking_time(self, task: str) -> int:
37        """Estimate thinking time in seconds."""
38        if not self.should_think_deeply(task):
39            return 0
40
41        # Base time
42        time = 5
43
44        # Add time for complexity indicators
45        if "security" in task.lower():
46            time += 20
47        if "architecture" in task.lower():
48            time += 15
49        if any(word in task.lower() for word in ["all", "every", "comprehensive"]):
50            time += 10
51
52        return min(time, 60)  # Cap at 60 seconds

Start Fast, Think Deep When Needed

Default to fast models for most tasks. Escalate to thinking models when the task is genuinely complex or when fast attempts fail.

Implementation Patterns

Pattern 1: Tiered Processing

🐍tiered_processing.py

1class TieredAgent:
2    """Agent with tiered processing levels."""
3
4    def __init__(self):
5        self.tiers = {
6            "fast": genai.GenerativeModel("gemini-2.5-flash"),
7            "balanced": genai.GenerativeModel("gemini-2.5-pro"),
8            "thinking": genai.GenerativeModel("gemini-2.0-flash-thinking-exp-01-21"),
9        }
10
11    def process(self, task: str) -> str:
12        """Process through tiers until success."""
13
14        # Try fast first
15        result = self.try_tier("fast", task)
16        if self.is_satisfactory(result, task):
17            return result.text
18
19        # Escalate to balanced
20        result = self.try_tier("balanced", task)
21        if self.is_satisfactory(result, task):
22            return result.text
23
24        # Full thinking for complex tasks
25        result = self.try_tier("thinking", task, think_deeply=True)
26        return result.text
27
28    def try_tier(
29        self,
30        tier: str,
31        task: str,
32        think_deeply: bool = False,
33    ):
34        """Try processing at a specific tier."""
35        model = self.tiers[tier]
36
37        if think_deeply:
38            return model.generate_content([
39                "Think through this problem very carefully. "
40                "Consider multiple approaches and verify your reasoning.",
41                task,
42            ])
43        else:
44            return model.generate_content(task)
45
46    def is_satisfactory(self, result, task: str) -> bool:
47        """Check if result is satisfactory."""
48        # Quick validation
49        if not result.text or len(result.text) < 50:
50            return False
51
52        # Check for uncertainty markers
53        uncertainty_markers = [
54            "I'm not sure",
55            "I don't know",
56            "This is difficult",
57            "I need more information",
58        ]
59
60        for marker in uncertainty_markers:
61            if marker.lower() in result.text.lower():
62                return False
63
64        return True

Pattern 2: Thinking with Verification

🐍thinking_verification.py

1class VerifiedThinkingAgent:
2    """Agent that verifies its own thinking."""
3
4    def __init__(self):
5        self.model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-01-21")
6        self.verifier = genai.GenerativeModel("gemini-2.5-flash")
7
8    def solve_with_verification(self, problem: str) -> dict:
9        """Solve problem with thinking and verification."""
10
11        # Step 1: Solve with thinking
12        solution = self.model.generate_content([
13            "Think through this problem carefully and show your reasoning.",
14            problem,
15        ])
16
17        # Extract thinking and answer
18        thinking, answer = self.extract_parts(solution)
19
20        # Step 2: Verify the answer
21        verification = self.verifier.generate_content(
22            f"Problem: {problem}\n\n"
23            f"Proposed answer: {answer}\n\n"
24            "Is this answer correct? If not, what is the correct answer?"
25        )
26
27        # Step 3: Check verification result
28        if "correct" in verification.text.lower() and "not correct" not in verification.text.lower():
29            return {
30                "answer": answer,
31                "thinking": thinking,
32                "verified": True,
33            }
34        else:
35            # Re-solve with the verification feedback
36            return self.resolve_with_feedback(problem, answer, verification.text)
37
38    def extract_parts(self, response) -> tuple[str, str]:
39        """Extract thinking and answer from response."""
40        thinking = ""
41        answer = ""
42
43        for part in response.parts:
44            if hasattr(part, "thought") and part.thought:
45                thinking = part.text
46            else:
47                answer = part.text
48
49        return thinking, answer
50
51    def resolve_with_feedback(
52        self,
53        problem: str,
54        original_answer: str,
55        feedback: str,
56    ) -> dict:
57        """Re-solve with verification feedback."""
58        resolution = self.model.generate_content([
59            "Think even more carefully about this problem.",
60            f"Problem: {problem}",
61            f"Your previous answer was: {original_answer}",
62            f"Verification feedback: {feedback}",
63            "Please reconsider and provide the correct answer.",
64        ])
65
66        thinking, answer = self.extract_parts(resolution)
67
68        return {
69            "answer": answer,
70            "thinking": thinking,
71            "verified": False,  # Needs human review
72            "feedback": feedback,
73        }

Pattern 3: Adaptive Reasoning

🐍adaptive_reasoning.py

1class AdaptiveReasoningAgent:
2    """Agent that adapts reasoning depth based on task."""
3
4    def __init__(self):
5        self.fast = genai.GenerativeModel("gemini-2.5-flash")
6        self.thinking = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-01-21")
7
8    def process(self, task: str) -> str:
9        """Process with adaptive reasoning."""
10
11        # Quick assessment
12        assessment = self.assess_task(task)
13
14        if assessment["needs_thinking"]:
15            # Use thinking model
16            prompt = self.build_thinking_prompt(
17                task,
18                depth=assessment["depth"],
19            )
20            return self.thinking.generate_content(prompt).text
21        else:
22            # Use fast model
23            return self.fast.generate_content(task).text
24
25    def assess_task(self, task: str) -> dict:
26        """Quickly assess task requirements."""
27        # Use fast model to assess
28        assessment = self.fast.generate_content(
29            f"Analyze this task. Does it require deep reasoning? "
30            f"If so, what depth (1-5)?\n\nTask: {task}\n\n"
31            f"Respond with JSON: {{"needs_thinking": bool, "depth": int}}"
32        )
33
34        try:
35            import json
36            return json.loads(assessment.text)
37        except:
38            return {"needs_thinking": True, "depth": 3}
39
40    def build_thinking_prompt(self, task: str, depth: int) -> str:
41        """Build prompt with appropriate thinking instructions."""
42        depth_instructions = {
43            1: "Think briefly before responding.",
44            2: "Consider the main factors before responding.",
45            3: "Think carefully about this problem. Consider key factors.",
46            4: "Think deeply. Consider multiple approaches and tradeoffs.",
47            5: "This requires maximum reasoning. Think exhaustively. "
48               "Consider all approaches, verify reasoning, check edge cases.",
49        }
50
51        return f"{depth_instructions.get(depth, depth_instructions[3])}\n\n{task}"

Monitor Thinking Cost

Extended thinking increases both latency and cost. Monitor usage and set budgets to avoid runaway costs on complex tasks.

Summary

Controllable reasoning depth in Gemini:

Thinking models: Extended reasoning before responding
Budget control: Match thinking to task complexity
Tiered processing: Fast first, escalate when needed
Verification: Validate thinking results
Adaptive: Dynamically adjust reasoning depth

Next: Let's explore Gemini Code Assist's Agent Mode and how Google integrates AI agents into the development workflow.