Introduction
Gemini offers "thinking" model variants that can reason for extended periods before responding. Unlike standard models that generate tokens immediately, thinking models invest compute time in internal reasoning - similar to OpenAI's o3.
The Thinking Trade-off: More thinking time means better answers on complex problems, but slower responses and higher costs. The key is matching reasoning depth to task complexity.
Gemini Thinking Models
Gemini offers thinking variants with controllable reasoning:
| Model | Max Output | Thinking Style | Best For |
|---|---|---|---|
| gemini-2.0-flash-thinking | 32K tokens | Visible thinking | Complex reasoning, debugging |
| gemini-2.5-pro | 8K tokens | Standard | General tasks |
| gemini-2.5-flash | 8K tokens | Fast | High-volume, simple tasks |
Thinking Model Output
๐thinking_model.py
1import google.generativeai as genai
2
3# Use thinking model
4model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-01-21")
5
6response = model.generate_content(
7 "Solve this step by step: A train leaves Station A at 9:00 AM "
8 "traveling at 60 mph. Another train leaves Station B at 10:00 AM "
9 "traveling at 80 mph toward Station A. If the stations are 280 miles "
10 "apart, when and where do the trains meet?"
11)
12
13# Thinking models show their reasoning
14for part in response.parts:
15 if hasattr(part, "thought") and part.thought:
16 print("THINKING:")
17 print(part.text)
18 else:
19 print("\nANSWER:")
20 print(part.text)Visible vs Hidden Thinking
๐thinking_comparison.txt
1Standard Model Response:
2"The trains meet at 11:30 AM, 150 miles from Station A."
3
4Thinking Model Response:
5[THINKING]
6Let me work through this step by step.
7
8First, let me set up the problem:
9- Train A: leaves 9:00 AM, speed 60 mph
10- Train B: leaves 10:00 AM, speed 80 mph
11- Distance between stations: 280 miles
12
13When Train B starts at 10:00 AM, Train A has already traveled:
14- 1 hour ร 60 mph = 60 miles
15- Remaining distance: 280 - 60 = 220 miles
16
17Now both trains are moving toward each other:
18- Combined speed: 60 + 80 = 140 mph
19- Time to meet: 220 miles รท 140 mph = 1.57 hours โ 1 hour 34 minutes
20
21From Train B's departure (10:00 AM):
22- Meeting time: 10:00 AM + 1:34 = 11:34 AM
23
24Position from Station A:
25- Train A traveled: 2.57 hours ร 60 mph = 154 miles
26
27Let me verify: Train B traveled 1.57 hours ร 80 mph = 126 miles
28Station A to meeting point: 154 miles
29Station B to meeting point: 126 miles
30Total: 154 + 126 = 280 miles โ
31
32[ANSWER]
33The trains meet at approximately 11:34 AM, about 154 miles from Station A.Thinking Budget Control
You can control how much the model thinks:
๐budget_control.py
1import google.generativeai as genai
2
3# Configure thinking budget
4model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-01-21")
5
6# Control thinking via generation config
7def generate_with_budget(prompt: str, thinking_budget: str) -> str:
8 """Generate with specified thinking budget.
9
10 thinking_budget options:
11 - 'none': No extended thinking
12 - 'low': Quick thinking (~1-2s)
13 - 'medium': Moderate thinking (~5-10s)
14 - 'high': Deep thinking (~30s+)
15 """
16
17 # Thinking budget is controlled via system instruction
18 system_prompts = {
19 "none": "Respond directly without extended reasoning.",
20 "low": "Think briefly before responding.",
21 "medium": "Think through the problem carefully before responding.",
22 "high": "Think very deeply about all aspects before responding. "
23 "Consider multiple approaches and verify your reasoning.",
24 }
25
26 response = model.generate_content([
27 system_prompts[thinking_budget],
28 prompt,
29 ])
30
31 return response.text
32
33
34# Usage examples
35# Quick response for simple task
36quick = generate_with_budget("What is 2 + 2?", "none")
37
38# Deep thinking for complex task
39deep = generate_with_budget(
40 "Design a distributed caching system for a global e-commerce platform.",
41 "high",
42)Dynamic Budget Allocation
๐dynamic_budget.py
1class DynamicThinkingAgent:
2 """Agent that dynamically allocates thinking budget."""
3
4 def __init__(self):
5 self.thinking_model = genai.GenerativeModel(
6 "gemini-2.0-flash-thinking-exp-01-21"
7 )
8 self.fast_model = genai.GenerativeModel("gemini-2.5-flash")
9
10 def process(self, task: str) -> str:
11 """Process task with appropriate thinking level."""
12 complexity = self.assess_complexity(task)
13
14 if complexity == "simple":
15 # Use fast model, no thinking
16 return self.fast_model.generate_content(task).text
17
18 elif complexity == "moderate":
19 # Use thinking model with moderate budget
20 return self.thinking_model.generate_content([
21 "Think through this carefully.",
22 task,
23 ]).text
24
25 else: # complex
26 # Use thinking model with maximum budget
27 return self.thinking_model.generate_content([
28 "This is a complex problem. Think very deeply. "
29 "Consider multiple approaches. Verify your reasoning. "
30 "Take your time.",
31 task,
32 ]).text
33
34 def assess_complexity(self, task: str) -> str:
35 """Assess task complexity quickly."""
36 # Use fast model to assess
37 assessment = self.fast_model.generate_content(
38 f"Rate this task's complexity as 'simple', 'moderate', or 'complex'. "
39 f"Only respond with one word.\n\nTask: {task}"
40 )
41
42 result = assessment.text.strip().lower()
43 if result not in ["simple", "moderate", "complex"]:
44 return "moderate" # Default
45 return resultWhen to Use Extended Thinking
| Task Type | Thinking Level | Reasoning |
|---|---|---|
| Simple queries | None | Fast response more important |
| Code formatting | None | Mechanical task |
| Bug fixing | Medium | Need to trace logic |
| Architecture design | High | Many tradeoffs to consider |
| Algorithm optimization | High | Complex analysis needed |
| Security review | High | Must be thorough |
Indicators for Extended Thinking
๐thinking_indicators.py
1class ThinkingIndicators:
2 """Determine when extended thinking is beneficial."""
3
4 HIGH_THINKING_KEYWORDS = [
5 "design", "architect", "optimize", "security",
6 "refactor", "debug", "analyze", "compare",
7 "trade-off", "best approach", "pros and cons",
8 ]
9
10 LOW_THINKING_KEYWORDS = [
11 "format", "rename", "simple", "quick",
12 "typo", "comment", "log", "print",
13 ]
14
15 def should_think_deeply(self, task: str) -> bool:
16 """Determine if task warrants deep thinking."""
17 task_lower = task.lower()
18
19 # Check for high-thinking indicators
20 for keyword in self.HIGH_THINKING_KEYWORDS:
21 if keyword in task_lower:
22 return True
23
24 # Check for low-thinking indicators
25 for keyword in self.LOW_THINKING_KEYWORDS:
26 if keyword in task_lower:
27 return False
28
29 # Check task length/complexity
30 if len(task) > 500:
31 return True
32
33 # Default to moderate
34 return False
35
36 def estimate_thinking_time(self, task: str) -> int:
37 """Estimate thinking time in seconds."""
38 if not self.should_think_deeply(task):
39 return 0
40
41 # Base time
42 time = 5
43
44 # Add time for complexity indicators
45 if "security" in task.lower():
46 time += 20
47 if "architecture" in task.lower():
48 time += 15
49 if any(word in task.lower() for word in ["all", "every", "comprehensive"]):
50 time += 10
51
52 return min(time, 60) # Cap at 60 secondsStart Fast, Think Deep When Needed
Default to fast models for most tasks. Escalate to thinking models when the task is genuinely complex or when fast attempts fail.
Implementation Patterns
Pattern 1: Tiered Processing
๐tiered_processing.py
1class TieredAgent:
2 """Agent with tiered processing levels."""
3
4 def __init__(self):
5 self.tiers = {
6 "fast": genai.GenerativeModel("gemini-2.5-flash"),
7 "balanced": genai.GenerativeModel("gemini-2.5-pro"),
8 "thinking": genai.GenerativeModel("gemini-2.0-flash-thinking-exp-01-21"),
9 }
10
11 def process(self, task: str) -> str:
12 """Process through tiers until success."""
13
14 # Try fast first
15 result = self.try_tier("fast", task)
16 if self.is_satisfactory(result, task):
17 return result.text
18
19 # Escalate to balanced
20 result = self.try_tier("balanced", task)
21 if self.is_satisfactory(result, task):
22 return result.text
23
24 # Full thinking for complex tasks
25 result = self.try_tier("thinking", task, think_deeply=True)
26 return result.text
27
28 def try_tier(
29 self,
30 tier: str,
31 task: str,
32 think_deeply: bool = False,
33 ):
34 """Try processing at a specific tier."""
35 model = self.tiers[tier]
36
37 if think_deeply:
38 return model.generate_content([
39 "Think through this problem very carefully. "
40 "Consider multiple approaches and verify your reasoning.",
41 task,
42 ])
43 else:
44 return model.generate_content(task)
45
46 def is_satisfactory(self, result, task: str) -> bool:
47 """Check if result is satisfactory."""
48 # Quick validation
49 if not result.text or len(result.text) < 50:
50 return False
51
52 # Check for uncertainty markers
53 uncertainty_markers = [
54 "I'm not sure",
55 "I don't know",
56 "This is difficult",
57 "I need more information",
58 ]
59
60 for marker in uncertainty_markers:
61 if marker.lower() in result.text.lower():
62 return False
63
64 return TruePattern 2: Thinking with Verification
๐thinking_verification.py
1class VerifiedThinkingAgent:
2 """Agent that verifies its own thinking."""
3
4 def __init__(self):
5 self.model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-01-21")
6 self.verifier = genai.GenerativeModel("gemini-2.5-flash")
7
8 def solve_with_verification(self, problem: str) -> dict:
9 """Solve problem with thinking and verification."""
10
11 # Step 1: Solve with thinking
12 solution = self.model.generate_content([
13 "Think through this problem carefully and show your reasoning.",
14 problem,
15 ])
16
17 # Extract thinking and answer
18 thinking, answer = self.extract_parts(solution)
19
20 # Step 2: Verify the answer
21 verification = self.verifier.generate_content(
22 f"Problem: {problem}\n\n"
23 f"Proposed answer: {answer}\n\n"
24 "Is this answer correct? If not, what is the correct answer?"
25 )
26
27 # Step 3: Check verification result
28 if "correct" in verification.text.lower() and "not correct" not in verification.text.lower():
29 return {
30 "answer": answer,
31 "thinking": thinking,
32 "verified": True,
33 }
34 else:
35 # Re-solve with the verification feedback
36 return self.resolve_with_feedback(problem, answer, verification.text)
37
38 def extract_parts(self, response) -> tuple[str, str]:
39 """Extract thinking and answer from response."""
40 thinking = ""
41 answer = ""
42
43 for part in response.parts:
44 if hasattr(part, "thought") and part.thought:
45 thinking = part.text
46 else:
47 answer = part.text
48
49 return thinking, answer
50
51 def resolve_with_feedback(
52 self,
53 problem: str,
54 original_answer: str,
55 feedback: str,
56 ) -> dict:
57 """Re-solve with verification feedback."""
58 resolution = self.model.generate_content([
59 "Think even more carefully about this problem.",
60 f"Problem: {problem}",
61 f"Your previous answer was: {original_answer}",
62 f"Verification feedback: {feedback}",
63 "Please reconsider and provide the correct answer.",
64 ])
65
66 thinking, answer = self.extract_parts(resolution)
67
68 return {
69 "answer": answer,
70 "thinking": thinking,
71 "verified": False, # Needs human review
72 "feedback": feedback,
73 }Pattern 3: Adaptive Reasoning
๐adaptive_reasoning.py
1class AdaptiveReasoningAgent:
2 """Agent that adapts reasoning depth based on task."""
3
4 def __init__(self):
5 self.fast = genai.GenerativeModel("gemini-2.5-flash")
6 self.thinking = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-01-21")
7
8 def process(self, task: str) -> str:
9 """Process with adaptive reasoning."""
10
11 # Quick assessment
12 assessment = self.assess_task(task)
13
14 if assessment["needs_thinking"]:
15 # Use thinking model
16 prompt = self.build_thinking_prompt(
17 task,
18 depth=assessment["depth"],
19 )
20 return self.thinking.generate_content(prompt).text
21 else:
22 # Use fast model
23 return self.fast.generate_content(task).text
24
25 def assess_task(self, task: str) -> dict:
26 """Quickly assess task requirements."""
27 # Use fast model to assess
28 assessment = self.fast.generate_content(
29 f"Analyze this task. Does it require deep reasoning? "
30 f"If so, what depth (1-5)?\n\nTask: {task}\n\n"
31 f"Respond with JSON: {{"needs_thinking": bool, "depth": int}}"
32 )
33
34 try:
35 import json
36 return json.loads(assessment.text)
37 except:
38 return {"needs_thinking": True, "depth": 3}
39
40 def build_thinking_prompt(self, task: str, depth: int) -> str:
41 """Build prompt with appropriate thinking instructions."""
42 depth_instructions = {
43 1: "Think briefly before responding.",
44 2: "Consider the main factors before responding.",
45 3: "Think carefully about this problem. Consider key factors.",
46 4: "Think deeply. Consider multiple approaches and tradeoffs.",
47 5: "This requires maximum reasoning. Think exhaustively. "
48 "Consider all approaches, verify reasoning, check edge cases.",
49 }
50
51 return f"{depth_instructions.get(depth, depth_instructions[3])}\n\n{task}"Monitor Thinking Cost
Extended thinking increases both latency and cost. Monitor usage and set budgets to avoid runaway costs on complex tasks.
Summary
Controllable reasoning depth in Gemini:
- Thinking models: Extended reasoning before responding
- Budget control: Match thinking to task complexity
- Tiered processing: Fast first, escalate when needed
- Verification: Validate thinking results
- Adaptive: Dynamically adjust reasoning depth
Next: Let's explore Gemini Code Assist's Agent Mode and how Google integrates AI agents into the development workflow.