Gemini 2.5 Family: Pro & Ultra Cheatsheet
August 2025 Edition
Best Features to Master
Vast Context Window (Up to 2M Tokens)
Processes extraordinary amounts of information in a single conversation.
- Gemini 2.5 Ultra: Standard 2 million token context window (~1.5 million words).
- Gemini 2.5 Pro: 1 million token context window.
- Processes entire research papers, codebases, or multiple documents at once.
- Eliminates the need for complex chunking or RAG pipelines for many tasks.
Autonomous Agents
Executes complex, multi-step tasks by creating and following its own plans.
- Give Gemini a high-level goal, and it will plan and execute the necessary sub-tasks.
- Can browse the web, execute code, and use other tools to achieve objectives.
- Integrates with Google Workspace for tasks like scheduling and email.
- Example: "Plan a 3-day trip to Tokyo, find flights under $1200, and create a daily itinerary."
Real-Time Multimodality (Project Astra)
Processes and comprehends live video and audio streams simultaneously.
- Analyzes the user's environment in real-time via a device's camera and microphone.
- Enables natural, conversational interaction about what's happening "right now."
- Can identify objects, explain processes, and answer questions about your live surroundings.
- Moves beyond file uploads to true real-world understanding.
Gemini Code
A specialized, fine-tuned model for superior coding performance.
- Builds complete, working applications from single prompts.
- Analyzes entire codebases to implement features across multiple files.
- Outperforms general models on benchmarks like Aider Polyglot and Codeforces.
- The recommended model for all development and technical problem-solving tasks.
Agents & Advanced Reasoning
Gemini has moved beyond simple "show-your-work" reasoning. The new **autonomous agent** framework is the primary way it tackles complex problems, creating a plan and using tools to execute it. This is a more powerful and proactive approach to problem-solving.
Benchmark Performance (Gemini 2.5 Ultra)
Benchmark | Description | Top Score |
---|---|---|
AIME 2025 | Advanced mathematics | 90.1% |
GPQA | Graduate-level scientific knowledge | 86.5% |
VideoVQA | Video question answering | 92.3% |
Aider Polyglot | Multi-file code editing | 72.4% |
Multimodal Processing
Unified Understanding: Text, Image, Audio & Live Video
Gemini excels at connecting information across multiple input formats. It can analyze a technical diagram in an image, relate it to written context from a document, and incorporate details from a real-time video explanation.
- File-based analysis: Upload documents, images, audio files, and video clips for deep analysis.
- Live-stream analysis: Use the Project Astra functionality to have Gemini "watch and listen" alongside you, providing real-time feedback and understanding of your environment.
Coding with Gemini Code
For all development tasks, it's recommended to use the specialized Gemini Code model for the best performance and accuracy.
Application Development
Creates comprehensive applications from high-level requirements, handling everything from frontend UI to backend logic and database schemas.
Codebase Analysis
Leverages the vast context window to analyze entire code repositories, identify bugs, suggest architectural improvements, or add new features across multiple files consistently.
Visual Programming
Creates visually engaging and interactive web interfaces, data visualizations, and even simple games directly from a text or image prompt.
Access Options & Pricing
Google One AI Premium
$20/month
- Full access to the flagship Gemini 2.5 Ultra model.
- Full access to autonomous agent capabilities.
- Integration into Google Workspace (Docs, Sheets, Gmail).
- Includes other Google One benefits like expanded storage.
Developer Access (API)
Pay-as-you-go
- Access stable models via Google AI Studio and Vertex AI.
- Pro Model ID:
gemini-2.5-pro-0725
- Ultra Model ID:
gemini-2.5-ultra-0725
- Tiered pricing based on context size and modality.
API Pricing (Text)
Model | Context Size | Input Cost (per 1M tokens) | Output Cost (per 1M tokens) |
---|---|---|---|
Gemini 2.5 Pro | 1M tokens | $1.00 | $3.00 |
Gemini 2.5 Ultra | ≤1M tokens | $2.00 | $6.00 |
Gemini 2.5 Ultra | >1M to 2M tokens | $4.00 | $12.00 |
API Pricing (Other Modalities)
Modality | Cost | Unit |
---|---|---|
Audio Processing | $0.005 | per minute |
Video Processing | $0.02 | per minute |
Tips for Best Results
Take full advantage of the 2M token window (with Ultra) by providing all relevant documents at once. Instead of asking questions one by one, provide the entire context (e.g., multiple research papers, full codebase) and then ask for a comprehensive analysis or comparison. This yields far better results than sequential prompting.
Frame your requests as high-level goals, not specific instructions. Instead of "Search for flights," say "Plan my vacation." Provide constraints like budget, dates, and preferences. The agent will perform better when it understands the overall objective and has the freedom to plan the steps.
When using the live (Project Astra) mode, speak naturally and use visual cues. You can point your camera at an object and ask, "What is this, and how does it work?" Gemini will use both the visual of the object and the audio of your question to form a complete understanding.