Vision Capabilities

Enable AI models to see and understand images, extract text, analyze visual content, and answer questions about what they observe in photos, documents, and other visual media.

Overview

Vision capabilities allow AI models to:

Analyze images - Understand content, objects, scenes, and context
Extract text (OCR) - Read text from images and documents
Answer visual questions - Respond to questions about image content
Compare images - Analyze differences and similarities
Generate descriptions - Create detailed descriptions of visual content

Image Understanding

Comprehensive analysis of visual content and context

Text Recognition

Extract and read text from any image or document

Visual Q&A

Answer specific questions about image content

Multi-Image Analysis

Compare and analyze multiple images together

Basic Vision Usage

import requests
import base64

def analyze_image(image_path, prompt="Describe this image in detail"):
    """Analyze an image with AI vision"""
    
    # Encode image to base64
    with open(image_path, "rb") as image_file:
        base64_image = base64.b64encode(image_file.read()).decode('utf-8')
    
    response = requests.post(
        "https://api.anyapi.ai/v1/chat/completions",
        headers={
            "Authorization": "Bearer YOUR_API_KEY",
            "Content-Type": "application/json"
        },
        json={
            "model": "gpt-4o",
            "messages": [
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": prompt
                        },
                        {
                            "type": "image_url",
                            "image_url": {
                                "url": f"data:image/jpeg;base64,{base64_image}"
                            }
                        }
                    ]
                }
            ],
            "max_tokens": 500
        }
    )
    
    return response.json()["choices"][0]["message"]["content"]

# Usage examples
description = analyze_image("photo.jpg", "What do you see in this image?")
print(description)

# OCR example
text_content = analyze_image("document.png", "Extract all text from this image")
print(text_content)

# Analysis example
analysis = analyze_image("chart.png", "Analyze this chart and explain the trends")
print(analysis)

Advanced Vision Applications

Document Processing

class DocumentProcessor:
    def __init__(self, api_key):
        self.api_key = api_key
    
    def extract_text_with_structure(self, image_path):
        """Extract text while preserving document structure"""
        
        with open(image_path, "rb") as image_file:
            base64_image = base64.b64encode(image_file.read()).decode('utf-8')
        
        response = requests.post(
            "https://api.anyapi.ai/v1/chat/completions",
            headers={
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "application/json"
            },
            json={
                "model": "gpt-4o",
                "messages": [
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "text",
                                "text": """
                                Extract all text from this document and structure it properly.
                                
                                Format the output as JSON with these fields:
                                - title: Document title if present
                                - sections: Array of sections with headers and content
                                - tables: Any tables found with structured data
                                - metadata: Any dates, reference numbers, etc.
                                - raw_text: All text in reading order
                                """
                            },
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": f"data:image/jpeg;base64,{base64_image}"
                                }
                            }
                        ]
                    }
                ]
            }
        )
        
        return response.json()["choices"][0]["message"]["content"]
    
    def analyze_invoice(self, image_path):
        """Extract invoice information"""
        
        with open(image_path, "rb") as image_file:
            base64_image = base64.b64encode(image_file.read()).decode('utf-8')
        
        response = requests.post(
            "https://api.anyapi.ai/v1/chat/completions",
            headers={
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "application/json"
            },
            json={
                "model": "gpt-4o",
                "messages": [
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "text",
                                "text": """
                                Analyze this invoice and extract key information as JSON:
                                
                                {
                                  "invoice_number": "",
                                  "date": "",
                                  "due_date": "",
                                  "vendor": {
                                    "name": "",
                                    "address": "",
                                    "phone": "",
                                    "email": ""
                                  },
                                  "bill_to": {
                                    "name": "",
                                    "address": ""
                                  },
                                  "items": [
                                    {
                                      "description": "",
                                      "quantity": 0,
                                      "unit_price": 0,
                                      "total": 0
                                    }
                                  ],
                                  "subtotal": 0,
                                  "tax": 0,
                                  "total": 0
                                }
                                """
                            },
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": f"data:image/jpeg;base64,{base64_image}"
                                }
                            }
                        ]
                    }
                ]
            }
        )
        
        return response.json()["choices"][0]["message"]["content"]
    
    def compare_documents(self, image1_path, image2_path):
        """Compare two documents and find differences"""
        
        # Encode both images
        with open(image1_path, "rb") as f1, open(image2_path, "rb") as f2:
            base64_image1 = base64.b64encode(f1.read()).decode('utf-8')
            base64_image2 = base64.b64encode(f2.read()).decode('utf-8')
        
        response = requests.post(
            "https://api.anyapi.ai/v1/chat/completions",
            headers={
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "application/json"
            },
            json={
                "model": "gpt-4o",
                "messages": [
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "text",
                                "text": """
                                Compare these two documents and identify:
                                1. Key differences in content
                                2. Changes in formatting or structure
                                3. Added or removed sections
                                4. Any data discrepancies
                                
                                Provide a detailed comparison report.
                                """
                            },
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": f"data:image/jpeg;base64,{base64_image1}"
                                }
                            },
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": f"data:image/jpeg;base64,{base64_image2}"
                                }
                            }
                        ]
                    }
                ]
            }
        )
        
        return response.json()["choices"][0]["message"]["content"]

# Usage
processor = DocumentProcessor("YOUR_API_KEY")

# Extract structured text
structured_data = processor.extract_text_with_structure("contract.pdf")
print(structured_data)

# Process invoice
invoice_data = processor.analyze_invoice("invoice.jpg")
print(invoice_data)

# Compare documents
comparison = processor.compare_documents("version1.pdf", "version2.pdf")
print(comparison)

Image Analysis for E-commerce

class EcommerceImageAnalyzer:
    def __init__(self, api_key):
        self.api_key = api_key
    
    def analyze_product_image(self, image_path):
        """Analyze product image for e-commerce listing"""
        
        with open(image_path, "rb") as image_file:
            base64_image = base64.b64encode(image_file.read()).decode('utf-8')
        
        response = requests.post(
            "https://api.anyapi.ai/v1/chat/completions",
            headers={
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "application/json"
            },
            json={
                "model": "gpt-4o",
                "messages": [
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "text",
                                "text": """
                                Analyze this product image for e-commerce use. Provide:
                                
                                1. Product identification and category
                                2. Key features and attributes visible
                                3. Color, material, size estimates
                                4. Condition assessment
                                5. Suggested product title and description
                                6. Keywords for SEO
                                7. Quality assessment of the photo
                                
                                Format as JSON with these fields.
                                """
                            },
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": f"data:image/jpeg;base64,{base64_image}"
                                }
                            }
                        ]
                    }
                ]
            }
        )
        
        return response.json()["choices"][0]["message"]["content"]
    
    def check_image_quality(self, image_path):
        """Assess image quality for e-commerce standards"""
        
        with open(image_path, "rb") as image_file:
            base64_image = base64.b64encode(image_file.read()).decode('utf-8')
        
        response = requests.post(
            "https://api.anyapi.ai/v1/chat/completions",
            headers={
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "application/json"
            },
            json={
                "model": "gpt-4o",
                "messages": [
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "text",
                                "text": """
                                Assess this product image quality for e-commerce use:
                                
                                1. Technical quality (resolution, focus, lighting)
                                2. Composition and framing
                                3. Background and staging
                                4. Product visibility and clarity
                                5. Overall professional appearance
                                6. Recommendations for improvement
                                7. Quality score (1-10)
                                
                                Provide detailed feedback and suggestions.
                                """
                            },
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": f"data:image/jpeg;base64,{base64_image}"
                                }
                            }
                        ]
                    }
                ]
            }
        )
        
        return response.json()["choices"][0]["message"]["content"]
    
    def generate_alt_text(self, image_path):
        """Generate accessibility alt text for images"""
        
        with open(image_path, "rb") as image_file:
            base64_image = base64.b64encode(image_file.read()).decode('utf-8')
        
        response = requests.post(
            "https://api.anyapi.ai/v1/chat/completions",
            headers={
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "application/json"
            },
            json={
                "model": "gpt-4o",
                "messages": [
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "text",
                                "text": """
                                Generate concise, descriptive alt text for this product image.
                                Focus on essential details that help users understand what the product is.
                                Keep it under 125 characters and be specific about key features.
                                """
                            },
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": f"data:image/jpeg;base64,{base64_image}"
                                }
                            }
                        ]
                    }
                ]
            }
        )
        
        return response.json()["choices"][0]["message"]["content"]

# Usage
analyzer = EcommerceImageAnalyzer("YOUR_API_KEY")

# Analyze product
product_analysis = analyzer.analyze_product_image("product.jpg")
print(product_analysis)

# Check quality
quality_report = analyzer.check_image_quality("product.jpg")
print(quality_report)

# Generate alt text
alt_text = analyzer.generate_alt_text("product.jpg")
print(alt_text)

Content Moderation

class ImageModerator:
    def __init__(self, api_key):
        self.api_key = api_key
    
    def moderate_image_content(self, image_path):
        """Check image for inappropriate content"""
        
        with open(image_path, "rb") as image_file:
            base64_image = base64.b64encode(image_file.read()).decode('utf-8')
        
        response = requests.post(
            "https://api.anyapi.ai/v1/chat/completions",
            headers={
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "application/json"
            },
            json={
                "model": "gpt-4o",
                "messages": [
                    {
                        "role": "user",
                        "content": [
                            {
                                "type": "text",
                                "text": """
                                Analyze this image for content moderation. Check for:
                                
                                1. Inappropriate or explicit content
                                2. Violence or harmful imagery
                                3. Hate symbols or offensive material
                                4. Privacy concerns (faces, license plates, etc.)
                                5. Copyright issues (branded content, logos)
                                
                                Provide a safety rating:
                                - SAFE: Appropriate for all audiences
                                - CAUTION: May need review
                                - UNSAFE: Violates content policy
                                
                                Include reasoning for your assessment.
                                """
                            },
                            {
                                "type": "image_url",
                                "image_url": {
                                    "url": f"data:image/jpeg;base64,{base64_image}"
                                }
                            }
                        ]
                    }
                ]
            }
        )
        
        return response.json()["choices"][0]["message"]["content"]

# Usage
moderator = ImageModerator("YOUR_API_KEY")
moderation_result = moderator.moderate_image_content("user_upload.jpg")
print(moderation_result)

Interactive Vision Chat

class VisionChatbot:
    def __init__(self, api_key):
        self.api_key = api_key
        self.conversation_history = []
    
    def add_image_to_conversation(self, image_path, user_message):
        """Add an image and message to ongoing conversation"""
        
        with open(image_path, "rb") as image_file:
            base64_image = base64.b64encode(image_file.read()).decode('utf-8')
        
        # Add user message with image to conversation
        self.conversation_history.append({
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": user_message
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        })
        
        return self.get_response()
    
    def add_text_message(self, message):
        """Add a text-only message to conversation"""
        
        self.conversation_history.append({
            "role": "user",
            "content": message
        })
        
        return self.get_response()
    
    def get_response(self):
        """Get AI response for current conversation"""
        
        response = requests.post(
            "https://api.anyapi.ai/v1/chat/completions",
            headers={
                "Authorization": f"Bearer {self.api_key}",
                "Content-Type": "application/json"
            },
            json={
                "model": "gpt-4o",
                "messages": self.conversation_history,
                "max_tokens": 500
            }
        )
        
        assistant_response = response.json()["choices"][0]["message"]["content"]
        
        # Add assistant response to conversation
        self.conversation_history.append({
            "role": "assistant",
            "content": assistant_response
        })
        
        return assistant_response
    
    def clear_conversation(self):
        """Clear conversation history"""
        self.conversation_history = []

# Usage
chatbot = VisionChatbot("YOUR_API_KEY")

# Start conversation with image
response1 = chatbot.add_image_to_conversation(
    "vacation_photo.jpg", 
    "What do you see in this photo?"
)
print("AI:", response1)

# Continue with follow-up questions
response2 = chatbot.add_text_message("What time of day do you think this was taken?")
print("AI:", response2)

response3 = chatbot.add_text_message("What activities would you recommend in this location?")
print("AI:", response3)

# Add another image to same conversation
response4 = chatbot.add_image_to_conversation(
    "another_photo.jpg",
    "How does this location compare to the first image?"
)
print("AI:", response4)

Best Practices

1. Image Preparation

Format support: JPEG, PNG, GIF, WebP
Size limits: Max 20MB per image
Resolution: Higher resolution for OCR tasks
Quality: Clear, well-lit images work best

2. Prompt Engineering

Be specific: Ask for exactly what you need
Provide context: Explain the purpose or domain
Structure requests: Use numbered lists for complex analysis
Set expectations: Specify output format (JSON, list, etc.)

3. Error Handling

Validate images: Check format and size before processing
Handle API errors: Network issues, rate limits
Fallback strategies: Alternative processing methods
User feedback: Clear error messages

4. Performance Optimization

Image compression: Balance quality vs processing speed
Batch processing: Multiple images when possible
Caching: Store results for repeated analysis
Async processing: For better user experience

Common Use Cases

Document Processing

OCR, invoice processing, form extraction, contract analysis

E-commerce

Product cataloging, quality assessment, description generation

Content Moderation

Safety checking, policy compliance, inappropriate content detection

Accessibility

Alt text generation, image descriptions, visual assistance

Medical Imaging

Diagnostic assistance, report generation, image analysis

Quality Control

Defect detection, inspection, compliance checking

Education

Homework help, diagram analysis, visual learning aids

Security

Surveillance analysis, threat detection, incident documentation

Getting Started

Quick Start

Start analyzing images with AI

API Reference

Explore vision model capabilities

Use Cases

See vision in real applications

Chat Models

Learn about multimodal chat

Get started

Features

Use Cases

Developer guides

API Reference

Integrations

Vision Capabilities

Vision Capabilities

Overview

Image Understanding

Text Recognition

Visual Q&A

Multi-Image Analysis

Basic Vision Usage

Advanced Vision Applications

Document Processing

Image Analysis for E-commerce

Content Moderation

Interactive Vision Chat

Best Practices

1. Image Preparation

2. Prompt Engineering

3. Error Handling

4. Performance Optimization

Common Use Cases

Document Processing

E-commerce

Content Moderation

Accessibility

Medical Imaging

Quality Control

Education

Security

Getting Started

Quick Start

API Reference

Use Cases

Chat Models

Get started

Features

Use Cases

Developer guides

API Reference

Integrations

​Vision Capabilities

​Overview

Image Understanding

Text Recognition

Visual Q&A

Multi-Image Analysis

​Basic Vision Usage

​Advanced Vision Applications

​Document Processing

​Image Analysis for E-commerce

​Content Moderation

​Interactive Vision Chat

​Best Practices

​1. Image Preparation

​2. Prompt Engineering

​3. Error Handling

​4. Performance Optimization

​Common Use Cases

Document Processing

E-commerce

Content Moderation

Accessibility

Medical Imaging

Quality Control

Education

Security

​Getting Started

Quick Start

API Reference

Use Cases

Chat Models

Vision Capabilities

Overview

Basic Vision Usage

Advanced Vision Applications

Document Processing

Image Analysis for E-commerce

Content Moderation

Interactive Vision Chat

Best Practices

1. Image Preparation

2. Prompt Engineering

3. Error Handling

4. Performance Optimization

Common Use Cases

Getting Started