Moderation Models Overview
Ensure content safety and compliance with advanced AI moderation models that detect harmful, inappropriate, or policy-violating content across text, images, and other media.Available Models
OpenAI Moderation
- text-moderation-latest: Latest text moderation model
- text-moderation-stable: Stable version for production use
- omni-moderation-latest: Multimodal moderation (text + images)
Perspective API (Google)
- perspective-toxicity: Detect toxic comments and harassment
- perspective-severe-toxicity: Identify severe toxic content
- perspective-threat: Detect threats and violent language
Custom Moderation Models
- content-classifier: Custom content classification
- nsfw-detector: NSFW content detection for images
- hate-speech-detector: Specialized hate speech detection
Specialized Models
- spam-detector: Identify spam and unwanted content
- violence-detector: Detect violent content and imagery
- self-harm-detector: Identify self-harm related content
Model Capabilities
Text Moderation
Analyze text for harmful or inappropriate content
Image Moderation
Detect inappropriate visual content and NSFW material
Multimodal Analysis
Analyze content across multiple media types
Custom Policies
Enforce custom content policies and guidelines
Text Moderation API
Analyze text content for policy violations:Basic Text Moderation
Response Format
Image Moderation API
Analyze images for inappropriate content:Image Content Analysis
Multimodal Moderation API
Analyze content across multiple modalities:Advanced Moderation Features
Custom Policy Enforcement
Batch Moderation
Real-time Stream Moderation
Model Comparison
Model | Type | Languages | Accuracy | Speed | Price/1K tokens |
---|---|---|---|---|---|
text-moderation-latest | Text | English+ | 95% | Fast | $0.002 |
omni-moderation-latest | Multimodal | English+ | 93% | Medium | $0.005 |
perspective-toxicity | Text | 100+ | 90% | Fast | $0.001 |
nsfw-detector | Image | N/A | 94% | Fast | $0.003 |
Moderation Categories
Text Content Categories
- Hate Speech: Discriminatory or hateful language
- Harassment: Bullying, intimidation, threats
- Violence: Violent threats or graphic descriptions
- Sexual Content: Explicit sexual material
- Self-Harm: Content promoting self-injury
- Spam: Unwanted promotional content
- Misinformation: False or misleading information
Image Content Categories
- NSFW: Not safe for work content
- Violence: Graphic violence or weapons
- Drugs: Illegal substances or drug use
- Hate Symbols: Hateful imagery or symbols
- Gore: Extremely graphic content
- Nudity: Nude or partially nude content
Custom Categories
- Brand Safety: Content safe for brand association
- Age Appropriateness: Content suitable for specific age groups
- Professional Context: Workplace-appropriate content
- Regional Compliance: Content compliant with local laws
Integration Patterns
Social Media Platform
Content Management System
Pricing
Moderation models are priced per request or token:Model Type | Price Model | Cost |
---|---|---|
Text Moderation | Per 1K tokens | $0.002 |
Image Moderation | Per image | $0.005 |
Video Moderation | Per minute | $0.050 |
Custom Policies | Per request | $0.003 |
Real-time Stream | Per minute | $0.100 |
Rate Limits
Moderation API limits by plan:Plan | Requests/Min | Tokens/Min | Custom Policies |
---|---|---|---|
Free | 100 | 50,000 | 1 |
Pro | 1,000 | 500,000 | 10 |
Enterprise | Custom | Custom | Unlimited |
Compliance and Privacy
Data Handling
- No storage: Content is analyzed but not stored
- Privacy-first: Minimal data collection
- GDPR compliant: EU privacy regulation compliance
- SOC2 certified: Enterprise security standards
Audit Trail
- Complete logging: All moderation decisions logged
- Transparency: Clear reason codes for actions
- Appeals process: Content review and appeals
- Analytics: Moderation metrics and insights
Common Use Cases
Social Media
User-generated content moderation, community guidelines enforcement
E-commerce
Product review moderation, marketplace content safety
Educational Platforms
Student content moderation, age-appropriate filtering
Gaming
Chat moderation, user behavior monitoring
News & Media
Article fact-checking, comment moderation
Corporate Communications
Internal content compliance, brand safety
Healthcare
Medical content verification, patient communication
Legal Services
Document compliance, regulatory adherence