Moderation Models Overview
Ensure content safety and compliance with AI moderation models that detect harmful, inappropriate, or policy-violating content in text.Available Models
OpenAI Models
- GPT OSS Safeguard 120B: Large-scale text moderation model for detecting harmful and policy-violating content
Model Capabilities
Text Moderation
Analyze text for harmful or inappropriate content
Content Safety
Detect hate speech, harassment, violence, and other violations
Policy Enforcement
Enforce content policies and community guidelines
Automated Screening
Automatically screen user-generated content at scale
Moderation API
Analyze text content for policy violations:Basic Example
Response Format
Moderation Categories
- Hate Speech: Discriminatory or hateful language
- Harassment: Bullying, intimidation, threats
- Violence: Violent threats or graphic descriptions
- Sexual Content: Explicit sexual material
- Self-Harm: Content promoting self-injury
Common Use Cases
Social Media
User-generated content moderation, community guidelines enforcement
E-commerce
Product review moderation, marketplace content safety
Educational Platforms
Student content moderation, age-appropriate filtering
Gaming
Chat moderation, user behavior monitoring
Getting Started
Quick Start
Set up content moderation
SDKs
Use our libraries