Skip to main content

Moderation Models Overview

Ensure content safety and compliance with AI moderation models that detect harmful, inappropriate, or policy-violating content in text.

Available Models

OpenAI Models

  • GPT OSS Safeguard 120B: Large-scale text moderation model for detecting harmful and policy-violating content

Model Capabilities

Text Moderation

Analyze text for harmful or inappropriate content

Content Safety

Detect hate speech, harassment, violence, and other violations

Policy Enforcement

Enforce content policies and community guidelines

Automated Screening

Automatically screen user-generated content at scale

Moderation API

Analyze text content for policy violations:
POST /v1/moderations

Basic Example

curl -X POST "https://api.anyapi.ai/v1/moderations" \
  -H "Authorization: Bearer your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-oss-safeguard-120b",
    "input": "This is a sample text to check for content violations."
  }'

Response Format

{
  "id": "modr-abc123",
  "model": "openai/gpt-oss-safeguard-120b",
  "results": [
    {
      "flagged": false,
      "categories": {
        "sexual": false,
        "hate": false,
        "harassment": false,
        "self-harm": false,
        "sexual/minors": false,
        "hate/threatening": false,
        "violence/graphic": false,
        "self-harm/intent": false,
        "self-harm/instructions": false,
        "harassment/threatening": false,
        "violence": false
      },
      "category_scores": {
        "sexual": 0.0001,
        "hate": 0.0002,
        "harassment": 0.0001,
        "self-harm": 0.0000,
        "sexual/minors": 0.0000,
        "hate/threatening": 0.0000,
        "violence/graphic": 0.0001,
        "self-harm/intent": 0.0000,
        "self-harm/instructions": 0.0000,
        "harassment/threatening": 0.0000,
        "violence": 0.0001
      }
    }
  ]
}

Moderation Categories

  • Hate Speech: Discriminatory or hateful language
  • Harassment: Bullying, intimidation, threats
  • Violence: Violent threats or graphic descriptions
  • Sexual Content: Explicit sexual material
  • Self-Harm: Content promoting self-injury

Common Use Cases

Social Media

User-generated content moderation, community guidelines enforcement

E-commerce

Product review moderation, marketplace content safety

Educational Platforms

Student content moderation, age-appropriate filtering

Gaming

Chat moderation, user behavior monitoring

Getting Started

Quick Start

Set up content moderation

SDKs

Use our libraries