How does Genuin Adaptive Intelligence automate and enhance content moderation on brand's content?

Introduction

The content moderation system is designed to analyze and flag video content based on a structured set of rules. This system ensures that video content aligns with platform policies, brand guidelines, and community standards. The moderation process is hierarchical, meaning that if content is flagged at any level, it will be reported, potentially halting further checks depending on the severity of the issue.

The moderation process consists of three key stages:

Platform-Based Moderation
Brand-Based Moderation
Community-Based Moderation

1. Platform-Based Moderation

1.1 Overview

Platform-based moderation is the first line of defense in ensuring content appropriateness across the platform. This level of moderation is primarily concerned with ensuring that content does not violate general platform rules.

1.2 Transcript-Based Moderation

Tool Used: OpenAI's Moderation API (free version).
Process:
- The video's transcript is extracted and analyzed using OpenAI's moderation API.
- The API checks for content that may contain hate speech, violence, adult content, or other inappropriate material.
- If the transcript is flagged by the API, the content is reported and may be removed or reviewed by a moderator.

1.3 Vision-Based Moderation

Tool Used: GPT-4o (OpenAI's model) for image analysis.
Process:
- It is used when transcript based moderation does not flag a content.
- The video is divided into scenes, and a collage of 9 frames is created for each scene.
- These collages are analyzed by GPT-4o to detect visual content that may include nudity, violence, or other forms of explicit or inappropriate imagery.
- If any frames are flagged for containing such content, the video is reported.

2. Brand-Based Moderation

2.1 Overview

Brand-based moderation is the second layer of content analysis, focusing on the brand's specific requirements and values. This stage ensures that content adheres to the particular standards and guidelines set by each brand.

2.2 Transcript-Based Sentiment Analysis

Process:
- The video's transcript is analyzed to determine the sentiment towards the brand.
- Sentiment analysis checks whether the content is positive, negative, or neutral.
- Content that shows negative sentiment towards the brand is flagged and reported.

2.3 Transcript-Based Brand Guidelines Compliance

Process:
- The transcript is further analyzed to check if it adheres to specific brand guidelines.
- This may include checking for the inclusion of specific language, tone, and messaging that aligns with the brand’s values.
- If the content violates these guidelines, it is flagged and reported.

2.4 Vision-Based Brand Compliance

Process:
- The generated video description is used for brand compliance because it contains the visual elements of the video content, which can be identified through vision-based analysis.

3. Community-Based Moderation

3.1 Overview

Community-based moderation is the final layer of content analysis, focusing on ensuring that the content aligns with the specific community's interests and standards. This stage is designed to foster a positive and relevant experience within each community.

3.2 Transcript-Based Community Similarity

Process:
- The transcript is analyzed for its relevance and similarity to the specific community's content.
- This involves checking the content's alignment with the themes, topics, and interests of the community.
- Content that deviates significantly from the community’s standards or interests is flagged and reported.

3.3 Vision based Community Compliance

Process:
- The generated video description is used for catching the visual guideline violations for community content

Platform Moderation Labels (vision):

1. Explicit Nudity:

Exposed Male Genitalia
Exposed Female Genitalia
Exposed Buttocks or Anus
Exposed Female Nipple

2. Sex toys:

Sex toys: Objects used for sexual pleasure (e.g., dildos, vibrators).

3. Non-Explicit Nudity:

Bare Back: Visible human back.
Exposed Male Nipple: Visible male nipples.
Partially Exposed Buttocks: Partially visible buttocks.
Partially Exposed Female Breast: Partially visible female breast.
Bikini: Woman wearing bikini or swimwear.

4. Implied Nudity

Implied Nudity : Implied Nudity with intimate parts covered

5. Kissing on the lips:

Kissing on the lips: Lips making contact.

6. Violence:

Weapon Violence: Harm caused by weapons.
Physical Violence: Harm to individuals or property.
Self-Harm: Self-inflicted harm.
Blood & Gore: Visible wounds and blood.
Explosions and Blasts: Destructive bursts.
Corpses: Human dead corpses and bodies.

7. Drugs

Drugs & Tobacco Paraphernalia & Use: Drugs & Tobacco Paraphernalia & Use

8. Alcohol and Smoking:

Alcohol : Drinking alcohol and usage.
Smoking : Smoking cigarettes

Platform Moderation Labels(Audio):

1. Hate : Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.

2. Hate/threatening : Hateful content that also includes violence or serious harm towards the targeted group based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.

3. Harassment: Content that expresses, incites, or promotes harassing language towards any target.

4. Harassment/threatening: Harassment content that also includes violence or serious harm towards any target.

5. Self-harm: Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.

6. Self-harm/intent: Content where the speaker expresses that they are engaging or intend to engage in acts of self-harm, such as suicide, cutting, and eating disorders.

7. Self-harm/instructions: Content that encourages performing acts of self-harm, such as suicide, cutting, and eating disorders, or that gives instructions or advice on how to commit such acts.

8. Sexual: Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).

9. Sexual/minors: Sexual content that includes an individual who is under 18 years old.

10. Violence: Content that depicts death, violence, or physical injury.

Platform Moderation classes (vision)

Brand Moderation classes:

Brand moderation will be done in accordance with the brand guidelines configured in the BCC portal.

Example:

1. No self promotion:

Don't spam our community with personal referral links or promotional offers.

2. No personal or professional advice requests:

All posts asking for advice must be generic and not specific to your situation.

3. Be respectful:

Treat your fellow members with respect, even when you disagree. Choose curiosity over conflict.

The brand guidelines violations are detected with the help of video transcript as well as generated video description.

Requirements Analysis:

External requirements:

Gpt 4o calls

Internal requirements:

Transcribe model (faster whisper)
Feature extraction model (InceptionV3/Efficient-net)

Comment Moderation:

Comment moderation flow works hand to hand like post moderation where three levels of guidelines are getting checked one after the other , that is Platform guidelines, brand guidelines and community guidelines.

Comment Types:

Video Comment: Comments that incorporate video content.
Audio Comment: Comments that utilize audio recordings.
Text Comment: Comments that consist solely of written text.

Current Moderation Capabilities

The existing moderation engine focuses on audio and text comments, employing the following methods:

Audio Transcript Analysis: Audio comments are transcribed into text, enabling the application of text analysis techniques to identify inappropriate content.
Textual Analysis: Text comments are directly analyzed using natural language processing and machine learning algorithms to detect policy violations.

Image Moderation:

Image moderation is currently applied to the following components:

Community DP: The display picture representing a community.
Community Banner: The banner image used by a community.
User DP: The display picture for an individual user's profile.

Moderation Process

Community DP and Banner: Both platform-level moderation and brand-specific moderation as well as community moderation are applied to these elements. This ensures that the images adhere to the general platform rules and align with the brand's specific guidelines.
User DP: Only platform-level moderation is applied to user display pictures. This is because a single user can be associated with multiple brands, and applying brand-specific moderation to their DP could be complex and potentially inconsistent. Therefore, the user DP is checked to ensure it meets the general platform standards for appropriateness.

Reasoning

The distinction in moderation between community elements and user DPs stems from the fact that communities are generally more closely tied to specific brands and their guidelines. Therefore, it is essential to ensure that community images reflect the brand's identity and values. User DPs, on the other hand, are more personal and may not always directly represent a particular brand.

Limitations of the Content Moderation Approach

While the content moderation system is designed to provide comprehensive and hierarchical analysis, it does have certain limitations that should be acknowledged. Understanding these limitations can help in setting realistic expectations and guiding future enhancements.

1. Dependency on Free-Text Guidelines

Scope of Moderation: The effectiveness of the brand-based moderation is inherently limited by the specificity and clarity of the brand guidelines provided. Since these guidelines are collected in the form of free text, the moderation system can only flag content based on the information it has been explicitly trained to recognize. This means that if a brand guideline is vague or not comprehensive, the system might miss instances of content that technically violate the spirit of the guideline but do not explicitly match the provided criteria.
Example: For a guideline such as "Respect the privacy and personal information of your fellow members by not sharing it outside of the community," it is challenging to monitor and flag the sharing of personal information effectively. The system might not be able to recognize every instance where personal data is subtly shared, especially if the sharing does not fit a predefined pattern or if the information is presented in a non-obvious way.

2. Limitations in Contextual Understanding

Nuance in Content: The current system relies heavily on AI models for both transcript and visual analysis. While these models are powerful, they can struggle with nuanced contexts or subtle implications in both language and imagery. For instance, sarcasm, irony, or culturally specific references might not be accurately interpreted by the AI, leading to either false positives or missed violations.
Visual Content: The system's ability to flag visual content is based on analyzing collages of frames from the video. This approach may miss inappropriate content if it occurs between the selected frames or if it is contextually relevant only when seen in motion rather than as a static image. Moreover, the AI might not always correctly identify subtle visual cues that violate guidelines.

3. Incomplete Brand and Community Data

Data Gaps: The system's ability to enforce brand and community-specific rules is limited by the availability and accuracy of the data provided. If the brand guidelines or community standards are incomplete, outdated, or not comprehensive, the system may fail to identify violations. Additionally, new trends or shifts in brand values that are not immediately updated in the system could lead to lapses in moderation.
Emerging Content Types: The rapid evolution of digital content means that new types of media or modes of expression may emerge that the current system is not equipped to handle. This could lead to gaps in moderation as the system struggles to adapt to new formats or styles of content that were not foreseen during the development of the moderation rules.

4. Reliance on Existing AI Models

Model Limitations: The system’s transcript and visual analysis are reliant on existing AI models (e.g., OpenAI's Moderation API and GPT-4o). These models, while advanced, are not infallible. They are trained on large datasets but may still exhibit biases or inaccuracies, particularly in edge cases. Additionally, these models may not always align perfectly with the specific needs or values of every brand or community.
API Limitations: Using third-party APIs introduces dependencies on external service availability, accuracy, and limitations. For instance, the free tier of OpenAI's Moderation API may have rate limits, which could slow down the moderation process or lead to incomplete analysis during peak usage times.

5. Challenges in Detecting Subtle or Indirect Violations

Subtle Violations: Some content violations might be indirect or subtle, such as promoting a competitor in a non-obvious way or making slight negative insinuations about a brand. The current system may struggle to detect these more nuanced violations unless they are explicitly defined and recognized by the AI models.
Indirect Content References: Similarly, content that indirectly references inappropriate topics without overtly mentioning them may slip through the moderation process. For example, a video might use metaphors, symbols, or coded language to bypass detection, which the AI may not always catch.

Existing Solutions:

Third party APIs :

Gpt4o : https://openai.com

Sight Engine : https://sightengine.com/pricing

Web purify : https://www.webpurify.com/video-moderation/pricing/

AWS rekognition : https://docs.aws.amazon.com/rekognition/latest/dg/moderation.html

Hive Moderation : https://hivemoderation.com/visual-moderation

Response format:

1). Platform moderation (Audio):

index.html

typescript

{

"result": [

{

"type": "platform",

"violated": true,

"source": [

{

"type": "audio",

"details": [

{

"start": 4.0,

"end": 8.0,

"text": " Oh fuck, shit bitch, damn cocksucker, pussy asshole cunt",

"violations": [

{

"label": "harassment",

"reason": "Use of derogatory terms and profanity targeting individuals.",

"score": 0.83

},

{

"label": "sexual",

"reason": "Use of sexually explicit language.",

"score": 0.92

}

]

},

{

"start": 8.0,

"end": 12.0,

"text": " Mother fucking dirty whore, shout on to my lunch",

"violations": [

{

"label": "harassment",

"reason": "Use of derogatory terms and profanity targeting individuals.",

"score": 0.83

},

{

"label": "sexual",

"reason": "Use of sexually explicit language.",

"score": 0.92

}

]

},

{

"start": 12.0,

"end": 16.0,

"text": " Piss cunt, damn bitch, suck my dick, Jesus Harold Christ",

"violations": [

{

"label": "harassment",

"reason": "Use of derogatory terms and profanity targeting individuals.",

"score": 0.83

},

{

"label": "sexual",

"reason": "Use of sexually explicit language.",

"score": 0.92

}

]

},

{

"start": 16.0,

"end": 20.0,

"text": " Shit bitch, cocksucker, goddamn motherfucker, pussy asshole cunt",

"violations": [

{

"label": "harassment",

"reason": "Use of derogatory terms and profanity targeting individuals.",

"score": 0.83

},

{

"label": "sexual",

"reason": "Use of sexually explicit language.",

"score": 0.92

}

]

}

]

}

]

}

]

}

Example 2:

index.html

{

"result": [

{

"type": "platform",

"violated": true,

"source": [

{

"type": "audio",

"details": [

{

"start": 52.160000000000004,

"end": 59.68000000000001,

"text": " on me no what the fuck what the fuck is going on yo you know such a black ass",

"violations": [

{

"label": "harassment",

"reason": "Use of derogatory language",

"score": 0.49

}

]

}

]

}

]

}

]

}

2). Platform moderation (visual):

index.html

{

"result": {

"type": [

"platform"

],

"violated": true,

"source": [

{

"type": "visual",

"details": [

{	

"start": 2,

"end": 5,

"nsfw_L1": "Violence",

"nsfw_L2": "Physical Violence",

"score": [

0.95

],

"reason": [

"The frame depicts a scene of physical violence."

],

"text": null

}

]

}

]

}

}

3). Brand moderation (brand guidelines , based on text):

index.html

{

"result": [

{

"type": "brand",

"violated": true,

"source": [

{

"type": "audio",

"details": [

{

"start": 41.72,

"end": 45.879999999999995,

"text": " way to help treat chronic health conditions like I've mentioned, follow my page.",

"sentiment": 1,

"violations": [

{

"label": "No self promotion",

"reason": "Encourages following the page for personal promotion.",

"score": 0.95

}

]

}

]

}

]

}

]

}

4).All moderation result(platform + brand +community moderation, not applicable now):

index.html

{


"result": [

{

"type": "platform",

"violated": true,

"source": [

{

"type": "audio",

"details": [

{

"start": 4.0,

"end": 8.0,

"text": " Oh fuck, shit bitch, damn cocksucker, pussy asshole cunt",

"violations": [

{

"label": "harassment",

"reason": "use of derogatory language",

"score": 0.82

},

{

"label": "sexual",

"reason": "explicit sexual language",

"score": 0.92

}

]

},

{

"start": 8.0,

"end": 12.0,

"text": " Mother fucking dirty whore, shout on to my lunch",

"violations": [

{

"label": "harassment",

"reason": "use of derogatory language",

"score": 0.82

},

{

"label": "sexual",

"reason": "explicit sexual language",

"score": 0.92

}

]

},

{

"start": 12.0,

"end": 16.0,

"text": " Piss cunt, damn bitch, suck my dick, Jesus Harold Christ",

"violations": [

{

"label": "harassment",

"reason": "use of derogatory language",

"score": 0.82

},

{

"label": "sexual",

"reason": "explicit sexual language",

"score": 0.92

}

]

},

{

"start": 16.0,

"end": 20.0,

"text": " Shit bitch, cocksucker, goddamn motherfucker, pussy asshole cunt",

"violations": [

{

"label": "harassment",

"reason": "use of derogatory language",

"score": 0.82

},

{

"label": "sexual",

"reason": "explicit sexual language",

"score": 0.92

}

]

}

]

}

]

},

{

"type": "brand",

"violated": true,

"source": [

{

"type": "audio",

"details": [

{

"start": 4.0,

"end": 8.0,

"text": " Oh fuck, shit bitch, damn cocksucker, pussy asshole cunt",

"sentiment": -1,

"violations": [

{

"label": "Do not abuse",

"reason": "Oh fuck, shit bitch, damn cocksucker, pussy asshole cunt",

"score": 1.0

}

]

},

{

"start": 8.0,

"end": 12.0,

"text": " Mother fucking dirty whore, shout on to my lunch",

"sentiment": -1,

"violations": [

{

"label": "Do not abuse",

"reason": "Mother fucking dirty whore, shout on to my lunch",

"score": 1.0

}

]

},

{

"start": 12.0,

"end": 16.0,

"text": " Piss cunt, damn bitch, suck my dick, Jesus Harold Christ",

"sentiment": -1,

"violations": [

{

"label": "Do not abuse",

"reason": "Piss cunt, damn bitch, suck my dick, Jesus Harold Christ",

"score": 1.0

}

]

},

{

"start": 16.0,

"end": 20.0,

"text": " Shit bitch, cocksucker, goddamn motherfucker, pussy asshole cunt",

"sentiment": -1,

"violations": [

{

"label": "Do not abuse",

"reason": "Shit bitch, cocksucker, goddamn motherfucker, pussy asshole cunt",

"score": 1.0

}

]

}

]

}

]

}

]

}

5). No moderation(passes platform , brand and community moderation):

index.html

{

"result": null

}