Content Moderation

Proof pending

9papers

5.3viability

-100%30d

Proof pending

Proof pending. Core topic summary fields are still materializing.

State of the Field

Content moderation is evolving to address the complexities of online interactions, particularly in detecting illicit activities and harmful behaviors across diverse platforms. Recent research highlights the use of advanced machine learning techniques, such as In-Context Learning and vision-language models, to enhance detection capabilities while minimizing the need for extensive labeled datasets. These innovations allow for better generalization to new threats and improve the accuracy of identifying harmful content in real-time. As online environments become increasingly dynamic, these advancements are crucial for builders seeking to create safer digital spaces, enabling proactive rather than reactive moderation strategies that can adapt to changing user behaviors and platform policies.

Last updated May 23, 2026

Topic-linked question coverage is still building for this proof surface.

Papers

1-9 of 9

Research Paper·Mar 30, 2026

Seeing the Unseen: Rethinking Illicit Promotion Detection with In-Context Learning

Illicit online promotion is a persistent threat that evolves to evade detection. Existing moderation systems remain tethered to platform-specific supervision and static taxonomies, a reactive paradigm...

7.0 viability

Research Paper·Apr 1, 2026

HarassGuard: Detecting Harassment Behaviors in Social Virtual Reality with Vision-Language Models

Social Virtual Reality (VR) platforms provide immersive social experiences but also expose users to serious risks of online harassment. Existing safety measures are largely reactive, while proactive s...

7.0 viability

Research Paper·Mar 5, 2026

Detection of Illicit Content on Online Marketplaces using Large Language Models

Online marketplaces, while revolutionizing global commerce, have inadvertently facilitated the proliferation of illicit activities, including drug trafficking, counterfeit sales, and cybercrimes. Trad...

7.0 viability

Research Paper·Jan 29, 2026

KID: Knowledge-Injected Dual-Head Learning for Knowledge-Grounded Harmful Meme Detection

Internet memes have become pervasive carriers of digital culture on social platforms. However, their heavy reliance on metaphors and sociocultural context also makes them subtle vehicles for harmful c...

7.0 viability

Research Paper·Feb 27, 2026

FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation

Ensuring the safety of LLM-generated content is essential for real-world deployment. Most existing guardrail models formulate moderation as a fixed binary classification task, implicitly assuming a fi...

5.0 viability

Research Paper·Mar 2, 2026

GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

Online content moderation is essential for maintaining a healthy digital environment, and reliance on AI for this task continues to grow. Consider a user comment using national stereotypes to insult a...

5.0 viability

Research Paper·Mar 29, 2026

Article and Comment Frames Shape the Quality of Online Comments

Framing theory posits that how information is presented shapes audience responses, but computational work has largely ignored audience reactions. While recent work showed that article framing systemat...

4.0 viability

Research Paper·Mar 17, 2026

KidsNanny: A Two-Stage Multimodal Content Moderation Pipeline Integrating Visual Classification, Object Detection, OCR, and Contextual Reasoning for Child Safety

We present KidsNanny, a two-stage multimodal content moderation architecture for child safety. Stage 1 combines a vision transformer (ViT) with an object detector for visual screening (11.7 ms); outpu...

3.0 viability

Research Paper·Apr 27, 2026

MemeScouts@LT-EDI 2026: Asking the Right Questions -- Prompted Weak Supervision for Meme Hate Speech Detection

Detecting hate speech in memes is challenging due to their multimodal nature and subtle, culturally grounded cues such as sarcasm and context. While recent vision-language models (VLMs) enable joint r...

3.0 viability

Content Moderation

Proof pending

State of the Field

Papers

Seeing the Unseen: Rethinking Illicit Promotion Detection with In-Context Learning

HarassGuard: Detecting Harassment Behaviors in Social Virtual Reality with Vision-Language Models

Detection of Illicit Content on Online Marketplaces using Large Language Models

KID: Knowledge-Injected Dual-Head Learning for Knowledge-Grounded Harmful Meme Detection

FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation

GMP: A Benchmark for Content Moderation under Co-occurring Violations and Dynamic Rules

Article and Comment Frames Shape the Quality of Online Comments

KidsNanny: A Two-Stage Multimodal Content Moderation Pipeline Integrating Visual Classification, Object Detection, OCR, and Contextual Reasoning for Child Safety

MemeScouts@LT-EDI 2026: Asking the Right Questions -- Prompted Weak Supervision for Meme Hate Speech Detection

Filters

Topic proof surfaces

Content Moderation

Use this topic page as a durable research-area proof surface