Current research in content moderation is increasingly focused on leveraging advanced machine learning techniques to tackle the complexities of online safety. Recent work highlights the application of large language models for detecting illicit content in online marketplaces, showcasing their superior performance in classifying nuanced communications compared to traditional methods. Additionally, innovative frameworks like Knowledge-Injected Dual-Head Learning are being developed to enhance the detection of harmful memes by integrating contextual knowledge, addressing the subtleties of digital culture. The introduction of FlexGuard represents a significant shift towards adaptive moderation, allowing for continuous risk scoring that accommodates varying strictness across platforms. Furthermore, new benchmarks are being established to evaluate AI systems' effectiveness in handling co-occurring violations and dynamic moderation rules, emphasizing the need for robust generalization in real-world scenarios. Collectively, these advancements aim to provide more effective, scalable solutions for content moderation, addressing pressing commercial challenges in maintaining safe online environments.
Social Virtual Reality (VR) platforms provide immersive social experiences but also expose users to serious risks of online harassment. Existing safety measures are largely reactive, while proactive s...
Illicit online promotion is a persistent threat that evolves to evade detection. Existing moderation systems remain tethered to platform-specific supervision and static taxonomies, a reactive paradigm...
Internet memes have become pervasive carriers of digital culture on social platforms. However, their heavy reliance on metaphors and sociocultural context also makes them subtle vehicles for harmful c...
Online marketplaces, while revolutionizing global commerce, have inadvertently facilitated the proliferation of illicit activities, including drug trafficking, counterfeit sales, and cybercrimes. Trad...
Ensuring the safety of LLM-generated content is essential for real-world deployment. Most existing guardrail models formulate moderation as a fixed binary classification task, implicitly assuming a fi...
Online content moderation is essential for maintaining a healthy digital environment, and reliance on AI for this task continues to grow. Consider a user comment using national stereotypes to insult a...
Framing theory posits that how information is presented shapes audience responses, but computational work has largely ignored audience reactions. While recent work showed that article framing systemat...
We present KidsNanny, a two-stage multimodal content moderation architecture for child safety. Stage 1 combines a vision transformer (ViT) with an object detector for visual screening (11.7 ms); outpu...