Bridging Pixels and Words: Mask-Aware Local Semantic Fusion for Multimodal Media Verification | Signal Canvas | ScienceToStartup