Safety in AI

Proof pending

3papers

5.3viability

Proof pending

Proof pending. This topic has not reached the minimum paper threshold yet.

Topic-linked question coverage is still building for this proof surface.

Papers

1-3 of 3

Research Paper·Mar 16, 2026

Two Birds, One Projection: Harmonizing Safety and Utility in LVLMs via Inference-time Feature Projection

Existing jailbreak defence frameworks for Large Vision-Language Models often suffer from a safety utility tradeoff, where strengthening safety inadvertently degrades performance on general visual-grou...

7.0 viability

Research Paper·Mar 12, 2026

OrthoEraser: Coupled-Neuron Orthogonal Projection for Concept Erasure

Text-to-image (T2I) models face significant safety risks from adversarial induction, yet current concept erasure methods often cause collateral damage to benign attributes when suppressing selected ne...

7.0 viability

Research Paper·Mar 16, 2026

Beyond Creed: A Non-Identity Safety Condition A Strong Empirical Alternative to Identity Framing in Low-Data LoRA Fine-Tuning

How safety supervision is written may matter more than the explicit identity content it contains. We study low-data LoRA safety fine-tuning with four supervision formats built from the same core safet...

2.0 viability

Safety in AI

Proof pending

Papers

Two Birds, One Projection: Harmonizing Safety and Utility in LVLMs via Inference-time Feature Projection

OrthoEraser: Coupled-Neuron Orthogonal Projection for Concept Erasure

Beyond Creed: A Non-Identity Safety Condition A Strong Empirical Alternative to Identity Framing in Low-Data LoRA Fine-Tuning

Filters

Topic proof surfaces

Safety in AI

Use this topic page as a durable research-area proof surface