LLM in the Loop: Creating the PARADEHATE Dataset for Hate Speech Detoxification Paper • 2506.01484 • Published Jun 2 • 6
Keep Security! Benchmarking Security Policy Preservation in Large Language Model Contexts Against Indirect Attacks in Question Answering Paper • 2505.15805 • Published May 21 • 3
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models Paper • 2505.02847 • Published May 1 • 28
Llama-3.1-FoundationAI-SecurityLLM-Base-8B Technical Report Paper • 2504.21039 • Published Apr 28 • 15
Open Deep Search: Democratizing Search with Open-source Reasoning Agents Paper • 2503.20201 • Published Mar 26 • 48
view article Article AI Policy @🤗: Response to the White House AI Action Plan RFI +1 Mar 19 • 30
SafeArena: Evaluating the Safety of Autonomous Web Agents Paper • 2503.04957 • Published Mar 6 • 21
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval Paper • 2503.08644 • Published Mar 11 • 16
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 Paper • 2502.12659 • Published Feb 18 • 7
Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region Paper • 2502.13946 • Published Feb 19 • 10
SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL Paper • 2502.11438 • Published Feb 17 • 8
GuardReasoner: Towards Reasoning-based LLM Safeguards Paper • 2501.18492 • Published Jan 30 • 88