By Xingjun Ma, Fudan University, China, xingjunma@fudan.edu.cn | Yifeng Gao, Fudan University, China | Yixu Wang, Fudan University, China | Ruofan Wang, Fudan University, China | Xin Wang, Fudan University, China | Ye Sun, Fudan University, China | Yifan Ding, Fudan University, China | Hengyuan Xu, Fudan University, China | Yunhao Chen, Fudan University, China | Yunhao Zhao, Fudan University, China | Hanxun Huang, The University of Melbourne, Australia | Yige Li, Singapore Management University, Singapore | Yutao Wu, Deakin University, Australia | Jiaming Zhang, Hong Kong University of Science and Technology, Hong Kong | Xiang Zheng, City University of Hong Kong, Hong Kong | Yang Bai, ByteDance, China | Yiming Li, Nanyang Technological University, Singapore | Zuxuan Wu, Fudan University, China | Xipeng Qiu, Fudan University, China | Jingfeng Zhang, University of Auckland, New Zealand and RIKEN, Japan | Xudong Han, MBZUAI, UAE | Haonan Li, MBZUAI, UAE | Jun Sun, Singapore Management University, Singapore | Cong Wang, City University of Hong Kong, Hong Kong | Jindong Gu, University of Oxford, UK | Baoyuan Wu, Chinese University of Hong Kong, Shenzhen, China | Siheng Chen, Shanghai Jiao Tong University, China | Tianwei Zhang, Nanyang Technological University, Singapore | Yang Liu, Nanyang Technological University, Singapore | Mingming Gong, The University of Melbourne, Australia | Tongliang Liu, The University of Sydney, Australia | Shirui Pan, Griffith University, Australia | Cihang Xie, University of California, Santa Cruz, USA | Tianyu Pang, Sea AI Lab, Singapore | Yinpeng Dong, Tsinghua University, China | Ruoxi Jia, Virginia Tech, USA | Yang Zhang, CISPA Helmholtz Center for Information Security, Germany | Shiqing Ma, University of Massachusetts Amherst, USA | Xiangyu Zhang, Purdue University, USA | Neil Gong, Duke University, USA | Chaowei Xiao, University of Wisconsin - Madison, USA | Sarah Erfani, The University of Melbourne, Australia | Tim Baldwin, The University of Melbourne, Australia and MBZUAI, UAE | Bo Li, University of Illinois Urbana-Champaign, USA | Masashi Sugiyama, RIKEN, Japan and The University of Tokyo, Japan | Dacheng Tao, Nanyang Technological University, Singapore | James Bailey, The University of Melbourne, Australia | Yu-Gang Jiang, Fudan University, China, ygj@fudan.edu.cn
The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-powered Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attack, if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models. GitHub: https://github.com/xingjunm/Awesome-Large-Model-Safety.
The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This monograph provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-powered Agents.
The monograph covers the following topics: (1) A comprehensive taxonomy of safety threats to these models is presented, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) Defense strategies proposed for each type of attack, if available, are reviewed, and the commonly used datasets and benchmarks for safety research are summarized. (3) Building on this, open challenges in large model safety are identified and discussed, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, the necessity of collective efforts from the research community and international collaboration is highlighted. This monograph can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.