Adversarial AI: Threats, Defenses, and the Role of Explainability in Building Trustworthy Systems
DOI:
https://doi.org/10.56127/ijst.v2i2.1955Keywords:
Adversarial AI, Security of Machine Learning, Adversarial Attacks and Defense Mechanism, Explainable AI (XAI), Trustworthy AI, Robustness, Interpretability, Transparency in AI.Abstract
Artificial Intelligence has made possible the latest revolutions in the industry. Nevertheless, adversarial AI turns out to be a serious challenge because of its tendency to exploit the vulnerabilities of machine learning models, breach their security, and eventually lead them to fail, mostly unless very few. Adversarial attacks can be evasion and poisoning, model inversion, and so forth; they indeed say how fragile an AI system is and also suggest a proper immediate call for solid defensive structures. Several adversarial defense mechanisms have been proposed―from adversarial training to defensive distillation and certified defenses―yet they remain vulnerable to high-level attacks. This included the emergence of explainable artificial intelligence (XAI) as one of the significant components in AI security, whereby capturing interpretability and transparency can lead to better threat detection and user trust. This work encompasses a literature review of adversarial AIs, current developments in adversarial defenses, and the role played by XAI in reducing threats from such adversarial systems. In effect, the paper presents an integrated framework with techniques of explainability for the building of resilient, transparent, and trustworthy AI systems.
References
1. Tiwari, S., Sresth, V., & Srivastava, A. (2020). The Role of Explainable AI in Cybersecurity: Addressing Transparency Challenges in Autonomous Defense Systems. International Journal of Innovative Research in Science Engineering and Technology, 9, 718-733.
2. Chamola, V., Hassija, V., Sulthana, A. R., Ghosh, D., Dhingra, D., & Sikdar, B. (2023). A review of trustworthy and explainable artificial intelligence (xai). IEEe Access, 11, 78994- 79015.
3. Xu, Y., Bai, T., Yu, W., Chang, S., Atkinson, P. M., & Ghamisi, P. (2023). AI security for geoscience and remote sensing: Challenges and future trends. IEEE Geoscience and Remote Sensing Magazine, 11(2), 60-85.
4. Rawal, A., McCoy, J., Rawat, D. B., Sadler, B. M., & Amant, R. S. (2021). Recent advances in trustworthy explainable artificial intelligence: Status, challenges, and perspectives. IEEE Transactions on Artificial Intelligence, 3(6), 852-866.
5. Zhang, Z., Al Hamadi, H., Damiani, E., Yeun, C. Y., & Taher, F. (2022). Explainable artificial intelligence applications in cyber security: State-of-the-art in research. IEEe Access, 10, 93104-93139.
6. Hamon, R., Junklewitz, H., & Sanchez, I. (2020). Robustness and explainability of artificial intelligence. Publications Office of the European Union, 207, 2020.
7. Nassar, M., Salah, K., Ur Rehman, M. H., & Svetinovic, D. (2020). Blockchain for explainable and trustworthy artificial intelligence. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(1), e1340.
8. Nassar, M., Salah, K., Ur Rehman, M. H., & Svetinovic, D. (2020). Blockchain for explainable and trustworthy artificial intelligence. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(1), e1340.
9. Liu, H., Wang, Y., Fan, W., Liu, X., Li, Y., Jain, S., ... & Tang, J. (2022). Trustworthy ai: A computational perspective. ACM Transactions on Intelligent Systems and Technology, 14(1), 1-59.
10. Mahima, K. Y., Ayoob, M., & Poravi, G. (2021). An Assessment of Robustness for Adversarial Attacks and Physical Distortions on Image Classification using Explainable AI. In AI-Cybersec@ SGAI (pp. 14-28).
11. Moustafa, N., Koroniotis, N., Keshk, M., Zomaya, A. Y., & Tari, Z. (2023). Explainable intrusion detection for cyber defences in the internet of things: Opportunities and solutions. IEEE Communications Surveys & Tutorials, 25(3), 1775-1807.
12. Kuppa, A., & Le-Khac, N. A. (2021). Adversarial XAI methods in cybersecurity. IEEE transactions on information forensics and security, 16, 4924-4938.
13. Li, H., Wu, J., Xu, H., Li, G., & Guizani, M. (2021). Explainable intelligence-driven defense mechanism against advanced persistent threats: A joint edge game and AI approach. IEEE Transactions on Dependable and Secure Computing, 19(2), 757-775.
14. Shah, H. (2019). Deep Learning Architectures for Safe and Secure Artificial Intelligence. MULTIDISCIPLINARY JOURNAL OF INSTRUCTION (MDJI), 2(1), 60-69.
15. Li, B., Qi, P., Liu, B., Di, S., Liu, J., Pei, J., ... & Zhou, B. (2023). Trustworthy AI: From principles to practices. ACM Computing Surveys, 55(9), 1-46.
16. Sabir, B., Babar, M. A., & Abuadbba, S. (2023). Interpretability and transparency-driven detection and transformation of textual adversarial examples (it-dt). arXiv preprint arXiv:2307.01225.
17. Eldrandaly, K. A., Abdel-Basset, M., Ibrahim, M., & Abdel-Aziz, N. M. (2023). Explainable and secure artificial intelligence: taxonomy, cases of study, learned lessons, challenges and future directions. Enterprise Information Systems, 17(9), 2098537.
18. Kaur, D., Uslu, S., Rittichier, K. J., & Durresi, A. (2022). Trustworthy artificial intelligence: a review. ACM computing surveys (CSUR), 55(2), 1-38.
19. Garcia, W., Choi, J. I., Adari, S. K., Jha, S., & Butler, K. R. (2018). Explainable black-box attacks against model-based authentication. arXiv preprint arXiv:1810.00024.
20. Brundage, M., Avin, S., Wang, J., Belfield, H., Krueger, G., Hadfield, G., ... & Anderljung,
M. (2020). Toward trustworthy AI development: mechanisms for supporting verifiable claims. arXiv preprint arXiv:2004.07213.
21. Brundage, M., Avin, S., Wang, J., Belfield, H., Krueger, G., Hadfield, G., ... & Anderljung,
M. (2020). Toward trustworthy AI development: mechanisms for supporting verifiable claims. arXiv preprint arXiv:2004.07213.
22. Wali, S., & Khan, I. (2021). Explainable AI and random forest based reliable intrusion detection system. Authorea Preprints.
23. Vadillo, J., Santana, R., & Lozano, J. A. (2021). When and how to fool explainable models (and humans) with adversarial examples. arXiv preprint arXiv:2107.01943.
24. Aljanabi, M. (2023). Safeguarding connected health: Leveraging trustworthy AI techniques to harden intrusion detection systems against data poisoning threats in IoMT environments. Babylonian Journal of Internet of Things, 2023, 31-37.
25. Straub, J. (2022, October). Increasing Trust in Artificial Intelligence with a Defensible AI Technique. In 2022 IEEE Applied Imagery Pattern Recognition Workshop (AIPR) (pp. 1-7). IEEE.
26. El-Sappagh, S., Alonso-Moral, J. M., Abuhmed, T., Ali, F., & Bugarín-Diz, A. (2023). Trustworthy artificial intelligence in Alzheimer’s disease: state of the art, opportunities, and challenges. Artificial Intelligence Review, 56(10), 11149-11296.
27. Ganguly, N., Fazlija, D., Badar, M., Fisichella, M., Sikdar, S., Schrader, J., ... & Nejdl, W. (2023). A review of the role of causality in developing trustworthy ai systems. arXiv preprint arXiv:2302.06975.
28. Gittens, A., Yener, B., & Yung, M. (2022). An adversarial perspective on accuracy, robustness, fairness, and privacy: multilateral-tradeoffs in trustworthy ML. IEEE Access, 10, 120850-120865.
29. Malik, A. E., Andresini, G., Appice, A., & Malerba, D. (2022, September). An XAI-based adversarial training approach for cyber-threat detection. In 2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech) (pp. 1-8). IEEE.
30. Alzubaidi, Laith, Aiman Al-Sabaawi, Jinshuai Bai, Ammar Dukhan, Ahmed H. Alkenani, Ahmed Al-Asadi, Haider A. Alwzwazy et al. "Towards Risk‐Free Trustworthy Artificial Intelligence: Significance and Requirements." International Journal of Intelligent Systems 2023, no. 1 (2023): 4459198.