Deliberate attempts to manipulate AI systems by providing specially crafted inputs designed to cause incorrect or harmful outputs. These attacks exploit vulnerabilities in how models process information to induce errors or bypass security measures.
Adversarial attacks represent significant security and reliability concerns for deployed AI systems. These attacks work by identifying and exploiting patterns in how models process inputs, often making subtle changes that are imperceptible to humans but dramatically alter model behaviour. Common types include evasion attacks (manipulating inputs to avoid detection), poisoning attacks (corrupting training data), and model stealing (extracting proprietary model information through carefully crafted queries).
Manipulating a product image with imperceptible modifications that cause an e-commerce content moderation system to misclassify a prohibited item as an allowed product, potentially circumventing safety controls.