Tag

Red Teaming

All articles tagged with #red teaming

artificial-intelligence1 year ago

"Enhancing AI Chatbot Safety: A Faster, Better Approach to Prevent Toxic Responses"

Researchers from MIT and the MIT-IBM Watson AI Lab have developed a machine learning technique to improve the red-teaming process for large language models, such as AI chatbots, to prevent them from generating toxic or unsafe responses. By training a red-team model to automatically generate diverse prompts that trigger a wider range of undesirable responses from the chatbot being tested, the researchers were able to outperform human testers and other machine-learning approaches. This approach provides a faster and more effective way to ensure the safety and trustworthiness of AI models, reducing the need for lengthy and costly manual verification processes.

technology1 year ago

"Microsoft's PyRIT: A Free GenAI Red Teaming Tool for Cybersecurity"

Microsoft has released PyRIT, an open access automation framework designed to proactively identify risks in generative artificial intelligence systems. The tool assesses the robustness of large language model endpoints against various harm categories and can identify security and privacy harms. It comes with five interfaces and offers options for scoring outputs from the target AI system. While not a replacement for manual red teaming, PyRIT complements a red team's domain expertise by highlighting risk "hot spots" and generating prompts for evaluation. Microsoft emphasizes the need for both manual probing and automation in red teaming generative AI systems, as Protect AI discloses critical vulnerabilities in popular AI supply chain platforms.

technology2 years ago

Tech Giants Employ Hackers to Uncover Alarming Vulnerabilities in AI Models

Tech giants like Google, Microsoft, Nvidia, and Meta have established in-house AI red teams to identify vulnerabilities in their AI models and ensure their safety. These red teams, consisting of external experts and internal employees, simulate adversarial attacks to uncover blind spots and risks in the technology. By injecting prompts that generate harmful and biased responses, the red teamers test the AI models for potential flaws. The practice of red teaming AI models is crucial in safeguarding against exploitation and ensuring the models are safe and usable. However, there is a delicate balance between safety and usability, as overly cautious models may become useless. Red teamers also share findings and collaborate to improve AI security across the industry.

technology2 years ago

Uncovering Bias and Vulnerabilities in Chatbots: The Race to Protect AI

Hackers and AI enthusiasts are participating in public "red teaming" events to find vulnerabilities and flaws in AI language models, such as chatbots, in order to address them before they can cause harm. These events, including the upcoming Generative Red Team Challenge at Def Con, aim to test AI models for issues like political misinformation, algorithmic discrimination, and defamatory claims. Leading AI firms have volunteered their chatbots and image generators for the competition, with the results being sealed for several months to allow companies time to address the flaws before they are made public. Red-teaming exercises are seen as a crucial step in identifying and mitigating potential harms, including biased assumptions and deceptive behavior, in AI systems.

technology2 years ago

DEF CON 31: AI Models Face Off Against Hackers.

DEF CON AI Village is hosting the largest red teaming exercise ever for any group of AI models, inviting hackers to find bugs and biases in large language models (LLMs) built by OpenAI, Google, Anthropic, and others. The event will host thousands of people, including hundreds of students, to find flaws in LLMs that power today's chatbots and generative AI. The event is also supported by the White House Office of Science, Technology, and Policy, America's National Science Foundation's Computer and Information Science and Engineering (CISE) Directorate, and the Congressional AI Caucus.