Google is shifting away from Scale AI, a key player in human-in-the-loop AI training, after Meta Platforms acquired a nearly 50% stake in the company for $14.3 billion, signaling a strategic move by Meta to bolster its AI capabilities and control over training data, while Google seeks alternative providers to protect its proprietary datasets.
OpenAI has revolutionized the development of chatbots by using a technique called "reinforcement learning from human feedback." Prior to the release of their chatbot ChatGPT, OpenAI hired hundreds of workers to provide precise suggestions and feedback to improve the bot's responses. This technique, which involves workers acting as tutors, has transformed chatbots from a curiosity to mainstream technology. However, researchers warn that while human feedback improves the behavior of chatbots in some ways, it can also degrade performance in other areas. The accuracy of OpenAI's technology has dropped in certain situations, possibly due to continuing efforts to apply human feedback. Despite its limitations, human feedback remains a crucial tool in reducing misinformation and bias in chatbot systems.
Anthropic, an AI startup founded by former OpenAI employees, is focusing on "constitutional AI" to make AI systems safe. The company has created a set of principles, inspired by the UN's Universal Declaration of Human Rights, Apple's terms of service, and its own research, to train AI systems to follow certain sets of rules. The principles include guidance to prevent users from anthropomorphizing chatbots, telling the system not to present itself as a human, and to consider non-Western perspectives. The company's intention is to prove the general efficacy of its method and start a public discussion about how AI systems should be trained and what principles they should follow.
Reinforcement learning with human feedback (RLHF) is critical to ensuring the alignment and ethical implications of generative AI models. RLHF involves large and diverse sets of people providing feedback to the models, which can help reduce factual errors and customize AI models to fit business needs. With humans added to the feedback loop, human expertise and empathy can now guide the learning process for generative AI models, significantly improving overall performance. RLHF will strengthen the AI training process and ensure that businesses are building ethical generative AI models.