Tag

Synthetic Data

All articles tagged with #synthetic data

Universities Warn of Ethical Risks in AI-Generated Medical Data

Originally Published 4 months ago — by Nature

Featured image for Universities Warn of Ethical Risks in AI-Generated Medical Data
Source: Nature

Some research institutions in Canada, the US, and Italy are using AI-generated synthetic medical data that mimics real patient information without including actual human data, allowing them to bypass traditional ethics review processes due to the data's non-human status and potential privacy benefits.

Study Warns AI Models May Secretly Share Harmful Behaviors

Originally Published 5 months ago — by Yahoo Home

Featured image for Study Warns AI Models May Secretly Share Harmful Behaviors
Source: Yahoo Home

Research indicates that AI models can transmit hidden subliminal signals to each other through training data, potentially amplifying negative behaviors like violence, even when data appears benign to humans. This phenomenon, called subliminal learning, poses significant risks for AI safety and the use of synthetic data in training, as it may be impossible to fully prevent the transfer of harmful patterns between models.

Microsoft Unveils Phi-4 AI Model in Research Preview

Originally Published 1 year ago — by TechCrunch

Featured image for Microsoft Unveils Phi-4 AI Model in Research Preview
Source: TechCrunch

Microsoft has introduced Phi-4, the latest in its Phi series of generative AI models, available for limited research use on the Azure AI Foundry platform. This 14 billion parameter model excels in math problem-solving due to improved training data quality, including high-quality synthetic datasets. Phi-4 competes with other small models like GPT-4o mini and Claude 3.5 Haiku, offering faster and cheaper performance. The launch follows the departure of key developer Sébastien Bubeck to OpenAI.

OpenAI Explores Solutions for AI Progress Challenges

Originally Published 1 year ago — by TechCrunch

Featured image for OpenAI Explores Solutions for AI Progress Challenges
Source: TechCrunch

OpenAI is reportedly facing a slowdown in the improvement of its AI models, with its upcoming model, codenamed Orion, showing less advancement compared to previous iterations like GPT-4. To address this, OpenAI has formed a foundations team to explore new strategies, including using synthetic data for training and enhancing models post-training. Despite these efforts, Orion may not outperform existing models in certain areas, such as coding. OpenAI has not confirmed plans to release Orion this year.

"Inflation Woes and Disappointing Earnings Lead to Major Stock Market Losses"

Originally Published 1 year ago — by Yahoo Finance

Featured image for "Inflation Woes and Disappointing Earnings Lead to Major Stock Market Losses"
Source: Yahoo Finance

As earnings season approaches, skepticism around the returns on AI technologies is growing, with concerns about the immense costs and limitations of relying on synthetic data for training AI models. Tech companies are investing heavily in hardware and infrastructure to reduce their dependence on outside suppliers of AI chips, but the spending and warnings over data and resources will bring them closer to having to prove the profitability of their investments in the AI-led future.

"The Underground Race for AI Training Data: Tech Giants' Desperate Quest"

Originally Published 1 year ago — by The New York Times

Featured image for "The Underground Race for AI Training Data: Tech Giants' Desperate Quest"
Source: The New York Times

Tech companies like OpenAI and Google are exploring the use of synthetic data, generated by artificial intelligence, to train their A.I. models as they face copyright issues and potential data scarcity. However, the use of synthetic data is still experimental, as A.I. models can introduce biases and inaccuracies, potentially amplifying flaws in the training process.

"AI Giants Struggle with Data Depletion: The Quest for More Training Data"

Originally Published 1 year ago — by Futurism

Featured image for "AI Giants Struggle with Data Depletion: The Quest for More Training Data"
Source: Futurism

AI companies are facing a shortage of training data as they continue to build larger models, leading to the exploration of alternative sources such as publicly-available video transcripts and synthetic data. Some companies are considering controversial methods like training on transcriptions from public YouTube videos, while others are working on creating higher-quality synthetic data. Concerns about AI running out of data have been raised, but researchers believe that breakthroughs could address the issue. However, the solution may also involve reevaluating the pursuit of larger models due to environmental and resource concerns.

"DeepMind's AI Masters Olympiad Geometry Challenges"

Originally Published 2 years ago — by Nature.com

Featured image for "DeepMind's AI Masters Olympiad Geometry Challenges"
Source: Nature.com

Researchers have developed AlphaGeometry, a neuro-symbolic theorem prover that uses synthetic data to solve olympiad-level geometry problems. By generating 100 million synthetic theorems and their proofs, AlphaGeometry outperforms previous state-of-the-art geometry-theorem-proving computer programs and approaches the performance of an average International Mathematical Olympiad (IMO) gold medallist. The method combines language modeling and specialized symbolic engines to produce human-readable proofs, achieving a success rate of 25 out of 30 problems on a test set of classical geometry problems. The synthetic data generation process rediscovers known theorems and lemmas, demonstrating the potential of this approach in theorem proving.

Fake Data Crucial for Neural Network Learning.

Originally Published 2 years ago — by Quanta Magazine

Featured image for Fake Data Crucial for Neural Network Learning.
Source: Quanta Magazine

Researchers are increasingly turning to synthetic data to supplement or even replace natural data for training neural networks. Synthetic data is proving useful in addressing concerns about facial recognition, as many facial recognition systems are trained with huge libraries of images of real faces, which raises issues about privacy and bias. Microsoft has released a collection of 100,000 synthetic faces for training AI systems, generated from a set of 500 people who gave permission for their faces to be scanned. The computer can label every part of every face, which helps the neural net learn faster.