Tag

Ai Inference

All articles tagged with #ai inference

Maia 200 Pushes Cloud AI In-House, But Nvidia Keeps the Data Center Edge
technology1 month ago

Maia 200 Pushes Cloud AI In-House, But Nvidia Keeps the Data Center Edge

Microsoft’s Maia 200 is an in‑house AI inference accelerator for Azure that claims strong performance per dollar and will power OpenAI models, signaling rising cloud‑provider pressure on Nvidia. While Maia 200 underscores a shift toward custom silicon, Nvidia still leads the data‑center AI market with its broad GPU ecosystem and software stack, and though cloud‑provider alternatives may erode pricing power over time, a rapid disruption to Nvidia’s position appears unlikely, even as valuations remain rich given AI growth.

Nvidia's Strategic Partnership with Groq Boosts AI Chip Competition and Stock
business2 months ago

Nvidia's Strategic Partnership with Groq Boosts AI Chip Competition and Stock

Nvidia's strategic licensing agreement with AI startup Groq, including key personnel hires, aims to strengthen its position in AI inference technology, signaling a shift from training to inference workloads and potentially expanding Nvidia's market dominance. The deal, which keeps Groq independent, is viewed positively by analysts as a move to address market share concerns and diversify Nvidia's AI offerings.

oLLM: Lightweight Python Library Enables 100K-Context LLMs on 8GB GPUs with SSD Offload
technology5 months ago

oLLM: Lightweight Python Library Enables 100K-Context LLMs on 8GB GPUs with SSD Offload

oLLM is a lightweight Python library that enables large-context LLM inference on consumer GPUs by offloading weights and KV-cache to SSDs, maintaining high precision without quantization, and supporting models like Qwen3-Next-80B, GPT-OSS-20B, and Llama-3, making it feasible to run large models on 8 GB GPUs for offline tasks, though with lower throughput and storage demands.

NVIDIA Launches Rubin CPX: A Next-Gen AI GPU for Video and Software Innovation
technology5 months ago

NVIDIA Launches Rubin CPX: A Next-Gen AI GPU for Video and Software Innovation

NVIDIA has announced the Rubin CPX, a new GPU designed for massive-context AI applications like long-format video and large-scale software coding, offering unprecedented performance with 8 exaflops, 100TB memory, and integration with the Vera Rubin platform, enabling significant advancements in AI productivity and monetization.

"Groq's LPU: The Future Standard for AI Startups' Speedy Computation"
technology2 years ago

"Groq's LPU: The Future Standard for AI Startups' Speedy Computation"

Groq, a Silicon Valley-based company, is making waves in the AI chip race with its language processing units (LPUs) designed for AI language applications. CEO Jonathan Ross claims that most startups will be using Groq's LPUs by the end of 2024 due to their super-fast and cost-effective performance for large language model (LLM) inference. Ross also highlighted the advantages of Groq's LPUs over Nvidia GPUs, emphasizing their ability to provide faster LLM output and maintain privacy in chat queries. The company has seen a surge in interest following a viral moment and is poised to contribute to the supply of AI chips, with plans to increase capacity and collaborate with countries.

Meta Unveils Next-Gen AI Chip and Datacenter Technologies.
technology2 years ago

Meta Unveils Next-Gen AI Chip and Datacenter Technologies.

Meta Platforms, formerly known as Facebook, has unveiled its homegrown AI inference and video encoding chips at its AI Infra @ Scale event. The company has created its own hardware to drive its software stacks, top to bottom, and can do whatever it wants to create the hardware that drives it. The Meta Training and Inference Accelerator (MTIA) AI inference engine is based on a dual-core RISC-V processing element and is wrapped with a whole bunch of stuff but not so much that it won’t fit into a 25 watt chip and a 35 watt dual M.2 peripheral card.