Tag

Ai Inference

All articles tagged with #ai inference

business18 days ago

Nvidia's Strategic Partnership with Groq Boosts AI Chip Competition and Stock

Nvidia's strategic licensing agreement with AI startup Groq, including key personnel hires, aims to strengthen its position in AI inference technology, signaling a shift from training to inference workloads and potentially expanding Nvidia's market dominance. The deal, which keeps Groq independent, is viewed positively by analysts as a move to address market share concerns and diversify Nvidia's AI offerings.

technology3 months ago

oLLM: Lightweight Python Library Enables 100K-Context LLMs on 8GB GPUs with SSD Offload

oLLM is a lightweight Python library that enables large-context LLM inference on consumer GPUs by offloading weights and KV-cache to SSDs, maintaining high precision without quantization, and supporting models like Qwen3-Next-80B, GPT-OSS-20B, and Llama-3, making it feasible to run large models on 8 GB GPUs for offline tasks, though with lower throughput and storage demands.

technology4 months ago

NVIDIA Launches Rubin CPX: A Next-Gen AI GPU for Video and Software Innovation

NVIDIA has announced the Rubin CPX, a new GPU designed for massive-context AI applications like long-format video and large-scale software coding, offering unprecedented performance with 8 exaflops, 100TB memory, and integration with the Vera Rubin platform, enabling significant advancements in AI productivity and monetization.

technology1 year ago

"Groq's LPU: The Future Standard for AI Startups' Speedy Computation"

Groq, a Silicon Valley-based company, is making waves in the AI chip race with its language processing units (LPUs) designed for AI language applications. CEO Jonathan Ross claims that most startups will be using Groq's LPUs by the end of 2024 due to their super-fast and cost-effective performance for large language model (LLM) inference. Ross also highlighted the advantages of Groq's LPUs over Nvidia GPUs, emphasizing their ability to provide faster LLM output and maintain privacy in chat queries. The company has seen a surge in interest following a viral moment and is poised to contribute to the supply of AI chips, with plans to increase capacity and collaborate with countries.

technology2 years ago

Meta Unveils Next-Gen AI Chip and Datacenter Technologies.

Meta Platforms, formerly known as Facebook, has unveiled its homegrown AI inference and video encoding chips at its AI Infra @ Scale event. The company has created its own hardware to drive its software stacks, top to bottom, and can do whatever it wants to create the hardware that drives it. The Meta Training and Inference Accelerator (MTIA) AI inference engine is based on a dual-core RISC-V processing element and is wrapped with a whole bunch of stuff but not so much that it won’t fit into a 25 watt chip and a 35 watt dual M.2 peripheral card.