Nvidia's partnership with Groq, focusing on inference technology, highlights the importance of efficient AI inference in scaling AI applications, potentially giving Nvidia an edge in the AI race by accelerating and reducing the cost of deploying large language models.
Nvidia's strategic licensing agreement with AI startup Groq, including key personnel hires, aims to strengthen its position in AI inference technology, signaling a shift from training to inference workloads and potentially expanding Nvidia's market dominance. The deal, which keeps Groq independent, is viewed positively by analysts as a move to address market share concerns and diversify Nvidia's AI offerings.
Intel has announced the Crescent Island GPU for data centers, featuring the new Xe3P architecture, 160 GB LPDDR5X memory, and optimized for AI inference workloads, with customer sampling expected in the second half of 2026.
Intel announced Crescent Island, a next-generation inference-optimized enterprise GPU built on the Xe3P architecture with 160GB of LPDDR5x memory, targeting AI inference workloads with a focus on power efficiency and cost, but it won't be available for sampling until H2 2026 at the earliest, with broad shipping likely in 2027.
oLLM is a lightweight Python library that enables large-context LLM inference on consumer GPUs by offloading weights and KV-cache to SSDs, maintaining high precision without quantization, and supporting models like Qwen3-Next-80B, GPT-OSS-20B, and Llama-3, making it feasible to run large models on 8 GB GPUs for offline tasks, though with lower throughput and storage demands.
NVIDIA has announced the Rubin CPX, a new GPU designed for massive-context AI applications like long-format video and large-scale software coding, offering unprecedented performance with 8 exaflops, 100TB memory, and integration with the Vera Rubin platform, enabling significant advancements in AI productivity and monetization.
Baseten, an AI inference infrastructure startup, raised $150 million at a $2.15 billion valuation, reflecting rapid growth in AI deployment tools, with a focus on inference as a key market in AI development.
The article introduces the analog optical computer (AOC), a novel hardware platform that combines optical and electronic components to efficiently perform AI inference and combinatorial optimization tasks through fixed-point iterative processes, promising significant improvements in speed and energy efficiency over digital systems.
Groq, a Silicon Valley-based company, is making waves in the AI chip race with its language processing units (LPUs) designed for AI language applications. CEO Jonathan Ross claims that most startups will be using Groq's LPUs by the end of 2024 due to their super-fast and cost-effective performance for large language model (LLM) inference. Ross also highlighted the advantages of Groq's LPUs over Nvidia GPUs, emphasizing their ability to provide faster LLM output and maintain privacy in chat queries. The company has seen a surge in interest following a viral moment and is poised to contribute to the supply of AI chips, with plans to increase capacity and collaborate with countries.
Meta Platforms, formerly known as Facebook, has unveiled its homegrown AI inference and video encoding chips at its AI Infra @ Scale event. The company has created its own hardware to drive its software stacks, top to bottom, and can do whatever it wants to create the hardware that drives it. The Meta Training and Inference Accelerator (MTIA) AI inference engine is based on a dual-core RISC-V processing element and is wrapped with a whole bunch of stuff but not so much that it won’t fit into a 25 watt chip and a 35 watt dual M.2 peripheral card.