Tag

Ai Inference

All articles tagged with #ai inference

technology1 month ago•7 min saved

Maia 200 Pushes Cloud AI In-House, But Nvidia Keeps the Data Center Edge

Microsoft’s Maia 200 is an in‑house AI inference accelerator for Azure that claims strong performance per dollar and will power OpenAI models, signaling rising cloud‑provider pressure on Nvidia. While Maia 200 underscores a shift toward custom silicon, Nvidia still leads the data‑center AI market with its broad GPU ecosystem and software stack, and though cloud‑provider alternatives may erode pricing power over time, a rapid disruption to Nvidia’s position appears unlikely, even as valuations remain rich given AI growth.

via The Motley Fool|

#ai-inference #cloud-computing #data-center

technology2 months ago•2 min saved

Nvidia's $20B AI Inference Bet Boosts Groq and Reshapes the AI Landscape

Nvidia's partnership with Groq, focusing on inference technology, highlights the importance of efficient AI inference in scaling AI applications, potentially giving Nvidia an edge in the AI race by accelerating and reducing the cost of deploying large language models.

via Axios|

#ai-hardware #ai-inference #groq

business2 months ago•3 min saved

Nvidia's Strategic Partnership with Groq Boosts AI Chip Competition and Stock

Nvidia's strategic licensing agreement with AI startup Groq, including key personnel hires, aims to strengthen its position in AI inference technology, signaling a shift from training to inference workloads and potentially expanding Nvidia's market dominance. The deal, which keeps Groq independent, is viewed positively by analysts as a move to address market share concerns and diversify Nvidia's AI offerings.

via Investor's Business Daily|

#ai-chips #ai-inference #business

technology4 months ago•2 min saved

Intel Unveils Crescent Island GPU with 160GB Memory for AI Inference

Intel has announced the Crescent Island GPU for data centers, featuring the new Xe3P architecture, 160 GB LPDDR5X memory, and optimized for AI inference workloads, with customer sampling expected in the second half of 2026.

via Wccftech|

#ai-inference #crescent-island #gpu

technology4 months ago•4 min saved

Intel Unveils Crescent Island Inference GPU with 160GB vRAM

Intel announced Crescent Island, a next-generation inference-optimized enterprise GPU built on the Xe3P architecture with 160GB of LPDDR5x memory, targeting AI inference workloads with a focus on power efficiency and cost, but it won't be available for sampling until H2 2026 at the earliest, with broad shipping likely in 2027.

via Phoronix|

#ai-inference #crescent-island #gpu

technology5 months ago•3 min saved

oLLM: Lightweight Python Library Enables 100K-Context LLMs on 8GB GPUs with SSD Offload

oLLM is a lightweight Python library that enables large-context LLM inference on consumer GPUs by offloading weights and KV-cache to SSDs, maintaining high precision without quantization, and supporting models like Qwen3-Next-80B, GPT-OSS-20B, and Llama-3, making it feasible to run large models on 8 GB GPUs for offline tasks, though with lower throughput and storage demands.

via MarkTechPost|

#ai-inference #gpu-memory-optimization #large-language-model

technology5 months ago•5 min saved

NVIDIA Launches Rubin CPX: A Next-Gen AI GPU for Video and Software Innovation

NVIDIA has announced the Rubin CPX, a new GPU designed for massive-context AI applications like long-format video and large-scale software coding, offering unprecedented performance with 8 exaflops, 100TB memory, and integration with the Vera Rubin platform, enabling significant advancements in AI productivity and monetization.

via NVIDIA Newsroom|

#ai-inference #gpu #massive-context-processing

technology5 months ago•6 min saved

Baseten Secures $150M in Funding, Valuation Hits $2.15B

Baseten, an AI inference infrastructure startup, raised $150 million at a $2.15 billion valuation, reflecting rapid growth in AI deployment tools, with a focus on inference as a key market in AI development.

via Fortune|

#ai-inference #ai-infrastructure #baseten

technology5 months ago•54 min saved

Microsoft's Analog Optical Computer Boosts AI and Optimization

The article introduces the analog optical computer (AOC), a novel hardware platform that combines optical and electronic components to efficiently perform AI inference and combinatorial optimization tasks through fixed-point iterative processes, promising significant improvements in speed and energy efficiency over digital systems.

via Nature|

#ai-inference #analog-computing #combinatorial-optimization

technology2 years ago•6 min saved

"Groq's LPU: The Future Standard for AI Startups' Speedy Computation"

Groq, a Silicon Valley-based company, is making waves in the AI chip race with its language processing units (LPUs) designed for AI language applications. CEO Jonathan Ross claims that most startups will be using Groq's LPUs by the end of 2024 due to their super-fast and cost-effective performance for large language model (LLM) inference. Ross also highlighted the advantages of Groq's LPUs over Nvidia GPUs, emphasizing their ability to provide faster LLM output and maintain privacy in chat queries. The company has seen a surge in interest following a viral moment and is poised to contribute to the supply of AI chips, with plans to increase capacity and collaborate with countries.

via VentureBeat|

#ai-chips #ai-inference #groq

technology2 years ago•8 min saved

Meta Unveils Next-Gen AI Chip and Datacenter Technologies.

Meta Platforms, formerly known as Facebook, has unveiled its homegrown AI inference and video encoding chips at its AI Infra @ Scale event. The company has created its own hardware to drive its software stacks, top to bottom, and can do whatever it wants to create the hardware that drives it. The Meta Training and Inference Accelerator (MTIA) AI inference engine is based on a dual-core RISC-V processing element and is wrapped with a whole bunch of stuff but not so much that it won’t fit into a 25 watt chip and a 35 watt dual M.2 peripheral card.

via The Next Platform|

#ai-inference #custom-silicon #dlrms