Tag

Ollm

All articles tagged with #ollm

technology3 months ago

oLLM: Lightweight Python Library Enables 100K-Context LLMs on 8GB GPUs with SSD Offload

oLLM is a lightweight Python library that enables large-context LLM inference on consumer GPUs by offloading weights and KV-cache to SSDs, maintaining high precision without quantization, and supporting models like Qwen3-Next-80B, GPT-OSS-20B, and Llama-3, making it feasible to run large models on 8 GB GPUs for offline tasks, though with lower throughput and storage demands.