
OpenScholar: An open, citation-aware AI for synthesizing scientific literature
OpenScholar introduces a fully open, retrieval-augmented language-model pipeline and an up-to-date data store (OSDS) with 45 million papers to synthesize scientific literature. It uses a bi-encoder retriever, a cross-encoder reranker, and a self-feedback loop with citation verification to generate citation-backed long-form answers. In ScholarQABench across computer science, physics, biomedicine and neuroscience, OpenScholar-8B and OpenScholar-GPT-4o consistently outperform baselines (including GPT-4o) on correctness, coverage and citation accuracy, often matching or surpassing expert responses, while offering lower costs and full open-source access, including a public demo.
