Learn about using LLMs locally on PCs and workstations with Ollama, AnythingLLM, and LM Studio.
Many users want to run large language models (LLMs) locally for more privacy, control, and without subscriptions, but until recently, this meant a trade-off in output quality. Newly released open-weight models, like OpenAI’s gpt-oss and Alibaba’s Qwen 3, can run directly on PCs, delivering useful high-quality outputs, especially for local Agentic AI.
This opens up new opportunities for students, hobbyists and developers to explore generative AI applications locally. NVIDIA RTX PCs and NVIDIA RTX PRO workstations accelerate these experiences, delivering fast and snappy AI to users.
Getting Started With Local LLMs Optimized for RTX PCs and Workstations
NVIDIA has worked to optimize top LLM applications for RTX PCs, extracting maximum performance of Tensor Cores in RTX GPUs.
One of the easiest ways to get started with AI on a PC is with Ollama, an open-source tool that provides a simple interface for running and interacting with LLMs. It supports the ability to drag-and-drop PDFs into prompts, conversational chat and multimodal understanding workflows that include text and images.
NVIDIA has collaborated with Ollama to improve its performance and user experience on GeForce RTX GPUs and RTX PRO GPUs. The most recent developments include:
Ollama is a developer framework that can be used with other applications. For example, AnythingLLM — an open-source app that lets users build their own AI assistants powered by any LLM — can run on top of Ollama and benefit from all of its accelerations.
Enthusiasts can also get started with local LLMs using LM Studio, an app powered by the popular llama.cpp framework. The app provides a user-friendly interface for running models locally, letting users load different LLMs, chat with them in real time and even serve them as local application programming interface (API) endpoints for integration into custom projects.
NVIDIA has worked with llama.cpp to optimize performance on NVIDIA RTX GPUs. The latest updates include:
Learn more about gpt-oss on RTX and how NVIDIA has worked with LM Studio to accelerate LLM performance on RTX PCs.
Creating an AI-Powered Study Buddy With AnythingLLM
In addition to greater privacy and performance, running LLMs locally removes restrictions on how many files can be loaded or how long they stay available, enabling context-aware AI conversations for a longer period of time. This creates more flexibility for building conversational and generative AI-powered assistants.
For students, managing a flood of slides, notes, labs and past exams can be overwhelming. Local LLMs make it possible to create a personal tutor that can adapt to individual learning needs.
A simple way to do this is with AnythingLLM, an application that helps users to build custom AI chatbots and agents by connecting them to their documents and data. It supports document uploads, custom knowledge bases and conversational interfaces. This makes it a flexible tool for anyone who wants to create a customizable AI to help with research, projects or day-to-day tasks. And with RTX acceleration, users can experience even faster responses.
By loading syllabi, assignments and textbooks into AnythingLLM on RTX PCs and RTX PRO workstations, students can gain an adaptive, interactive study companion. They can ask the agent, using plain text or speech, to help with tasks like:
Beyond the classroom, hobbyists and professionals can use AnythingLLM to prepare for certifications in new fields of study or for other similar purposes. And running locally on RTX GPUs ensures fast, private responses with no subscription costs or usage limits.