Skip to main content
NVIDIA
Explore
Models
Skills
Blueprints
GPUs
Docs
⌘KCtrl+K
View All Playbooks
View All Playbooks

llms chat

  • How to Get Started With Large Language Models on NVIDIA RTX PCs

visual gen

  • How to Get Started With Visual Generative AI on NVIDIA RTX PCs

How to Get Started With Large Language Models on NVIDIA RTX PCs

11 MIN

Learn about using LLMs locally on PCs and workstations with Ollama, AnythingLLM, and LM Studio.

AnythingLLMLLMsLM StudioOllamaRTX
Check out our blog here!
OverviewOverview

Many users want to run large language models (LLMs) locally for more privacy, control, and without subscriptions, but until recently, this meant a trade-off in output quality. Newly released open-weight models, like OpenAI’s gpt-oss and Alibaba’s Qwen 3, can run directly on PCs, delivering useful high-quality outputs, especially for local Agentic AI.

This opens up new opportunities for students, hobbyists and developers to explore generative AI applications locally. NVIDIA RTX PCs and NVIDIA RTX PRO workstations accelerate these experiences, delivering fast and snappy AI to users.

Getting Started With Local LLMs Optimized for RTX PCs and Workstations

NVIDIA has worked to optimize top LLM applications for RTX PCs, extracting maximum performance of Tensor Cores in RTX GPUs.

One of the easiest ways to get started with AI on a PC is with Ollama, an open-source tool that provides a simple interface for running and interacting with LLMs. It supports the ability to drag-and-drop PDFs into prompts, conversational chat and multimodal understanding workflows that include text and images.

NVIDIA has collaborated with Ollama to improve its performance and user experience on GeForce RTX GPUs and RTX PRO GPUs. The most recent developments include:

  • 50% performance improvements on OpenAI’s gpt-oss-20B model
  • 60% performance improvements on the new Gemma 3 270M and EmbeddingGemma models for hyper-efficient RAG
  • Improved model scheduling system to maximize and accurately report memory utilization
  • Stability enhancements to reduce the number of crashes

Ollama is a developer framework that can be used with other applications. For example, AnythingLLM — an open-source app that lets users build their own AI assistants powered by any LLM — can run on top of Ollama and benefit from all of its accelerations.

Enthusiasts can also get started with local LLMs using LM Studio, an app powered by the popular llama.cpp framework. The app provides a user-friendly interface for running models locally, letting users load different LLMs, chat with them in real time and even serve them as local application programming interface (API) endpoints for integration into custom projects.

NVIDIA has worked with llama.cpp to optimize performance on NVIDIA RTX GPUs. The latest updates include:

  • Support for the latest NVIDIA Nemotron Nano v2 9B model, which is based on the novel hybrid-mamba architecture
  • Flash Attention now turned on by default, offering up to 20% performance improvement compared to Flash Attention being turned off
  • CUDA kernel optimizations for RMS Norm and fast-div-based modulo, resulting in up to 9% performance improvements for popular models
  • Semantic versioning, making it easy for developers to adopt future releases

Learn more about gpt-oss on RTX and how NVIDIA has worked with LM Studio to accelerate LLM performance on RTX PCs.

Creating an AI-Powered Study Buddy With AnythingLLM

In addition to greater privacy and performance, running LLMs locally removes restrictions on how many files can be loaded or how long they stay available, enabling context-aware AI conversations for a longer period of time. This creates more flexibility for building conversational and generative AI-powered assistants.

For students, managing a flood of slides, notes, labs and past exams can be overwhelming. Local LLMs make it possible to create a personal tutor that can adapt to individual learning needs.

A simple way to do this is with AnythingLLM, an application that helps users to build custom AI chatbots and agents by connecting them to their documents and data. It supports document uploads, custom knowledge bases and conversational interfaces. This makes it a flexible tool for anyone who wants to create a customizable AI to help with research, projects or day-to-day tasks. And with RTX acceleration, users can experience even faster responses.

By loading syllabi, assignments and textbooks into AnythingLLM on RTX PCs and RTX PRO workstations, students can gain an adaptive, interactive study companion. They can ask the agent, using plain text or speech, to help with tasks like:

  • Generating flashcards from lecture slides: “Create flashcards from the Sound chapter lecture slides. Put key terms on one side and definitions on the other.”
  • Asking contextual questions tied to their materials: “Explain conservation of momentum using my Physics 8 notes.”
  • Creating and grading quizzes for exam prep: “Create a 10-question multiple-choice quiz based on chapters 5-6 of my chemistry textbook and grade my answers.”
  • Walking through tough problems step by step: “Show me how to solve problem 4 from my coding homework, step by step.”

Beyond the classroom, hobbyists and professionals can use AnythingLLM to prepare for certifications in new fields of study or for other similar purposes. And running locally on RTX GPUs ensures fast, private responses with no subscription costs or usage limits.

Resources

  • Ollama
  • AnythingLLM
  • LM Studio
Terms of Use
Privacy Policy
Your Privacy Choices
Contact

Copyright © 2026 NVIDIA Corporation