Text to Knowledge Graph on DGX Station

Common issues

Symptom	Cause	Fix
Ollama performance issues	Suboptimal settings for GB300	Set environment variables: `OLLAMA_FLASH_ATTENTION=1` (enables flash attention for better performance) `OLLAMA_KEEP_ALIVE=30m` (keeps model loaded for 30 minutes) `OLLAMA_MAX_LOADED_MODELS=1` (avoids VRAM contention) `OLLAMA_KV_CACHE_TYPE=q8_0` (reduces KV cache VRAM with minimal performance impact)
VRAM exhausted or memory pressure (e.g. when switching between Ollama models)	GPU memory fragmentation	Clear GPU memory: `nvidia-smi --gpu-reset` or restart Docker containers
Slow triple extraction	Large model or large context window	Reduce document chunk size or use faster models
ArangoDB connection refused	Service not fully started	Wait 30s after start.sh, verify with `docker ps`
Container fails to start with GPU error	NVIDIA Container Toolkit not configured	Run `nvidia-ctk runtime configure --runtime=docker` and restart Docker
Port already in use	Previous instance still running	Run `./stop.sh` first or use `docker compose down`
Default is vLLM; need Ollama instead	Prefer ArangoDB + Ollama	Start with `./start.sh --ollama`.
vLLM takes long to become ready	Model load can take 30+ minutes	The start script waits and shows elapsed time. The UI shows a banner and "vLLM (Local) – Initializing…" until ready. Check progress: `docker logs vllm-service -f`.

NOTE

DGX Station with GB300 Ultra provides massive GPU memory capacity, enabling you to run larger models (70B+) for higher-quality knowledge extraction. If you encounter memory issues with very large models, try reducing the context window size or using quantized model variants.

Text to Knowledge Graph on DGX Station

Common issues

Resources