Text to Knowledge Graph on DGX Station
30 MIN
Transform unstructured text into interactive knowledge graphs with LLM inference and graph visualization
Common issues
| Symptom | Cause | Fix |
|---|---|---|
| Ollama performance issues | Suboptimal settings for GB300 | Set environment variables:OLLAMA_FLASH_ATTENTION=1 (enables flash attention for better performance)OLLAMA_KEEP_ALIVE=30m (keeps model loaded for 30 minutes)OLLAMA_MAX_LOADED_MODELS=1 (avoids VRAM contention)OLLAMA_KV_CACHE_TYPE=q8_0 (reduces KV cache VRAM with minimal performance impact) |
| VRAM exhausted or memory pressure (e.g. when switching between Ollama models) | GPU memory fragmentation | Clear GPU memory: nvidia-smi --gpu-reset or restart Docker containers |
| Slow triple extraction | Large model or large context window | Reduce document chunk size or use faster models |
| ArangoDB connection refused | Service not fully started | Wait 30s after start.sh, verify with docker ps |
| Container fails to start with GPU error | NVIDIA Container Toolkit not configured | Run nvidia-ctk runtime configure --runtime=docker and restart Docker |
| Port already in use | Previous instance still running | Run ./stop.sh first or use docker compose down |
| Default is vLLM; need Ollama instead | Prefer ArangoDB + Ollama | Start with ./start.sh --ollama. |
| vLLM takes long to become ready | Model load can take 30+ minutes | The start script waits and shows elapsed time. The UI shows a banner and "vLLM (Local) – Initializing…" until ready. Check progress: docker logs vllm-service -f. |
NOTE
DGX Station with GB300 Ultra provides massive GPU memory capacity, enabling you to run larger models (70B+) for higher-quality knowledge extraction. If you encounter memory issues with very large models, try reducing the context window size or using quantized model variants.