NelsaHost

Hardware AI Models Compatibility Compare

search Login

Can I run Llama 3 70B (Q4_K_M (GGUF 4-bit)) on NVIDIA RTX 4090?

cancel

Fail/OOM

This GPU doesn't have enough VRAM

GPU VRAM

24.0GB

Required

35.0GB

Headroom

-11.0GB

VRAM Usage

0GB 100% used 24.0GB

info Technical Analysis

NVIDIA RTX 4090 cannot run Llama 3 70B (70.00B) in this configuration. The model requires 35.0GB but only 24.0GB is available, leaving you 11.0GB short.

lightbulb Recommendation

Consider using a more aggressive quantization (Q4_K_M, Q3_K_M) to reduce VRAM requirements, or upgrade to a GPU with more VRAM. Cloud GPU services like RunPod or Vast.ai offer affordable options.

tune Recommended Settings

Batch_Size

None

Context_Length

None

Inference_Framework

llama.cpp or vLLM

help Frequently Asked Questions

Can I run Llama 3 70B (70.00B) on NVIDIA RTX 4090? expand_more

NVIDIA RTX 4090 (24.0GB VRAM) cannot run Llama 3 70B (70.00B) which requires 35.0GB. You are 11.0GB short. Consider using a more aggressive quantization (like Q4_K_M or Q3_K_M) or upgrading to a GPU with more VRAM.

How much VRAM does Llama 3 70B (70.00B) need? expand_more

Llama 3 70B (70.00B) requires approximately 35.0GB of VRAM.

What performance can I expect? expand_more

Estimated None tokens per second.