Can I run Llama 3.1 405B (Q4_K_M (GGUF 4-bit)) on NVIDIA A100 80GB?

cancel
Fail/OOM
This GPU doesn't have enough VRAM
GPU VRAM
80.0GB
Required
202.5GB
Headroom
-122.5GB

VRAM Usage

0GB 100% used 80.0GB

info Technical Analysis

NVIDIA A100 80GB cannot run Llama 3.1 405B (405.00B) in this configuration. The model requires 202.5GB but only 80.0GB is available, leaving you 122.5GB short.

lightbulb Recommendation

Consider using a more aggressive quantization (Q4_K_M, Q3_K_M) to reduce VRAM requirements, or upgrade to a GPU with more VRAM. Cloud GPU services like RunPod or Vast.ai offer affordable options.

tune Recommended Settings

Batch_Size
None
Context_Length
None
Inference_Framework
llama.cpp or vLLM

help Frequently Asked Questions

Can I run Llama 3.1 405B (405.00B) on NVIDIA A100 80GB? expand_more
NVIDIA A100 80GB (80.0GB VRAM) cannot run Llama 3.1 405B (405.00B) which requires 202.5GB. You are 122.5GB short. Consider using a more aggressive quantization (like Q4_K_M or Q3_K_M) or upgrading to a GPU with more VRAM.
How much VRAM does Llama 3.1 405B (405.00B) need? expand_more
Llama 3.1 405B (405.00B) requires approximately 202.5GB of VRAM.
What performance can I expect? expand_more
Estimated None tokens per second.