Can I run LLaVA 1.6 34B on NVIDIA Jetson Orin Nano 8GB?

cancel
Fail/OOM
This GPU doesn't have enough VRAM
GPU VRAM
8.0GB
Required
68.0GB
Headroom
-60.0GB

VRAM Usage

0GB 100% used 8.0GB

info Technical Analysis

The primary limiting factor in running large language models (LLMs) like LLaVA 1.6 34B is VRAM. This model, in FP16 precision, requires approximately 68GB of VRAM to load the model weights and perform inference. The NVIDIA Jetson Orin Nano 8GB, as the name suggests, only provides 8GB of VRAM. This creates a significant VRAM headroom deficit of -60GB, meaning the model cannot be loaded entirely onto the GPU. Attempting to run the model directly would result in out-of-memory errors. The memory bandwidth of 0.07 TB/s on the Jetson Orin Nano, while adequate for smaller models, would also become a bottleneck if swapping to system RAM were attempted, severely impacting inference speed. Finally, the Ampere architecture and number of CUDA and Tensor cores, while capable, cannot overcome the fundamental limitation of insufficient VRAM.

lightbulb Recommendation

Due to the substantial VRAM discrepancy, directly running LLaVA 1.6 34B on the Jetson Orin Nano 8GB is not feasible. Consider smaller models that fit within the 8GB VRAM limit. Alternatively, explore aggressive quantization techniques like Q4_K_M or even lower precisions if supported by the inference framework. Offloading layers to system RAM (CPU) is possible, but will drastically reduce performance, making it unsuitable for real-time or interactive applications. If possible, consider using a more powerful GPU with sufficient VRAM, or leverage cloud-based inference services.

tune Recommended Settings

Batch_Size
1
Context_Length
512
Other_Settings
['Reduce the number of layers loaded on GPU', 'Enable memory offloading to system RAM (expect significant performance degradation)', 'Use a smaller model variant if available']
Inference_Framework
llama.cpp
Quantization_Suggested
Q4_K_M or lower (e.g., Q2_K)

help Frequently Asked Questions

Is LLaVA 1.6 34B compatible with NVIDIA Jetson Orin Nano 8GB? expand_more
No, it is not directly compatible due to insufficient VRAM.
What VRAM is needed for LLaVA 1.6 34B? expand_more
LLaVA 1.6 34B requires approximately 68GB of VRAM in FP16 precision.
How fast will LLaVA 1.6 34B run on NVIDIA Jetson Orin Nano 8GB? expand_more
It is unlikely to run at all without significant modifications and performance will be very slow if memory offloading is used.