Can I run LLaVA 1.6 7B on NVIDIA Jetson Orin Nano 8GB?

cancel
Fail/OOM
This GPU doesn't have enough VRAM
GPU VRAM
8.0GB
Required
14.0GB
Headroom
-6.0GB

VRAM Usage

0GB 100% used 8.0GB

info Technical Analysis

The NVIDIA Jetson Orin Nano 8GB, with its Ampere architecture, 1024 CUDA cores, and 32 Tensor cores, offers a capable platform for AI inference, especially considering its low 15W TDP. However, its primary limitation when running large language models like LLaVA 1.6 7B is its 8GB of LPDDR5 VRAM. LLaVA 1.6 7B, in FP16 precision, requires approximately 14GB of VRAM to load and operate efficiently. This creates a significant shortfall of 6GB, meaning the model cannot be loaded in FP16 without encountering out-of-memory errors. The 70 GB/s memory bandwidth, while decent for the Orin Nano's class, further constrains performance when attempting to work around the VRAM limitation through techniques like offloading layers to system RAM.

lightbulb Recommendation

Due to the VRAM limitation, running LLaVA 1.6 7B in FP16 on the Jetson Orin Nano 8GB is not feasible. The most viable approach is to aggressively quantize the model. Consider using 4-bit quantization (Q4_K_S or similar) via llama.cpp or similar inference frameworks. This will significantly reduce the VRAM footprint, potentially bringing it within the 8GB limit. Be aware that quantization will impact accuracy, and extensive testing is recommended to ensure acceptable performance for your specific application. Another option is to explore smaller vision-language models that have lower VRAM requirements.

tune Recommended Settings

Batch_Size
1
Context_Length
2048 (adjust based on VRAM usage after quantizati…
Other_Settings
['Offload layers to system RAM if necessary (CPU offloading), but be mindful of the performance impact.', 'Experiment with different quantization methods to find the best balance between VRAM usage and accuracy.', 'Reduce image resolution passed to the vision encoder to reduce VRAM usage.']
Inference_Framework
llama.cpp
Quantization_Suggested
Q4_K_S (or similar 4-bit quantization)

help Frequently Asked Questions

Is LLaVA 1.6 7B compatible with NVIDIA Jetson Orin Nano 8GB? expand_more
Not directly. The 8GB VRAM of the Jetson Orin Nano is insufficient to load LLaVA 1.6 7B in FP16. Quantization is necessary.
What VRAM is needed for LLaVA 1.6 7B? expand_more
LLaVA 1.6 7B requires approximately 14GB of VRAM in FP16 precision.
How fast will LLaVA 1.6 7B run on NVIDIA Jetson Orin Nano 8GB? expand_more
Performance will be limited by VRAM and memory bandwidth. Expect significantly reduced tokens/second compared to higher-end GPUs. Quantization and CPU offloading will further impact speed. Performance will vary based on chosen quantization and settings. Expect single-digit tokens/second, or even sub-1 token/second, depending on the chosen settings and model size.