Can I run BGE-M3 on NVIDIA Jetson Orin Nano 8GB?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
8.0GB
Required
1.0GB
Headroom
+7.0GB

VRAM Usage

0GB 13% used 8.0GB

Performance Estimate

Tokens/sec ~90.0
Batch size 32

info Technical Analysis

The NVIDIA Jetson Orin Nano 8GB is exceptionally well-suited for running the BGE-M3 embedding model. The Orin Nano's 8GB of LPDDR5 VRAM provides ample headroom for the model's 1.0GB FP16 VRAM requirement, leaving a substantial 7GB buffer for other processes and larger batch sizes. While the memory bandwidth of 0.07 TB/s might be a limiting factor for larger models, it is more than sufficient for the relatively small 0.5B parameter BGE-M3 model. The Ampere architecture, with its 1024 CUDA cores and 32 Tensor Cores, provides a strong foundation for efficient matrix operations crucial for embedding generation.

lightbulb Recommendation

Given the generous VRAM headroom, users can experiment with larger batch sizes (up to 32) to maximize throughput. Consider using a framework like ONNX Runtime or TensorRT to further optimize inference speed on the Jetson Orin Nano. While FP16 precision should work well, exploring INT8 quantization could provide an additional performance boost with minimal accuracy loss, especially if memory bandwidth becomes a bottleneck at higher batch sizes. Monitoring GPU utilization and memory usage is crucial to fine-tune the optimal batch size and context length for your specific application.

tune Recommended Settings

Batch_Size
32
Context_Length
8192
Other_Settings
['Enable CUDA graph capture', 'Use asynchronous execution', 'Optimize tensor layouts']
Inference_Framework
ONNX Runtime, TensorRT
Quantization_Suggested
INT8

help Frequently Asked Questions

Is BGE-M3 compatible with NVIDIA Jetson Orin Nano 8GB? expand_more
Yes, BGE-M3 is fully compatible with the NVIDIA Jetson Orin Nano 8GB.
What VRAM is needed for BGE-M3? expand_more
BGE-M3 requires approximately 1.0GB of VRAM when using FP16 precision.
How fast will BGE-M3 run on NVIDIA Jetson Orin Nano 8GB? expand_more
Expect approximately 90 tokens per second with optimized settings, but this can vary based on batch size and specific implementation.