Can I run BGE-Small-EN on NVIDIA Jetson Orin Nano 8GB?

check_circle
Perfect
Yes, you can run this model!
GPU VRAM
8.0GB
Required
0.1GB
Headroom
+7.9GB

VRAM Usage

0GB 1% used 8.0GB

Performance Estimate

Tokens/sec ~90.0
Batch size 32

info Technical Analysis

The NVIDIA Jetson Orin Nano 8GB is exceptionally well-suited for running the BGE-Small-EN embedding model. With 8GB of LPDDR5 VRAM and the model only requiring 0.1GB in FP16 precision, there's a substantial VRAM headroom of 7.9GB. This allows for comfortable operation even with larger batch sizes and potentially other concurrent AI tasks. The Orin Nano's Ampere architecture, featuring 1024 CUDA cores and 32 Tensor Cores, provides ample computational power for the model's relatively small 0.03B parameter size.

Memory bandwidth, though modest at 0.07 TB/s, is not a significant bottleneck for BGE-Small-EN. Embedding models are generally less memory-bandwidth intensive than large language models. The estimated 90 tokens/sec performance is a reasonable expectation for this hardware and model combination. The estimated batch size of 32 leverages the available VRAM efficiently, maximizing throughput without exceeding memory capacity. This setup allows for efficient inference, making it suitable for edge applications and real-time processing.

lightbulb Recommendation

For optimal performance on the Jetson Orin Nano, prioritize using a framework optimized for NVIDIA GPUs, such as TensorRT or ONNX Runtime with CUDA execution. While FP16 precision works perfectly, consider experimenting with INT8 quantization to potentially increase throughput further, given the ample VRAM. Monitor memory usage and adjust batch sizes as needed to avoid exceeding the 8GB limit, especially if running other processes concurrently.

If you encounter performance limitations, verify that the Jetson Orin Nano is running in its maximum performance mode and that thermal throttling is not occurring. Ensure the latest NVIDIA drivers are installed for optimal compatibility and performance. Consider optimizing the input pipeline to minimize data transfer overhead between the CPU and GPU. You can also try different inference frameworks to find the one that works best with your specific application.

tune Recommended Settings

Batch_Size
32
Context_Length
512
Other_Settings
['Enable CUDA graph capture', 'Optimize input pipeline', 'Ensure proper cooling for sustained performance']
Inference_Framework
TensorRT
Quantization_Suggested
INT8

help Frequently Asked Questions

Is BGE-Small-EN compatible with NVIDIA Jetson Orin Nano 8GB? expand_more
Yes, BGE-Small-EN is perfectly compatible with the NVIDIA Jetson Orin Nano 8GB.
What VRAM is needed for BGE-Small-EN? expand_more
BGE-Small-EN requires approximately 0.1GB of VRAM in FP16 precision.
How fast will BGE-Small-EN run on NVIDIA Jetson Orin Nano 8GB? expand_more
You can expect BGE-Small-EN to run at approximately 90 tokens/sec on the NVIDIA Jetson Orin Nano 8GB.