The NVIDIA Jetson Orin Nano 8GB is incompatible with the FLUX.1 Dev model due to insufficient VRAM. FLUX.1 Dev, with its 12 billion parameters, requires 24GB of VRAM when using FP16 (half-precision floating point) for inference. The Jetson Orin Nano 8GB only provides 8GB of VRAM, resulting in a deficit of 16GB. This means the entire model cannot be loaded onto the GPU simultaneously, leading to out-of-memory errors and preventing successful inference.
Furthermore, even if aggressive quantization techniques were applied to reduce the model's memory footprint, the limited memory bandwidth of 0.07 TB/s on the Jetson Orin Nano would likely result in significantly reduced performance. Loading model weights and intermediate activations during inference would become a bottleneck, severely impacting the tokens/second generation rate. While the Ampere architecture and Tensor Cores could accelerate certain operations, they cannot overcome the fundamental VRAM limitation and memory bandwidth constraints.
Due to the significant VRAM shortfall, running FLUX.1 Dev on the Jetson Orin Nano 8GB is not feasible without substantial modifications. Consider exploring smaller diffusion models that fit within the available 8GB of VRAM. Alternatively, you could investigate offloading some model layers to system RAM, but this would drastically reduce performance. For optimal performance with FLUX.1 Dev, a GPU with at least 24GB of VRAM is strongly recommended. Another possibility is to utilize distributed inference across multiple devices, although this is a complex setup best suited for advanced users.
If you are set on using the Orin Nano, focus on extreme quantization techniques such as 4-bit or even 2-bit quantization. This might allow the model to fit into VRAM, but will likely reduce the quality of the output. Also, experiment with very small batch sizes, potentially even down to a batch size of 1, to reduce memory pressure.