The primary limiting factor in running FLUX.1 Schnell (12B parameters) on an AMD RX 7800 XT is the VRAM. FLUX.1 Schnell, in FP16 precision, requires approximately 24GB of VRAM to load the model and its associated buffers. The RX 7800 XT is equipped with 16GB of GDDR6 VRAM, resulting in a shortfall of 8GB. This means the model, in its native FP16 format, cannot be loaded entirely onto the GPU, leading to a compatibility failure. While the RX 7800 XT boasts a respectable memory bandwidth of 0.62 TB/s, this is irrelevant when the entire model cannot reside in VRAM. The absence of Tensor Cores on the RX 7800 XT also means that INT8 or INT4 acceleration will be less efficient compared to NVIDIA GPUs with Tensor Cores.
Due to the VRAM limitation, directly running FLUX.1 Schnell on the RX 7800 XT in FP16 is not feasible. Consider model quantization techniques like 8-bit or 4-bit quantization (e.g., using bitsandbytes or llama.cpp) to significantly reduce the VRAM footprint. Alternatively, explore offloading some model layers to system RAM, though this will drastically reduce inference speed. If possible, consider using a smaller model or a GPU with sufficient VRAM (24GB or more) for optimal performance. Distributed inference across multiple GPUs is another option, but this requires significant technical expertise and infrastructure.