FlashAttention3 has not yet been integrated into many frameworks such as HuggingFace; FA3 only works when directly calling flash-attention functions from flash_attn_interface
See https://github.com/TheRainstorm/attention_tests/blob/main/benchmark_flashinfer.py for an example on how to use FA3.
Last tested: , version 2.7.4.post1, commit fd2fc9d
Save the following as a job file, and submit using sbatch
:
#!/bin/bash #SBATCH -p gpu_h100 #SBATCH -t 03:00:00 #SBATCH --gpus 1 module load 2024 Python/3.12.3-GCCcore-13.3.0 module load CUDA/12.6.0 # Needed for nvcc: matching current pytorch cuda version (for PyTorch 2.7), or: `module load 2023 CUDA/12.4.0` for PyTorch <2.7 # Execute these in a preferred location python -m venv venv source venv/bin/activate python -m pip install torch numpy pytest packaging setuptools einops ninja git clone https://github.com/Dao-AILab/flash-attention cd flash-attention/hopper python setup.py install # Actually test PYTHONPATH=$PWD:$PYTHONPATH pytest -q -s test_flash_attn.py #TADA!
Troubleshooting
- tests are failing