FlashAttention3 has not yet been integrated into many frameworks such as HuggingFace; FA3 only works when directly calling flash-attention functions from flash_attn_interface
See https://github.com/TheRainstorm/attention_tests/blob/main/benchmark_flashinfer.py for an example on how to use FA3.


Last tested: , version 2.7.4.post1, commit fd2fc9d


Save the following as a job file, and submit using sbatch:

#!/bin/bash
#SBATCH -p gpu_h100
#SBATCH -t 03:00:00
#SBATCH --gpus 1
 
module load 2024 Python/3.12.3-GCCcore-13.3.0
module load CUDA/12.6.0 # Needed for nvcc: matching current pytorch cuda version (for PyTorch 2.7), or: `module load 2023 CUDA/12.4.0` for PyTorch <2.7  

# Execute these in a preferred location
python -m venv venv
source venv/bin/activate
 
python -m pip install torch numpy pytest packaging setuptools einops ninja

git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention/hopper
python setup.py install

# Actually test
PYTHONPATH=$PWD:$PYTHONPATH pytest -q -s test_flash_attn.py 
 
#TADA!

Troubleshooting

  • No labels