Testing GEMM: Assertion failed: deep_gemm/jit/../include/deep_gemm/fp8_gemm.cuh:369, condition: cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size) == cudaSuccess terminate called after throwing an instance of 'AssertionException' what(): Assertion failed: cudaFuncSetAttribute(kernel, cudaFuncAttributeMaxDynamicSharedMemorySize, smem_size) == cudaSuccess
Try to lower the sm90_capacity value in gemm.py: I think 128KB is the correct value for RTX 5080 compared to 256KB for the H100/H800.
And probably add ", 3, 2, 1" after "6, 5, 4".
> DeepGEMM exclusively supports NVIDIA Hopper tensor cores