Dataset Viewer
Auto-converted to Parquet Duplicate
Batch Size
int64
1
1
Seq Length
int64
128
128
New Tokens
int64
16
64
Torch Compile
stringclasses
3 values
Implementation
stringclasses
4 values
Mean Generation Latency (ms)
float64
182
5.12k
Mean Prefill Latency (ms)
float64
76.5
498
Mean Decode Latency (ms)
float64
96.7
4.64k
Peak Mem (MB)
float64
27.4k
35.9k
1
128
16
False
eager
1,452
480.2
971.81
27,425.16
1
128
16
max-autotune-no-cudagraphs
eager
1,620.98
498
1,122.99
27,410.34
1
128
16
False
grouped_mm
850.87
76.51
774.35
27,425.4
1
128
16
max-autotune-no-cudagraphs
grouped_mm
492.87
76.79
416.08
27,425.4
1
128
16
False
batched_mm
815.11
316.56
498.56
35,866.47
1
128
16
max-autotune
batched_mm
412.98
316.33
96.65
35,866.48
1
128
16
False
grouped_prefill+batched_decode
588.87
77.15
511.72
27,470.51
1
128
16
max-autotune
grouped_prefill+batched_decode
181.79
76.78
105.01
27,424.49
1
128
64
False
eager
4,524.16
486.84
4,037.31
27,418.44
1
128
64
max-autotune-no-cudagraphs
eager
5,116.71
477.42
4,639.29
27,419.62
1
128
64
False
grouped_mm
3,327.45
76.46
3,250.98
27,434.67
1
128
64
max-autotune-no-cudagraphs
grouped_mm
1,824.7
76.55
1,748.15
27,433.49
1
128
64
False
batched_mm
2,411.23
316.18
2,095.05
35,875.48
1
128
64
max-autotune
batched_mm
707.73
316.24
391.49
35,875.48
1
128
64
False
grouped_prefill+batched_decode
2,219.34
76.89
2,142.45
27,479.51
1
128
64
max-autotune
grouped_prefill+batched_decode
489.06
76.88
412.18
27,433.5
README.md exists but content is empty.
Downloads last month
24