Skip to content

Commit 718a92f

Browse files
committed
Update demoBERT benchmark data from PBR# 156504
Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>
1 parent ddaf0b9 commit 718a92f

File tree

1 file changed

+114
-1
lines changed

1 file changed

+114
-1
lines changed

demo/BERT/README.md

Lines changed: 114 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -425,4 +425,117 @@ Also note that BERT Large engines, especially using mixed precision with large b
425425

426426
### Results
427427

428-
To be published soon.
428+
The following sections provide details on how we achieved our performance and inference.
429+
430+
#### Inference performance: NVIDIA A100 (40GB)
431+
432+
Our results were obtained by running the `scripts/inference_benchmark.sh --gpu Ampere` script in the container generated by the TensorRT OSS Dockerfile on NVIDIA A100 with (1x A100 40G) GPUs.
433+
434+
##### BERT Base
435+
436+
| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
437+
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
438+
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
439+
| 128 | 1 | 0.33 | 0.97 | 0.58 | 0.75 | 0.75 | 0.72 |
440+
| 128 | 2 | 0.78 | 0.79 | 0.63 | 0.84 | 1.07 | 0.84 |
441+
| 128 | 4 | 0.76 | 0.98 | 0.76 | 1.13 | 1.46 | 1.14 |
442+
| 128 | 8 | 1.08 | 1.08 | 0.98 | 1.66 | 1.81 | 1.66 |
443+
| 128 | 12 | 1.26 | 1.63 | 1.27 | 2.07 | 2.07 | 2.07 |
444+
| 128 | 16 | 1.47 | 1.48 | 1.47 | 2.48 | 2.49 | 2.48 |
445+
| 128 | 24 | 2.13 | 2.13 | 2.13 | 3.47 | 3.49 | 3.46 |
446+
| 128 | 32 | 2.54 | 2.83 | 2.54 | 4.37 | 4.40 | 4.34 |
447+
| 128 | 64 | 4.58 | 4.59 | 4.54 | 8.70 | 8.79 | 8.65 |
448+
| 128 | 128 | 9.04 | 9.06 | 8.97 | 17.05 | 17.07 | 16.90 |
449+
| 384 | 1 | 1.15 | 1.15 | 1.15 | 1.43 | 1.44 | 1.43 |
450+
| 384 | 2 | 1.37 | 1.37 | 1.37 | 1.84 | 2.21 | 1.84 |
451+
| 384 | 4 | 1.73 | 1.74 | 1.73 | 2.47 | 2.48 | 2.47 |
452+
| 384 | 8 | 2.51 | 2.51 | 2.51 | 3.77 | 3.80 | 3.76 |
453+
| 384 | 12 | 3.61 | 3.62 | 3.61 | 5.36 | 5.37 | 5.30 |
454+
| 384 | 16 | 4.39 | 4.40 | 4.38 | 7.32 | 7.32 | 7.24 |
455+
| 384 | 24 | 6.24 | 6.24 | 6.23 | 10.50 | 10.51 | 10.41 |
456+
| 384 | 32 | 8.42 | 8.50 | 8.42 | 14.32 | 14.44 | 14.27 |
457+
| 384 | 64 | 16.48 | 16.52 | 16.36 | 27.51 | 27.54 | 27.33 |
458+
| 384 | 128 | 31.71 | 31.78 | 31.58 | | | |
459+
460+
##### BERT Large
461+
462+
| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
463+
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
464+
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
465+
| 128 | 1 | 1.24 | 1.56 | 1.24 | 1.73 | 2.11 | 1.73 |
466+
| 128 | 2 | 1.49 | 1.49 | 1.49 | 2.20 | 2.20 | 2.20 |
467+
| 128 | 4 | 1.91 | 1.92 | 1.91 | 3.22 | 3.23 | 3.22 |
468+
| 128 | 8 | 2.94 | 2.94 | 2.93 | 4.84 | 4.84 | 4.83 |
469+
| 128 | 12 | 3.34 | 3.34 | 3.34 | 5.95 | 5.96 | 5.90 |
470+
| 128 | 16 | 4.63 | 4.64 | 4.62 | 7.98 | 7.99 | 7.90 |
471+
| 128 | 24 | 5.87 | 5.88 | 5.87 | 11.05 | 11.08 | 10.94 |
472+
| 128 | 32 | 7.99 | 7.99 | 7.98 | 14.74 | 14.77 | 14.59 |
473+
| 128 | 64 | 14.74 | 17.74 | 14.56 | 28.09 | 28.25 | 27.85 |
474+
| 128 | 128 | 28.32 | 23.38 | 28.03 | 54.38 | 54.40 | 54.12 |
475+
| 384 | 1 | 2.80 | 2.80 | 2.80 | 3.49 | 3.49 | 3.48 |
476+
| 384 | 2 | 3.12 | 3.13 | 3.12 | 4.71 | 4.72 | 4.71 |
477+
| 384 | 4 | 4.27 | 4.27 | 4.27 | 6.70 | 6.71 | 6.70 |
478+
| 384 | 8 | 7.66 | 7.67 | 7.66 | 12.41 | 12.53 | 12.37 |
479+
| 384 | 12 | 10.07 | 10.08 | 10.07 | 17.63 | 17.76 | 17.56 |
480+
| 384 | 16 | 13.34 | 13.34 | 13.33 | 23.40 | 23.46 | 23.19 |
481+
| 384 | 24 | 19.36 | 19.38 | 19.22 | 34.34 | 34.36 | 34.10 |
482+
| 384 | 32 | 25.56 | 25.60 | 25.56 | 44.94 | 44.98 | 44.78 |
483+
| 384 | 64 | 49.84 | 49.92 | 49.60 | 87.26 | 87.56 | 86.77 |
484+
| 384 | 128 | 97.66 | 97.78 | 97.06 | 170.85 | 171.00 | 170.08 |
485+
486+
487+
#### Inference performance: NVIDIA T4 (16GB)
488+
489+
Our results were obtained by running the `scripts/inference_benchmark.sh --gpu Turing` script in the container generated by the TensorRT OSS Dockerfile on NVIDIA T4 with (1x T4 16G) GPUs.
490+
491+
##### BERT Base
492+
493+
| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
494+
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
495+
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
496+
| 128 | 1 | 1.55 | 1.57 | 1.33 | 2.00 | 2.06 | 1.93 |
497+
| 128 | 2 | 1.78 | 2.06 | 1.75 | 2.54 | 2.58 | 2.49 |
498+
| 128 | 4 | 2.80 | 2.88 | 2.74 | 4.25 | 4.34 | 4.16 |
499+
| 128 | 8 | 4.48 | 4.56 | 4.42 | 8.13 | 8.74 | 7.88 |
500+
| 128 | 12 | 6.28 | 6.31 | 6.12 | 11.67 | 12.12 | 11.30 |
501+
| 128 | 16 | 8.92 | 9.11 | 8.78 | 17.24 | 17.79 | 16.70 |
502+
| 128 | 24 | 12.70 | 12.84 | 12.53 | 24.48 | 24.85 | 24.90 |
503+
| 128 | 32 | 17.90 | 18.41 | 17.59 | 33.02 | 33.51 | 32.65 |
504+
| 128 | 64 | 34.80 | 34.83 | 34.31 | 65.38 | 65.43 | 64.28 |
505+
| 128 | 128 | 68.16 | 68.46 | 67.05 | 130.77 | 131.01 | 129.19 |
506+
| 384 | 1 | 2.47 | 2.53 | 2.43 | 3.76 | 3.81 | 3.69 |
507+
| 384 | 2 | 3.87 | 3.95 | 3.81 | 6.31 | 6.43 | 6.21 |
508+
| 384 | 4 | 7.15 | 7.18 | 6.97 | 12.16 | 12.22 | 12.03 |
509+
| 384 | 8 | 14.09 | 12.11 | 13.73 | 25.45 | 25.83 | 24.94 |
510+
| 384 | 12 | 20.99 | 21.12 | 20.66 | 38.15 | 38.38 | 37.51 |
511+
| 384 | 16 | 27.49 | 27.65 | 27.08 | 50.90 | 51.36 | 50.04 |
512+
| 384 | 24 | 41.93 | 42.17 | 41.36 | 77.25 | 78.16 | 76.05 |
513+
| 384 | 32 | 54.65 | 54.87 | 54.06 | 102.44 | 103.09 | 101.30 |
514+
| 384 | 64 | 109.78 | 110.42 | 108.24 | 200.58 | 201.20 | 198.68 |
515+
| 384 | 128 | 227.46 | 228.80 | 223.92 | 401.33 | 402.14 | 399.24 |
516+
517+
##### BERT Large
518+
519+
| Sequence Length | Batch Size | INT8 Latency (ms) | | | FP16 Latency (ms) | | |
520+
|-----------------|------------|-----------------|-----------------|---------|-----------------|-----------------|---------|
521+
| | | 95th Percentile | 99th Percentile | Average | 95th Percentile | 99th Percentile | Average |
522+
| 128 | 1 | 3.59 | 3.61 | 3.51 | 5.10 | 5.18 | 5.02 |
523+
| 128 | 2 | 4.93 | 5.03 | 4.83 | 7.72 | 7.73 | 7.58 |
524+
| 128 | 4 | 8.15 | 8.19 | 7.93 | 13.67 | 13.85 | 13.56 |
525+
| 128 | 8 | 14.21 | 14.23 | 13.89 | 26.88 | 27.66 | 26.35 |
526+
| 128 | 12 | 22.41 | 22.47 | 21.91 | 41.04 | 41.29 | 40.30 |
527+
| 128 | 16 | 29.30 | 29.83 | 28.82 | 55.04 | 55.27 | 54.05 |
528+
| 128 | 24 | 44.60 | 44.63 | 43.92 | 81.24 | 82.28 | 79.59 |
529+
| 128 | 32 | 60.88 | 61.48 | 58.97 | 114.13 | 114.47 | 112.78 |
530+
| 128 | 64 | 111.78 | 112.02 | 110.77 | 224.24 | 225.02 | 221.97 |
531+
| 128 | 128 | 223.99 | 224.28 | 222.33 | 417.56 | 418.54 | 415.33 |
532+
| 384 | 1 | 7.18 | 7.27 | 7.07 | 11.74 | 11.96 | 11.51 |
533+
| 384 | 2 | 12.22 | 12.25 | 11.92 | 21.47 | 21.61 | 20.97 |
534+
| 384 | 4 | 35.95 | 36.43 | 35.63 | 42.03 | 42.35 | 41.36 |
535+
| 384 | 8 | 47.06 | 47.22 | 46.41 | 83.16 | 83.51 | 82.06 |
536+
| 384 | 12 | 66.04 | 66.04 | 65.89 | 127.10 | 127.99 | 127.46 |
537+
| 384 | 16 | 87.98 | 88.45 | 87.13 | 164.13 | 165.12 | 161.96 |
538+
| 384 | 24 | 132.56 | 132.96 | 131.24 | 262.76 | 263.68 | 258.96 |
539+
| 384 | 32 | 179.44 | 180.61 | 176.66 | 329.99 | 331.67 | 325.59 |
540+
| 384 | 64 | 352.81 | 353.39 | 350.21 | 684.19 | 686.39 | 674.76 |
541+
| 384 | 128 | 706.85 | 707.73 | 704.38 | 1318.74 | 1320.22 | 1315.10 |

0 commit comments

Comments
 (0)