Skip to content

Commit 2d517d2

Browse files
committed
TensorRT OSS v8.2 Early Access Release
Signed-off-by: Rajeev Rao <rajeevrao@nvidia.com>
1 parent 80674b3 commit 2d517d2

File tree

278 files changed

+432506
-56929
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

278 files changed

+432506
-56929
lines changed

CHANGELOG.md

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,44 @@
11
# TensorRT OSS Release Changelog
22

3+
## [8.2.0 EA](https://docs.nvidia.com/deeplearning/tensorrt/release-notes/tensorrt-8.html#rel-8-2-0-EA) - 2021-10-05
4+
### Added
5+
- [Demo applications](demo/HuggingFace) showcasing TensorRT inference of [HuggingFace Transformers](https://huggingface.co/transformers).
6+
- Support is currently extended to GPT-2 and T5 models.
7+
- Added support for the following ONNX operators:
8+
- `Einsum`
9+
- `IsNan`
10+
- `GatherND`
11+
- `Scatter`
12+
- `ScatterElements`
13+
- `ScatterND`
14+
- `Sign`
15+
- `Round`
16+
- Added support for building TensorRT Python API on Windows.
17+
18+
### Updated
19+
- Notable API updates in TensorRT 8.2.0.6 EA release. See [TensorRT Developer Guide](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html) for details.
20+
- Added three new APIs, `IExecutionContext: getEnqueueEmitsProfile()`, `setEnqueueEmitsProfile()`, and `reportToProfiler()` which can be used to collect layer profiling info when the inference is launched as a CUDA graph.
21+
- Eliminated the global logger; each `Runtime`, `Builder` or `Refitter` now has its own logger.
22+
- Added new operators: `IAssertionLayer`, `IConditionLayer`, `IEinsumLayer`, `IIfConditionalBoundaryLayer`, `IIfConditionalOutputLayer`, `IIfConditionalInputLayer`, and `IScatterLayer`.
23+
- Added new `IGatherLayer` modes: `kELEMENT` and `kND`
24+
- Added new `ISliceLayer` modes: `kFILL`, `kCLAMP`, and `kREFLECT`
25+
- Added new `IUnaryLayer` operators: `kSIGN` and `kROUND`
26+
- Added new runtime class `IEngineInspector` that can be used to inspect the detailed information of an engine, including the layer parameters, the chosen tactics, the precision used, etc.
27+
- `ProfilingVerbosity` enums have been updated to show their functionality more explicitly.
28+
- Updated TensorRT OSS container defaults to cuda 11.4
29+
- CMake to target C++14 builds.
30+
- Updated following ONNX operators:
31+
- `Gather` and `GatherElements` implementations to natively support negative indices
32+
- `Pad` layer to support ND padding, along with `edge` and `reflect` padding mode support
33+
- `If` layer with general performance improvements.
34+
35+
### Removed
36+
- Removed `sampleMLP`.
37+
- Several flags of trtexec have been deprecated:
38+
- `--explicitBatch` flag has been deprecated and has no effect. When the input model is in UFF or in Caffe prototxt format, the implicit batch dimension mode is used automatically; when the input model is in ONNX format, the explicit batch mode is used automatically.
39+
- `--explicitPrecision` flag has been deprecated and has no effect. When the input ONNX model contains Quantization/Dequantization nodes, TensorRT automatically uses explicit precision mode.
40+
- `--nvtxMode=[verbose|default|none]` has been deprecated in favor of `--profilingVerbosity=[detailed|layer_names_only|none]` to show its functionality more explicitly.
41+
342
## [21.10](https://github.com/NVIDIA/TensorRT/releases/tag/21.10) - 2021-10-05
443
### Added
544
- Benchmark script for demoBERT-Megatron
@@ -33,7 +72,6 @@
3372
- Mark BOOL tiles as unsupported
3473
- Remove unnecessary shape tensor checks
3574

36-
3775
### Removed
3876
- N/A
3977

CMakeLists.txt

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,9 +58,11 @@ option(BUILD_PLUGINS "Build TensorRT plugin" ON)
5858
option(BUILD_PARSERS "Build TensorRT parsers" ON)
5959
option(BUILD_SAMPLES "Build TensorRT samples" ON)
6060

61-
set(CMAKE_CXX_STANDARD 11)
61+
# C++14
62+
set(CMAKE_CXX_STANDARD 14)
6263
set(CMAKE_CXX_STANDARD_REQUIRED ON)
6364
set(CMAKE_CXX_EXTENSIONS OFF)
65+
6466
set(CMAKE_CXX_FLAGS "-Wno-deprecated-declarations ${CMAKE_CXX_FLAGS} -DBUILD_SYSTEM=cmake_oss")
6567

6668
############################################################################################

README.md

Lines changed: 23 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,12 @@ This repository contains the Open Source Software (OSS) components of NVIDIA Ten
1515
To build the TensorRT-OSS components, you will first need the following software packages.
1616

1717
**TensorRT GA build**
18-
* [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) v8.0.3.4
18+
* [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download) v8.2.0.6
1919

2020
**System Packages**
2121
* [CUDA](https://developer.nvidia.com/cuda-toolkit)
2222
* Recommended versions:
23-
* cuda-11.3.1 + cuDNN-8.2
23+
* cuda-11.4.x + cuDNN-8.2
2424
* cuda-10.2 + cuDNN-8.2
2525
* [GNU make](https://ftp.gnu.org/gnu/make/) >= v4.1
2626
* [cmake](https://github.com/Kitware/CMake/releases) >= v3.13
@@ -34,16 +34,16 @@ To build the TensorRT-OSS components, you will first need the following software
3434
* [Docker](https://docs.docker.com/install/) >= 19.03
3535
* [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker)
3636
* Toolchains and SDKs
37-
* (Cross compilation for Jetson platform) [NVIDIA JetPack](https://developer.nvidia.com/embedded/jetpack) >= 4.6 (July 2021)
37+
* (Cross compilation for Jetson platform) [NVIDIA JetPack](https://developer.nvidia.com/embedded/jetpack) >= 4.6 (current support only for TensorRT 8.0.1)
3838
* (For Windows builds) [Visual Studio](https://visualstudio.microsoft.com/vs/older-downloads/) 2017 Community or Enterprise edition
3939
* (Cross compilation for QNX platform) [QNX Toolchain](https://blackberry.qnx.com/en)
4040
* PyPI packages (for demo applications/tests)
41-
* [onnx](https://pypi.org/project/onnx/) 1.8.0
41+
* [onnx](https://pypi.org/project/onnx/) 1.9.0
4242
* [onnxruntime](https://pypi.org/project/onnxruntime/) 1.8.0
43-
* [tensorflow-gpu](https://pypi.org/project/tensorflow/) >= 2.4.1
44-
* [Pillow](https://pypi.org/project/Pillow/) >= 8.1.2
45-
* [pycuda](https://pypi.org/project/pycuda/) < 2020.1
46-
* [numpy](https://pypi.org/project/numpy/) 1.21.0
43+
* [tensorflow-gpu](https://pypi.org/project/tensorflow/) >= 2.5.1
44+
* [Pillow](https://pypi.org/project/Pillow/) >= 8.3.2
45+
* [pycuda](https://pypi.org/project/pycuda/) < 2021.1
46+
* [numpy](https://pypi.org/project/numpy/)
4747
* [pytest](https://pypi.org/project/pytest/)
4848
* Code formatting tools (for contributors)
4949
* [Clang-format](https://clang.llvm.org/docs/ClangFormat.html)
@@ -66,27 +66,27 @@ To build the TensorRT-OSS components, you will first need the following software
6666

6767
Else download and extract the TensorRT GA build from [NVIDIA Developer Zone](https://developer.nvidia.com/nvidia-tensorrt-download).
6868

69-
**Example: Ubuntu 18.04 on x86-64 with cuda-11.3**
69+
**Example: Ubuntu 18.04 on x86-64 with cuda-11.4**
7070

7171
```bash
7272
cd ~/Downloads
73-
tar -xvzf TensorRT-8.0.3.4.Ubuntu-18.04.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz
74-
export TRT_LIBPATH=`pwd`/TensorRT-8.0.3.4
73+
tar -xvzf TensorRT-8.2.0.6.Linux.x86_64-gnu.cuda-11.4.cudnn8.2.tar.gz
74+
export TRT_LIBPATH=`pwd`/TensorRT-8.2.0.6
7575
```
7676

77-
**Example: Windows on x86-64 with cuda-11.3**
77+
**Example: Windows on x86-64 with cuda-11.4**
7878

7979
```powershell
8080
cd ~\Downloads
81-
Expand-Archive .\TensorRT-8.0.3.4.Windows10.x86_64.cuda-11.3.cudnn8.2.zip
82-
$Env:TRT_LIBPATH = '$(Get-Location)\TensorRT-8.0.3.4'
81+
Expand-Archive .\TensorRT-8.2.0.6.Windows10.x86_64.cuda-11.4.cudnn8.2.zip
82+
$Env:TRT_LIBPATH = '$(Get-Location)\TensorRT-8.2.0.6'
8383
$Env:PATH += 'C:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\MSBuild\15.0\Bin\'
8484
```
8585

8686

8787
3. #### (Optional - for Jetson builds only) Download the JetPack SDK
8888
1. Download and launch the JetPack SDK manager. Login with your NVIDIA developer account.
89-
2. Select the platform and target OS (example: Jetson AGX Xavier, `Linux Jetpack 4.4`), and click Continue.
89+
2. Select the platform and target OS (example: Jetson AGX Xavier, `Linux Jetpack 4.6`), and click Continue.
9090
3. Under `Download & Install Options` change the download folder and select `Download now, Install later`. Agree to the license terms and click Continue.
9191
4. Move the extracted files into the `<TensorRT-OSS>/docker/jetpack_files` folder.
9292
@@ -98,13 +98,13 @@ For Linux platforms, we recommend that you generate a docker container for build
9898
1. #### Generate the TensorRT-OSS build container.
9999
The TensorRT-OSS build container can be generated using the supplied Dockerfiles and build script. The build container is configured for building TensorRT OSS out-of-the-box.
100100
101-
**Example: Ubuntu 18.04 on x86-64 with cuda-11.3**
101+
**Example: Ubuntu 18.04 on x86-64 with cuda-11.4.2 (default)**
102102
```bash
103-
./docker/build.sh --file docker/ubuntu-18.04.Dockerfile --tag tensorrt-ubuntu18.04-cuda11.3 --cuda 11.3.1
103+
./docker/build.sh --file docker/ubuntu-18.04.Dockerfile --tag tensorrt-ubuntu18.04-cuda11.4
104104
```
105-
**Example: CentOS/RedHat 8 on x86-64 with cuda-10.2**
105+
**Example: CentOS/RedHat 7 on x86-64 with cuda-10.2**
106106
```bash
107-
./docker/build.sh --file docker/centos-8.Dockerfile --tag tensorrt-centos8-cuda10.2 --cuda 10.2
107+
./docker/build.sh --file docker/centos-7.Dockerfile --tag tensorrt-centos7-cuda10.2 --cuda 10.2
108108
```
109109
**Example: Ubuntu 18.04 cross-compile for Jetson (aarch64) with cuda-10.2 (JetPack SDK)**
110110
```bash
@@ -114,7 +114,7 @@ For Linux platforms, we recommend that you generate a docker container for build
114114
2. #### Launch the TensorRT-OSS build container.
115115
**Example: Ubuntu 18.04 build container**
116116
```bash
117-
./docker/launch.sh --tag tensorrt-ubuntu18.04-cuda11.3 --gpus all
117+
./docker/launch.sh --tag tensorrt-ubuntu18.04-cuda11.4 --gpus all
118118
```
119119
> NOTE:
120120
1. Use the `--tag` corresponding to build container generated in Step 1.
@@ -125,7 +125,7 @@ For Linux platforms, we recommend that you generate a docker container for build
125125
## Building TensorRT-OSS
126126
* Generate Makefiles or VS project (Windows) and build.
127127
128-
**Example: Linux (x86-64) build with default cuda-11.3**
128+
**Example: Linux (x86-64) build with default cuda-11.4.2**
129129
```bash
130130
cd $TRT_OSSPATH
131131
mkdir -p build && cd build
@@ -156,21 +156,20 @@ For Linux platforms, we recommend that you generate a docker container for build
156156
msbuild ALL_BUILD.vcxproj
157157
```
158158
> NOTE:
159-
1. The default CUDA version used by CMake is 11.3.1. To override this, for example to 10.2, append `-DCUDA_VERSION=10.2` to the cmake command.
159+
1. The default CUDA version used by CMake is 11.4.2. To override this, for example to 10.2, append `-DCUDA_VERSION=10.2` to the cmake command.
160160
2. If samples fail to link on CentOS7, create this symbolic link: `ln -s $TRT_OUT_DIR/libnvinfer_plugin.so $TRT_OUT_DIR/libnvinfer_plugin.so.8`
161161
* Required CMake build arguments are:
162162
- `TRT_LIB_DIR`: Path to the TensorRT installation directory containing libraries.
163163
- `TRT_OUT_DIR`: Output directory where generated build artifacts will be copied.
164164
* Optional CMake build arguments:
165165
- `CMAKE_BUILD_TYPE`: Specify if binaries generated are for release or debug (contain debug symbols). Values consists of [`Release`] | `Debug`
166-
- `CUDA_VERISON`: The version of CUDA to target, for example [`11.3.1`].
166+
- `CUDA_VERISON`: The version of CUDA to target, for example [`11.4.2`].
167167
- `CUDNN_VERSION`: The version of cuDNN to target, for example [`8.2`].
168168
- `PROTOBUF_VERSION`: The version of Protobuf to use, for example [`3.0.0`]. Note: Changing this will not configure CMake to use a system version of Protobuf, it will configure CMake to download and try building that version.
169169
- `CMAKE_TOOLCHAIN_FILE`: The path to a toolchain file for cross compilation.
170170
- `BUILD_PARSERS`: Specify if the parsers should be built, for example [`ON`] | `OFF`. If turned OFF, CMake will try to find precompiled versions of the parser libraries to use in compiling samples. First in `${TRT_LIB_DIR}`, then on the system. If the build type is Debug, then it will prefer debug builds of the libraries before release versions if available.
171171
- `BUILD_PLUGINS`: Specify if the plugins should be built, for example [`ON`] | `OFF`. If turned OFF, CMake will try to find a precompiled version of the plugin library to use in compiling samples. First in `${TRT_LIB_DIR}`, then on the system. If the build type is Debug, then it will prefer debug builds of the libraries before release versions if available.
172172
- `BUILD_SAMPLES`: Specify if the samples should be built, for example [`ON`] | `OFF`.
173-
- `CUB_VERSION`: The version of CUB to use, for example [`1.8.0`].
174173
- `GPU_ARCHS`: GPU (SM) architectures to target. By default we generate CUDA code for all major SMs. Specific SM versions can be specified here as a quoted space-separated list to reduce compilation time and binary size. Table of compute capabilities of NVIDIA GPUs can be found [here](https://developer.nvidia.com/cuda-gpus). Examples:
175174
- NVidia A100: `-DGPU_ARCHS="80"`
176175
- Tesla T4, GeForce RTX 2080: `-DGPU_ARCHS="75"`

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
8.0.3.4
1+
8.2.0.6

cmake/modules/set_ifndef.cmake

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
1515
#
16-
1716
function (set_ifndef variable value)
1817
if(NOT DEFINED ${variable})
1918
set(${variable} ${value} PARENT_SCOPE)

cmake/toolchains/cmake_aarch64-android.toolchain

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,7 @@ set(CMAKE_SYSTEM_PROCESSOR aarch64)
2020
set(CMAKE_C_COMPILER $ENV{AARCH64_ANDROID_CC})
2121
set(CMAKE_CXX_COMPILER $ENV{AARCH64_ANDROID_CC})
2222

23-
set(CMAKE_C_FLAGS "$ENV{AARCH64_ANDROID_CFLAGS} -pie -fPIE"
24-
CACHE STRING "" FORCE)
23+
set(CMAKE_C_FLAGS "$ENV{AARCH64_ANDROID_CFLAGS} -pie -fPIE" CACHE STRING "" FORCE)
2524
set(CMAKE_CXX_FLAGS "${CMAKE_C_FLAGS}" CACHE STRING "" FORCE)
2625

2726
set(CMAKE_C_COMPILER_TARGET aarch64-none-linux-android)
@@ -37,11 +36,8 @@ set(CMAKE_CUDA_HOST_COMPILER ${CMAKE_CXX_COMPILER} CACHE STRING "" FORCE)
3736
set(CMAKE_CUDA_FLAGS "-I${CUDA_INCLUDE_DIRS} -Xcompiler=\"-fPIC ${CMAKE_CXX_FLAGS}\"" CACHE STRING "" FORCE)
3837
set(CMAKE_CUDA_COMPILER_FORCED TRUE)
3938

40-
4139
set(CUDA_LIBS -L${CUDA_ROOT}/lib64)
4240

43-
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${CUDA_LIBS} -lcublas -lcudart -lnvToolsExt -lculibos -lcudadevrt -llog)
44-
41+
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${CUDA_LIBS} -lcudart -lnvToolsExt -lculibos -lcudadevrt -llog)
4542

46-
set(DISABLE_SWIG TRUE)
4743
set(TRT_PLATFORM_ID "aarch64-android")

cmake/toolchains/cmake_aarch64.toolchain

Lines changed: 15 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -16,22 +16,29 @@
1616

1717
set(CMAKE_SYSTEM_NAME Linux)
1818
set(CMAKE_SYSTEM_PROCESSOR aarch64)
19+
1920
set(TRT_PLATFORM_ID "aarch64")
20-
set(CUDA_PLATFORM_ID "aarch64-linux")
2121

22-
set(CMAKE_C_COMPILER /usr/bin/aarch64-linux-gnu-gcc)
23-
set(CMAKE_CXX_COMPILER /usr/bin/aarch64-linux-gnu-g++)
22+
if("$ENV{ARMSERVER}" AND "${CUDA_VERSION}" VERSION_GREATER_EQUAL 11.0)
23+
set(CUDA_PLATFORM_ID "sbsa-linux")
24+
else()
25+
set(CUDA_PLATFORM_ID "aarch64-linux")
26+
endif()
27+
28+
set(CMAKE_C_COMPILER $ENV{AARCH64_CC})
29+
set(CMAKE_CXX_COMPILER $ENV{AARCH64_CC})
2430

25-
set(CMAKE_C_FLAGS "" CACHE STRING "" FORCE)
26-
set(CMAKE_CXX_FLAGS "" CACHE STRING "" FORCE)
31+
set(CMAKE_C_FLAGS "$ENV{AARCH64_CFLAGS}" CACHE STRING "" FORCE)
32+
set(CMAKE_CXX_FLAGS "$ENV{AARCH64_CFLAGS}" CACHE STRING "" FORCE)
2733

28-
set(CMAKE_C_COMPILER_TARGET aarch64)
29-
set(CMAKE_CXX_COMPILER_TARGET aarch64)
34+
set(CMAKE_C_COMPILER_TARGET aarch64-linux-gnu)
35+
set(CMAKE_CXX_COMPILER_TARGET aarch64-linux-gnu)
3036

3137
set(CMAKE_C_COMPILER_FORCED TRUE)
3238
set(CMAKE_CXX_COMPILER_FORCED TRUE)
3339

3440
set(CUDA_ROOT /usr/local/cuda-${CUDA_VERSION}/targets/${CUDA_PLATFORM_ID} CACHE STRING "CUDA ROOT dir")
41+
3542
set(CUDNN_ROOT_DIR /pdk_files/cudnn)
3643
set(BUILD_LIBRARY_ONLY 1)
3744

@@ -46,6 +53,4 @@ set(CMAKE_CUDA_COMPILER_FORCED TRUE)
4653

4754
set(CUDA_LIBS -L${CUDA_ROOT}/lib)
4855

49-
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${CUDA_LIBS} -lcublas -lcudart -lstdc++ -lm)
50-
51-
set(DISABLE_SWIG TRUE)
56+
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${CUDA_LIBS} -lcudart -lstdc++ -lm)

cmake/toolchains/cmake_ppc64le.toolchain

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,10 @@ set(CMAKE_SYSTEM_PROCESSOR ppc64le)
1919

2020
set(CMAKE_C_COMPILER powerpc64le-linux-gnu-gcc)
2121
set(CMAKE_CXX_COMPILER powerpc64le-linux-gnu-g++)
22+
set(CMAKE_AR /usr/bin/ar CACHE STRING "" FORCE)
2223

23-
set(CMAKE_C_COMPILER_TARGET ppc64le)
24-
set(CMAKE_CXX_COMPILER_TARGET ppc64le)
24+
set(CMAKE_C_COMPILER_TARGET powerpc64le-linux-gnu)
25+
set(CMAKE_CXX_COMPILER_TARGET powerpc64le-linux-gnu)
2526

2627
set(CMAKE_CUDA_HOST_COMPILER ${CMAKE_CXX_COMPILER} CACHE STRING "" FORCE)
2728
set(CMAKE_CUDA_FLAGS "-I${CUDA_ROOT}/include -Xcompiler=\"-fPIC ${CMAKE_CXX_FLAGS}\"" CACHE STRING "" FORCE)

cmake/toolchains/cmake_qnx.toolchain

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
# limitations under the License.
1515
#
1616

17-
set(CMAKE_SYSTEM_NAME qnx)
17+
set(CMAKE_SYSTEM_NAME QNX)
1818
set(CMAKE_SYSTEM_PROCESSOR aarch64)
1919

2020
if(DEFINED ENV{QNX_BASE})
@@ -39,8 +39,8 @@ message(STATUS "QNX_TARGET = ${QNX_TARGET}")
3939
set(CMAKE_C_COMPILER ${QNX_HOST}/usr/bin/aarch64-unknown-nto-qnx7.0.0-gcc)
4040
set(CMAKE_CXX_COMPILER ${QNX_HOST}/usr/bin/aarch64-unknown-nto-qnx7.0.0-g++)
4141

42-
set(CMAKE_C_COMPILER_TARGET aarch64)
43-
set(CMAKE_CXX_COMPILER_TARGET aarch64)
42+
set(CMAKE_C_COMPILER_TARGET aarch64-unknown-nto-qnx)
43+
set(CMAKE_CXX_COMPILER_TARGET aarch64-unknown-nto-qnx)
4444

4545
set(CMAKE_C_COMPILER_FORCED TRUE)
4646
set(CMAKE_CXX_COMPILER_FORCED TRUE)
@@ -54,8 +54,6 @@ set(CMAKE_CUDA_COMPILER_FORCED TRUE)
5454

5555
set(CUDA_LIBS -L${CUDA_ROOT}/lib)
5656

57-
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${CUDA_LIBS} -lcublas -lcudart)
58-
#...Disable swig
59-
set(DISABLE_SWIG TRUE)
57+
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${CUDA_LIBS} -lcudart)
6058

6159
set(TRT_PLATFORM_ID "aarch64-qnx")

cmake/toolchains/cmake_x64_win.toolchain

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,13 +36,12 @@ set(W10_LIBRARY_SUFFIXES .lib .dll)
3636
set(W10_CUDA_ROOT ${CUDA_TOOLKIT_ROOT_DIR})
3737
set(W10_LINKER ${MSVC_COMPILER_DIR}/bin/amd64/link)
3838

39-
4039
set(CMAKE_CUDA_HOST_COMPILER ${CMAKE_NVCC_COMPILER} CACHE STRING "" FORCE)
4140

4241
set(ADDITIONAL_PLATFORM_INCL_FLAGS "-I${MSVC_COMPILER_DIR}/include -I${MSVC_COMPILER_DIR}/../ucrt/include")
4342
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${ADDITIONAL_PLATFORM_LIB_FLAGS} "-LIBPATH:${NV_TOOLS}/ddk/wddmv2/official/17134/Lib/10.0.17134.0/um/x64")
4443
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${ADDITIONAL_PLATFORM_LIB_FLAGS} "-LIBPATH:${MSVC_COMPILER_DIR}/lib/amd64" )
4544
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${ADDITIONAL_PLATFORM_LIB_FLAGS} "-LIBPATH:${MSVC_COMPILER_DIR}/../ucrt/lib/x64")
46-
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${ADDITIONAL_PLATFORM_LIB_FLAGS} "-LIBPATH:${W10_CUDA_ROOT}/lib/x64 cudart.lib cublas.lib")
45+
set(ADDITIONAL_PLATFORM_LIB_FLAGS ${ADDITIONAL_PLATFORM_LIB_FLAGS} "-LIBPATH:${W10_CUDA_ROOT}/lib/x64 cudart.lib")
4746

4847
set(TRT_PLATFORM_ID "win10")

samples/sampleMLP/CMakeLists.txt renamed to cmake/toolchains/cmake_x86_64_agnostic.toolchain

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,17 @@
1313
# See the License for the specific language governing permissions and
1414
# limitations under the License.
1515
#
16-
set(SAMPLE_SOURCES
17-
sampleMLP.cpp
18-
)
1916

20-
set(SAMPLE_PARSERS "caffe")
17+
set(CMAKE_SYSTEM_NAME Linux)
18+
set(CMAKE_SYSTEM_PROCESSOR x86_64)
2119

22-
include(../CMakeSamplesTemplate.txt)
20+
set(CMAKE_C_COMPILER /opt/rh/devtoolset-8/root/usr/bin/gcc)
21+
set(CMAKE_CXX_COMPILER /opt/rh/devtoolset-8/root/usr/bin/g++)
22+
23+
if(DEFINED CUDA_ROOT)
24+
set(CUDA_TOOLKIT_ROOT_DIR ${CUDA_ROOT})
25+
endif()
26+
27+
set(CUDA_INCLUDE_DIRS ${CUDA_ROOT}/include)
28+
29+
set(TRT_PLATFORM_ID "x86_64")

demo/HuggingFace/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
*.pyc
2+
__pycache__/

demo/HuggingFace/GPT2/.gitkeep

Whitespace-only changes.

0 commit comments

Comments
 (0)