Support remote inference on Triton Inference Server with ease of use #536

MMelQin · 2025-04-19T06:40:50Z

This pull request adds support for performing inference on remote Triton Inference Server, with ease of use

updated the TritonModel class to fully support and encapsulate the Triton client of the specific model hosted in Triton Inference Server. This is done by parsing the key elements of Triton model config.pbtxt file (actual model files not required), and dynamically setting the input and output parameters using the parsed model configurations. User app is relived from setting any arguments except for the server network location, and in future versions, TLS keys for secure communication.
enhanced the NamedModel to include the support of Triton model repository (which may have multiple folders each for a specific model), making it possible to support hybrid inference scenario, i.e. local in-proc hosted as well as Triton hosted models, all transparent to the user app.
updated the app context class to support the setting for Triton server network location, as well as calling the instantiated TritonModel to connect to the remote server.
Added and tested example app showcasing remote inference on Triton Inference Server.

Also added the following files to the example application folder:

Example Triton Model repo for the client side, i.e. with only model folder and its config.pbtxt file.
Example Shell script to set the required env vars, though if these are not set, the command line option can be used instead.

The Quality Gate failure is due to the example app has duplicated code, which is true as it has exactly the same app inference operator as the Spleen Seg app with in-proc Torch model hosting. We have each example app folder contain all necessary code for ease of use, hence have to tolerate the duplicates with a few hundred lines of code.

Signed-off-by: M Q <mingmelvinq@nvidia.com>

…ded a new example Signed-off-by: M Q <mingmelvinq@nvidia.com>

…4 for no specific reasons Signed-off-by: M Q <mingmelvinq@nvidia.com>

Signed-off-by: M Q <mingmelvinq@nvidia.com>

examples/apps/ai_remote_infer_app/__main__.py

mocsharp

LGTM. Thank you.

examples/apps/ai_remote_infer_app/app.py

examples/apps/ai_remote_infer_app/env_settings_example.sh

Signed-off-by: M Q <mingmelvinq@nvidia.com>

sonarqubecloud · 2025-04-21T19:37:14Z

Quality Gate failed

Failed conditions
10.2% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

…536) * Adding requirements for Triton client impl Signed-off-by: M Q <mingmelvinq@nvidia.com> * Updated/added core classes to support Triton remote inference, and added a new example Signed-off-by: M Q <mingmelvinq@nvidia.com> * GitHub build server complains about conflicts for tritonclient[]>=2.54 for no specific reasons Signed-off-by: M Q <mingmelvinq@nvidia.com> * Fix flake8 complaints Signed-off-by: M Q <mingmelvinq@nvidia.com> * Fix pytype complaints by simplifying code Signed-off-by: M Q <mingmelvinq@nvidia.com> * Remove now unused imports Signed-off-by: M Q <mingmelvinq@nvidia.com> * Addressed all pytype and mypy complaint in new code in the dev env Signed-off-by: M Q <mingmelvinq@nvidia.com> * No complaint in local dev env, but on GitHub Signed-off-by: M Q <mingmelvinq@nvidia.com> * Add model confgi.pbtxt and example env settings Signed-off-by: M Q <mingmelvinq@nvidia.com> * Doc update Signed-off-by: M Q <mingmelvinq@nvidia.com> * update license dates Signed-off-by: M Q <mingmelvinq@nvidia.com> * Updated the copyright year of new files Signed-off-by: M Q <mingmelvinq@nvidia.com> --------- Signed-off-by: M Q <mingmelvinq@nvidia.com>

MMelQin added 3 commits April 18, 2025 19:16

Adding requirements for Triton client impl

e910237

Signed-off-by: M Q <mingmelvinq@nvidia.com>

Updated/added core classes to support Triton remote inference, and ad…

bf0a084

…ded a new example Signed-off-by: M Q <mingmelvinq@nvidia.com>

GitHub build server complains about conflicts for tritonclient[]>=2.5…

faec441

…4 for no specific reasons Signed-off-by: M Q <mingmelvinq@nvidia.com>

MMelQin force-pushed the mq/remote_triton branch from b2301bd to faec441 Compare April 19, 2025 08:22

MMelQin added 8 commits April 19, 2025 02:12

Fix flake8 complaints

807e9f8

Signed-off-by: M Q <mingmelvinq@nvidia.com>

Fix pytype complaints by simplifying code

d2d90d1

Signed-off-by: M Q <mingmelvinq@nvidia.com>

Remove now unused imports

24be6f4

Signed-off-by: M Q <mingmelvinq@nvidia.com>

Addressed all pytype and mypy complaint in new code in the dev env

43043bf

Signed-off-by: M Q <mingmelvinq@nvidia.com>

No complaint in local dev env, but on GitHub

9cf75b4

Signed-off-by: M Q <mingmelvinq@nvidia.com>

Add model confgi.pbtxt and example env settings

f124a28

Signed-off-by: M Q <mingmelvinq@nvidia.com>

Doc update

9307acd

Signed-off-by: M Q <mingmelvinq@nvidia.com>

update license dates

749c490

Signed-off-by: M Q <mingmelvinq@nvidia.com>

MMelQin changed the title ~~[WIP] Support remote inference on Triton Inference Server with ease of use~~ Support remote inference on Triton Inference Server with ease of use Apr 21, 2025

MMelQin requested a review from mocsharp April 21, 2025 06:58

mocsharp reviewed Apr 21, 2025

View reviewed changes

examples/apps/ai_remote_infer_app/__main__.py Outdated Show resolved Hide resolved

mocsharp approved these changes Apr 21, 2025

View reviewed changes

examples/apps/ai_remote_infer_app/app.py Outdated Show resolved Hide resolved

examples/apps/ai_remote_infer_app/env_settings_example.sh Show resolved Hide resolved

MMelQin requested a review from rahul-imaging April 21, 2025 19:11

Updated the copyright year of new files

c9299bc

Signed-off-by: M Q <mingmelvinq@nvidia.com>

MMelQin merged commit 0844f17 into main Apr 21, 2025
3 of 4 checks passed

This was linked to issues Apr 23, 2025

[FEA] Add remote inference support with the use of Triton Inference Server #212

Closed

[IMP] Setting up MONAI Model Server #458

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support remote inference on Triton Inference Server with ease of use #536

Support remote inference on Triton Inference Server with ease of use #536

MMelQin commented Apr 19, 2025 •

edited

Loading

mocsharp left a comment

sonarqubecloud bot commented Apr 21, 2025

Support remote inference on Triton Inference Server with ease of use #536

Support remote inference on Triton Inference Server with ease of use #536

Conversation

MMelQin commented Apr 19, 2025 • edited Loading

mocsharp left a comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Apr 21, 2025

Quality Gate failed

MMelQin commented Apr 19, 2025 •

edited

Loading