Skip to content

Support remote inference on Triton Inference Server with ease of use #536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 21, 2025

Conversation

MMelQin
Copy link
Collaborator

@MMelQin MMelQin commented Apr 19, 2025

This pull request adds support for performing inference on remote Triton Inference Server, with ease of use

  • updated the TritonModel class to fully support and encapsulate the Triton client of the specific model hosted in Triton Inference Server. This is done by parsing the key elements of Triton model config.pbtxt file (actual model files not required), and dynamically setting the input and output parameters using the parsed model configurations. User app is relived from setting any arguments except for the server network location, and in future versions, TLS keys for secure communication.
  • enhanced the NamedModel to include the support of Triton model repository (which may have multiple folders each for a specific model), making it possible to support hybrid inference scenario, i.e. local in-proc hosted as well as Triton hosted models, all transparent to the user app.
  • updated the app context class to support the setting for Triton server network location, as well as calling the instantiated TritonModel to connect to the remote server.
  • Added and tested example app showcasing remote inference on Triton Inference Server.

Also added the following files to the example application folder:

  • Example Triton Model repo for the client side, i.e. with only model folder and its config.pbtxt file.
  • Example Shell script to set the required env vars, though if these are not set, the command line option can be used instead.

The Quality Gate failure is due to the example app has duplicated code, which is true as it has exactly the same app inference operator as the Spleen Seg app with in-proc Torch model hosting. We have each example app folder contain all necessary code for ease of use, hence have to tolerate the duplicates with a few hundred lines of code.

MMelQin added 3 commits April 18, 2025 19:16
Signed-off-by: M Q <mingmelvinq@nvidia.com>
…ded a new example

Signed-off-by: M Q <mingmelvinq@nvidia.com>
…4 for no specific reasons

Signed-off-by: M Q <mingmelvinq@nvidia.com>
MMelQin added 8 commits April 19, 2025 02:12
Signed-off-by: M Q <mingmelvinq@nvidia.com>
Signed-off-by: M Q <mingmelvinq@nvidia.com>
Signed-off-by: M Q <mingmelvinq@nvidia.com>
Signed-off-by: M Q <mingmelvinq@nvidia.com>
Signed-off-by: M Q <mingmelvinq@nvidia.com>
Signed-off-by: M Q <mingmelvinq@nvidia.com>
Signed-off-by: M Q <mingmelvinq@nvidia.com>
Signed-off-by: M Q <mingmelvinq@nvidia.com>
@MMelQin MMelQin changed the title [WIP] Support remote inference on Triton Inference Server with ease of use Support remote inference on Triton Inference Server with ease of use Apr 21, 2025
@MMelQin MMelQin requested a review from mocsharp April 21, 2025 06:58
Copy link
Collaborator

@mocsharp mocsharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you.

@MMelQin MMelQin requested a review from rahul-imaging April 21, 2025 19:11
Signed-off-by: M Q <mingmelvinq@nvidia.com>
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
10.2% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@MMelQin MMelQin merged commit 0844f17 into main Apr 21, 2025
3 of 4 checks passed
MMelQin added a commit that referenced this pull request Apr 22, 2025
…536)

* Adding requirements for Triton client impl

Signed-off-by: M Q <mingmelvinq@nvidia.com>

* Updated/added core classes to support Triton remote inference, and added a new example

Signed-off-by: M Q <mingmelvinq@nvidia.com>

* GitHub build server complains about conflicts for tritonclient[]>=2.54 for no specific reasons

Signed-off-by: M Q <mingmelvinq@nvidia.com>

* Fix flake8 complaints

Signed-off-by: M Q <mingmelvinq@nvidia.com>

* Fix pytype complaints by simplifying code

Signed-off-by: M Q <mingmelvinq@nvidia.com>

* Remove now unused imports

Signed-off-by: M Q <mingmelvinq@nvidia.com>

* Addressed all pytype and mypy complaint in new code in the dev env

Signed-off-by: M Q <mingmelvinq@nvidia.com>

* No complaint in local dev env, but on GitHub

Signed-off-by: M Q <mingmelvinq@nvidia.com>

* Add model confgi.pbtxt and example env settings

Signed-off-by: M Q <mingmelvinq@nvidia.com>

* Doc update

Signed-off-by: M Q <mingmelvinq@nvidia.com>

* update license dates

Signed-off-by: M Q <mingmelvinq@nvidia.com>

* Updated the copyright year of new files

Signed-off-by: M Q <mingmelvinq@nvidia.com>

---------

Signed-off-by: M Q <mingmelvinq@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants