Skip to content

utility to dump details of all nodes in a cluster, into a csv file #652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 10, 2025

Conversation

amitosaurus
Copy link
Contributor

@amitosaurus amitosaurus commented Apr 25, 2025

Issue #, if available:

Description of changes:
Creating a 'tools' directory for utility scripts, and adding a 'list_cluster_nodes.py' 'dump_cluster_nodes_info.py' utility to dump details of all nodes in a cluster, into a csv file

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@amitosaurus amitosaurus marked this pull request as draft April 25, 2025 19:53
@amitosaurus amitosaurus marked this pull request as ready for review April 25, 2025 19:59
Copy link
Collaborator

@KeitaW KeitaW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting but I don't see why we want to have such util script.

 aws sagemaker list-cluster-nodes --region us-west-2 --cluster-name ml-cluster-trn1

should be enough. Could you elaborate the motivation?

@amitosaurus
Copy link
Contributor Author

The list-cluster-nodes command does not provide the primary IP of the node, which we have found to be critical while troubleshooting critical issues.

@amitosaurus amitosaurus requested a review from KeitaW April 28, 2025 16:51
@shimomut
Copy link
Collaborator

@KeitaW
So this script is list_cluster_nodes() to list all nodes with pagenation handling + describe_cluster_node() for each node.

@shimomut
Copy link
Collaborator

@amitosaurus , to make the intention of this script clearer for users, does it make sense to rename the script to something like "dump_cluster_nodes.py" or "list_cluster_nodes_in_detail.py"?

updated script name to better reflect it's functionality
@amitosaurus
Copy link
Contributor Author

Updated the script name to "dump_cluster_nodes_info.py" to better reflect it's functionality

@KeitaW
Copy link
Collaborator

KeitaW commented May 1, 2025

Noted. Kindly add README inside the tool directory. Thank you!

Adding README.md that provides guidelines for usage of utility script(s) in the "tools" folder
@amitosaurus
Copy link
Contributor Author

README.md file added under the tools folder

@KeitaW KeitaW self-requested a review May 10, 2025 00:29
Copy link
Collaborator

@KeitaW KeitaW left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Kindly update readme to avoid using table (descriptions for the future commands could be too long to fit into) and we can merge the PR.

amitosaurus and others added 2 commits May 10, 2025 07:47
@amitosaurus
Copy link
Contributor Author

This makes sense. Updated README.md file based on the suggestion.

@KeitaW KeitaW merged commit 5c563b0 into aws-samples:main May 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants