-
Notifications
You must be signed in to change notification settings - Fork 118
utility to dump details of all nodes in a cluster, into a csv file #652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting but I don't see why we want to have such util script.
aws sagemaker list-cluster-nodes --region us-west-2 --cluster-name ml-cluster-trn1
should be enough. Could you elaborate the motivation?
The list-cluster-nodes command does not provide the primary IP of the node, which we have found to be critical while troubleshooting critical issues. |
@KeitaW |
@amitosaurus , to make the intention of this script clearer for users, does it make sense to rename the script to something like "dump_cluster_nodes.py" or "list_cluster_nodes_in_detail.py"? |
updated script name to better reflect it's functionality
Updated the script name to "dump_cluster_nodes_info.py" to better reflect it's functionality |
Noted. Kindly add README inside the |
Adding README.md that provides guidelines for usage of utility script(s) in the "tools" folder
README.md file added under the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Kindly update readme to avoid using table (descriptions for the future commands could be too long to fit into) and we can merge the PR.
Co-authored-by: Keita Watanabe <keitaw09@gmail.com>
This makes sense. Updated README.md file based on the suggestion. |
Issue #, if available:
Description of changes:
Creating a 'tools' directory for utility scripts, and adding a
'list_cluster_nodes.py''dump_cluster_nodes_info.py' utility to dump details of all nodes in a cluster, into a csv fileBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.