|
| 1 | +# Base Image |
| 2 | +This code is intended to be used as the basis for derived transformers and docker images |
| 3 | + |
| 4 | +- The file named entrypoint.py is expected to be kept for all transformers. |
| 5 | + |
| 6 | +- For each environment (such as Clowder, TERRA REF, CyVerse) the transformer_class.py file is replaced. |
| 7 | + |
| 8 | +- For each transformer the transformer.py file is replaced. |
| 9 | + |
| 10 | +- Additionally, the entrypoint.py script can be called from a different script allowing pre- and post- processing (see [entrypoint.py](#entrypoint) below). |
| 11 | + |
| 12 | +It is expected that this arrangement will provide reusable code not only within a single environment, but across transformers in different environments as well. |
| 13 | + |
| 14 | +## Quick Start |
| 15 | +Create a new repository to hold the code specific to your environment or transformer. |
| 16 | + |
| 17 | +For a new environment: |
| 18 | +1. create a new transformer_class.py file specific for your environment |
| 19 | +2. fill and create in any methods and data as necessary to support transformers |
| 20 | +3. if using Docker images, create a new dockerfile that uses the base_image Docker image as its starting point, add needed executables and libraries, and overwrite the existing transformer_classs.py file in your new image |
| 21 | + |
| 22 | +For a new transformer: |
| 23 | +1. create a new transformer.py file specific for your transformer with the needed function signatures |
| 24 | +2. add the code to do your work |
| 25 | +3. if using Docker images, create a new dockerfile that uses the appropriate starting docker image, add needed executables and libraries, and overwrite the existing transformer.py file in your new image |
| 26 | + |
| 27 | +## Meet the Files |
| 28 | +- Dockerfile: contains the build instructions for a docker image |
| 29 | +- configuration.py: contains configuration information for transformers. Can be overridden by derived code as long as existing variables aren't lost |
| 30 | +- entrypoint.py: entrypoint for the transformers and docker images. More on this file below |
| 31 | +- transformer.py: stub of expected transformer interface. More on this file below as well |
| 32 | +- transformer_class.py: stub of class used to provide environment for code in transfomer.py |
| 33 | + |
| 34 | +### configuration.py |
| 35 | +Unless documented here, the contents of this file are required by `entrypoint.py`. |
| 36 | +If you are replacing this file with your own version, be sure to keep existing code (and its associated comments). |
| 37 | + |
| 38 | +### entrypoint.py <a name="entrypoint" /> |
| 39 | +This file can be executed as an independent script, or called by other Python code. |
| 40 | +If calling into this script, the entry point is a function named `do_work`. |
| 41 | +The `do_work` function expects to get an instance or `argparse.ArgumentParser` passed in as its first parameter. |
| 42 | +Additional named parameters can also be passed in as kwargs; these are then passed to the new instance of transformer_class.Transformer at initialization. |
| 43 | + |
| 44 | +Calling `do_work` returns a dict of the result. |
| 45 | +Briefly, the 'code' key of the return value indicates the result of the call, and the presence of an 'error' key indicates an error ocurred. |
| 46 | + |
| 47 | +To provide environmental context to a transformer, the transformer_class.py file can be replaced with something more meaningful. |
| 48 | +The transformer_class.py file in this repo defines a class that has methods that will be called by entrypoint.py if they're defined. |
| 49 | +The class methods are not required but can provide convenient hooks for customization. |
| 50 | +An instance of this class is passed to the transformer code in [transformer.py](#transformer) |
| 51 | + |
| 52 | +### transformer.py <a name="transformer" /> |
| 53 | +This is the file that performs all the work. |
| 54 | +It is expected that this file will be replaced with a meaningful one for particular transformers. |
| 55 | +The transformer.py file in this repo contains the functions that can be called by the main transformer script [entrypoint.py](#entrypoint). |
| 56 | +The only required function in this file is the `perform_process` function. |
| 57 | + |
| 58 | +### transformer_class.py <a name="transformer_class" /> |
| 59 | +This is the file that provides the environment for transformers. |
| 60 | +It is expected that for different environments, this file will be replaced with a meaningful one. |
| 61 | +For example, in the CyVerse environment this file could be replaced with one containing iRODS support for any files generated by the transformer. |
| 62 | + |
| 63 | +It is the responsibility of this class to appropriately handle any command line arguments for the transformer instance. |
| 64 | +The easiest way to achieve this is to store the parameters as part of the class instance. |
| 65 | + |
| 66 | +## Transformer Control Flow |
| 67 | +In this section we cover the flow of control for a transformer. |
| 68 | +We assume that this transformer is started by running the [entrypoint.py](#entrypoint) script. |
| 69 | + |
| 70 | +1. Initialization of Parameters: |
| 71 | +The first thing that happens is the initialization of an instance of `argparse.ArgumentParser` and the creation of a `transformer_class.Transformer` instance. |
| 72 | +The entrypoint.py script adds its parameters, followed by the transformer_class.Transformer instance, and finally the transformer can add theirs. |
| 73 | +The parse_args() method is called on the ArgumentParser instance and the resulting argument values are stored in memory. |
| 74 | + |
| 75 | +2. Loading of Metadata: |
| 76 | +One of the parameters required by entrypoint is the path to a JSON file containing metadata. |
| 77 | +After the parameters are parsed, the entire contents of the JSON file are loaded and stored in memory. |
| 78 | + |
| 79 | +3. Getting Parameters for transformer function calls: |
| 80 | +If the transformer_class.Transformer instance has a method named `get_transformer_params()` it is called with the command line arguments and the loaded metadata. |
| 81 | +The dictionary returned by get_transformer_params() is used to pass parameters to the functions defined in [transformer.py](#transformer). |
| 82 | +This allows the customization of parameters between an environment and a transformer. |
| 83 | +If get_transformer_params() is not defined by transformer_class.Transformer, no additional parameters are passed to the transformer functions. |
| 84 | + |
| 85 | +4. Check to Continue: |
| 86 | +If the transformer.py file has a function named `check_continue` it will be called getting passed the transformer_class.Transformer instance and any parameters defined in the above step. |
| 87 | +The return from the check_continue() function is used to determine if processing should continue. |
| 88 | +If the function is not defined, processing will continue automatically. |
| 89 | + |
| 90 | +5. Processsing: |
| 91 | +The `perform_process` function in transformer.py is called getting passed the transformer_class.Transformer instance and any parameters previously defined. |
| 92 | + |
| 93 | +6. Result Handling: |
| 94 | +The result of the above steps may produce warnings, errors, or successful results. |
| 95 | +These results can be stored in a file, printed to standard output, and/or returned to the caller of `do_work`. |
| 96 | +In the default case that we're exploring here, the return value from do_work is ignored. |
| 97 | + |
| 98 | +## Defined Command Line Parameters |
| 99 | +The following command line parameters are defined for all transformers. |
| 100 | + |
| 101 | +* --debug, -d: (optional parameter) enable debug level logging messages |
| 102 | +* -h: (optional parameter) display help message (automatically defined by argparse) |
| 103 | +* --info, -i: (optional parameter) enable info level logging messages |
| 104 | +* --result: (optional parameter) how to handle the result of processing; one or more comma-separated strings of: all, file, print |
| 105 | +* --metadata: mandatory path to file containing JSON metadata |
| 106 | +* --working_space: path to folder to use as a working space and file store |
| 107 | +* the "file_list" argument contains all additional parameters (which are assumed to be file names but may not be) |
| 108 | + |
| 109 | +*Pro Tip* - Use the `-h` parameter against the script or docker container to see all the command line options for a transformer. |
| 110 | + |
| 111 | +## Conventions |
| 112 | +**error return code ranges**: |
| 113 | +- [entrypoint.py](#entrypoint) returns error values in the range of `-1` to `-99` |
| 114 | +- [transformer_class.py](#transformer_class) returns error values in the range of `-100` to `-999` |
| 115 | +- [transformer.py](#transformer) returns error values in the range of `-1000` and greater |
0 commit comments