base-image

Chris-Schnaufer

and

Sithyphus

Develop (#11 )

Mar 16, 2020

32bdf46 · Mar 16, 2020

History

Name	Name	Last commit message	Last commit date
parent directory ..
test-files	test-files	Merging develop (v1.2) to master (#9 )	Jan 22, 2020
Dockerfile	Dockerfile	Develop (#11 )	Mar 16, 2020
README.md	README.md	Merging develop (v1.2) to master (#9 )	Jan 22, 2020
configuration.py	configuration.py	Develop (#11 )	Mar 16, 2020
entrypoint.py	entrypoint.py	Develop (#11 )	Mar 16, 2020
transformer.py	transformer.py	Merging develop (v1.2) to master (#9 )	Jan 22, 2020
transformer_class.py	transformer_class.py	Merging develop (v1.2) to master (#9 )	Jan 22, 2020

README.md

Base Image

This code is intended to be used as the basis for derived transformers and docker images

The file named entrypoint.py is expected to be kept for all transformers.
For each environment (such as Clowder, TERRA REF, CyVerse) the transformer_class.py file is replaced.
For each transformer the transformer.py file is replaced.
Additionally, the entrypoint.py script can be called from a different script allowing pre- and post- processing (see entrypoint.py below).

It is expected that this arrangement will provide reusable code not only within a single environment, but across transformers in different environments as well.

Quick Start

Create a new repository to hold the code specific to your environment or transformer.

For a new environment:

create a new transformer_class.py file specific for your environment
fill and create in any methods and data as necessary to support transformers
if using Docker images, create a new dockerfile that uses the base_image Docker image as its starting point, add needed executables and libraries, and overwrite the existing transformer_classs.py file in your new image

For a new transformer:

create a new transformer.py file specific for your transformer with the needed function signatures
add the code to do your work
if using Docker images, create a new dockerfile that uses the appropriate starting docker image, add needed executables and libraries, and overwrite the existing transformer.py file in your new image

Meet the Files

Dockerfile: contains the build instructions for a docker image
configuration.py: contains configuration information for transformers. Can be overridden by derived code as long as existing variables aren't lost
entrypoint.py: entrypoint for the transformers and docker images. More on this file below
transformer.py: stub of expected transformer interface. More on this file below as well
transformer_class.py: stub of class used to provide environment for code in transfomer.py

configuration.py

Unless documented here, the contents of this file are required by entrypoint.py. If you are replacing this file with your own version, be sure to keep existing code (and its associated comments).

entrypoint.py

This file can be executed as an independent script, or called by other Python code. If calling into this script, the entry point is a function named do_work. The do_work function expects to get an instance or argparse.ArgumentParser passed in as its first parameter. Additional named parameters can also be passed in as kwargs; these are then passed to the new instance of transformer_class.Transformer at initialization.

Calling do_work returns a dict of the result. Briefly, the 'code' key of the return value indicates the result of the call, and the presence of an 'error' key indicates an error ocurred.

To provide environmental context to a transformer, the transformer_class.py file can be replaced with something more meaningful. The transformer_class.py file in this repo defines a class that has methods that will be called by entrypoint.py if they're defined. The class methods are not required but can provide convenient hooks for customization. An instance of this class is passed to the transformer code in transformer.py

transformer.py

This is the file that performs all the work. It is expected that this file will be replaced with a meaningful one for particular transformers. The transformer.py file in this repo contains the functions that can be called by the main transformer script entrypoint.py. The only required function in this file is the perform_process function.

transformer_class.py

This is the file that provides the environment for transformers. It is expected that for different environments, this file will be replaced with a meaningful one. For example, in the CyVerse environment this file could be replaced with one containing iRODS support for any files generated by the transformer.

It is the responsibility of this class to appropriately handle any command line arguments for the transformer instance. The easiest way to achieve this is to store the parameters as part of the class instance.

Transformer Control Flow

In this section we cover the flow of control for a transformer. We assume that this transformer is started by running the entrypoint.py script.

Initialization of Parameters: The first thing that happens is the initialization of an instance of argparse.ArgumentParser and the creation of a transformer_class.Transformer instance. The entrypoint.py script adds its parameters, followed by the transformer_class.Transformer instance, and finally the transformer can add theirs. The parse_args() method is called on the ArgumentParser instance and the resulting argument values are stored in memory.
Loading of Metadata: One of the parameters required by entrypoint is the path to a JSON file containing metadata. After the parameters are parsed, the entire contents of the JSON file are loaded and stored in memory.
Getting Parameters for transformer function calls: If the transformer_class.Transformer instance has a method named get_transformer_params() it is called with the command line arguments and the loaded metadata. The dictionary returned by get_transformer_params() is used to pass parameters to the functions defined in transformer.py. This allows the customization of parameters between an environment and a transformer. If get_transformer_params() is not defined by transformer_class.Transformer, no additional parameters are passed to the transformer functions.
Check to Continue: If the transformer.py file has a function named check_continue it will be called getting passed the transformer_class.Transformer instance and any parameters defined in the above step. The return from the check_continue() function is used to determine if processing should continue. If the function is not defined, processing will continue automatically.
Retrieve Files: If the transformer_class.Transformer instance has a method named retrieve_files it will be called getting passed the dictionary returned by transformer_class.Transformer.get_transformer_params() (see step 3.) and the loaded metadata. This allows the downloading of data when the transformer has determined it can proceed (see step 4.). If this function is not defined, processing will continue automatically.
Processsing: The perform_process function in transformer.py is called getting passed the transformer_class.Transformer instance and any parameters previously defined (see step 3.). This performs the processing of the data. It's important to note that the dictionary returned in step 3 is used to populate the parameter list of the perform_process call.
Result Handling: The result of the above steps may produce warnings, errors, or successful results. These results can be stored in a file, printed to standard output, and/or returned to the caller of do_work. In the default case that we're exploring here, the return value from do_work is ignored.

Defined Command Line Parameters

The following command line parameters are defined for all transformers.

--debug, -d: (optional parameter) enable debug level logging messages
-h: (optional parameter) display help message (automatically defined by argparse)
--info, -i: (optional parameter) enable info level logging messages
--result: (optional parameter) how to handle the result of processing; one or more comma-separated strings of: all, file, print
--metadata: mandatory path to file containing JSON metadata; can be specified multiple times
--working_space: path to folder to use as a working space and file store
the "file_list" argument contains all additional parameters (which are assumed to be file names but may not be)

Pro Tip - Use the -h parameter against the script or docker container to see all the command line options for a transformer.

Conventions

error return code ranges:

entrypoint.py returns error values in the range of -1 to -99
transformer_class.py returns error values in the range of -100 to -999
transformer.py returns error values in the range of -1000 and greater

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

base-image

base-image

README.md

Base Image

Quick Start

Meet the Files

configuration.py

entrypoint.py

transformer.py

transformer_class.py

Transformer Control Flow

Defined Command Line Parameters

Conventions

Files

base-image

Directory actions

More options

Directory actions

More options

Latest commit

History

base-image

Folders and files

parent directory

README.md

Base Image

Quick Start

Meet the Files

configuration.py

entrypoint.py

transformer.py

transformer_class.py

Transformer Control Flow

Defined Command Line Parameters

Conventions