Skip to content
vahidrezanezhad edited this page Jun 6, 2024 · 19 revisions

Documentation

This repository assists users in preparing training datasets, training models, and performing inference with trained models. We cover various use cases including pixel-wise segmentation, image classification, image enhancement, and machine-based reading order. For each use case, we provide guidance on how to generate the corresponding training dataset. All these use cases are now utilized in the Eynollah workflow. As mentioned, the following three tasks can be accomplished using this repository:

  • Generate training dataset
  • Train a model
  • Inference with the trained model

Generate training dataset

The script generate_gt_for_training.py is used for generating training datasets. As the results of the following command demonstrate, the dataset generator provides three different commands:

python generate_gt_for_training.py --help

These three commands are:

  • image-enhancement
  • machine-based-reading-order
  • pagexml2label

image-enhancement

Generating a training dataset for image enhancement is quite straightforward. All that is needed is a set of high-resolution images. The training dataset can then be generated using the following command:

python generate_gt_for_training.py image-enhancement -dis "dir of high resolution images" -dois "dir where degraded images will be written" -dols "dir where the corresponding high resolution image will be written as label" -scs "degrading scales json file"

The scales JSON file is a dictionary with a key named 'scales' and values representing scales smaller than 1. Images are downscaled based on these scales and then upscaled again to their original size. This process causes the images to lose resolution at different scales. The degraded images are used as input images, and the original high-resolution images serve as labels. The enhancement model can be trained with this generated dataset. The scales JSON file looks like this:

{ "scales": [0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9] }

machine-based-reading-order

For machine-based reading order, we aim to determine the reading priority between two sets of text regions. The model's input is a three-channel image: the first and last channels contain information about each of the two text regions, while the middle channel encodes prominent layout elements necessary for reading order, such as separators and headers. To generate the training dataset, our script requires a page XML file that specifies the image layout with the correct reading order.

For output images, it is necessary to specify the width and height. Additionally, a minimum text region size can be set to filter out regions smaller than this minimum size. This minimum size is defined as the ratio of the text region area to the image area, with a default value of zero. To run the dataset generator, use the following command:

python generate_gt_for_training.py machine-based-reading-order -dx "dir of GT xml files" -domi "dir where output images will be written" -docl "dir where the labels will be written" -ih "height" -iw "width" -min "min area ratio"

pagexml2label

pagexml2label is specialized for all pixel-wise segmentation tasks, such as layout, textline, or page detection. To train a pixel-wise segmentation model, we require images along with their corresponding labels. Our training script expects a PNG image where each pixel corresponds to a label, represented by an integer. The background is always labeled as zero, while other elements are assigned different integers. For instance, if we have ground truth data with four elements including the background, the classes would be labeled as 0, 1, 2, and 3 respectively.

In binary segmentation scenarios such as textline or page extraction, the background is encoded as 0, and the desired element is automatically encoded as 1 in the PNG label.

To specify the desired use case and the elements to be extracted in the PNG labels, a custom JSON file can be passed. For example, in the case of 'textline' detection, the JSON file would resemble this:

{ "use_case": "textline" }

In the case of layout segmentation a possible custom config json file can be like this:

{ "use_case": "layout", "textregions":{"rest_as_paragraph":1 , "drop-capital": 1, "header":2, "heading":2, "marginalia":3}, "imageregion":4, "separatorregion":5, "graphicregions" :{"rest_as_decoration":6} }

Clone this wiki locally