Test Suite Documentation

Test Suite Overview

This directory contains the test suite for the METS XML Synthesis Pipeline. The tests are designed to verify the functionality of the pipeline components and ensure that they work together correctly. The test suite provides comprehensive coverage of the pipeline's functionality, from data loading to XML validation.

Test Structure

The test suite consists of:

conftest.py: Contains pytest fixtures that set up the test environment
smoke_test.py: A comprehensive smoke test that verifies the basic functionality of all pipeline components

The smoke test includes individual test functions for each component of the pipeline:

test_data_loader: Tests the loading and validation of input JSON files
test_metadata_builder: Tests the construction of SDV-compatible metadata
test_model: Tests the training of a generative model
test_sampler: Tests the generation of synthetic data
test_reassembler: Tests the conversion of synthetic data to XML
test_validator: Tests the validation of the generated XML
test_pipeline: Tests the end-to-end pipeline execution

Fixtures

The test fixtures in conftest.py are designed to provide reusable components for tests. They follow a dependency chain that mirrors the pipeline's execution flow:

Core Fixtures

config (session scope): Provides a test configuration with temporary output paths
logger (session scope): Provides a logger for tests

Data Fixtures

dmdsec_df, file_df, structmap_df (module scope): Provide the input DataFrames
tables (module scope): Combines the DataFrames into a dictionary

Processing Fixtures

metadata (module scope): Builds metadata from the input DataFrames
model (module scope): Trains a model on the input data
synthetic_data (function scope): Generates synthetic data from the model
xml_output_path (function scope): Reassembles the synthetic data into XML

The fixtures use different scopes to optimize test execution:

Session scope: Created once per test session (e.g., config, logger)
Module scope: Created once per test module (e.g., dmdsec_df, metadata, model)
Function scope: Created for each test function (e.g., synthetic_data, xml_output_path)

Running Tests

To run the smoke test:

# Run with pytest (recommended)
pytest tests/smoke_test.py -v

# Or run the script directly
./tests/smoke_test.py

To run a specific test function:

pytest tests/smoke_test.py::test_data_loader -v

To run all tests with detailed output:

pytest tests/ -v

Testing vs. Production Usage

Understanding the difference between testing and production usage is important:

Aspect	Testing (pytest)	Production
Entry Point	`pytest tests/smoke_test.py`	`python -m src.data_archive_ml_synthesizer.pipeline`
Configuration	Test configuration created in `conftest.py`	Configuration from `config.yaml`
Output Files	Temporary files	Files specified in `config.yaml`
Purpose	Verify code functionality	Generate synthetic METS XML
Environment	Test fixtures automatically set up	Manual configuration required

When to use each approach:

Use pytest during development to verify that your code changes don't break existing functionality
Use production mode when you want to generate actual synthetic METS XML documents for your use case

Writing New Tests

When writing new tests, you can use the fixtures defined in conftest.py to set up the test environment. For example:

def test_example(config, synthetic_data, logger):
    """Example test using fixtures."""
    # Your test code here
    assert synthetic_data is not None

The fixtures will be automatically created and injected into your test function.

Best Practices for Writing Tests

Use fixtures: Leverage the existing fixtures to set up the test environment
Test one thing at a time: Each test function should focus on testing a single aspect of the code
Use descriptive names: Test function names should clearly describe what they're testing
Include assertions: Every test should include at least one assertion to verify the expected behavior
Handle exceptions: Use pytest's exception handling to test for expected exceptions

Test Configuration

The test configuration is created in conftest.py using the create_test_config() function from smoke_test.py. It uses the same input files as the main configuration but creates temporary output paths to avoid modifying the project's output files.

The test configuration includes:

Input paths pointing to the same input files as the main configuration
Temporary output paths for generated files
A fixed random seed for reproducibility
Validation settings matching the main configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Test Suite Documentation

Test Suite Overview

Test Structure

Fixtures

Core Fixtures

Data Fixtures

Processing Fixtures

Running Tests

Testing vs. Production Usage

Writing New Tests

Best Practices for Writing Tests

Test Configuration

Files

README.md

Latest commit

History

README.md

File metadata and controls

Test Suite Documentation

Test Suite Overview

Test Structure

Fixtures

Core Fixtures

Data Fixtures

Processing Fixtures

Running Tests

Testing vs. Production Usage

Writing New Tests

Best Practices for Writing Tests

Test Configuration