Skip to content

Python code for generating synthetic sequence data: DNASEQ and RNASEQ reads for use as standards in genomics data analysis pipelines

License

Notifications You must be signed in to change notification settings

Replicon-genetics/rg_exploder_shared

Repository files navigation

rg_exploder_shared

Summary
Python code for generating synthetic sequence data; synthetic DNASEQ or RNASEQ reads, using either a Tkinter or Vue.js Graphical User Interface

What is this for?
This repository at https://github.com/snowlizardz/rg_exploder_shared/ holds Python code, data and metadata for the fragmentation of DNA sequences intended to emulate NGS style sequencing reads. An explanation of why this is useful is at https://repliconevaluation.com/about/ Note that repliconevaluation.com redirects to replicongenomics.com, at least until September 2025.

Licence conditions
This code here is in the Public Domain as free software, as defined by https://www.fsf.org/ , specifically the AGPL-3.0 license. This repository is a tidied-up subset of the development repository at https://github.com/snowlizardz/rg_exploder (currently private).

Getting started
In /exploder_python:
a) Execute the Python module RG_exploder_globals_make.py to set the correct output filepath. This needs to be done only once, or after changing input-data sets.
b) Not visible in the repository are two required folders: /data_sources/exploder_output_38_1000 and /data_sources/exploder_output_37_1000 ; if not present, please create them!
c) Execute the Python Tkinter GUI module RG_exploder_gui.py

Dependencies
The Python modules require Biopython, with Pillow(PIL) to support the Tkinter GUI. There may be an X11 dependency on some platforms.

Maintaining and updating genomic data
/data_sources holds genomic data downloaded from Ensembl, then processed.
/documents includes a "Data_management" guide for pre-processing new genomic data using the maintenance scripts in folders /helper_python & /helper_scripts.
In presentations, there's a Libre Office document explaining concepts and implementation.

Alternative GUI
The Python code & data here are those used by the Vue.js implementation at https://repliconevaluation.wordpress.com/replicon-genetics (initially released in 2021). The full source for building the Vue.js implementation is held at https://github.com/Replicon-genetics/rg_exploder, currently private. A subset of critical definition files is present in this current repository: see folders /pyodide, /webdist/public & /webdist/src

Origins
Code and documentation was developed between September 2018 to March 2025 by Cary O'Donnell, originally for Replicon Genetics, a company set up by Dr Gillian Ellison and Jane Theaker in 2018, but de-registered in 2023; IP is due to those named. No AI-generation tools were used at any point.

Advice on improving access, offers on collaboration, or other feedback is welcome; please email syrgenreads@gmail.com

Cary O'Donnell 22nd April 2025