-
Notifications
You must be signed in to change notification settings - Fork 43
Publish to cloud tooling providers like Dockstore, AnVIL, etc #188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Right now, the PharmCAT-Pipeline workflow only works on a single VCF file and does not handle outside calls file. The way the pipeline script is set up is to use naming conventions of files in the same directory, but this doesn't work because there's no concecpt of a directory in the cloud. Part 1: accept a
Part 2: accept
Can either provide Documentation tasks:
Super bonus part 3: do we want to support URLs in addition to files? There is probably no good reason to do so from the user perspective, but it does mean that we can write tests for the WDL and it will get tested automatically (I think). |
Note: I messed up dockstore integration on last release. It should be fixed for next release though. |
Proposal for Aligning and Simplifying the PharmCAT Pipeline To simplify the maintenance of the PharmCAT_Pipeline and ensure it remains robust, I propose we keep the WDL focused on its core functionality of processing a single VCF file at a time. By doing this, we maintain clarity and ease of maintenance in the WDL itself, while offloading the complexity of file management to earlier workflow steps. For handling issues like multiple files, compressed formats, and file naming conventions, we can delegate these tasks to upstream workflows within Terra or AnVIL. These workflows can manage tasks such as:
By leveraging Terra and AnVIL’s ability to orchestrate custom workflows, users can create preprocessing steps that handle file management and preparation before invoking the PharmCAT_Pipeline for each individual file. This modular approach keeps the pipeline clean and focused while allowing flexibility for diverse file formats and workflows. Next Suggested Steps: Use Case Simulations: We can simulate a few use cases involving multiple files, compressed files, and naming conventions. Then, we’ll build workflows that manage these tasks before calling the PharmCAT_Pipeline. This will ensure the process is flexible and can handle different scenarios. Comprehensive Documentation: We should document these workflows to guide users on how to set up file preprocessing workflows in Terra or AnVIL. This documentation will include examples of how to manage files and call them in the WDL one by one. This modular approach will reduce the complexity within the pipeline itself, delegating file handling logic to other parts of the workflow, which simplifies both maintenance and usability across multiple platforms. |
Details on file inputs: https://pharmcat.org/using/Running-PharmCAT-Pipeline/#inputs |
This is the link to the PharmCAT tutorial. It includes some real-world VCFs and outside call files. |
Hi all, apologies for the delay! I took some time to dive deeper into the PharmCAT_Pipeline code, and it’s clear that it isn’t fully optimized for cloud environments. You had mentioned this issue before, but it really hit home after reviewing the code more closely. I’m currently working on creating individual WDLs for each of the 4 modules, trying to replicate the logic of the PharmCAT_Pipeline in AnVIL. I’m not entirely sure if we’ll be able to replicate it 100%, but I do think having these modules separated could be valuable for future use cases. That said, what do you think about developing a version of PharmCAT_Pipeline specifically designed to work in cloud environments? |
Yes, the pipeline script is meant as a very simple wrapper around our main tools. Using it was the quickest way to get going in Dockstore. You're welcome to create a better WDL script, but let's review because maybe we can then enable more functionality. |
@markwoon, I created a new WDL https://dockstore.org/workflows/github.com/AndreRico/PharmCAT_Dockstore/PharmCAT-VCF_Preprocessor:main?tab=files with two tasks: one to convert the cloud environment into a Path environment, and a second to receive this path environment and run the vcf-preprocessor. |
I assume this is On the other hand, now that I'm thinking of this, this would also resolve the original problems I had with the |
We want to make PharmCAT easily available on cloud genomics analysis platforms. We already publish a Docker image to Docker Hub so it should be relatively easy to make that image available to different cloud providers. For example, we want to enable access from AnVIL.
After doing some research it seems the best route is publishing a workflow through Dockstore. This will make it available through AnVIL but also DNAStack, DNAnexus, and others.
Current questions are:
The text was updated successfully, but these errors were encountered: