You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+30-1
Original file line number
Diff line number
Diff line change
@@ -4,4 +4,33 @@ NCBI now provides [a clustered nr database](https://ncbiinsights.ncbi.nlm.nih.go
4
4
We were interested in using this database to reduce search times and to increase the taxonomic diversity of returned sequences when doing BLAST searches.
5
5
However, as of March 2023, the database is not available for download.
6
6
Therefore, we re-made this database ourselves.
7
-
The [README.sh](./README.sh) file in this repository documents how we performed the clustering and created a taxonomy sheet that annotates the lowest common ancestor for each protein cluster.
7
+
The [Snakefile](./Snakefile) in this repository documents how we performed the clustering and created a taxonomy sheet that annotates the lowest common ancestor for each protein cluster.
8
+
9
+
## Getting started with this repository
10
+
11
+
This repository uses snakemake to run the pipeline and conda to manage software environments and installations.
12
+
You can find operating system-specific instructions for installing miniconda [here](https://docs.conda.io/en/latest/miniconda.html).
13
+
We executed the pipeline on AWS EC2 with an Ubuntu image (ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-20230208).
14
+
15
+
```
16
+
curl -JLO https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh # download the miniconda installation script
17
+
bash Miniconda3-latest-Linux-x86_64.sh # run the miniconda installation script. Accept the license and follow the defaults.
18
+
source ~/.bashrc # source the .bashrc for miniconda to be available in the environment
19
+
# configure miniconda channel order
20
+
conda config --add channels defaults
21
+
conda config --add channels bioconda
22
+
conda config --add channels conda-forge
23
+
conda config --set channel_priority strict # make channel priority strict so snakemake doesn't yell at you
24
+
conda install mamba # install mamba for faster software installation.
25
+
26
+
conda env create -n nr -f environment.yml
27
+
conda activate nr
28
+
```
29
+
30
+
After cloning the repository, you can then run the snakefile with:
where `-j` specifies the number of threads to run with, `--use-conda` uses conda to manage software environments, `--rerun-incomplete` re-runs incomplete files, `-k` tells the pipeline to continue with independent steps when one step fails, and `-n` signifies to run a dry run first.
0 commit comments