File Format Descriptions¶
Create module¶
Input files
–i-alignment: This is an alignment of all sequences the user would like to evaluate in FASTA format
- Example
>S003715306
GCGGTA-AT-ACGTAG-GGAGCAAGCGTTGTC-CGG-ATTTATTGG-GCGTAAA-GAGCTCGTAG-G-CGGCTT-GGCAAGT-CGGATGTGAAA-CC-CCCAGG-CTTAACC-TGGGG-C–C- GCCATTCGA-TAC-TGC-TATGG-C-TT-GAGTTCGGTA-GGGGAT-TG-TGGA-ATT-CC-C-GGTGTAGCGGTGAAATGCGCAG-ATATCG-GG-AGGA-ACACC-AATG-GCGAAGGCAG- CAAT-CTGGGC-CGACACT-GA-CGCTGA-GG–A-GCGAAA-GCGTGGG-G-AGCAAA-CAGGATTAGATA
>S003614093
GCGGTA-AT-ACGTAG-GGAGCAAGCGTTGTC-CGG-AATTATTGG-GCGTAAA-GAGCTCGTAG-G-CGGTTC-GGTAAGT-CGGGTGTGAAA-AC-TCAAGG-CTCAACC-TTGAG-A–C- GCCACTCGA-TAC-TGC-CGTGA-C-TT-GAGTCCGGTA-GAGGAG-TG-TGGA-ATT-CC-T-GGTGTAGCGGTGAAATGCGCAG-ATATCA-GG-AGGA-ACACC-AGCG-GCGAAGGCGG- CACT-CTGGGC-CGGTACT-GA-CGCTGA-GG–A-GCGAAA-GCATGGG-G-AGCAAA-CAGGATTAGATA
>S001611178
GCGGTA-AT-ACGTAG-GGCGCGAGCGTTGTC-CGG-AATTATTGG-GCGTAAA-GGGCTCGTAG-G-CGGCTT-GTTGCGC-CTGCTGTGAAA-AC-GCGGGG-CTTAACT-CCGCG-C-GT- G-CAGTGGG-TAC-GGG-CA-GG-C-TT-GAGTGTGGTA-GGGGTG-AC-TGGA-ATT-CC-A-GGTGTAGCGGTGGAATGCGCAG-ATATCT-GG-AGGA-ACACCGAT-G-GCGAAGGCAG- GTCA-CTGGGC-CATTACT-GA-CGCTGA-GG–A-GCGAAA-GCGTGGG-T-AGCGAA-CAGGATTAGATA
–p-include-strains: A list of community members the community the user would like added. Each sequence identifier (must match what is in alignment)is on its own line
- Example
S003715306
S003614093
S001611178
–i-metadata: Information to combine with the community output. File must contain a header, be tab-delimited, and contain the identifiers in the first column
- Example
ID Phylum Class
S003715306 Actinobacteria Actinobacteria
S003614093 Actinobacteria Actinobacteria
S001611178 Actinobacteria Actinobacteria
S000014419 Actinobacteria Actinobacteria
–i-distance-database: Pre-calculated distance database of sequences, this is created when a previous create command has been run with the same strains. It is a tab delimited file with each line comparing two sequences. The sequence identifiers are in the first two columns and the edit distance between them is in the third
- Example
S003715306 S003614093 28
S003715306 S001611178 50
S003715306 S000014419 48
S003715306 S000015295 49
S003715306 S000022350 42
S003715306 S000129061 44
Output files
–o-community-list: A tab delimited list of strains, with each strain on its own line with a header line. If metadata is supplied it will be combined with this output
- Example
ID Phylum Class
S003715306 Actinobacteria Actinobacteria
S003614093 Actinobacteria Actinobacteria
S001611178 Actinobacteria Actinobacteria
S000014419 Actinobacteria Actinobacteria
–o-fasta: A FASTA file containing only the strains in the constructed community
Subsample module¶
Input files
–i-input-community: Tab seperated file with taxa ids in the first column with metadata in additional columns, output of create module
- Example
ID Phylum Class
S003715306 Actinobacteria Actinobacteria
S003614093 Actinobacteria Actinobacteria
S001611178 Actinobacteria Actinobacteria
S000014419 Actinobacteria Actinobacteria
–p-proportion: File of the relative proportions of each taxonomic rank desired in final community. Each rank is contained on its own line. The rank and the proportion are seperated by a tab.
- Example
Actinobacteria 0.1
Aquificae 0.001
Bacteroidia 0.05
Flavobacteriia 0.001
Sphingobacteriia 0.003
Output files
A file with each sequence identifier on its own lines for the subsampled community