This command generates a structured comment file based on MIUVIG standards. It merges taxonomy, features, MIUVIG, and assembly data into a unified output file that will add structured comments to the GenBank submission.
The --taxonomy file should contain info on the genome structure and type of your viral sequences. These are mandatory parameters in the MIUVIG standard and can be obtained with the suvtkvirus-info if you determined your own taxonomy, or suvtktaxonomy to get both the taxonomy and predicted genome structure and type for your sequences. These commands will output miuvig_taxonomy.tsv containing this info for all sequences.
The --features input file should contain info on the software, database and method that is used to annotate the features in your viral sequences. The suvtkfeatures generates miuvig_features.tsv containing all tool and databases with their versions and parameters used by suvtk.
The MIUVIG tsv file should contain global parameters of your study that apply to all your sequences. It should be a tsv file with two columns: MIUVIG_parameter and value. Allowed parameters and their values, including which parameters are mandatory, can be found here and the corresponding MIUVIG_parameter strings can be found here.
Example
MIUVIG_parameter
value
source_uvig
viral fraction metagenome (virome)
assembly_software
metaSPAdes;v3.15.3;kmer set 21,33,55,77, default otherwise
Providing the quality_summary.tsv output of CheckV is optional, but will add more information in the MIUVIG structured comment. For example, the quality will be taken from CheckV’s quality estimation (High-quality or Genome-fragment) and also the completeness score will be added. Also if the sequence is a provirus (UpViG) will be taken into account. If you do not provide this file, sequences are considered to be ‘Genome fragment(s)’ and ‘independent sequence (UViG) by default.
Note
Providing the quality_summary.tsv CheckV file will be mostly useful when you have bacteriophage sequences as CheckV can not reliably estimate the completeness score for eukaryotic (RNA) viruses.
The assembly comment file is essentially a tsv file that contains the necessary information on the Assembly structured comment. It should again contain two columns: Assembly_parameter and value. There are three possible Assembly_parameter values: StructuredCommentPrefix which should always be Assembly-Data, Assembly Method which contains the assembly software you used and Sequencing Technology which should include the sequencing platform used to generate your data.
comments¶
Overview¶
This command generates a structured comment file based on MIUVIG standards. It merges taxonomy, features, MIUVIG, and assembly data into a unified output file that will add structured comments to the GenBank submission.
MIUVIG structured comment¶
MIUVIG taxonomy¶
The
--taxonomy
file should contain info on the genome structure and type of your viral sequences. These are mandatory parameters in the MIUVIG standard and can be obtained with thesuvtk virus-info
if you determined your own taxonomy, orsuvtk taxonomy
to get both the taxonomy and predicted genome structure and type for your sequences. These commands will outputmiuvig_taxonomy.tsv
containing this info for all sequences.Example
contig
pred_genome_type
pred_genome_struc
Seq1
uncharacterized
undetermined
Seq2
ssRNA(+)
non-segmented
Seq3
uncharacterized
undetermined
Seq4
dsRNA
segmented
MIUVIG features¶
The
--features
input file should contain info on the software, database and method that is used to annotate the features in your viral sequences. Thesuvtk features
generatesmiuvig_features.tsv
containing all tool and databases with their versions and parameters used bysuvtk
.Example
MIUVIG_parameter
value
feat_pred
pyrodigal-gv;0.3.2;-g 1, default otherwise
ref_db
BFVD;2023_02;https://bfvd.steineggerlab.workers.dev
sim_search_meth
MMseqs2;17.b804f;-s 7.5, default otherwise
Global MIUVIG parameters¶
The MIUVIG tsv file should contain global parameters of your study that apply to all your sequences. It should be a tsv file with two columns: MIUVIG_parameter and value. Allowed parameters and their values, including which parameters are mandatory, can be found here and the corresponding MIUVIG_parameter strings can be found here.
Example
MIUVIG_parameter
value
source_uvig
viral fraction metagenome (virome)
assembly_software
metaSPAdes;v3.15.3;kmer set 21,33,55,77, default otherwise
vir_ident_software
genomad;1.7.0;score-calibration, default otherwise
size_frac
0-0.8 um
virus_enrich_appr
filtration + centrifugation + DNAse + RNAse
nucl_acid_ext
10.1038/srep16532
wga_amp_appr
mda based
CheckV quality_summary.tsv¶
Providing the
quality_summary.tsv
output of CheckV is optional, but will add more information in the MIUVIG structured comment. For example, the quality will be taken from CheckV’s quality estimation (High-quality or Genome-fragment) and also the completeness score will be added. Also if the sequence is a provirus (UpViG) will be taken into account. If you do not provide this file, sequences are considered to be ‘Genome fragment(s)’ and ‘independent sequence (UViG) by default.Note
Providing the
quality_summary.tsv
CheckV file will be mostly useful when you have bacteriophage sequences as CheckV can not reliably estimate the completeness score for eukaryotic (RNA) viruses.Assembly comment¶
The assembly comment file is essentially a tsv file that contains the necessary information on the Assembly structured comment. It should again contain two columns: Assembly_parameter and value. There are three possible Assembly_parameter values: StructuredCommentPrefix which should always be Assembly-Data, Assembly Method which contains the assembly software you used and Sequencing Technology which should include the sequencing platform used to generate your data.
Example
Assembly_parameter
value
StructuredCommentPrefix
Assembly-Data
Assembly Method
metaSPAdes v. 3.15.3
Sequencing Technology
Illumina NovaSeq 6000
Required Input¶
-t, --taxonomy: MIUVIG TSV file produced by the
taxonomy
subcommand. (Required)-f, --features: MIUVIG TSV file produced by the
features
subcommand. (Required)-m, --miuvig: TSV file containing MIUVIG information. (Required)
-a, --assembly: TSV file with GenBank assembly information. (Required)
-o, --output: Output filename (the script appends
.cmt
to the provided name). (Required)Optional Parameters¶
-c, --checkv: CheckV’s quality_summary.tsv file.
Output¶
A structured comment file (e.g.,
output.cmt
) that consolidates various data fields and meets MIUVIG standards.Example Usage¶