table2asn


Overview

Generates a submission file (.sqn) for GenBank by integrating sequence data, feature tables, and structured comment files. It leverages the table2asn utility and validates the submission file.

Fasta file

Your input sequences, preferably the reoriented_nucleotide_sequences.fna from the features subcommand.

Source file

This file contains the metadata of your sequences as required by NCBI. You will have to generate this file yourself from the output of the suvtk taxonomy subcommand and the metadata of your study. The taxonomy.tsvfile contains the Sequence_ID and the Organism values, you will have to provide:

  • Isolate (required; unique identifiers for each sequence, preferably as a single string of at least six alphanumerical characters (e.g., blue53F), using hyphens and underscores to tie separate elements together, e.g., “0815_Eier-kuchen”)

  • Collection_date (required; in format [DD-Mon-]YYYY)

  • geo_loc_name (required; Country of origin)

  • Lat_Lon (required; coordinates in decimal degrees format: XX.XXX N/S XX.XXX E/W)

  • Bioproject (Bioproject accession)

  • Biosample (Biosample accession)

  • SRA (SRA accession)

  • Segment (Indicates to which segment this contig belongs, the co-occurrence module can help with determining more segments of a virus. Leave blank for non-segmented viruses)

  • Metagenomic (should always be TRUE)

  • Metagenome_source (origin of sample, eg. human gut metagenome, soil metagenome, etc.)

The order of the columns does not matter.

Example

Sequence_ID

Organism

Isolate

Collection_date

geo_loc_name

Metagenome_source

Lat_Lon

Biosample

SRA

Segment

Metagenomic

Seq1

Riboviria sp.

Sample1_bazTh

Jul-21

Cameroon

blackfly metagenome

4.352433 N 11.63255 E

SAMN40472416

SRR28387210

TRUE

Seq2

Leviviricetes sp.

Sample2_8xzwR

Jul-21

Cameroon

blackfly metagenome

4.352433 N 11.63255 E

SAMN40472429

SRR28387198

TRUE

Seq3

unclassified viruses

Sample3_xliVj

Jul-21

Cameroon

blackfly metagenome

4.352433 N 11.63255 E

SAMN40472417

SRR28387209

TRUE

Seq4

Chrysoviridae sp.

Sample4_qC6AD

Jul-21

Cameroon

blackfly metagenome

4.352433 N 11.63255 E

SAMN40472428

SRR28387197

1

TRUE

Seq5

Chrysoviridae sp.

Sample4_qC6AD

Jul-21

Cameroon

blackfly metagenome

4.352433 N 11.63255 E

SAMN40472428

SRR28387197

2

TRUE

Seq6

Chrysoviridae sp.

Sample4_qC6AD

Jul-21

Cameroon

blackfly metagenome

4.352433 N 11.63255 E

SAMN40472428

SRR28387197

3

TRUE

Seq7

Chrysoviridae sp.

Sample4_qC6AD

Jul-21

Cameroon

blackfly metagenome

4.352433 N 11.63255 E

SAMN40472428

SRR28387197

4

TRUE

Seq8

Negarnaviricota sp.

Sample1_IowNh

Jul-21

Cameroon

blackfly metagenome

4.352433 N 11.63255 E

SAMN40472416

SRR28387210

TRUE

Seq9

Riboviria sp.

Sample3_o7K62

Jul-21

Cameroon

blackfly metagenome

4.352433 N 11.63255 E

SAMN40472417

SRR28387209

TRUE

Note

Note that for the segmented virus, the Isolate value is the same for all segments.

Template file

You can generate a template file here to include sequence author information in the Genbank submission.

Comments file

This file contains structured comments that provide additional metadata for your submission. It should be generated from the comments subcommand.


Required Input

  • -i, --input: Input FASTA file. (Required)

  • -o, --output: Output prefix (the resulting file will have a .sqn extension). (Required)

  • -s, --src-file: File containing source modifiers (.src). (Required)

  • -f, --features: Feature table file (.tbl). (Required)

  • -t, --template: Template file with author information (.sbt). (Required)

  • -c, --comments: Structured comment file (.cmt) with MIUVIG information. (Required)

Output

  • <output>.sqn: The submission file ready for GenBank.

  • <output>.val: A validation file listing warnings, info, or errors from the submission check.

Example Usage

suvtk table2asn -i sequences.fasta -o submission -s source.src -f features.tbl -t template.sbt -c comments.cmt