vdb-dump extended help (1) dumping a vdb-table: the only mandatory option to vdb-dump is the name of the object to dump: vdb-dump OBJECT the OBJECT can be: a) absolute or relative path to a vdb-table (a directory) on linux: vdb-dump /path/to//SRR000001 on windows: vdb-dump \\data\sra\sra0\SRR\000000\SRR000001 vdb-dump Y:\sra0\SRR\000000\SRR000001 b) absolute or relative path to a file containing a vdb-table on linux/windows: vdb-dump SRR044989.sra c) an accession on linux/windows: vdb-dump SRR000001 outside NCBI you need internet access to reach accessions stored at NCBI and you need remote access enabled in your configuration If you specify only the object, vdb-dump will dump all columns for all rows to the standard-output. The --table / -T option: ======================== vdb-dump is designed to operate on a vdb-database. A vdb-database can contain more then one table. If you do not specify the table-name, vdb-dump will first try to interpret the given object as a vdb-database ( and try to dump the table "SEQUENCE", if that table does not exist: the first table it finds in this database ). If this try (silently) fails, because the given object is not a database, vdb-dump will try to interpret the given object as a table. If the object is not a vdb-database or vdb-table, the tool will fail. The --rows / -R option: ======================= With this option you can restrict which rows will be dumped. vdb-dump file.sra -R 5 ... will dump only row number 5 vdb-dump file.sra -R 5-20 ... will dump rows number 5 to number 20 (15 rows) The ranges can be mixed: vdb-dump file.sra -R 5,7-20,200-201,300,305 ... will dump these rows/ranges If you omit the range, vdb-dump will output all rows. The --columns -C option: ======================== With this option you can restrict which columns per row will be dumped. vdb-dump file.sra -C NAME,READ ... will dump only the columns NAME and READ per row the --exclude -x option: ======================== If you want to dump all columns, except some specific ones. vdb-dump file.sra -x READ,RD_FILTER ... will dump all columns but the READ-column and the RD_FILTER-column. The --row_id_on -I option: ========================== vdb-dump does not output the row-id per default, it has to be switched on with this option: vdb-dump SRR000001 -R1 -CNAME,SPOT_LEN NAME: EM7LVYS01C1LWG SPOT_LEN: 255 vdb-dump SRR000001 -R1 -CNAME,SPOT_LEN -I ROW-ID = 1 NAME: EM7LVYS01C1LWG SPOT_LEN: 255 The --line_feed -l option: ========================== vdb-dump separates the rows by one empty line (line-feed) per default: vdb-dump SRR000001 -R1-3 -CNAME,SPOT_LEN NAME: EM7LVYS01C1LWG SPOT_LEN: 255 NAME: EM7LVYS01B2EMP SPOT_LEN: 248 NAME: EM7LVYS01C2YO0 SPOT_LEN: 307 with this option you can change that: vdb-dump SRR000001 -R1-3 -CNAME,SPOT_LEN -l2 NAME: EM7LVYS01C1LWG SPOT_LEN: 255 NAME: EM7LVYS01B2EMP SPOT_LEN: 248 NAME: EM7LVYS01C2YO0 SPOT_LEN: 307 The --colname_off -N option: ============================ vdb-dump prints the name of every column in front of the it's data: vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN NAME: EM7LVYS01C1LWG SPOT_LEN: 255 NAME: EM7LVYS01B2EMP SPOT_LEN: 248 With this option it prints only the data: vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -N EM7LVYS01C1LWG 255 EM7LVYS01B2EMP 248 The --in_hex -X option: ======================= With this option all numeric outputs are printed as hexadecimal numbers: $vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -X NAME: EM7LVYS01C1LWG SPOT_LEN: 0xFF NAME: EM7LVYS01B2EMP SPOT_LEN: 0xF8 The --dna_baese -D option: ========================== With this option you can force columns into printed as DNA-base "ACGT", but only if the column has a datatype with more than one dimension. If a column has a datatype with a dimension of 2, each dimension 1 bit, it is automatically printed as DNA-base. The --max_length -M option: =========================== With this options you can truncate the output of columns longer than this limit. vdb-dump SRR000001 -R1-2 -CREAD READ: TCAGGGGGGAGCTTAAATTTGAAACTAGAAAAATTTTGAACAAAATAATCATAATTGTTAGCTGATGAAAAACTAGAAAAGATTTTCTGAGTGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAACGGTATCCCGTAGTGTGCATTCATCCCTGCTCTGGATACAGTCAGCTCCCAAATTCCATAAACAACTCCTTTGTAAGTAACCTCCTTTTGACAGGGGGTACTGAGCGGGCTGGCAAGGCN READ: TCAGGGGGGGGTTACACGTGCAGATTTGTTACACGGGTGTACTGTGAGGTTTGGGGTACGAATGATCCCGTTACCTAGATAGTGAGCATGGAACCCGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAACAATGTGCAGGGCTCAGGTCAGCATTAGGGTCAGGTTCTTAGGAAAAGAAAGAGCAAAAACAATGAAACACAATACAAAGTAAAGAACACTGAGCGGGCTGGCAAGGCN vdb-dump SRR000001 -R1-2 -CREAD -M40 READ: TCAGGGGGGAGCTTAAATTTGAAACTAGAA ... READ: TCAGGGGGGGGTTACACGTGCAGATTTGTT ... The --indent_with -i option: ============================ With this option you can limit the length of the output-line and force a left-edge indenting. vdb-dump $vdb-dump SRR000001 -R1-2 -CREAD -i80 READ: TCAGGGGGGAGCTTAAATTTGAAACTAGAAAAATTTTGAACAAAATAATCATAATTGTTAGCTGATGAAAAACT AGAAAAGATTTTCTGAGTGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAACGGTATCCCGTAG TGTGCATTCATCCCTGCTCTGGATACAGTCAGCTCCCAAATTCCATAAACAACTCCTTTGTAAGTAACCTCCTT TTGACAGGGGGTACTGAGCGGGCTGGCAAGGCN READ: TCAGGGGGGGGTTACACGTGCAGATTTGTTACACGGGTGTACTGTGAGGTTTGGGGTACGAATGATCCCGTTAC CTAGATAGTGAGCATGGAACCCGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAACAATGTGCA GGGCTCAGGTCAGCATTAGGGTCAGGTTCTTAGGAAAAGAAAGAGCAAAAACAATGAAACACAATACAAAGTAA AGAACACTGAGCGGGCTGGCAAGGCN The --format -f option: ======================= This selects other than the default-output formating: csv = comma-separated on one line --------------------------------- vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -fcsv EM7LVYS01C1LWG,255 EM7LVYS01B2EMP,248 xml = xml-section ----------------- vdb-dump $vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -fxml EM7LVYS01C1LWG 255 EM7LVYS01B2EMP 248 json = json format ------------------ vdb-dump $vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -fjson { "row_id": 1, "NAME":"EM7LVYS01C1LWG", "SPOT_LEN":255 }, { "row_id": 2, "NAME":"EM7LVYS01B2EMP", "SPOT_LEN":248 }, piped = format friendly to beeing piped into other processes ------------------------------------------------------------ vdb-dump $vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -fpiped 1, NAME: "EM7LVYS02FOYNU" 1, SPOT_LEN: 284 2, NAME: "EM7LVYS02GCAPL" 2, SPOT_LEN: 262 sra-dump = simulates the output of a deprecated tool ------------------------------------------------------------ vdb-dump $vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -fsra-dump 1, NAME: EM7LVYS02FOYNU 1, SPOT_LEN: 284 2, NAME: EM7LVYS02GCAPL 2, SPOT_LEN: 262 fastq = produces fastq-output ( the table needs to have a READ- and a QUALITY column, no splitting supported ) ------------------------------------------------------- vdb-dump $vdb-dump SRR000001 -R1 -ffastq @SRR000001.1 EM7LVYS02FOYNU length=284 TCAGATTCTCCTAGCCTACATCCGTACGAGTTAGCGTGGGATTACGAGGTGCACACCATTTCATTCCGTACGGGTAAATTTTTGTATTTTTAGCAGACGGCAGGGTTTCACCATGGTTGACCAACGTACTAATCTTGAACTCCTGACCTCAAGTGATTTGCCTGCCTTCAGCCTCCCAAAGTGACTGGGTATTACAGATGTGAGCGAGTTTGTGCCCAAGCCTTATAAGTAAATTTATAAATTTACATAATTTAAATGACTTATGCTTAGCGAAATAGGGTAAG +SRR000001.1 EM7LVYS02FOYNU length=284 =<8<85)9=9/3-8?68<7=8<3657747==49==+;FB2;A;5:'*>69<:74)9.;C?+;*GC8/%9<=GC8.#=2:5:16D==*6?7<:77>:1+CA138?<)C@2166:A:%<<9<;33<;6?9;<;4=:%<$CA1+1%1 fasta = produces fasta-output ( the table needs to have a READ column ) ------------------------------------------------------- vdb-dump SRR000001 -R1 -f fasta >SRR000001.1 EM7LVYS02FOYNU length=284 TCAGATTCTCCTAGCCTACATCCGTACGAGTTAGCGTGGGATTACGAGGTGCACACCATTTCATTCCGTA CGGGTAAATTTTTGTATTTTTAGCAGACGGCAGGGTTTCACCATGGTTGACCAACGTACTAATCTTGAAC TCCTGACCTCAAGTGATTTGCCTGCCTTCAGCCTCCCAAAGTGACTGGGTATTACAGATGTGAGCGAGTT TGTGCCCAAGCCTTATAAGTAAATTTATAAATTTACATAATTTAAATGACTTATGCTTAGCGAAATAGGG TAAG The --without_sra -n option: ============================ With this option you can switch off the special treatment (translation) of certain column-types vdb-dump SRR000001 -R1 -C SPOT_DESC,PLATFORM SPOT_DESC: spot_len=255, fixed_len=0, signal_len=400, clip_qual_right=235, num_reads=4 PLATFORM: SRA_PLATFORM_454 vdb-dump SRR000001 -R1 -C SPOT_DESC,PLATFORM -n SPOT_DESC: [255, 0, 0, 0, 144, 1, 235, 0, 4, 0, 0, 0, 0, 0, 0, 0] PLATFORM: 1 The --table_enum -E option: =========================== If the object is a vdb-database, enumerate the tables it contains. The --version -V option: ======================== Print the version of the vdb-manager used by vdb-dump. vdb-dump -V vdb-dump: 2.5.1 The column_enum_short -o option: ================================ Enumerates the columns and the default type of each column vdb-dump SRR000001 -o BASE_COUNT (U64) BIO_BASE_COUNT (U64) CLIP_ADAPTER_LEFT (INSDC:coord:one) etc. The column_enum -O option: ========================== Enumerates the columns and all available type of each column vdb-dump SRR000001 -O SRR000001.01 : (032 bits [01], Int) CLIP_QUALITY_LEFT (INSDC:coord:one) CLIP_QUALITY_LEFT.type[0] = INSDC:coord:one (dflt) CLIP_QUALITY_LEFT.type[1] = U16 CLIP_QUALITY_LEFT.type[2] = INSDC:coord:zero SRR000001.02 : (032 bits [01], Int) CLIP_QUALITY_RIGHT (INSDC:coord:one) CLIP_QUALITY_RIGHT.type[0] = INSDC:coord:one (dflt) CLIP_QUALITY_RIGHT.type[1] = U16 CLIP_QUALITY_RIGHT.type[2] = INSDC:coord:zero SRR000001.03 : (008 bits [01], Uint) COLOR_MATRIX (U8) COLOR_MATRIX.type[0] = U8 (dflt) etc. The --id_range -r option: ========================= Print the row-range that a table contains. vdb-dump SRR000001 -r id-range: first-row = 1, row-count = 470985 The --info option: ================== prints a summary of meta-data about the accession vdb-dump SRR000001 --info acc : SRR000001 path : /somepath/SRR/000000/SRR000001 size : 312,527,083 type : Table platf : SRA_PLATFORM_454 SEQ : 470,985 SCHEMA : NCBI:SRA:_454_:tbl:v2#1.0.7 TIME : 0x0000000055248a41 (04/07/2015 21:54) FMT : SFF FMTVER : 2.4.5 LDR : sff-load.2.4.5 LDRVER : 2.4.5 LDRDATE: Feb 25 2015 (2/25/2015 0:0)