vdb-dump extended help
(1) dumping a vdb-table:
the only mandatory option to vdb-dump is the name of the object to dump:
vdb-dump OBJECT
the OBJECT can be:
a) absolute or relative path to a vdb-table (a directory)
on linux: vdb-dump /path/to//SRR000001
on windows: vdb-dump \\data\sra\sra0\SRR\000000\SRR000001
vdb-dump Y:\sra0\SRR\000000\SRR000001
b) absolute or relative path to a file containing a vdb-table
on linux/windows: vdb-dump SRR044989.sra
c) an accession
on linux/windows: vdb-dump SRR000001
outside NCBI you need internet access to reach accessions stored at NCBI and you need
remote access enabled in your configuration
If you specify only the object, vdb-dump will dump all columns for all rows to the standard-output.
The --table / -T option:
========================
vdb-dump is designed to operate on a vdb-database. A vdb-database can contain more then one table.
If you do not specify the table-name, vdb-dump will first try to interpret the given object as a vdb-database
( and try to dump the table "SEQUENCE", if that table does not exist: the first table it finds in this database ).
If this try (silently) fails, because the given object is not a database, vdb-dump will try to interpret
the given object as a table. If the object is not a vdb-database or vdb-table, the tool will fail.
The --rows / -R option:
=======================
With this option you can restrict which rows will be dumped.
vdb-dump file.sra -R 5 ... will dump only row number 5
vdb-dump file.sra -R 5-20 ... will dump rows number 5 to number 20 (15 rows)
The ranges can be mixed:
vdb-dump file.sra -R 5,7-20,200-201,300,305 ... will dump these rows/ranges
If you omit the range, vdb-dump will output all rows.
The --columns -C option:
========================
With this option you can restrict which columns per row will be dumped.
vdb-dump file.sra -C NAME,READ ... will dump only the columns NAME and READ per row
the --exclude -x option:
========================
If you want to dump all columns, except some specific ones.
vdb-dump file.sra -x READ,RD_FILTER ... will dump all columns but the READ-column
and the RD_FILTER-column.
The --row_id_on -I option:
==========================
vdb-dump does not output the row-id per default, it has to be switched on with this option:
vdb-dump SRR000001 -R1 -CNAME,SPOT_LEN
NAME: EM7LVYS01C1LWG
SPOT_LEN: 255
vdb-dump SRR000001 -R1 -CNAME,SPOT_LEN -I
ROW-ID = 1
NAME: EM7LVYS01C1LWG
SPOT_LEN: 255
The --line_feed -l option:
==========================
vdb-dump separates the rows by one empty line (line-feed) per default:
vdb-dump SRR000001 -R1-3 -CNAME,SPOT_LEN
NAME: EM7LVYS01C1LWG
SPOT_LEN: 255
NAME: EM7LVYS01B2EMP
SPOT_LEN: 248
NAME: EM7LVYS01C2YO0
SPOT_LEN: 307
with this option you can change that:
vdb-dump SRR000001 -R1-3 -CNAME,SPOT_LEN -l2
NAME: EM7LVYS01C1LWG
SPOT_LEN: 255
NAME: EM7LVYS01B2EMP
SPOT_LEN: 248
NAME: EM7LVYS01C2YO0
SPOT_LEN: 307
The --colname_off -N option:
============================
vdb-dump prints the name of every column in front of the it's data:
vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN
NAME: EM7LVYS01C1LWG
SPOT_LEN: 255
NAME: EM7LVYS01B2EMP
SPOT_LEN: 248
With this option it prints only the data:
vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -N
EM7LVYS01C1LWG
255
EM7LVYS01B2EMP
248
The --in_hex -X option:
=======================
With this option all numeric outputs are printed as hexadecimal numbers:
$vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -X
NAME: EM7LVYS01C1LWG
SPOT_LEN: 0xFF
NAME: EM7LVYS01B2EMP
SPOT_LEN: 0xF8
The --dna_baese -D option:
==========================
With this option you can force columns into printed as DNA-base "ACGT",
but only if the column has a datatype with more than one dimension.
If a column has a datatype with a dimension of 2, each dimension 1 bit,
it is automatically printed as DNA-base.
The --max_length -M option:
===========================
With this options you can truncate the output of columns longer than this limit.
vdb-dump SRR000001 -R1-2 -CREAD
READ: TCAGGGGGGAGCTTAAATTTGAAACTAGAAAAATTTTGAACAAAATAATCATAATTGTTAGCTGATGAAAAACTAGAAAAGATTTTCTGAGTGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAACGGTATCCCGTAGTGTGCATTCATCCCTGCTCTGGATACAGTCAGCTCCCAAATTCCATAAACAACTCCTTTGTAAGTAACCTCCTTTTGACAGGGGGTACTGAGCGGGCTGGCAAGGCN
READ: TCAGGGGGGGGTTACACGTGCAGATTTGTTACACGGGTGTACTGTGAGGTTTGGGGTACGAATGATCCCGTTACCTAGATAGTGAGCATGGAACCCGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAACAATGTGCAGGGCTCAGGTCAGCATTAGGGTCAGGTTCTTAGGAAAAGAAAGAGCAAAAACAATGAAACACAATACAAAGTAAAGAACACTGAGCGGGCTGGCAAGGCN
vdb-dump SRR000001 -R1-2 -CREAD -M40
READ: TCAGGGGGGAGCTTAAATTTGAAACTAGAA ...
READ: TCAGGGGGGGGTTACACGTGCAGATTTGTT ...
The --indent_with -i option:
============================
With this option you can limit the length of the output-line and force a left-edge
indenting.
vdb-dump $vdb-dump SRR000001 -R1-2 -CREAD -i80
READ: TCAGGGGGGAGCTTAAATTTGAAACTAGAAAAATTTTGAACAAAATAATCATAATTGTTAGCTGATGAAAAACT
AGAAAAGATTTTCTGAGTGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAACGGTATCCCGTAG
TGTGCATTCATCCCTGCTCTGGATACAGTCAGCTCCCAAATTCCATAAACAACTCCTTTGTAAGTAACCTCCTT
TTGACAGGGGGTACTGAGCGGGCTGGCAAGGCN
READ: TCAGGGGGGGGTTACACGTGCAGATTTGTTACACGGGTGTACTGTGAGGTTTGGGGTACGAATGATCCCGTTAC
CTAGATAGTGAGCATGGAACCCGTTGGAACCGAAAGGGTTTGAATTCAAACCCTTTCGGTTCCAACAATGTGCA
GGGCTCAGGTCAGCATTAGGGTCAGGTTCTTAGGAAAAGAAAGAGCAAAAACAATGAAACACAATACAAAGTAA
AGAACACTGAGCGGGCTGGCAAGGCN
The --format -f option:
=======================
This selects other than the default-output formating:
csv = comma-separated on one line
---------------------------------
vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -fcsv
EM7LVYS01C1LWG,255
EM7LVYS01B2EMP,248
xml = xml-section
-----------------
vdb-dump $vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -fxml
EM7LVYS01C1LWG
255
EM7LVYS01B2EMP
248
json = json format
------------------
vdb-dump $vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -fjson
{
"row_id": 1,
"NAME":"EM7LVYS01C1LWG",
"SPOT_LEN":255
},
{
"row_id": 2,
"NAME":"EM7LVYS01B2EMP",
"SPOT_LEN":248
},
piped = format friendly to beeing piped into other processes
------------------------------------------------------------
vdb-dump $vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -fpiped
1, NAME: "EM7LVYS02FOYNU"
1, SPOT_LEN: 284
2, NAME: "EM7LVYS02GCAPL"
2, SPOT_LEN: 262
sra-dump = simulates the output of a deprecated tool
------------------------------------------------------------
vdb-dump $vdb-dump SRR000001 -R1-2 -CNAME,SPOT_LEN -fsra-dump
1, NAME: EM7LVYS02FOYNU
1, SPOT_LEN: 284
2, NAME: EM7LVYS02GCAPL
2, SPOT_LEN: 262
fastq = produces fastq-output
( the table needs to have a READ- and a QUALITY column, no splitting supported )
-------------------------------------------------------
vdb-dump $vdb-dump SRR000001 -R1 -ffastq
@SRR000001.1 EM7LVYS02FOYNU length=284
TCAGATTCTCCTAGCCTACATCCGTACGAGTTAGCGTGGGATTACGAGGTGCACACCATTTCATTCCGTACGGGTAAATTTTTGTATTTTTAGCAGACGGCAGGGTTTCACCATGGTTGACCAACGTACTAATCTTGAACTCCTGACCTCAAGTGATTTGCCTGCCTTCAGCCTCCCAAAGTGACTGGGTATTACAGATGTGAGCGAGTTTGTGCCCAAGCCTTATAAGTAAATTTATAAATTTACATAATTTAAATGACTTATGCTTAGCGAAATAGGGTAAG
+SRR000001.1 EM7LVYS02FOYNU length=284
=<8<85)9=9/3-8?68<7=8<3657747==49==+;FB2;A;5:'*>69<:74)9.;C?+;*GC8/%9<=GC8.#=2:5:16D==*6?7<:77>:1+CA138?<)C@2166:A:%<<9<;33<;6?9;<;4=:%<$CA1+1%1
fasta = produces fasta-output
( the table needs to have a READ column )
-------------------------------------------------------
vdb-dump SRR000001 -R1 -f fasta
>SRR000001.1 EM7LVYS02FOYNU length=284
TCAGATTCTCCTAGCCTACATCCGTACGAGTTAGCGTGGGATTACGAGGTGCACACCATTTCATTCCGTA
CGGGTAAATTTTTGTATTTTTAGCAGACGGCAGGGTTTCACCATGGTTGACCAACGTACTAATCTTGAAC
TCCTGACCTCAAGTGATTTGCCTGCCTTCAGCCTCCCAAAGTGACTGGGTATTACAGATGTGAGCGAGTT
TGTGCCCAAGCCTTATAAGTAAATTTATAAATTTACATAATTTAAATGACTTATGCTTAGCGAAATAGGG
TAAG
The --without_sra -n option:
============================
With this option you can switch off the special treatment (translation) of certain column-types
vdb-dump SRR000001 -R1 -C SPOT_DESC,PLATFORM
SPOT_DESC: spot_len=255, fixed_len=0, signal_len=400, clip_qual_right=235, num_reads=4
PLATFORM: SRA_PLATFORM_454
vdb-dump SRR000001 -R1 -C SPOT_DESC,PLATFORM -n
SPOT_DESC: [255, 0, 0, 0, 144, 1, 235, 0, 4, 0, 0, 0, 0, 0, 0, 0]
PLATFORM: 1
The --table_enum -E option:
===========================
If the object is a vdb-database, enumerate the tables it contains.
The --version -V option:
========================
Print the version of the vdb-manager used by vdb-dump.
vdb-dump -V
vdb-dump: 2.5.1
The column_enum_short -o option:
================================
Enumerates the columns and the default type of each column
vdb-dump SRR000001 -o
BASE_COUNT (U64)
BIO_BASE_COUNT (U64)
CLIP_ADAPTER_LEFT (INSDC:coord:one)
etc.
The column_enum -O option:
==========================
Enumerates the columns and all available type of each column
vdb-dump SRR000001 -O
SRR000001.01 : (032 bits [01], Int) CLIP_QUALITY_LEFT
(INSDC:coord:one)
CLIP_QUALITY_LEFT.type[0] = INSDC:coord:one (dflt)
CLIP_QUALITY_LEFT.type[1] = U16
CLIP_QUALITY_LEFT.type[2] = INSDC:coord:zero
SRR000001.02 : (032 bits [01], Int) CLIP_QUALITY_RIGHT
(INSDC:coord:one)
CLIP_QUALITY_RIGHT.type[0] = INSDC:coord:one (dflt)
CLIP_QUALITY_RIGHT.type[1] = U16
CLIP_QUALITY_RIGHT.type[2] = INSDC:coord:zero
SRR000001.03 : (008 bits [01], Uint) COLOR_MATRIX
(U8)
COLOR_MATRIX.type[0] = U8 (dflt)
etc.
The --id_range -r option:
=========================
Print the row-range that a table contains.
vdb-dump SRR000001 -r
id-range: first-row = 1, row-count = 470985
The --info option:
==================
prints a summary of meta-data about the accession
vdb-dump SRR000001 --info
acc : SRR000001
path : /somepath/SRR/000000/SRR000001
size : 312,527,083
type : Table
platf : SRA_PLATFORM_454
SEQ : 470,985
SCHEMA : NCBI:SRA:_454_:tbl:v2#1.0.7
TIME : 0x0000000055248a41 (04/07/2015 21:54)
FMT : SFF
FMTVER : 2.4.5
LDR : sff-load.2.4.5
LDRVER : 2.4.5
LDRDATE: Feb 25 2015 (2/25/2015 0:0)