============================================================================
    sam-analyze - tool
============================================================================

The purpose of this tool is to analyze SAM- or BAM-files.
If a BAM-file has to be analyzed, it has to be converted into the SAM-format.
( In the future there might be an option to let this tool perform that task too.
  Another possible future function would be to import directly from a cSRA-accession. )

----------------------------------------------------------------------------

The tool can:

1) import a SAM-file and write it into a SQLITE-database

2) analyze the SQLITE-database for problems ( FLAGS etc. ), print a report

3) export all or some of the SPOTS written to the SQLITE-database in SAM-format

4) create a reference-report ( frequencies of reference-usage by spots )

----------------------------------------------------------------------------

The tool can be used to:

- explore what is wrong with a given SAM/BAM-file

- predict if loading a SAM/BAM-file with bam-load would succeed
    ( faster than probing with bam-load itself )

- fix problems before running bam-load ( for instance fixing invalid names )

- create short examples from existing SAM/BAM-files or cSRA-accessions
    for the purpose of creating fast tests for other tools

----------------------------------------------------------------------------

The tool leaves a SQLITE-database behind, which can be manually examined
with 'sqlite3' - the SQLITE-commandline-tool or other SQL-explorer-tools.
( In the future there might be an option to remove the database after
 the EXPORT-step. )

The sqlite-library is statically linked into the tool - it does not need
to be installed for the tool to function.

----------------------------------------------------------------------------

IMPORT:
=======

just import the SAM-file 'filename.SAM'
    $./sam-analyze -i filename.SAM
    ( the produced SQLITE-database defaults to 'sam.db' )

import the same file, but show progress
    $./sam-analyze -i filename.SAM -p

import the same file, show progress and print report
    $./sam-analyze -i filename.SAM -pr

import the same file, use 'example.db' as output
    $./sam-analyze -i filename.SAM -d example.db

import the same file, into an in-memory-database
    $./sam-analyze -i filename.SAM -d :memory:
    ( this makes no sense on its own, but if the import is followed by analyze and export
      and the machine has lots of RAM, this can be faster )

import the same file, increase transaction size to 100k ( default=50k )
    $./sam-analyze -i filename.SAM -t 100000
    ( the transaction-size is in lines imported, the higher the faster )

import the same file via stdin
    $cat filename.SAM | ./sam-analyze -i stdin
    ( the special name 'stdin' is used to import via a pipe )

import the same file, but limit the number of alignments read to 1000
    $./sam-analyze -i filename.SAM -l 1000
    ( the limit is on the number of alignments, all headers are imported,
      this can result in half-aligned spots, even if the source does not have them )

TBD:
    - require a config-file ( identical to bam-load )
    - import the used/all references for later tests
    - mark external vs. internal references

----------------------------------------------------------------------------

ANALYZE:
========

import the SAM-file 'filename.SAM' and analyze it
    $./sam-analyze -i filename.SAM -a

analyze a previously created database 'sam.db' ( the default name )
    $./sam-analyze -a

analyze a previously created database named 'other-name.db'
    $./sam-analyze -d other-name.db -a

TBD:
    - add plugable tests to the analyze-step
    - test if secondary alignments have the same sequence as primary ones
    - test for invalid flags
    - test if reference + cigar matches the sequence
    - test if alignments refer to unknown references
    - test if alignments refer to out-of-bounds positions on references
    - test if all references mentioned or used are available

----------------------------------------------------------------------------

EXPORT:
=======

export from the default database into a file 'out.SAM'
    $./sam-analyze -e out.SAM

export from the default database into a file 'out.SAM' with progress and report
    $./sam-analyze -e out.SAM -pr

export from a previously created database 'other-name.db'
    $./sam-analyze -d other-name.db -e out.SAM

export via stdout
    $./sam-analyze -e stdout

export into 'out.SAM', but write only used references in header-section
    $./sam-analyze -e out.SAM -u

export into 'out.SAM', and fix invalid QNAMES ( convert spaces to underscores )
    $./sam-analyze -e out.SAM -f

export into 'out.SAM', and sort output by reference-position
    $./sam-analyze -e out.SAM -s
    ( default is not sorting at all, the alignments are written in the same order
     as they have been imported )

export into 'out.SAM', and sort output by QNAME
    $./sam-analyze -e out.SAM -n
    ( sorting by reference-position and by QNAME are mutualy exclusive )

export into 'out.SAM', but export only 1000 spots
    $./sam-analyze -e out.SAM -E 1000
    ( The limit is on the number of spots not alignments! Helpful for creating tests. )

export into 'out.SAM', and produce a reference-report-file
    $./sam-analyze -e out.SAM -R ref-report.txt
    ( the reference-report-file lists how often each reference is used,
    unused references are omitted )

TBD:
    - use smallest number of references if output is restricted to a subset of spots
    - generate coverage report ( which reference-position --> coverage )
    - generate hot-spot report ( which reference-positions have to highest coverage )
    - drive the general-loader to write cSRA-format
    - add plugable filters into export-stream

----------------------------------------------------------------------------

The 3 steps ( import - analyze - export ) can be combined:
    $./sam-analyze -i filename.SAM -a -e out.SAM

TBD:
    - if no import/export files are given the tool does nothing and quits
    ( it should produce an error message )