If these threads are starved, they will sleep (the quantification threads do not busy wait), but there is a point beyond which allocating more threads will not speed up alignment-based quantification. For quasi-mapping-based Salmon, the story is somewhat different. Generally, performance continues to improve as more threads are made available. This is because the determiniation of the potential mapping locations of each read is, generally, the slowest step in quasi-mapping-based quantification.

Since this process is trivially parallelizable (and well-parallelized within Salmon), more threads generally equates to faster quantification.

However, there may be a point to the return on invested threads, when Salmon can begin to process fragments more quickly than they can be provided via the parser.

One of the novel and innovative features of Salmon is its ability to accurately quantify transcripts without having aligned the reads using its fast, built-in selective-alignment algorithm.

Further details about the selective alignment algorithm can be found here. If you want to use Salmon in mapping-based mode, then you first have to build an index for your transcriptome.

We generally recommend that you build a decoy-aware transcriptome file. This can be done with e.g. MashMap2, and we provide some simple scripts to greatly simplify this whole process. Specifically, you can use the generateDecoyTranscriptome. The second is to use the entire genome of the organism as the decoy sequence. This can be done by concatenating the genome to the end of the transcriptome you want to index and populating the decoys.

Detailed instructions on how to build this type of decoy sequence is available here. This scheme provides a more comprehensive set of decoys, but requires considerably more memory to build the index.

Finally, pre-built versions of both the partial decoy and full decoy (i.e. genome) indices are available. While the mapping algorithms will make used of arbitrarily long matches between the query and reference, the k size selected here will act as the minimum acceptable length for a valid match. Thus, a smaller value of k may improve sensitivty.

We find that a k of 31 seems to work well for reads of 75bp or longer, but you might consider a smaller k if you plan to deal with shorter reads. So, if you are seeing a smaller mapping rate than you might expect, consider building the index with a slightly smaller k. Then, you can quantify any set of reads (say, paired-end reads in files reads1 and reads2). This is because the contents of the library type flag is used to determine how the reads should be interpreted.

You can, of course, pass a number of options to control things such as the number of threads used or the different cutoffs used for counting reads. Also, one may wish to quantify multiple replicates or samples together, treating them as if they are one library.

When the input is paired-end reads, the order of the files in the left and right lists must be the same. There are a number of ways to provide salmon with multiple read files, and treat these as a single library. Both methods work, and are acceptable ways to merge the files. The latter method (i.e. providing multiple files) is preferred. Salmon does not currently have built-in support for interleaved FASTQ files (i.e. files where the paired-end reads are interleaved). We provide a script that can be used to run salmon with interleaved input.

However, this script assumes that the input reads are perfectly synchronized. That is, the input cannot contain any un-paired reads. This contains the quantification results from the run, and the columns it contains are similar to those of Sailfish (and self-explanatory where they differ).

For the full set of options that can be passed to Salmon in its alignment-based mode, and a description of each, run salmon quant --help-alignment.

Salmon expects that the alignment files provided are with respect to the transcripts given in the corresponding fasta file. That is, Salmon expects that the reads have been aligned directly to the transcriptome (like RSEM, eXpress, etc. If you have reads that have already been aligned to the genome, there are currently 3 options for converting them for use with Salmon.

Third, you could use a tool like sam-xlate to try and convert the genome-coordinate BAM files directly into transcript coordinates. This avoids the necessity of having to re-map the reads. However, we have very limited experience with this tool so far.

If your alignments for the sample you want to quantify appear in multiple files, Salmon will automatically read through these one after the other and quantify transcripts using the alignments contained therein. However, it is currently the case that these separate files must (1) all be of the same type and (2) all be aligned with respect to the same reference (i.e. transcriptome).

Salmon exposes a number of useful optional command-line parameters to the user. The particularly important ones are explained here, but you can always run salmon quant -h to see them all.

Enables selective alignment of the sequencing reads when mapping them to the transcriptome. This can improve both the sensitivity and specificity of mapping and, as a result, can improve quantification accuracy. If you pass the --validateMappings flag to salmon, in addition to using a more sensitive and accurate mapping algorithm, it will run an extension alignment dynamic program on the potential mappings it produces.

The alignment procedure used to validate these mappings makes use of the highly-efficient and SIMD-parallelized ksw2 library. Moreover, salmon makes use of an intelligent alignment cache to avoid computing alignment scores against the same transcript sequences (e.g. when multiple reads map to the same set of transcripts).

The exact parameters used for scoring alignments, and the cutoff used for which mappings should be reported at all, are controllable by parameters described below.

These setting essentially disallow indels in the resulting alignments. This flag (which should only be used with selective alignment) turns off soft filtering and range-factorized equivalence classes, and removes all but the equally highest scoring mappings from the equivalence class label for each fragment. While we recommend Hydorcortisone soft filtering (the default) for quantification, this flag can produce easier-to-understand equivalence classes if that is the primary object of study. Related to the above, this flag will stop execution before the actual quantification algorithm is run.

Dovetailing mappings and alignments are considered discordant and discarded by default - this is the same behavior that is adopted by default in Bowtie2.



