Data Requirements

  1. You must have an approximate genome size estimate for group

    • Note: If genome size estimates vary greatly within your group, use an estimate on the larger end
  2. Raw or Trimmed NGS reads for each taxa

    • Currently, starting NGS reads should either be all untrimmed or all trimmed
    • Paired-end reads must have the same basename and end in _1/_2.fastq.gz to be recognized as a paired end. Any other naming convention will trigger single-end assignment
    • SISRS default parameters require 3X depth to call a site, so higher per-taxa coverage is ideal. As a minimum we recommend at least 10X coverage. The pipeline can be run with less, but site recovery will become reduced as coverage drops
    • All read files should be of the same ‘type’ (e.g. Don’t mix DNA-seq + RNA-seq)
    • Ensure high sequence data quality prior to analysis (low read quality and high sequence duplication levels are both red flags for analysis). Note that the built-in trimming scripts are fairly conservative