How To Write an RNA-Seq Methods Section Reviewers Accept
A methods section interrogation is not a pleasant experience. A PI asks why you set a particular parameter in STAR, and you remember choosing it from a forum post two months ago but cannot find the source. A reviewer returns the manuscript asking for the reference genome version, the Ensembl release, the exact filter applied before differential expression, and the multiple testing correction method. None of those questions are unfair. They are the minimum information needed to reproduce the analysis. The problem is that most RNA-seq methods sections are written after the analysis is complete, from memory, by someone who cannot fully recall every decision they made.
This post is a template and a checklist. Use it before you write the methods section, and use it again as a final pass before submission. The goal is not to write a long methods section. It is to write one specific enough that a reviewer cannot send it back asking for information you already have.
What Reviewers Actually Flag
Reviewers who know bioinformatics are looking for a small set of things that appear consistently in rejection comments. Understanding this list means you can pre-empt every objection before it arrives.
Missing tool versions. Citing DESeq2 without specifying the version tells the reviewer nothing useful. DESeq2 1.36 and DESeq2 1.40 behave differently in ways that can affect results, and a reader trying to reproduce your analysis needs to know which version produced your numbers. The same applies to every tool in the pipeline: fastp, STAR, Salmon, featureCounts, clusterProfiler. Version numbers are not optional detail; they are the minimum scaffold for reproducibility.
Missing reference genome version. “Reads were aligned to the human genome” tells a reviewer you used a human genome. It does not tell them which assembly (GRCh38, T2T-CHM13), which patch level (GRCh38.p14), whether you used the primary assembly or the full assembly with ALT contigs, or which Ensembl or GENCODE release provided the gene annotation. Each of those choices affects the gene count, the number of alignable reads, and the list of genes that appear in your count matrix.
Multiple testing correction omitted or unspecified. If the methods say “genes with p < 0.05 were considered significant,” a reviewer familiar with the field will immediately ask whether that is a raw p-value or an adjusted one. Omitting the word “adjusted” is the most common single error in RNA-seq methods sections, and it is the one most likely to trigger a reproducibility concern because raw p-value thresholds in a multi-gene test are scientifically indefensible.
The enrichment background. Methods sections for enrichment analysis routinely say “gene ontology analysis was performed using clusterProfiler” and stop there. They do not specify whether ORA or GSEA was used, what background gene set was supplied, or what significance threshold was applied. All three choices materially affect the result.
Citing a URL instead of the tool paper. Citing https://github.com/alexdobin/STAR in the references section is not a citation in the bibliographic sense; it is a URL that may break. Every major bioinformatics tool has a citable primary paper. Cite the paper.
The five things reviewers flag most often in RNA-seq methods
- Tool versions absent for every step in the pipeline. 2. Reference genome assembly and annotation release unspecified. 3. p-values reported without clarifying they are adjusted (or not). 4. Enrichment analysis background and method unspecified. 5. Tools cited by URL or software name alone rather than by their primary publication.
The One-Paragraph-Per-Stage Rule
The structure that works best for reviewers and for reproducibility is one paragraph per analysis stage, in the order the analysis was performed. This is not a rigid rule, but it is a reliable default for most bulk RNA-seq manuscripts.
The stages for a standard bulk RNA-seq analysis are: library preparation and sequencing (or a pointer to the core facility’s protocol), quality control and trimming, alignment or pseudoalignment and quantification, differential expression analysis, and enrichment analysis. Each stage gets its own paragraph. Each paragraph names the tool, specifies the version, states the key parameters, and names the reference file where applicable.
What does not belong in the main text methods: figures of the pipeline itself, long lists of all parameters passed to a tool, detailed output statistics (those belong in supplementary tables), and methodology for tools whose parameters were all left at default if the defaults are well-established and you are citing the primary paper.
What belongs in supplementary methods rather than main text: custom scripts, non-standard parameter choices that require extended justification, organism-specific decisions that most readers will not encounter, and the full sessionInfo() output from R.
A Working Template
The following template covers a standard paired-end poly(A)-selected bulk RNA-seq experiment. Bracketed items are placeholders to fill in with your specific values. Adapt the structure for your organism, library type, and specific tools.
Quality control and read trimming. Raw sequencing reads were assessed for quality using FastQC (version [X.X]) and MultiQC (version [X.X]). Adapter sequences and low-quality bases were removed using fastp (version [X.X]) with the following parameters: [list parameters differing from defaults, e.g., —detect_adapter_for_pe, —cut_tail, —qualified_quality_phred 20]. Post-trimming quality was re-assessed with FastQC.
Alignment and quantification. Trimmed reads were aligned to the [GRCh38.p14 / GRCm39 / specify assembly] reference genome using STAR (version [X.X]) with GENCODE [release number] gene annotation. Alternatively: Transcript-level abundances were estimated using Salmon (version [X.X]) in quasi-mapping mode against a decoy-aware transcriptome index built from GENCODE [release number] transcript sequences and the [GRCh38.p14] genome as decoy. Gene-level counts were obtained using tximport (version [X.X]) with the [lengthScaledTPM / scaledTPM] method.
Differential expression analysis. Differential expression analysis was performed using DESeq2 (version [X.X]) in R (version [X.X]). The design formula was [~condition / ~batch + condition / specify]. Genes with fewer than [10] counts across all samples were excluded prior to analysis. Fold change estimates were shrunk using the apeglm method via the lfcShrink function. Genes with a Benjamini-Hochberg adjusted p-value below [0.05] and an absolute log2 fold change exceeding [1] were considered differentially expressed.
Enrichment analysis. Gene ontology and pathway enrichment analysis was performed using clusterProfiler (version [X.X]). Over-representation analysis was performed using the enrichGO function with the tested gene universe as the background, Biological Process ontology, and Benjamini-Hochberg correction applied to all tested terms. Gene set enrichment analysis was performed using gseGO with genes ranked by DESeq2 Wald statistic. A significance threshold of adjusted p-value below 0.05 was applied to both analyses. Redundant GO terms were reduced using the simplify function with a similarity cutoff of [0.7].
Adjust the quantification paragraph for your approach: if you used featureCounts instead of Salmon, specify the strandedness setting, the feature type used for counting, and whether multi-mapping reads were included. If you used edgeR instead of DESeq2, specify the normalization method (TMM), the test (quasi-likelihood F-test or likelihood ratio test), and the dispersion estimation approach.
Required vs Optional Elements
The following table distinguishes what is mandatory for reproducibility from what is good practice but not universally required by reviewers.
| Element | Required | Goes in main text | Goes in supplementary | Notes |
|---|---|---|---|---|
| Tool name and version | Yes | Yes | Also in sessionInfo() | Every tool, no exceptions |
| Reference genome assembly | Yes | Yes | - | Assembly name and patch level |
| Annotation release | Yes | Yes | - | Ensembl or GENCODE release number |
| Key non-default parameters | Yes | Yes | Full list in supplement | State parameters that affect results |
| Design formula | Yes | Yes | - | Exact formula string |
| Pre-filtering criterion | Yes | Yes | - | e.g., min 10 counts across samples |
| Fold change shrinkage method | Yes | Yes | - | apeglm, ashr, or normal |
| Significance thresholds | Yes | Yes | - | Both padj and log2FC cutoffs |
| Multiple testing correction method | Yes | Yes | - | Must say Benjamini-Hochberg explicitly |
| Enrichment background | Yes | Yes | - | Tested universe, not full genome |
| Enrichment method (ORA vs GSEA) | Yes | Yes | - | Both if both were used |
| Custom scripts or code | Yes | - | Yes + repo link | GitHub or Zenodo DOI |
| Full sessionInfo() or conda env | Yes | - | Yes | Version record for all packages |
| All default parameters | No | No | Optional | Assumed from cited tool paper |
| Pipeline DAG or flowchart | No | Optional | Optional | Useful but not required |
| Raw QC metrics per sample | No | No | Yes | As supplementary table |
| Alignment rate per sample | No | No | Yes | As supplementary table |
Citing Tools Correctly
Every tool in your pipeline has a citable paper. The following are the primary citations for the most commonly used tools. Check that the version you used is covered by the citation and use a more recent paper if the tool has been substantially updated since its original publication.
STAR: Dobin et al., 2013, Bioinformatics.
Salmon: Patro et al., 2017, Nature Methods.
HISAT2: Kim et al., 2019, Nature Biotechnology.
featureCounts: Liao et al., 2014, Bioinformatics.
tximport: Soneson et al., 2015, F1000Research.
DESeq2: Love et al., 2014, Genome Biology.
edgeR: Robinson et al., 2010, Bioinformatics; Chen et al., 2016, F1000Research for the quasi-likelihood extension.
limma-voom: Law et al., 2014, Genome Biology; Ritchie et al., 2015, Nucleic Acids Research.
clusterProfiler: Yu et al., 2012, OMICS; Wu et al., 2021, Innovation for the updated version.
fgsea: Korotkevich et al., 2021, bioRxiv; the earlier version by Sergushichev, 2016.
fastp: Chen et al., 2018, Bioinformatics.
Trimmomatic: Bolger et al., 2014, Bioinformatics.
FastQC: Andrews, 2010, available at bioinformatics.babraham.ac.uk.
MultiQC: Ewels et al., 2016, Bioinformatics.
When in doubt, look up the tool on Bioconductor or its GitHub page. Most tools link their primary paper prominently. If a tool has been updated significantly since its original publication and you used a recent version, cite both the original paper and the methods paper describing the update.
The Final Pre-Submission Check
Before submitting, read your methods section against this checklist. It takes less than five minutes and will pre-empt the majority of reviewer revision requests on the computational side.
Every tool in the pipeline is named with its version number. The reference genome assembly and annotation release are specified by name and release number, not by organism alone. The design formula is written out explicitly. The filtering step before differential expression is described. The fold change shrinkage method is named. The significance thresholds state that p-values are adjusted, and name the correction method. The enrichment analysis specifies whether ORA or GSEA was used, what background was supplied, and what significance threshold was applied. Every tool is cited by its primary publication, not by URL alone. Custom analysis code is deposited in a repository with a DOI.
If every item on that list is present, your methods section will not come back from reviewers with requests for missing information. It may come back with requests to change the analysis, but that is a different kind of revision.
NotchBio generates a complete methods paragraph for every run automatically, including tool versions, parameter choices, reference genome version, and formatted citations for each tool used. If you are running analyses on the platform, the methods text is ready to paste into your manuscript the moment the run completes.
Related Reading
Further reading
Read another related post
How to Run DESeq2 in R: From Salmon Counts to DEG Results
Complete DESeq2 tutorial in R: import Salmon quant.sf files with tximeta, build a DESeqDataSet, run the Wald test, apply apeglm shrinkage, and export a ranked DEG table.
TutorialHow to Build a Counts Matrix from featureCounts and Salmon in Python
Python tutorial: parse featureCounts output, aggregate Salmon quant.sf files, build a tx2gene map from a GTF, round estimated counts, and save a DESeq2-ready integer count matrix with pandas.
TutorialHow to Run STAR Alignment for Bulk RNA-Seq (Step-by-Step)
Complete STAR alignment tutorial: download genome and GTF, build a genome index with the right sjdbOverhang, run paired-end alignment, generate GeneCounts, and load counts into R for DESeq2.