Back to blog
Tutorial

How to Set Up a Bulk RNA-Seq Analysis Environment on Ubuntu and macOS

By Abdullah Shahid · · 13 min read

Most RNA-seq tutorials start with a pipeline diagram. This one does not.

Before you can run a single line of DESeq2 code, you need a working environment. That means conda, the right channels, a locked-down set of tool versions, and an R installation that can actually compile Bioconductor packages without failing at gfortran.

This first post in the series shows you exactly how to build that environment from a clean laptop. Linux, macOS Intel, and Apple Silicon are all covered. The goal is a reproducible setup you can rebuild from a single environment.yml file six months from now.

Diagram showing the bulk RNA-seq environment stack: OS layer at the bottom (Ubuntu, macOS Intel, macOS Apple Silicon), Miniforge and mamba as the package manager, conda-forge and bioconda as the channels, a named rnaseq environment containing FastQC, fastp, Salmon, STAR, samtools, and MultiQC on top, with a parallel R 4.4 layer containing DESeq2, tximeta, and edgeR from Bioconductor
Figure 1: The full RNA-seq analysis stack. Miniforge + mamba manage the conda environments. The conda-forge and bioconda channels supply the command-line tools. R 4.4 and Bioconductor supply the statistical packages for differential expression.

What You Need Before You Start

Your machine needs to meet a few minimum specs. Skip this check and you will hit a wall at the indexing step.

RAM: 16 GB minimum for Salmon. 64 GB for STAR with a human genome. If you have 8 GB, you can still do QC and DESeq2 but you cannot build a STAR index locally.

Disk: 100 GB free for a typical 6-sample human experiment. Reference files alone take around 30 GB.

OS: Ubuntu 22.04 or newer on Linux. macOS 13 Ventura or newer. Windows works only through WSL2.

Terminal: You need to be comfortable running commands in bash or zsh. If you have never opened Terminal, start with a shell tutorial first.

Apple Silicon users, read this first

If you have an M1, M2, M3, or M4 Mac, you must install the ARM64 version of Miniforge. Do not use the Intel installer, even with Rosetta. Bioconda has full ARM64 support since 2024, so most tools install natively. A few legacy R packages still need extra setup. We cover that below.

How to Install Miniforge on Linux and macOS

We use Miniforge instead of Anaconda or Miniconda. Three reasons.

Miniforge ships with conda-forge pre-configured and no Anaconda defaults channel. Since 2024, Anaconda Inc. requires a paid license for research institutions using their defaults channel. Miniforge sidesteps this entirely.

Miniforge also bundles mamba, which is a drop-in replacement for conda that resolves dependencies about 10x faster. For any non-trivial bioinformatics environment this matters a lot.

Linux x86_64 installation

Terminal window
# Download the installer
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh"
# Run it (accept defaults, answer yes when asked about conda init)
bash Miniforge3-Linux-x86_64.sh
# Close and reopen your terminal, then verify
conda --version
# conda 24.11.3
mamba --version
# mamba 1.5.10

macOS installation (Intel and Apple Silicon)

Terminal window
# Apple Silicon (M1/M2/M3/M4)
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-arm64.sh"
bash Miniforge3-MacOSX-arm64.sh
# Intel Mac
curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-MacOSX-x86_64.sh"
bash Miniforge3-MacOSX-x86_64.sh

Restart your terminal after the installer finishes.

Verify the installation. Your prompt should now show (base) at the start.

/Users/yourname/miniforge3/bin/conda
which conda
which mamba
# /Users/yourname/miniforge3/bin/mamba

Never install packages into the base environment

The (base) environment is for managing other environments, nothing else. Always create a named environment for your project. This keeps your setup reproducible and makes it easy to wipe and rebuild when something breaks.

How to Configure the Bioconda Channel Correctly

Bioconda is the channel that hosts most RNA-seq tools. Configure it once per machine.

The channel priority matters. Bioconda’s official docs recommend this exact order (Bioconda usage documentation).

Terminal window
conda config --add channels bioconda
conda config --add channels conda-forge
conda config --set channel_priority strict

The last command is the critical one. Strict priority prevents conda from mixing packages across channels in ways that break your environment.

Verify the configuration.

Terminal window
cat ~/.condarc

You should see:

channels:
- conda-forge
- bioconda
channel_priority: strict

Why conda-forge goes first, not bioconda

Older tutorials tell you to put bioconda on top. That recommendation is outdated. Bioconda depends on conda-forge for most of its base libraries. If bioconda has higher priority, conda will pull in older R and Python versions from bioconda that do not match newer tools. The current recommended order is conda-forge first, bioconda second, strict priority.

Two-panel diagram: left panel shows the old incorrect channel configuration with bioconda on top of conda-forge causing version conflicts in red; right panel shows the correct 2024 configuration with conda-forge on top of bioconda and strict priority set, rendered in green with a checkmark indicating the correct setup
Figure 2: Channel priority for Bioconda changed in 2024. conda-forge must be listed first (higher priority), bioconda second, with strict priority enabled. Older tutorials show the reversed order, which causes dependency conflicts.

How to Create a Reproducible RNA-Seq conda Environment with environment.yml

Now for the actual environment. We define it in a YAML file instead of running a series of conda install commands.

Why a file? Because in six months you will want to rebuild this exact environment on a new machine. A conda install history is lost the moment your terminal closes. An environment.yml is version-controlled and reproducible.

Create a file called rnaseq-env.yml:

name: rnaseq
channels:
- conda-forge
- bioconda
dependencies:
# Core Python
- python=3.11
- pip
# QC tools
- fastqc=0.12.1
- fastp=0.24.0
- multiqc=1.25.1
# Quantification and alignment
- salmon=1.10.3
- star=2.7.11b
- hisat2=2.2.1
# BAM handling
- samtools=1.21
- subread=2.0.8 # for featureCounts
# Data download
- sra-tools=3.1.1
- parallel=20240922 # GNU parallel for batch jobs
# Python data science stack
- pandas=2.2.3
- numpy=2.1.3
- scipy=1.14.1
- scikit-learn=1.5.2
- matplotlib=3.9.2
- seaborn=0.13.2
- plotly=5.24.1
# Python DEA
- pip:
- pydeseq2==0.5.4
- gseapy==1.1.4
- mygene==3.2.2

Every package is pinned to a specific version. This is intentional.

A pandas=2.2 pin means you always get the same minor version. When you publish your results, another lab can recreate your exact stack by running one command.

Create the environment

Terminal window
# Use mamba for speed (10x faster than conda here)
mamba env create -f rnaseq-env.yml
# Activate it
conda activate rnaseq
# Your prompt now shows (rnaseq) instead of (base)

This takes 5-15 minutes depending on your network. Mamba downloads and resolves everything in parallel.

Verify the tools work

Terminal window
fastqc --version # FastQC v0.12.1
fastp --version # fastp 0.24.0
salmon --version # salmon 1.10.3
STAR --version # 2.7.11b
samtools --version | head -1 # samtools 1.21
multiqc --version # multiqc, version 1.25.1

If any of these fail, see the error section at the end.

CategoryTools in the EnvironmentWhat They Do
QCFastQC, fastp, MultiQCRead quality checks and trimming
AlignmentSTAR, HISAT2Map reads to a reference genome
QuantificationSalmon, Subread (featureCounts)Count reads per gene
HandlingsamtoolsView, sort, and index BAM files
Data fetchsra-tools, GNU parallelDownload public FASTQ from SRA
Python analysispandas, scikit-learn, plotly, PyDESeq2, gseapyLoad counts, run DE and enrichment

How to Install R 4.4 and Bioconductor Packages for DESeq2

R goes in its own environment. Mixing R into the bioinformatics tools environment tends to cause library version conflicts.

Create a second YAML file, r-rnaseq.yml:

name: r-rnaseq
channels:
- conda-forge
- bioconda
dependencies:
- r-base=4.4.2
- r-essentials=4.4
- r-tidyverse=2.0.0
- r-ggplot2=3.5.1
- r-ggrepel=0.9.6
- r-patchwork=1.3.0
- r-pheatmap=1.0.12
- r-biocmanager=1.30.25
# Some Bioconductor packages are pre-built on bioconda
- bioconductor-deseq2=1.46.0
- bioconductor-tximeta=1.24.0
- bioconductor-tximport=1.34.0
- bioconductor-edger=4.4.0
- bioconductor-limma=3.62.1
- bioconductor-apeglm=1.28.0
- bioconductor-enhancedvolcano=1.24.0
- bioconductor-clusterprofiler=4.14.4
- bioconductor-org.hs.eg.db=3.20.0
- bioconductor-org.mm.eg.db=3.20.0

Create and activate:

Terminal window
mamba env create -f r-rnaseq.yml
conda activate r-rnaseq

Verify R and DESeq2 work:

Terminal window
R --version
# R version 4.4.2
R -e "library(DESeq2); sessionInfo()"
# Should print DESeq2 1.46.0 in the loaded packages list

Why install DESeq2 from bioconda instead of BiocManager

Installing DESeq2 through BiocManager::install("DESeq2") inside R works, but it compiles from source. That takes 20 to 40 minutes and fails on Apple Silicon without gfortran manually installed. The bioconductor-deseq2 package on bioconda is pre-compiled for your platform, so it installs in seconds. Use the R route only if you need a package not yet on bioconda.

Directory tree diagram showing the structure of two conda environments: rnaseq environment containing command-line tools like FastQC, fastp, Salmon, STAR, samtools, multiqc, and a PyDESeq2 python package tree; and a separate r-rnaseq environment containing R 4.4 with Bioconductor packages DESeq2, tximeta, edgeR, limma, and apeglm; with arrows showing that both environments live under miniforge3/envs/ and are activated with conda activate
Figure 3: Two parallel environments keep the command-line stack (conda activate rnaseq) separate from the R analysis stack (conda activate r-rnaseq). This isolation prevents dependency conflicts and makes each environment independently rebuildable.

Fixing the Most Common Install Errors

A few errors come up often enough to cover here. These are the ones that have eaten the most grad student hours.

”Unable to find conflicting dependencies”

This almost always means your channel priority is wrong. Run:

Terminal window
conda config --show channels

If you see defaults anywhere in the list, remove it:

Terminal window
conda config --remove channels defaults

Then rebuild the environment from the YAML file.

DESeq2 compilation fails on Apple Silicon

This happens when you install DESeq2 from inside R on an M-series Mac without gfortran. You will see an error like clang: error: linker command failed with exit code 1.

The fix: install DESeq2 through bioconda as shown above. If you must use BiocManager, install gfortran first from gcc.gnu.org/wiki/GFortranBinaries, then create a ~/.R/Makevars file pointing to it.

Packages install but R cannot find them

You activated the wrong environment. Run conda deactivate until your prompt shows (base), then conda activate r-rnaseq. The R in your PATH should then be the one inside the environment:

Terminal window
which R
# Should be /Users/yourname/miniforge3/envs/r-rnaseq/bin/R
# NOT /usr/local/bin/R or /Library/Frameworks/R.framework/...

Environment creation takes hours and never finishes

You are probably using conda env create instead of mamba env create. Cancel it with Ctrl+C and use mamba. If you already have conda but not mamba, install mamba into your base environment:

Terminal window
conda install -n base -c conda-forge mamba

macOS: “operation not permitted” when running tools

macOS Gatekeeper is blocking binaries from conda. First time you run each tool, you may need to right-click the binary in Finder and select Open. Usually happens once per binary, not repeatedly.

Do not mix pip install with conda install for the same package

Installing a package with pip after conda has already installed it creates two parallel copies. R imports get confused, Python imports break randomly. If a package is available in conda-forge or bioconda, always use conda. Reserve pip for packages like PyDESeq2 that are not yet on bioconda.

Exporting and Sharing Your Environment

Once your environment works, export it so you can rebuild or share it.

Terminal window
conda activate rnaseq
conda env export > rnaseq-env-lock.yml
# Or without the build strings (more portable across OS)
conda env export --no-builds > rnaseq-env.yml

Commit this file to your git repository. Anyone who clones your repo can rebuild the exact environment with:

Terminal window
mamba env create -f rnaseq-env.yml

That is the whole reason we pinned versions upfront. Reproducibility is not a bonus, it is the reason to use conda at all.

Manual Setup vs NotchBio: What You Actually Have to Maintain

Setting up this environment takes between one and three hours for someone who has done it before. Closer to a full day for someone who has not.

That is just the first time. You also have to maintain it. New versions of tools arrive every few months. A fresh macOS release sometimes breaks an installer. Bioconductor releases new versions of DESeq2 twice a year.

If you are running RNA-seq analyses regularly, this maintenance is part of the job. If you are running it occasionally or you just want to focus on the biology, it is pure overhead.

NotchBio handles the environment for you. The same tools covered in this post run on our infrastructure with pinned versions and verified compatibility. You upload FASTQ files or a count matrix, and the pipeline runs without you touching conda once.

Side-by-side comparison diagram: left side labeled Manual Setup shows a frustrated developer icon with a timeline of 2-4 hours including stages for Miniforge install, channel config, YAML creation, conda environment build, R install, and debugging errors, with a warning icon noting ongoing maintenance needed; right side labeled NotchBio shows a browser window icon with the text Upload FASTQ and Run, timing of 5 minutes with a green checkmark, and text noting that all tools, references, and versions are managed for you
Figure 4: Manual setup front-loads hours of work before you can analyze anything. NotchBio skips the entire stack setup so you go from FASTQ to results in minutes, with the same pipeline tools running on managed infrastructure.

The comparison below shows where each approach pays off.

DimensionManual Setup (This Tutorial)NotchBio
First-time setup1 to 3 hours, longer for beginners0 minutes
Install errorsYou debug conda conflicts, gfortran, channel priorityNone, stack is pre-built
OS supportLinux and macOS; Windows needs WSL2Any browser on any OS
Apple SiliconMostly works; a few R packages still need workaroundsFully supported out of the box
Version lockManual, via environment.yml you maintainBuilt in, every run is version-tagged
Tool updatesYou decide when to upgrade and rebuildManaged centrally with changelogs
ReproducibilityGood, if you commit your YAML fileGuaranteed, every run is reproducible
Time to first result4 to 8 hours for a new user5 minutes after upload
Maintenance costOngoing: ~1 day per month on active projectsZero
Good fit forCustom pipelines, advanced users, HPC clustersFast results, busy labs, teaching settings

If the setup above is costing you more time than the biology it supports, try notchbio.app instead. Upload your FASTQ files and the same tools from this tutorial run automatically on our infrastructure.

Further reading

Read another related post

View all posts