Back to blog
Tutorial

Pathway Enrichment Analysis: GSEA and ORA in R and Python

By Abdullah Shahid · · 13 min read

A list of 1,200 differentially expressed genes is not a result. It is raw material.

Without pathway context, you cannot answer the biological question your experiment was designed to address. Is the treatment activating an inflammatory response? Suppressing a metabolic pathway? Inducing an epithelial-to-mesenchymal transition? Pathway enrichment analysis answers these questions by testing whether biologically coherent gene sets are systematically shifted in your data.

This tutorial covers two complementary approaches: Gene Set Enrichment Analysis (GSEA), which uses the full ranked gene list, and Over-Representation Analysis (ORA), which tests a significant DEG list against a background. Both are implemented in R with clusterProfiler and in Python with gseapy. We use the results from Blog 7 and Blog 8 as input.

Two-column overview diagram comparing GSEA and ORA approaches: left column labeled GSEA shows a ranked gene list with a running enrichment score curve below it, annotated with NES and leading edge genes; right column labeled ORA shows a Venn diagram of DEG list overlapping with a pathway gene set, with a 2x2 contingency table and Fisher's exact test notation below; center divider shows inputs with arrows pointing to each method
Figure 1: GSEA and ORA test different questions. GSEA asks whether a pathway is systematically enriched across the full ranked list of genes. ORA asks whether your significant DEG list contains more pathway members than expected by chance. Use GSEA when you have a continuous ranking metric. Use ORA when you have a hard DEG cutoff and want to interpret that specific list.

GSEA vs ORA: Which One to Use

People often run both and report whichever gives more impressive results. That is not the right approach.

GSEA requires no significance threshold. It uses every gene in your experiment, ranked by a continuous metric. It is more sensitive to subtle, coordinated shifts across a pathway. It is the preferred method when you want to capture changes in biological processes, not just the loudest individual genes.

ORA requires a defined DEG list. It asks whether the genes you called significant happen to cluster into known pathways more than random chance would predict. It is faster to interpret but loses information about genes just below the significance threshold. Use it when your DEG list has clear biological meaning you want to annotate.

In practice, GSEA with the full ranked list is the primary analysis. ORA on the significant DEGs is a useful complement for generating pathway hypotheses.

Which ranking metric to use for GSEA

The Wald test statistic (the stat column in DESeq2 results) is a better ranking metric than log2 fold change alone. It encodes both effect size and statistical confidence because it is the fold change divided by its standard error. Genes with large, well-supported fold changes rank higher than genes with large but noisy fold changes. Always use stat for ranking when it is available.

How to Run GSEA in R with clusterProfiler and MSigDB Hallmark Gene Sets

Install required packages

# Install once
BiocManager::install(c("clusterProfiler", "enrichplot", "org.Hs.eg.db"))
install.packages("msigdbr")

Load the packages and results

library(clusterProfiler)
library(enrichplot)
library(msigdbr)
library(org.Hs.eg.db)
library(ggplot2)
library(dplyr)
library(readr)
# Load DESeq2 results from Blog 5
res <- read_csv("results/deseq2/all_genes.csv") |>
filter(!is.na(stat), !is.na(symbol))

Build the ranked gene list

# Use the Wald stat as ranking metric
# Strip version numbers from Ensembl IDs
res <- res |>
mutate(gene_id_clean = sub("\\..*", "", gene_id))
# Map Ensembl IDs to Entrez IDs
# clusterProfiler needs Entrez for most MSigDB lookups
id_map <- bitr(
res$gene_id_clean,
fromType = "ENSEMBL",
toType = "ENTREZID",
OrgDb = org.Hs.eg.db
)
res_mapped <- res |>
left_join(id_map, by = c("gene_id_clean" = "ENSEMBL")) |>
filter(!is.na(ENTREZID)) |>
distinct(ENTREZID, .keep_all = TRUE)
# Named numeric vector: names = Entrez IDs, values = Wald stat
ranked_list <- res_mapped$stat
names(ranked_list) <- res_mapped$ENTREZID
ranked_list <- sort(ranked_list, decreasing = TRUE)
cat("Ranked list length:", length(ranked_list), "\n")
# Typically 15,000-19,000 genes with Entrez mappings

Fetch MSigDB Hallmark gene sets

# MSigDB Hallmark gene sets: 50 curated pathways
# Each represents a specific biological state or process
h_gene_sets <- msigdbr(species = "Homo sapiens", category = "H") |>
dplyr::select(gs_name, entrez_gene) |>
dplyr::rename(ont = gs_name, gene = entrez_gene)
# Alternatively, use C2 (KEGG, Reactome, Biocarta):
# c2_gene_sets <- msigdbr(species = "Homo sapiens", category = "C2")
# Or GO Biological Processes:
# c5_gene_sets <- msigdbr(species = "Homo sapiens", category = "C5", subcategory = "BP")
cat("Hallmark gene sets loaded:", length(unique(h_gene_sets$ont)), "\n")
# 50 pathways

Run GSEA

set.seed(2024)
gsea_result <- GSEA(
geneList = ranked_list,
TERM2GENE = h_gene_sets,
minGSSize = 15, # minimum genes in a set after filtering
maxGSSize = 500, # maximum genes per set
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
eps = 0, # more precise p-values (slower but important)
seed = TRUE,
verbose = FALSE
)
# View significant pathways
as.data.frame(gsea_result) |>
arrange(p.adjust) |>
dplyr::select(ID, NES, pvalue, p.adjust, setSize, core_enrichment) |>
head(10)

The key columns in the output are NES (normalized enrichment score), p.adjust (BH-corrected p-value), and core_enrichment (the leading edge genes).

A positive NES means the pathway is enriched in the up-regulated direction. A negative NES means enrichment in the down-regulated direction.

What NES means and what it does not mean

The normalized enrichment score corrects for gene set size, so you can compare NES values across pathways with different numbers of genes. However, you cannot meaningfully compare NES values across different experiments or different gene set collections. Always report both NES and the adjusted p-value. Never rank pathways by NES alone.

Visualize with dotplot and ridgeplot

library(enrichplot)
# Dotplot of top enriched pathways
p_dot <- dotplot(
gsea_result,
showCategory = 15,
x = "NES",
color = "p.adjust",
size = "setSize",
label_format = 40
) +
scale_color_gradient(low = "#0D9488", high = "#F97316",
name = "Adjusted p") +
labs(title = "GSEA: MSigDB Hallmark (Treated vs Control)") +
theme_minimal(base_size = 11)
ggsave("results/figures/gsea_dotplot.pdf",
p_dot, width = 8, height = 7, dpi = 300, device = "pdf")
Horizontal dotplot from clusterProfiler GSEA showing 15 enriched MSigDB Hallmark pathways on the y-axis and NES on the x-axis; dots are colored from teal for low adjusted p-value to orange for high adjusted p-value; dot size represents gene set size; pathways like HALLMARK_INTERFERON_GAMMA_RESPONSE and HALLMARK_INFLAMMATORY_RESPONSE are at the right with positive NES values; HALLMARK_G2M_CHECKPOINT and HALLMARK_E2F_TARGETS are at the left with negative NES values; a vertical dashed line marks NES equals zero
Figure 2: GSEA dotplot from clusterProfiler. Pathways to the right of zero are enriched in the up-regulated direction. Pathways to the left are enriched in the down-regulated direction. Dot color shows significance (adjusted p-value) and size shows how many genes from that set were in your ranked list.

How to Run Over-Representation Analysis (ORA) with enrichGO

ORA tests whether a list of significant genes is enriched for specific GO terms compared to the background of all tested genes.

# Significant DEGs only
sig_genes <- res_mapped |>
filter(padj < 0.05, abs(log2FoldChange) > 1) |>
pull(ENTREZID)
# Background: all genes that were tested (not just significant ones)
background_genes <- res_mapped$ENTREZID
cat("Significant DEGs for ORA:", length(sig_genes), "\n")
cat("Background genes:", length(background_genes), "\n")
# Run GO Biological Process over-representation analysis
ora_result <- enrichGO(
gene = sig_genes,
universe = background_genes,
OrgDb = org.Hs.eg.db,
ont = "BP", # Biological Process; use "MF" or "CC" for others
pvalueCutoff = 0.05,
pAdjustMethod = "BH",
qvalueCutoff = 0.2,
readable = TRUE # converts Entrez IDs to gene symbols in output
)
# Simplify redundant GO terms
ora_simple <- simplify(ora_result, cutoff = 0.7, by = "p.adjust")
# View top results
as.data.frame(ora_simple) |>
arrange(p.adjust) |>
dplyr::select(Description, GeneRatio, BgRatio, p.adjust, geneID) |>
head(10)

Always provide a background gene list for ORA

The default background in enrichGO() is all genes in the organism database, which is around 20,000 for human. Your actual background should be the genes you actually tested, not all genes that exist. If you only measured 15,000 genes due to low-count filtering, using all 20,000 as background overstates statistical significance. Always set universe = background_genes explicitly.

How to Run GSEA in Python with gseapy

GSEApy is a comprehensive Python package for gene set enrichment analysis. Its core is written in Rust for performance, and it provides six tools: gsea for full GSEA, prerank for pre-ranked lists, ssgsea for single sample GSEA, enrichr for over-representation analysis, and several others.

Install it if you have not already:

Terminal window
pip install gseapy

Build the ranked gene list

import pandas as pd
import numpy as np
import gseapy as gp
# Load PyDESeq2 results from Blog 6
res = pd.read_csv("results/pydeseq2/all_genes.csv", index_col=0)
res = res.dropna(subset=["stat", "symbol"])
# Use Wald statistic for ranking
# The stat column in PyDESeq2 matches the stat column in R DESeq2
ranked = (
res[["symbol", "stat"]]
.dropna()
.drop_duplicates(subset="symbol", keep="first")
.sort_values("stat", ascending=False)
.set_index("symbol")["stat"]
)
print(f"Ranked list length: {len(ranked)}")
print(ranked.head())
# IFNG 18.43
# STAT1 16.72
# IRF1 15.89
# ISG15 14.34
# IFIT1 13.91

Run pre-ranked GSEA with Hallmark gene sets

# gseapy can pull gene sets directly from MSigDB or Enrichr
# No need to download GMT files manually
gsea_result = gp.prerank(
rnk = ranked,
gene_sets = "MSigDB_Hallmark_2020", # fetches from Enrichr automatically
threads = 4,
min_size = 15,
max_size = 500,
permutation_num = 1000,
outdir = "results/gsea_prerank",
seed = 2024,
verbose = False
)
# Access the results DataFrame
result_df = gsea_result.res2d.sort_values("FDR q-val")
print(result_df[["Term","NES","NOM p-val","FDR q-val","Lead_genes"]].head(10))

Available gene set libraries you can pass as a string include: "MSigDB_Hallmark_2020", "KEGG_2021_Human", "Reactome_2022", "GO_Biological_Process_2023". The library name is passed to the Enrichr API and the current list can be retrieved with gp.get_library_name().

Visualize the enrichment score plot for one pathway

# Enrichment plot for a specific pathway
terms = gsea_result.res2d.Term.tolist()
# Find the most significant pathway
top_term = result_df.iloc[0]["Term"]
ax = gp.gseaplot(
rank_metric = gsea_result.ranking,
term = top_term,
**gsea_result.results[top_term],
figsize = (6, 4)
)
ax[0].figure.savefig("results/figures/gsea_enrichment_plot.pdf",
dpi=300, bbox_inches="tight")
GSEA enrichment score plot for HALLMARK_INTERFERON_GAMMA_RESPONSE: top panel shows a green running enrichment score curve that rises sharply from the left, reaches a peak NES of 2.34 around one third of the way through the ranked list, then gradually falls; middle panel shows a black tick mark for every gene in the set, densely clustered at the left side of the ranked list corresponding to up-regulated genes; bottom panel shows a gradient from orange on the left for highly ranked genes to blue on the right for down-ranked genes, representing the ranking metric
Figure 3: GSEA enrichment score plot for a top-enriched pathway. The green curve rising sharply at the left indicates the pathway genes are disproportionately clustered among the highest-ranked genes in the experiment. The tick marks in the middle panel show exactly which genes are in the set and where they fall in the ranked list.

How to Run ORA in Python with gseapy.enrichr

ORA in Python does not require a ranked list. Pass your significant gene symbols directly to gseapy.enrichr.

# Significant DEG symbols
sig_symbols = (
res.query("padj < 0.05 and abs(log2FoldChange) > 1")
["symbol"]
.dropna()
.tolist()
)
print(f"ORA input: {len(sig_symbols)} significant genes")
# Run ORA against GO Biological Process and Hallmark
enr_result = gp.enrichr(
gene_list = sig_symbols,
gene_sets = ["GO_Biological_Process_2023", "MSigDB_Hallmark_2020"],
organism = "human",
outdir = "results/ora_enrichr",
cutoff = 0.05
)
# Access results
enr_df = enr_result.results.sort_values("Adjusted P-value")
print(enr_df[["Gene_set","Term","Overlap","Adjusted P-value","Genes"]].head(10))

Unlike GSEA, enrichr hits the Enrichr web API and returns results within seconds. An internet connection is required.

Export and summarize results

# Save all enrichment results to CSV
result_df.to_csv("results/gsea_prerank/gsea_hallmark_results.csv", index=False)
enr_df.to_csv("results/ora_enrichr/ora_results.csv", index=False)
# Quick dotplot with gseapy
ax = gp.dotplot(
gsea_result.res2d,
column = "FDR q-val",
title = "GSEA: MSigDB Hallmark",
cmap = "RdYlGn_r",
size = 6,
top_term = 15,
figsize = (6, 8)
)
ax.figure.savefig("results/figures/gsea_dotplot_python.pdf",
dpi=300, bbox_inches="tight")

Comparing GSEA and ORA Results

Running both methods together gives you a more complete picture.

GSEA captures gradual shifts in pathway activity that may not reach the DEG threshold. A pathway where 80 of its 120 genes show modest but consistent up-regulation will score well in GSEA but have no genes in your DEG list at padj < 0.05.

ORA captures the pathways most concentrated in your strongest DEGs. If five of your top 20 significant genes are interferon-stimulated genes, ORA flags interferon signaling even if the pathway is not strongly enriched across the entire ranked list.

Comparison dimensionGSEAORA
InputFull ranked gene list (all tested genes)Significant DEG list only
Ranking metricWald stat, log2FC, or any continuous scoreNot applicable
Threshold sensitivityNone: uses all genesDepends heavily on padj/LFC cutoffs
Best forDetecting coordinated pathway shiftsAnnotating your significant hit list
Reference setsMSigDB Hallmark, KEGG, Reactome, GOSame sets
Key outputNES, leading edge genesOverlap, p-value, gene list
R implementationclusterProfiler::GSEA(), fgsea::fgsea()clusterProfiler::enrichGO()
Python implementationgseapy.prerank()gseapy.enrichr()
Recommended backgroundN/A: all genes are usedAll tested genes, not the organism genome

A practical workflow is to run GSEA first to identify which biological themes are systematically perturbed, then use ORA to confirm those themes in your high-confidence DEG list. When the two methods agree, that is strong evidence. When they disagree, it usually means either your significance cutoff is too stringent for the pathway signal you are chasing, or a single outlier gene is driving the ORA result.

Manual Enrichment Pipeline vs NotchBio

Running enrichment analysis end-to-end takes around an hour: bitr mapping for Entrez IDs, choosing between GSEA and ORA, downloading the right MSigDB collection, tuning the min/max gene set sizes, and generating the dotplots.

NotchBio runs both GSEA and ORA automatically as part of every differential expression report. Results include Hallmark, KEGG, Reactome, and GO Biological Process for each comparison.

StepManual (R or Python)NotchBio
Install clusterProfiler / gseapyBiocManager + conda, 5-10 minNot required
Map gene IDsbitr() in R, mygene in PythonAutomatic
Build ranked listSort by stat, handle NA, deduplicateAutomatic
Fetch gene setsmsigdbr() or Enrichr APIPre-loaded for human, mouse, rat
Run GSEAGSEA() or gseapy.prerank()Automatic for Hallmark, KEGG, GO
Run ORAenrichGO() or gseapy.enrichr()Automatic
Simplify redundant termssimplify() or manual filteringAutomated term clustering
Generate dotplotsdotplot() + ggsave() at 300 dpiIncluded in report
Export resultsManual CSV exportsDownload from dashboard
Time to enrichment results45-90 min5 min after DE analysis

If you want pathway enrichment results without managing Bioconductor package installs or Enrichr API calls, notchbio.app runs the full GSEA and ORA pipeline as part of every differential expression analysis.

Further reading

Read another related post

View all posts