Getting started with GEOquery

Sean Davis

2026-06-13

GEOquery is the bridge between the NCBI Gene Expression Omnibus (GEO) and Bioconductor: it downloads and parses GEO records into Bioconductor objects. This page is a short quick-start and an index; the in-depth, narrative documentation lives in the articles listed below.

Install

if (!requireNamespace("BiocManager", quietly = TRUE)) {
    install.packages("BiocManager")
}
BiocManager::install("GEOquery")

Quick start

library(GEOquery)

# A GSE via the fast Series Matrix path -> a list of SummarizedExperiment,
# one per platform. (Pass returnType = "ExpressionSet" for the legacy class.)
gse <- getGEO("GSE2553")
se <- gse[[1]]
assay(se)      # expression matrix
colData(se)    # sample metadata
rowData(se)    # feature annotation

# Other entity types parse to GEOquery's S4 classes:
getGEO("GSM11805")   # a sample
getGEO("GPL96")      # a platform
getGEO("GDS507")     # a curated dataset

# See what supplementary files a study has, without downloading:
getGEOSuppFiles("GSE63137", fetch_files = FALSE)

In-depth articles

The articles go beyond the how to the why — the structure of GEO, the file formats, and how a GEOquery object connects to downstream Bioconductor workflows:

Getting help