Extract data from GEO2R

geo
ncbi
gene expression
bioinformatics
Published

August 31, 2024

Gene Expression Omnibus (GEO) is a place where we can get publicly available datasets on gene expressions uploaded by the scientific community, for free. Here’s how we can extract data from the platform.

1. Visit https://www.ncbi.nlm.nih.gov/geo/

image.png

2. Search for keyword(s) of interest

Search for keyword(s) in search bar [1] and click on the search result for GEO DataSets Database [2].

image.png

3. Filter search results

Choose DataSets entry [3]. Select organisms you wish to investigate [4].

image.png

4. Choose dataset(s) to investigate

Click on the Series hyperlink on the choosen dataset we wish to proeceed with.

image.png

5. Click on Analyze with GEO2R

image.png

6. Split samples into Control & Treatment group

To create a new group, click Define groups and define the name of the group. Once group has been defined, select row(s) from the table to add to the group. Hold down shift to select multiple rows at once.

Once rows has been added to groups, you will see the group name under Group column instead of -.

image.png

image.png

image.png

After splitting into 2 (or more groups, if needed), your table should be similar to this:

image.png

7. Analyze sample data using GEO2R tool

Click on Analyze to check out the samples data. You can download the full table in tsv file by clicking the Download full table below.

image.png

image-2.png

References & Credits:
1. Andrew Gao’s Udemy Course: Gene Expression
2. NCBI Tutorials
3. NCBI GEO Overview
4. Saint Louis University: GEO Tutorial