Extract data from GEO2R
Gene Expression Omnibus (GEO) is a place where we can get publicly available datasets on gene expressions uploaded by the scientific community, for free. Here’s how we can extract data from the platform.
1. Visit https://www.ncbi.nlm.nih.gov/geo/
2. Search for keyword(s) of interest
Search for keyword(s) in search bar [1] and click on the search result for GEO DataSets Database [2].
3. Filter search results
Choose DataSets entry [3]. Select organisms you wish to investigate [4].
4. Choose dataset(s) to investigate
Click on the Series
hyperlink on the choosen dataset we wish to proeceed with.
5. Click on Analyze with GEO2R
6. Split samples into Control & Treatment group
To create a new group, click Define groups
and define the name of the group. Once group has been defined, select row(s) from the table to add to the group. Hold down shift to select multiple rows at once.
Once rows has been added to groups, you will see the group name under Group
column instead of -
.
After splitting into 2 (or more groups, if needed), your table should be similar to this:
7. Analyze sample data using GEO2R tool
Click on Analyze
to check out the samples data. You can download the full table in tsv file by clicking the Download full table
below.
References & Credits:
1. Andrew Gao’s Udemy Course: Gene Expression
2. NCBI Tutorials
3. NCBI GEO Overview
4. Saint Louis University: GEO Tutorial