How can I convert the Ensembl identifier to a gene symbol in R?

I have a data.frame containing Ensembl identifiers in a single column; I would like to find the appropriate gene characters for the values โ€‹โ€‹of this column and add them to a new column in my data frame. I used bioMaRt but could not find any of the Ensembl IDs!

Here are my sample data ( df[1:2,] ):

 row.names organism gene 41 Homo-Sapiens ENSP00000335357 115 Homo-Sapiens ENSP00000227378 

and I want to get something like this

 row.names organism gene id 41 Homo-Sapiens ENSP00000335357 CDKN3 115 Homo-Sapiens ENSP00000227378 HSPA8 

and here is my code:

 library('biomaRt') mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl")) genes <- df$genes df$id <- NA G_list <- getBM(filters= "ensembl_gene_id", attributes= c("ensembl_gene_id", "entrezgene", "description"),values=genes,mart= mart) 

Then I get this when I check the G_list

 [1] ensembl_gene_id entrezgene description <0 rows> (or 0-length row.names) 

So I could not add G_list to my df! because there is nothing to add!

Thanks Advance,

+11
source share
1 answer

This is because the values โ€‹โ€‹in your gene column are not gene identifiers, but peptide identifiers (they start with ENSP). To get the information you need, try replacing ensembl_gene_id with ensembl_peptide_id :

 G_list <- getBM(filters = "ensembl_peptide_id", attributes = c("ensembl_peptide_id", "entrezgene", "description"), values = genes, mart = mart) 

In addition, what you are really looking for is hgnc_symbol

Here is the complete code to get your output:

 library('biomaRt') mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl")) genes <- df$genes df<-df[,-4] G_list <- getBM(filters= "ensembl_peptide_id", attributes= c("ensembl_peptide_id","hgnc_symbol"),values=genes,mart= mart) merge(df,G_list,by.x="gene",by.y="ensembl_peptide_id") 
+16
source

Source: https://habr.com/ru/post/982562/


All Articles