I am trying to write code to open all the data files in a folder, apply a function (or set of functions) to retrieve my data of interest. So far, so good. The problem is that I would like to rename one of the columns that I extract from each file using a single element of the file name, and it is difficult for me to determine how to extract it.
I have a bunch of files named "YYYY-MM-DD geneName data copy.txt" and would like to extract part of the name "geneName" in the file name. (For example, I have "2012-05-31 PMA1 data copy.txt".)
The date format is always the same (YYYY-MM-DD), and all file names end with "data copy.txt".
In addition, some file names contain additional annotation of the experiment (either "E (number)" or "Expt (number)" in the file name between the date and geneName (for example, "2012-05-21 E7 PMA1 data copy.txt "); others have an "SDM" between geneName and "data copy.txt".
Here are some file names and my desired result:
- 2012-05-31 CTN1 data copy.txt (want CTN1)
- 2012-05-21 E7 PMA1 data copy.txt (want "PMA1")
- 2011-11-29 TDH3 SDM data copy.txt (want "TDH3")
- 2012-01-04 POX1 data copy.txt (want "POX1")
Any thoughts on how I can do this without manually deleting the experiment number or βSDMβ from some files?
Thanks!
source share