Find and replace characters before ":"

Question

Find and replace characters before ":"

I have a file containing a certain number of lines. Each line looks like this:

TF_list_to_test10004/Nus_k0.345_t0.1_e0.1.adj:PKMYT1

I would like to delete everything before the ":" symbol in order to save only PKMYT1, which is the name of the gene. Since I'm not an expert in regex scripts, can anyone help me do this using Unix (sed or awk) or in R?

+28

unix replace awk r sed

Elb Sep 06 '12 at 10:17

source share

9 answers

Simple regex used with gsub() :

 x <- "TF_list_to_test10004/Nus_k0.345_t0.1_e0.1.adj:PKMYT1" gsub(".*:", "", x) "PKMYT1"

See ?regex or ?gsub more details.

+10

Andrie Sep 06 '12 at 10:22

source share

In R. There are, of course, more than two ways. Here is another.

 unlist(lapply(strsplit(foo, ':', fixed = TRUE), '[', 2))

If the string is of constant length, I suppose substr will be faster than this or regular methods.

+9

John Sep 06 '12 at 11:59

source share

Using sed:

 sed 's/.*://' < your_input_file > output_file

This will replace everything followed by a colon, nothing will happen, so it will remove everything before and including the last colon on each line ( because * greedy by default ).

According to Josh O'Brien's comment, if you only want to replace the before and include the first colon, do the following:

 sed "s/[^:]*://"

This will correspond to all that is not a colon followed by a single colon and does not replace anything.

Note that for both of these patterns, they will stop at the first match on each line. If you want the replacement to be performed for each match in the string, add the ' g ' (global) parameter to the end of the command.

Also note that on linux (but not OSX) you can edit the file in place using -i for example:

 sed -i 's/.*://' your_file

+8

John carter Sep 06 '12 at 10:26

source share

You can use awk as follows:

 awk -F: '{print $2}' /your/file

+5

Costi ciudatu Sep 06 '12 at 10:31

source share

If you are using GNU coreutils , use cut :

 cut -d: -f2 infile

+2

Thor Sep 06 '12 at 12:49

source share

The following are two equivalent solutions:

The first uses the autosplit perl -a function to divide each line into fields using : filling in an array of fields F and printing the 2nd field $F[1] (counted starting from field 0)

 perl -F: -lane 'print $F[1]' file

The second uses a regular expression to replace s/// from ^ beginning of the line ^ .*: Any characters ending with a colon, without anything

 perl -pe 's/^.*://' file

0

Chris koknat Oct 9 '15 at 17:59

source share

I worked on a similar problem. Advice John and Josh O'Brien did the trick. I started with this question:

 library(dplyr) my_tibble <- tibble(Col1=c("ABC:Content","BCDE:MoreContent","FG:Conent:with:colons"))

Looks like:

  | Col1 1 | ABC:Content 2 | BCDE:MoreContent 3 | FG:Content:with:colons

I needed to create this tibet:

  | Col1 | Col2 | Col3 1 | ABC:Content | ABC | Content 2 | BCDE:MoreContent | BCDE | MoreContent 3 | FG:Content:with:colons| FG | Content:with:colons

And I did it with this code (R version 3.4.2).

 my_tibble2 <- mutate(my_tibble ,Col2 = unlist(lapply(strsplit(Col1, ':',fixed = TRUE), '[', 1)) ,Col3 = gsub("^[^:]*:", "", Col1))

0

Leslie Nov 30 '17 at 23:32

source share

Some very simple move that I skipped from @Sacha Epskamp's best answer was to use a subfunction, in this case take everything before the ":" (instead of deleting it), so it was very simple:

 foo <- "TF_list_to_test10004/Nus_k0.345_t0.1_e0.1.adj:PKMYT1" # 1st, as she did to remove all before and up to ":": gsub(".*:","",foo) # 2nd, to keep everything before and up to ":": gsub(":.*","",foo)

Basically, the same thing, just change the position ":" inside the sub argument. Hope this helps.

0

Carolina fagundes brinholi Jan 4 '18 at 17:45

source share

Sacha epskamp · Accepted Answer · 2012-09-06T10:23:14+0000

Here are two ways to do this in R:

 foo <- "TF_list_to_test10004/Nus_k0.345_t0.1_e0.1.adj:PKMYT1" # Remove all before and up to ":": gsub(".*:","",foo) # Extract everything behind ":": regmatches(foo,gregexpr("(?<=:).*",foo,perl=TRUE))

Find and replace characters before ":"

More articles: