Elixir - How to split a string into a list of 3 characters

if I have the string "UGGUGUUAUUAAUGGUUU" , how to turn it into a list, broken every 3 characters into ["UGG", "UGU", "UAU", "UAA", "UGG", "UUU"] ?

+5
source share
5 answers

If your string contains only ASCII characters, and your byte_size string byte_size multiple of 3, there is a really elegant solution using the lesser-known Elixir function: binary recognition methods:

 iex(1)> string = "UGGUGUUAUUAAUGGUUU" "UGGUGUUAUUAAUGGUUU" iex(2)> for <<x::binary-3 <- string>>, do: x ["UGG", "UGU", "UAU", "UAA", "UGG", "UUU"] 

This splits the string into pieces of 3 bytes. This will be much faster than splitting on code points or graphemes, but will not work correctly if your string contains non-ASCII characters. (In that case, I would go with @michalmuskala's answer.)

Edit: Patrick Oscity's answer reminded me that this could also work for code pages:

 iex(1)> string = "αβγδεζηθικλμνξοπρςστυφχψ" "αβγδεζηθικλμνξοπρςστυφχψ" iex(2)> for <<a::utf8, b::utf8, c::utf8 <- string>>, do: <<a::utf8, b::utf8, c::utf8>> ["αβγ", "δεζ", "ηθι", "κλμ", "νξο", "πρς", "στυ", "φχψ"] 
+14
source
 "UGGUGUUAUUAAUGGUUU" |> String.codepoints |> Enum.chunk(3) |> Enum.map(&Enum.join/1) 

I am also wondering if there is a more elegant version

+10
source

This can be achieved using the Stream.unfold/2 function. In a way, this is the opposite of reduce - reduce, it allows us to dump the collection into one value, expands to expand one value into the collection.

As a generator for Stream.unfold/2 we need a function that returns a tuple. The first element is the next member of the generated collection, and the second is the drive, which we are going to pass to the next iteration. This accurately describes the String.split_at/2 function. Finally, we need a termination condition - String.split_at("", 3) will return {"", ""} . We are not interested in empty lines, so it should be enough to process our generated stream until we meet an empty line - this can be achieved with Enum.take_while/2 .

 string |> Stream.unfold(&String.split_at(&1, 3)) |> Enum.take_while(&(&1 != "")) 
+7
source

Another option: Regex.scan/2 :

 iex> string = "abcdef" iex> Regex.scan(~r/.{3}/, string) [["abc"], ["def"]] # In case the number of characters is not evenly divisible by 3 iex> string = "abcdefg" iex> Regex.scan(~r/.{1,3}/, string) [["abc"], ["def"], ["g"]] # If you need to handle unicode characters, you can add the `u` modifier iex> string = "🙈🙉🙊abc" iex> Regex.scan(~r/.{1,3}/u, string) [["🙈🙉🙊"], ["abc"]] 

Or using a recursive function that is a bit verbose, but if IMO is the best solution, using a pending evaluation:

 defmodule Split do def tripels(string), do: do_tripels(string, []) defp do_tripels(<<x::utf8, y::utf8, z::utf8, rest::binary>>, acc) do do_tripels(rest, [<<x::utf8, y::utf8, z::utf8>> | acc]) end defp do_tripels(_rest, acc) do Enum.reverse(acc) end end # in case you actually want the rest in the result, change the last clause to defp do_tripels(rest, acc) do Enum.reverse([rest | acc]) end 
+3
source

Try

 List.flatten(Regex.scan(~r/.../, "UGGUGUUAUUAAUGGUUU")) 

You'll get

 ["UGG", "UGU", "UAU", "UAA", "UGG", "UUU"] 

Source from the documentation:

scanning method

smoothing method

+2
source

Source: https://habr.com/ru/post/1265994/


All Articles