How to combine YAML arrays?

Question

How to combine YAML arrays?

I would like to combine arrays in YAML and load them through ruby -

some_stuff: &some_stuff - a - b - c combined_stuff: <<: *some_stuff - d - e - f

I would like to have a combined array like [a,b,c,d,e,f]

I get an error: I did not find the expected key when parsing a block display

How to combine arrays in YAML?

+73

list data-structures yaml

lfender6445 Jun 06 '14 at 20:33

source share

5 answers

dreftymac · Answer 1 · 2015-06-11 02:51

Update: 2019-07-01 14:06:12

Note : another answer to this question has been substantially edited to update alternative approaches .
- This updated answer mentions an alternative to a workaround in this answer. It has been added to the See Also section below.

context

This post assumes the following context:

Python 2.7
python yaml parser

problem

lfender6445 wants to merge two or more lists in a YAML file, and these parsed lists will appear as a single list when parsed.

Solution (Workaround)

This can be achieved simply by assigning YAML bindings to the mappings, where the required lists appear as children of the mappings. However, there are some warnings (see Pitfalls below).

In the example below, we have three mappings ( list_one, list_two, list_three ) and three list_one, list_two, list_three and aliases that reference these mappings where necessary.

When the YAML file is loaded into the program, we get the list that we need, but it may require a little modification after loading (see Pitfalls below).

example

Original YAML File

   list_one: & id001
    - a
    - b
    - c

   list_two: & id002
    - e
    - f
    - g

   list_three: & id003
    - h
    - i
    - j

   list_combined:
       - * id001
       - * id002
       - * id003

Result after YAML.safe_load

 ## list_combined
   [
     [
       "a"
       "b"
       c
     ],
     [
       e
       f
       g
     ],
     [
       h
       "i"
       "j"
     ]
   ]

Trap

this approach creates a nested list of lists that may not correspond to the desired result, but may be subjected to further processing using the flatten method
the usual YAML anchor and alias clauses relate to uniqueness and order of declaration

Conclusion

This approach allows you to create merged lists using an alias and YAML binding function.

Although the output is a nested list of lists, it can be easily converted using the flatten method.

where s1 , s2 , s3 are the bindings to the sequences (not shown) that you want to combine into a new sequence, and then d , e and f are added to it. But YAML first resolves the depth of structures of this type, so there is no real context in the process of processing a merge key. There is no array / list available to you to which you could attach the processed value (bound sequence).

You can take advantage of the approach suggested by @dreftymac, but it has a huge drawback in that you need to somehow know which nested sequences you need to smooth out (i.e. Knowing the "path" from the root of the loaded data structure to the parent sequences), or that you recursively look at the loaded data structure in search of nested arrays / lists and indiscriminately smooth them all.

The best IMO solution would be to use tags to load data structures that do the alignment for you. This allows you to clearly indicate what needs to be aligned and what is not, and gives you full control over whether this alignment is performed at boot time or during access. Which one to choose depends on the ease of implementation and effectiveness in time and space for storage. This is the same compromise that must be made to implement the merge function , and there is no single solution that would always be the best.

For example, my ruamel.yaml library uses brute force merges at boot time using its secure loader, which leads to merged dictionaries, which are regular Python dictates. This combination must be done in advance and duplicates the data (inefficient space), but quickly in the search for values. When using a circular loader, you want to be able to unload the drains without combining, so they must be stored separately. A dictaphone similar to a data structure loaded as a result of circular loading is space-efficient, but slower in access, as it should try to find a key not found in the dict itself in merges (and this is not cached, so you need to do each time). Of course, such considerations are not very important for relatively small configuration files.

The following implements a merge-like scheme for lists in python using objects with the flatten tag which on the fly return to elements that are lists and are marked as toflatten . Using these two tags, you can get the YAML file:

 l1: &x1 !toflatten - 1 - 2 l2: &x2 - 3 - 4 m1: !flatten - *x1 - *x2 - [5, 6] - !toflatten [7, 8]

(the use of the flow versus block sequence is completely arbitrary and does not affect the loaded result).

When iterating over the elements that are the value for the m1 key, it "repeats" in the sequence marked toflatten , but displays the other lists (with an alias or not) as one element.

One possible way to achieve this is with Python code:

 import sys from pathlib import Path import ruamel.yaml yaml = ruamel.yaml.YAML() @yaml.register_class class Flatten(list): yaml_tag = u'!flatten' def __init__(self, *args): self.items = args @classmethod def from_yaml(cls, constructor, node): x = cls(*constructor.construct_sequence(node, deep=True)) return x def __iter__(self): for item in self.items: if isinstance(item, ToFlatten): for nested_item in item: yield nested_item else: yield item @yaml.register_class class ToFlatten(list): yaml_tag = u'!toflatten' @classmethod def from_yaml(cls, constructor, node): x = cls(constructor.construct_sequence(node, deep=True)) return x data = yaml.load(Path('input.yaml')) for item in data['m1']: print(item)

what conclusions:

 1 2 [3, 4] [5, 6] 7 8

As you can see, you can see that in a sequence that requires alignment, you can use an alias for the labeled sequence, or you can use the labeled sequence. YAML does not allow you to do:

 - !flatten *x2

those. mark the bound sequence, as this will essentially make it a different data structure.

Using explicit IMO tags is better than some kind of magic, as with YAML << merge keys. If nothing else, you now need to go through the hoops if you have a YAML file with a mapping that has a key << that you don’t want to act as a merge key, for example, when you map C statements to their English descriptions (or other natural language).

Jorge Leitao · Answer 3 · 2019-07-25 19:38

You can achieve this as follows:

 # note: no dash before commands some_stuff: &some_stuff |- a b c combined_stuff: - *some_stuff - d - e - f

I used this on my gitlab-ci.yml (to answer @ rink.attendant.6 comment on this question).

Tamlyn · Answer 4 · 2019-05-16 14:37

If you need to combine only one item in a list, you can do

 fruit: - &banana name: banana colour: yellow food: - *banana - name: carrot colour: orange

which gives

 fruit: - name: banana colour: yellow food: - name: banana colour: yellow - name: carrot colour: orange

sm4rk0 · Answer 5 · 2018-01-19 13:35

You can combine the mappings, and then convert their keys to a list, under these conditions:

If you use jinja2 templates and
if the order of the elements is not important

 some_stuff: &some_stuff a: b: c: combined_stuff: <<: *some_stuff d: e: f: {{ combined_stuff | list }}

How to combine YAML arrays?

Update: 2019-07-01 14:06:12

context

problem

Solution (Workaround)

example

Original YAML File

Result after YAML.safe_load

Trap

Conclusion

see also

Updated Alternative Approach @Anthon

`flatten` method `flatten`

More articles:

How to combine YAML arrays?

Update: 2019-07-01 14:06:12

context

problem

Solution (Workaround)

example

Original YAML File

Result after YAML.safe_load

Trap

Conclusion

see also

Updated Alternative Approach @Anthon

flatten method flatten

More articles:

`flatten` method `flatten`