Skip to content

Adding support for handling YAML Merge Key (#41) - proper keys comparison for merging#1279

Open
azat wants to merge 7 commits intojbeder:masterfrom
azat-ch:merge-operator
Open

Adding support for handling YAML Merge Key (#41) - proper keys comparison for merging#1279
azat wants to merge 7 commits intojbeder:masterfrom
azat-ch:merge-operator

Conversation

@azat
Copy link

@azat azat commented May 3, 2024

NOTE: this is based on #1243, but includes a fix for proper merging (by comparing the key itself, instead of shared_ptrs, see the last patch).

Support for YAML Merge keys ( <<: [*dict1, *dict2] ) is added. The merge
key is a specific scalar with value << (and tag !!merge) that implies
that during node construction, the map (or sequence of maps) are merged
into the current map. The priority rules are that each key from maps
within the value associated with << are added iff the key is not yet
present in the current map (and first map gets higher priority). Test
cases have been added accordingly.

nlescoua-bpk and others added 2 commits November 4, 2023 10:08
Support for YAML Merge keys ( <<: [*dict1, *dict2] ) is added. The merge
key is a specific scalar with value << (and tag !!merge) that implies
that during node construction, the map (or sequence of maps) are merged
into the current map. The priority rules are that each key from maps
within the value associated with  << are added iff the key is not yet
present in the current map (and first map gets higher priority). Test
cases have been added accordingly.
azat added 2 commits May 3, 2024 14:36
…the node)

The problem is that const_map_to.get(*j->first) compares only the
shared_ptr's, while we need to compare the key itself.

v2: remove const iterator
v3: fix use-after-free due to const reference
Consider the following YAML:

    trait1: &t1
      foo: 1

    trait2: &t2
      foo: 2

    merged:
      <<: *t1
      <<: *t2

yq reports:

    $ yq .merged.foo < /tmp/yaml
    2

while the order that yaml-cpp returns is different, since it will
firstly handle 1, and will not replace it with 2:

    $ util/parse < /tmp/yaml
    trait1:
      ? &1 foo
      : &2 1
    trait2:
      foo: 2
    merged:
      *1 : *2

(Don't mix up "*2" with "2", it is trait1)
Copy link
Collaborator

@SGSSGene SGSSGene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested Change:

  • MergeMapCollection must be static or in an anonymous namespace.

I think, the R-Value (&&)/std::move suggestion is a matter of taste. This library is not very strong on move semantics anyways.

Copy link
Collaborator

@SGSSGene SGSSGene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azat : I only see a few minor changes (which are a matter of taste).
@jbeder: I do not see why this should not be merged. Looks good to me.

Co-authored-by: Simon Gene Gottlieb <simon@gottliebtfreitag.de>
@SGSSGene
Copy link
Collaborator

@jbeder I think this is a great addition, but it falls under "feature" and would need your approval to be merged.
@azat could you resolves the conflicts of this branch?

@jbeder
Copy link
Owner

jbeder commented Mar 19, 2026

I've always been on the fence about this for two reasons:

  1. It's a nonstandard addition (not part of the official spec). So it feels funny to introduce it with no way to turn it off. Maybe an option or something, but we'd have to figure out the API. (is it YAML::Load(..., YAML:::WithMergeKeys)?)
  2. It relies on node equality, which is also nonstandard. For example, if the parent has a key 20, and you merge in a key 0x14, is that the same thing? What about "20"? Same question for 1, "1", true, True, etc? This is all very nonobvious to me.

What do you think?

@SGSSGene
Copy link
Collaborator

I've always been on the fence about this for two reasons:

1. It's a nonstandard addition (not part of the official spec). So it feels funny to introduce it with no way to turn it off. Maybe an option or something, but we'd have to figure out the API. (is it YAML::Load(..., YAML:::WithMergeKeys)?)

2. It relies on node equality, which is also nonstandard. For example, if the parent has a key 20, and you merge in a key 0x14, is that the same thing? What about "20"? Same question for 1, "1", true, True, etc? This is all very nonobvious to me.

What do you think?

All very good questions. I have to admit I have merged #1221 which has the same issue of defining what "equality" is.
Looking at the YAML-specs (Node Comparision) it heavily relies on the tag resolution, which I believe only has limited support in yaml-cpp. Currently it simply compares the scalar content. This is a different definition of 'equal' than the specs provide:
The current main differences are:

  1. Tags
    specs: if tags are different nodes are different
    yaml-cpp: we do not care about tags
    outcome: we would detect "NonUniqueKeys" where strictly speaking they should be unique.
    e.g.: { !!str 12: A, !!int 12: B } should be a map with two keys 12 and "12", we detect a NonUniqueKey.
  2. Cannonical Form
    specs: content must be compared in canonical form
    yaml-cpp: compares their string representation
    outcome: we do not detect "NonUniqueKeys" when their content is different but means the same:
    e.g.: {0x1: A, 1:B} should detect a NonUniqueKey, we see them as different values.

While incorrect, I think these are edge-cases and should raise serious concerns if relied on.
I think it is nice to have a parsing error for non unique keys, even if its not perfect.

@"Merge Keys": For some reason I didn't realize that the Merge Key proposal is just a working draft. I am assuming that @azat and the people giving " 👍" have a use case for it.
Maybe we can go for a compromise, we remove the detection of << and only trigger a merge node if the tag !!merge is given. This seems a lot easier than having another argument to LoadXYZ.

@"UniqueKeys" or "ComparableKeys" I think we should change it to be more standard conforming (not part of this PR). Maybe a comparision that only relies on the Failsafe Schema.
This is almost what we already do, it is only missing an additional check for the tags comparision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants