How to remove duplicate lines in YAML format configuration files?











up vote
1
down vote

favorite












I have a bunch of manifest/yaml files that may or may not have these key value pair duplicates:



...
app: activity-worker
app: activity-worker
...


I need to search through each of those files and find those duplicates so that I can remove one of them.



Note: I know that to replace a certain string (say, switch service: to app:) in all files of a directory (say, dev) I can run grep -l 'service:' dev/* | xargs sed -i "" 's/service:/app:/g'. I'm looking for a relation between lines.










share|improve this question




















  • 1




    Please provide more context and some examples of what you're trying to accomplish
    – John
    Nov 10 at 1:10










  • @bishop okay I'm open to any other way
    – v1mg1rl
    Nov 10 at 1:13










  • Only thing I've tried to figure out is how to find the duplicate lines. I've tried sort fileName | uniq -d but this command showed me a little more than the duplicate lines I needed.
    – v1mg1rl
    Nov 10 at 1:18






  • 3




    There is no such regex. You're looking for a relation between lines, not a pattern in a single line.
    – melpomene
    Nov 10 at 1:25






  • 1




    I have focused your question's title and tags; and removed the please and thank you noise. You may roll these edits back if they do not match your intent.
    – bishop
    Nov 10 at 1:29















up vote
1
down vote

favorite












I have a bunch of manifest/yaml files that may or may not have these key value pair duplicates:



...
app: activity-worker
app: activity-worker
...


I need to search through each of those files and find those duplicates so that I can remove one of them.



Note: I know that to replace a certain string (say, switch service: to app:) in all files of a directory (say, dev) I can run grep -l 'service:' dev/* | xargs sed -i "" 's/service:/app:/g'. I'm looking for a relation between lines.










share|improve this question




















  • 1




    Please provide more context and some examples of what you're trying to accomplish
    – John
    Nov 10 at 1:10










  • @bishop okay I'm open to any other way
    – v1mg1rl
    Nov 10 at 1:13










  • Only thing I've tried to figure out is how to find the duplicate lines. I've tried sort fileName | uniq -d but this command showed me a little more than the duplicate lines I needed.
    – v1mg1rl
    Nov 10 at 1:18






  • 3




    There is no such regex. You're looking for a relation between lines, not a pattern in a single line.
    – melpomene
    Nov 10 at 1:25






  • 1




    I have focused your question's title and tags; and removed the please and thank you noise. You may roll these edits back if they do not match your intent.
    – bishop
    Nov 10 at 1:29













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have a bunch of manifest/yaml files that may or may not have these key value pair duplicates:



...
app: activity-worker
app: activity-worker
...


I need to search through each of those files and find those duplicates so that I can remove one of them.



Note: I know that to replace a certain string (say, switch service: to app:) in all files of a directory (say, dev) I can run grep -l 'service:' dev/* | xargs sed -i "" 's/service:/app:/g'. I'm looking for a relation between lines.










share|improve this question















I have a bunch of manifest/yaml files that may or may not have these key value pair duplicates:



...
app: activity-worker
app: activity-worker
...


I need to search through each of those files and find those duplicates so that I can remove one of them.



Note: I know that to replace a certain string (say, switch service: to app:) in all files of a directory (say, dev) I can run grep -l 'service:' dev/* | xargs sed -i "" 's/service:/app:/g'. I'm looking for a relation between lines.







unix duplicates yaml






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 10 at 1:34

























asked Nov 10 at 1:07









v1mg1rl

358




358








  • 1




    Please provide more context and some examples of what you're trying to accomplish
    – John
    Nov 10 at 1:10










  • @bishop okay I'm open to any other way
    – v1mg1rl
    Nov 10 at 1:13










  • Only thing I've tried to figure out is how to find the duplicate lines. I've tried sort fileName | uniq -d but this command showed me a little more than the duplicate lines I needed.
    – v1mg1rl
    Nov 10 at 1:18






  • 3




    There is no such regex. You're looking for a relation between lines, not a pattern in a single line.
    – melpomene
    Nov 10 at 1:25






  • 1




    I have focused your question's title and tags; and removed the please and thank you noise. You may roll these edits back if they do not match your intent.
    – bishop
    Nov 10 at 1:29














  • 1




    Please provide more context and some examples of what you're trying to accomplish
    – John
    Nov 10 at 1:10










  • @bishop okay I'm open to any other way
    – v1mg1rl
    Nov 10 at 1:13










  • Only thing I've tried to figure out is how to find the duplicate lines. I've tried sort fileName | uniq -d but this command showed me a little more than the duplicate lines I needed.
    – v1mg1rl
    Nov 10 at 1:18






  • 3




    There is no such regex. You're looking for a relation between lines, not a pattern in a single line.
    – melpomene
    Nov 10 at 1:25






  • 1




    I have focused your question's title and tags; and removed the please and thank you noise. You may roll these edits back if they do not match your intent.
    – bishop
    Nov 10 at 1:29








1




1




Please provide more context and some examples of what you're trying to accomplish
– John
Nov 10 at 1:10




Please provide more context and some examples of what you're trying to accomplish
– John
Nov 10 at 1:10












@bishop okay I'm open to any other way
– v1mg1rl
Nov 10 at 1:13




@bishop okay I'm open to any other way
– v1mg1rl
Nov 10 at 1:13












Only thing I've tried to figure out is how to find the duplicate lines. I've tried sort fileName | uniq -d but this command showed me a little more than the duplicate lines I needed.
– v1mg1rl
Nov 10 at 1:18




Only thing I've tried to figure out is how to find the duplicate lines. I've tried sort fileName | uniq -d but this command showed me a little more than the duplicate lines I needed.
– v1mg1rl
Nov 10 at 1:18




3




3




There is no such regex. You're looking for a relation between lines, not a pattern in a single line.
– melpomene
Nov 10 at 1:25




There is no such regex. You're looking for a relation between lines, not a pattern in a single line.
– melpomene
Nov 10 at 1:25




1




1




I have focused your question's title and tags; and removed the please and thank you noise. You may roll these edits back if they do not match your intent.
– bishop
Nov 10 at 1:29




I have focused your question's title and tags; and removed the please and thank you noise. You may roll these edits back if they do not match your intent.
– bishop
Nov 10 at 1:29












1 Answer
1






active

oldest

votes

















up vote
0
down vote



accepted










What you call YAML, is not YAML. The YAML specification
very explicitly states that
keys in a mapping must be unique, and your keys are not:




The content of a mapping node is an unordered set of key: value node
pairs, with the restriction that each of the keys is unique. YAML
places no further restrictions on the nodes. In particular, keys may
be arbitrary nodes, the same node may be used as the value of
several key: value pairs, and a mapping could even contain itself as
a key or a value (directly or indirectly).




On the other hand some libraries have implemented this incorrectly, choosing to overwrite
any previous value associated with a key, with a later value. In your case, since
the values are the same, which value would be taken doesn't really matter.



Also your block style representation is not the only way to represent key-value pairs of a
mapping in "YAML", these duplicates could also be represented in a mapping, as



{...., app: activity-worker, app: activity-worker, .... }


With the two occurences not necessarily being next to each, nor on the same line. The
following is also semantically equivalent "YAML" to your input:



{...., app: activity-worker, app: 
activity-worker, .... }


If you have such faulty "YAML" files, the best way to clean them up is
using the round-trip capabilities of
ruamel.yaml (disclaimer: I
am the author of that package), and its ability to switch except/warn
on faulty input containing duplicate keys. You can install it for your
Python (virtual environment) using:



pip install ruamel.yaml


Assuming your file is called input.yaml and it contains:



a: 1   # some duplicate keys follow
app: activity-worker
app: activity-worker
b: "abc"


You can run the following one-liner:



python -c "import sys; from ruamel.yaml import YAML; yaml = YAML(); yaml.preserve_quotes=yaml.allow_duplicate_keys=True; yaml.dump(yaml.load(open('input.yaml')), sys.stdout)"


to get:



a: 1   # some duplicate keys follow
app: activity-worker
b: "abc"


and if your input were like:



{a: 1, app: activity-worker, app: 
activity-worker, b: "abc"}


the output would be:



{a: 1, app: activity-worker, b: "abc"}





share|improve this answer





















  • Thanks for a very thorough, informative answer Anthon!
    – v1mg1rl
    Nov 14 at 16:36










  • @v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
    – Anthon
    Nov 14 at 18:11











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53235155%2fhow-to-remove-duplicate-lines-in-yaml-format-configuration-files%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote



accepted










What you call YAML, is not YAML. The YAML specification
very explicitly states that
keys in a mapping must be unique, and your keys are not:




The content of a mapping node is an unordered set of key: value node
pairs, with the restriction that each of the keys is unique. YAML
places no further restrictions on the nodes. In particular, keys may
be arbitrary nodes, the same node may be used as the value of
several key: value pairs, and a mapping could even contain itself as
a key or a value (directly or indirectly).




On the other hand some libraries have implemented this incorrectly, choosing to overwrite
any previous value associated with a key, with a later value. In your case, since
the values are the same, which value would be taken doesn't really matter.



Also your block style representation is not the only way to represent key-value pairs of a
mapping in "YAML", these duplicates could also be represented in a mapping, as



{...., app: activity-worker, app: activity-worker, .... }


With the two occurences not necessarily being next to each, nor on the same line. The
following is also semantically equivalent "YAML" to your input:



{...., app: activity-worker, app: 
activity-worker, .... }


If you have such faulty "YAML" files, the best way to clean them up is
using the round-trip capabilities of
ruamel.yaml (disclaimer: I
am the author of that package), and its ability to switch except/warn
on faulty input containing duplicate keys. You can install it for your
Python (virtual environment) using:



pip install ruamel.yaml


Assuming your file is called input.yaml and it contains:



a: 1   # some duplicate keys follow
app: activity-worker
app: activity-worker
b: "abc"


You can run the following one-liner:



python -c "import sys; from ruamel.yaml import YAML; yaml = YAML(); yaml.preserve_quotes=yaml.allow_duplicate_keys=True; yaml.dump(yaml.load(open('input.yaml')), sys.stdout)"


to get:



a: 1   # some duplicate keys follow
app: activity-worker
b: "abc"


and if your input were like:



{a: 1, app: activity-worker, app: 
activity-worker, b: "abc"}


the output would be:



{a: 1, app: activity-worker, b: "abc"}





share|improve this answer





















  • Thanks for a very thorough, informative answer Anthon!
    – v1mg1rl
    Nov 14 at 16:36










  • @v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
    – Anthon
    Nov 14 at 18:11















up vote
0
down vote



accepted










What you call YAML, is not YAML. The YAML specification
very explicitly states that
keys in a mapping must be unique, and your keys are not:




The content of a mapping node is an unordered set of key: value node
pairs, with the restriction that each of the keys is unique. YAML
places no further restrictions on the nodes. In particular, keys may
be arbitrary nodes, the same node may be used as the value of
several key: value pairs, and a mapping could even contain itself as
a key or a value (directly or indirectly).




On the other hand some libraries have implemented this incorrectly, choosing to overwrite
any previous value associated with a key, with a later value. In your case, since
the values are the same, which value would be taken doesn't really matter.



Also your block style representation is not the only way to represent key-value pairs of a
mapping in "YAML", these duplicates could also be represented in a mapping, as



{...., app: activity-worker, app: activity-worker, .... }


With the two occurences not necessarily being next to each, nor on the same line. The
following is also semantically equivalent "YAML" to your input:



{...., app: activity-worker, app: 
activity-worker, .... }


If you have such faulty "YAML" files, the best way to clean them up is
using the round-trip capabilities of
ruamel.yaml (disclaimer: I
am the author of that package), and its ability to switch except/warn
on faulty input containing duplicate keys. You can install it for your
Python (virtual environment) using:



pip install ruamel.yaml


Assuming your file is called input.yaml and it contains:



a: 1   # some duplicate keys follow
app: activity-worker
app: activity-worker
b: "abc"


You can run the following one-liner:



python -c "import sys; from ruamel.yaml import YAML; yaml = YAML(); yaml.preserve_quotes=yaml.allow_duplicate_keys=True; yaml.dump(yaml.load(open('input.yaml')), sys.stdout)"


to get:



a: 1   # some duplicate keys follow
app: activity-worker
b: "abc"


and if your input were like:



{a: 1, app: activity-worker, app: 
activity-worker, b: "abc"}


the output would be:



{a: 1, app: activity-worker, b: "abc"}





share|improve this answer





















  • Thanks for a very thorough, informative answer Anthon!
    – v1mg1rl
    Nov 14 at 16:36










  • @v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
    – Anthon
    Nov 14 at 18:11













up vote
0
down vote



accepted







up vote
0
down vote



accepted






What you call YAML, is not YAML. The YAML specification
very explicitly states that
keys in a mapping must be unique, and your keys are not:




The content of a mapping node is an unordered set of key: value node
pairs, with the restriction that each of the keys is unique. YAML
places no further restrictions on the nodes. In particular, keys may
be arbitrary nodes, the same node may be used as the value of
several key: value pairs, and a mapping could even contain itself as
a key or a value (directly or indirectly).




On the other hand some libraries have implemented this incorrectly, choosing to overwrite
any previous value associated with a key, with a later value. In your case, since
the values are the same, which value would be taken doesn't really matter.



Also your block style representation is not the only way to represent key-value pairs of a
mapping in "YAML", these duplicates could also be represented in a mapping, as



{...., app: activity-worker, app: activity-worker, .... }


With the two occurences not necessarily being next to each, nor on the same line. The
following is also semantically equivalent "YAML" to your input:



{...., app: activity-worker, app: 
activity-worker, .... }


If you have such faulty "YAML" files, the best way to clean them up is
using the round-trip capabilities of
ruamel.yaml (disclaimer: I
am the author of that package), and its ability to switch except/warn
on faulty input containing duplicate keys. You can install it for your
Python (virtual environment) using:



pip install ruamel.yaml


Assuming your file is called input.yaml and it contains:



a: 1   # some duplicate keys follow
app: activity-worker
app: activity-worker
b: "abc"


You can run the following one-liner:



python -c "import sys; from ruamel.yaml import YAML; yaml = YAML(); yaml.preserve_quotes=yaml.allow_duplicate_keys=True; yaml.dump(yaml.load(open('input.yaml')), sys.stdout)"


to get:



a: 1   # some duplicate keys follow
app: activity-worker
b: "abc"


and if your input were like:



{a: 1, app: activity-worker, app: 
activity-worker, b: "abc"}


the output would be:



{a: 1, app: activity-worker, b: "abc"}





share|improve this answer












What you call YAML, is not YAML. The YAML specification
very explicitly states that
keys in a mapping must be unique, and your keys are not:




The content of a mapping node is an unordered set of key: value node
pairs, with the restriction that each of the keys is unique. YAML
places no further restrictions on the nodes. In particular, keys may
be arbitrary nodes, the same node may be used as the value of
several key: value pairs, and a mapping could even contain itself as
a key or a value (directly or indirectly).




On the other hand some libraries have implemented this incorrectly, choosing to overwrite
any previous value associated with a key, with a later value. In your case, since
the values are the same, which value would be taken doesn't really matter.



Also your block style representation is not the only way to represent key-value pairs of a
mapping in "YAML", these duplicates could also be represented in a mapping, as



{...., app: activity-worker, app: activity-worker, .... }


With the two occurences not necessarily being next to each, nor on the same line. The
following is also semantically equivalent "YAML" to your input:



{...., app: activity-worker, app: 
activity-worker, .... }


If you have such faulty "YAML" files, the best way to clean them up is
using the round-trip capabilities of
ruamel.yaml (disclaimer: I
am the author of that package), and its ability to switch except/warn
on faulty input containing duplicate keys. You can install it for your
Python (virtual environment) using:



pip install ruamel.yaml


Assuming your file is called input.yaml and it contains:



a: 1   # some duplicate keys follow
app: activity-worker
app: activity-worker
b: "abc"


You can run the following one-liner:



python -c "import sys; from ruamel.yaml import YAML; yaml = YAML(); yaml.preserve_quotes=yaml.allow_duplicate_keys=True; yaml.dump(yaml.load(open('input.yaml')), sys.stdout)"


to get:



a: 1   # some duplicate keys follow
app: activity-worker
b: "abc"


and if your input were like:



{a: 1, app: activity-worker, app: 
activity-worker, b: "abc"}


the output would be:



{a: 1, app: activity-worker, b: "abc"}






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 10 at 10:22









Anthon

27.9k1691143




27.9k1691143












  • Thanks for a very thorough, informative answer Anthon!
    – v1mg1rl
    Nov 14 at 16:36










  • @v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
    – Anthon
    Nov 14 at 18:11


















  • Thanks for a very thorough, informative answer Anthon!
    – v1mg1rl
    Nov 14 at 16:36










  • @v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
    – Anthon
    Nov 14 at 18:11
















Thanks for a very thorough, informative answer Anthon!
– v1mg1rl
Nov 14 at 16:36




Thanks for a very thorough, informative answer Anthon!
– v1mg1rl
Nov 14 at 16:36












@v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
– Anthon
Nov 14 at 18:11




@v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
– Anthon
Nov 14 at 18:11


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53235155%2fhow-to-remove-duplicate-lines-in-yaml-format-configuration-files%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Schultheiß

Verwaltungsgliederung Dänemarks

Liste der Kulturdenkmale in Wilsdruff