How to remove duplicate lines in YAML format configuration files?
up vote
1
down vote
favorite
I have a bunch of manifest/yaml files that may or may not have these key value pair duplicates:
...
app: activity-worker
app: activity-worker
...
I need to search through each of those files and find those duplicates so that I can remove one of them.
Note: I know that to replace a certain string (say, switch service:
to app:
) in all files of a directory (say, dev
) I can run grep -l 'service:' dev/* | xargs sed -i "" 's/service:/app:/g'
. I'm looking for a relation between lines.
unix duplicates yaml
|
show 2 more comments
up vote
1
down vote
favorite
I have a bunch of manifest/yaml files that may or may not have these key value pair duplicates:
...
app: activity-worker
app: activity-worker
...
I need to search through each of those files and find those duplicates so that I can remove one of them.
Note: I know that to replace a certain string (say, switch service:
to app:
) in all files of a directory (say, dev
) I can run grep -l 'service:' dev/* | xargs sed -i "" 's/service:/app:/g'
. I'm looking for a relation between lines.
unix duplicates yaml
1
Please provide more context and some examples of what you're trying to accomplish
– John
Nov 10 at 1:10
@bishop okay I'm open to any other way
– v1mg1rl
Nov 10 at 1:13
Only thing I've tried to figure out is how to find the duplicate lines. I've triedsort fileName | uniq -d
but this command showed me a little more than the duplicate lines I needed.
– v1mg1rl
Nov 10 at 1:18
3
There is no such regex. You're looking for a relation between lines, not a pattern in a single line.
– melpomene
Nov 10 at 1:25
1
I have focused your question's title and tags; and removed the please and thank you noise. You may roll these edits back if they do not match your intent.
– bishop
Nov 10 at 1:29
|
show 2 more comments
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a bunch of manifest/yaml files that may or may not have these key value pair duplicates:
...
app: activity-worker
app: activity-worker
...
I need to search through each of those files and find those duplicates so that I can remove one of them.
Note: I know that to replace a certain string (say, switch service:
to app:
) in all files of a directory (say, dev
) I can run grep -l 'service:' dev/* | xargs sed -i "" 's/service:/app:/g'
. I'm looking for a relation between lines.
unix duplicates yaml
I have a bunch of manifest/yaml files that may or may not have these key value pair duplicates:
...
app: activity-worker
app: activity-worker
...
I need to search through each of those files and find those duplicates so that I can remove one of them.
Note: I know that to replace a certain string (say, switch service:
to app:
) in all files of a directory (say, dev
) I can run grep -l 'service:' dev/* | xargs sed -i "" 's/service:/app:/g'
. I'm looking for a relation between lines.
unix duplicates yaml
unix duplicates yaml
edited Nov 10 at 1:34
asked Nov 10 at 1:07
v1mg1rl
358
358
1
Please provide more context and some examples of what you're trying to accomplish
– John
Nov 10 at 1:10
@bishop okay I'm open to any other way
– v1mg1rl
Nov 10 at 1:13
Only thing I've tried to figure out is how to find the duplicate lines. I've triedsort fileName | uniq -d
but this command showed me a little more than the duplicate lines I needed.
– v1mg1rl
Nov 10 at 1:18
3
There is no such regex. You're looking for a relation between lines, not a pattern in a single line.
– melpomene
Nov 10 at 1:25
1
I have focused your question's title and tags; and removed the please and thank you noise. You may roll these edits back if they do not match your intent.
– bishop
Nov 10 at 1:29
|
show 2 more comments
1
Please provide more context and some examples of what you're trying to accomplish
– John
Nov 10 at 1:10
@bishop okay I'm open to any other way
– v1mg1rl
Nov 10 at 1:13
Only thing I've tried to figure out is how to find the duplicate lines. I've triedsort fileName | uniq -d
but this command showed me a little more than the duplicate lines I needed.
– v1mg1rl
Nov 10 at 1:18
3
There is no such regex. You're looking for a relation between lines, not a pattern in a single line.
– melpomene
Nov 10 at 1:25
1
I have focused your question's title and tags; and removed the please and thank you noise. You may roll these edits back if they do not match your intent.
– bishop
Nov 10 at 1:29
1
1
Please provide more context and some examples of what you're trying to accomplish
– John
Nov 10 at 1:10
Please provide more context and some examples of what you're trying to accomplish
– John
Nov 10 at 1:10
@bishop okay I'm open to any other way
– v1mg1rl
Nov 10 at 1:13
@bishop okay I'm open to any other way
– v1mg1rl
Nov 10 at 1:13
Only thing I've tried to figure out is how to find the duplicate lines. I've tried
sort fileName | uniq -d
but this command showed me a little more than the duplicate lines I needed.– v1mg1rl
Nov 10 at 1:18
Only thing I've tried to figure out is how to find the duplicate lines. I've tried
sort fileName | uniq -d
but this command showed me a little more than the duplicate lines I needed.– v1mg1rl
Nov 10 at 1:18
3
3
There is no such regex. You're looking for a relation between lines, not a pattern in a single line.
– melpomene
Nov 10 at 1:25
There is no such regex. You're looking for a relation between lines, not a pattern in a single line.
– melpomene
Nov 10 at 1:25
1
1
I have focused your question's title and tags; and removed the please and thank you noise. You may roll these edits back if they do not match your intent.
– bishop
Nov 10 at 1:29
I have focused your question's title and tags; and removed the please and thank you noise. You may roll these edits back if they do not match your intent.
– bishop
Nov 10 at 1:29
|
show 2 more comments
1 Answer
1
active
oldest
votes
up vote
0
down vote
accepted
What you call YAML, is not YAML. The YAML specification
very explicitly states that
keys in a mapping must be unique, and your keys are not:
The content of a mapping node is an unordered set of key: value node
pairs, with the restriction that each of the keys is unique. YAML
places no further restrictions on the nodes. In particular, keys may
be arbitrary nodes, the same node may be used as the value of
several key: value pairs, and a mapping could even contain itself as
a key or a value (directly or indirectly).
On the other hand some libraries have implemented this incorrectly, choosing to overwrite
any previous value associated with a key, with a later value. In your case, since
the values are the same, which value would be taken doesn't really matter.
Also your block style representation is not the only way to represent key-value pairs of a
mapping in "YAML", these duplicates could also be represented in a mapping, as
{...., app: activity-worker, app: activity-worker, .... }
With the two occurences not necessarily being next to each, nor on the same line. The
following is also semantically equivalent "YAML" to your input:
{...., app: activity-worker, app:
activity-worker, .... }
If you have such faulty "YAML" files, the best way to clean them up is
using the round-trip capabilities of
ruamel.yaml
(disclaimer: I
am the author of that package), and its ability to switch except/warn
on faulty input containing duplicate keys. You can install it for your
Python (virtual environment) using:
pip install ruamel.yaml
Assuming your file is called input.yaml
and it contains:
a: 1 # some duplicate keys follow
app: activity-worker
app: activity-worker
b: "abc"
You can run the following one-liner:
python -c "import sys; from ruamel.yaml import YAML; yaml = YAML(); yaml.preserve_quotes=yaml.allow_duplicate_keys=True; yaml.dump(yaml.load(open('input.yaml')), sys.stdout)"
to get:
a: 1 # some duplicate keys follow
app: activity-worker
b: "abc"
and if your input were like:
{a: 1, app: activity-worker, app:
activity-worker, b: "abc"}
the output would be:
{a: 1, app: activity-worker, b: "abc"}
Thanks for a very thorough, informative answer Anthon!
– v1mg1rl
Nov 14 at 16:36
@v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
– Anthon
Nov 14 at 18:11
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
What you call YAML, is not YAML. The YAML specification
very explicitly states that
keys in a mapping must be unique, and your keys are not:
The content of a mapping node is an unordered set of key: value node
pairs, with the restriction that each of the keys is unique. YAML
places no further restrictions on the nodes. In particular, keys may
be arbitrary nodes, the same node may be used as the value of
several key: value pairs, and a mapping could even contain itself as
a key or a value (directly or indirectly).
On the other hand some libraries have implemented this incorrectly, choosing to overwrite
any previous value associated with a key, with a later value. In your case, since
the values are the same, which value would be taken doesn't really matter.
Also your block style representation is not the only way to represent key-value pairs of a
mapping in "YAML", these duplicates could also be represented in a mapping, as
{...., app: activity-worker, app: activity-worker, .... }
With the two occurences not necessarily being next to each, nor on the same line. The
following is also semantically equivalent "YAML" to your input:
{...., app: activity-worker, app:
activity-worker, .... }
If you have such faulty "YAML" files, the best way to clean them up is
using the round-trip capabilities of
ruamel.yaml
(disclaimer: I
am the author of that package), and its ability to switch except/warn
on faulty input containing duplicate keys. You can install it for your
Python (virtual environment) using:
pip install ruamel.yaml
Assuming your file is called input.yaml
and it contains:
a: 1 # some duplicate keys follow
app: activity-worker
app: activity-worker
b: "abc"
You can run the following one-liner:
python -c "import sys; from ruamel.yaml import YAML; yaml = YAML(); yaml.preserve_quotes=yaml.allow_duplicate_keys=True; yaml.dump(yaml.load(open('input.yaml')), sys.stdout)"
to get:
a: 1 # some duplicate keys follow
app: activity-worker
b: "abc"
and if your input were like:
{a: 1, app: activity-worker, app:
activity-worker, b: "abc"}
the output would be:
{a: 1, app: activity-worker, b: "abc"}
Thanks for a very thorough, informative answer Anthon!
– v1mg1rl
Nov 14 at 16:36
@v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
– Anthon
Nov 14 at 18:11
add a comment |
up vote
0
down vote
accepted
What you call YAML, is not YAML. The YAML specification
very explicitly states that
keys in a mapping must be unique, and your keys are not:
The content of a mapping node is an unordered set of key: value node
pairs, with the restriction that each of the keys is unique. YAML
places no further restrictions on the nodes. In particular, keys may
be arbitrary nodes, the same node may be used as the value of
several key: value pairs, and a mapping could even contain itself as
a key or a value (directly or indirectly).
On the other hand some libraries have implemented this incorrectly, choosing to overwrite
any previous value associated with a key, with a later value. In your case, since
the values are the same, which value would be taken doesn't really matter.
Also your block style representation is not the only way to represent key-value pairs of a
mapping in "YAML", these duplicates could also be represented in a mapping, as
{...., app: activity-worker, app: activity-worker, .... }
With the two occurences not necessarily being next to each, nor on the same line. The
following is also semantically equivalent "YAML" to your input:
{...., app: activity-worker, app:
activity-worker, .... }
If you have such faulty "YAML" files, the best way to clean them up is
using the round-trip capabilities of
ruamel.yaml
(disclaimer: I
am the author of that package), and its ability to switch except/warn
on faulty input containing duplicate keys. You can install it for your
Python (virtual environment) using:
pip install ruamel.yaml
Assuming your file is called input.yaml
and it contains:
a: 1 # some duplicate keys follow
app: activity-worker
app: activity-worker
b: "abc"
You can run the following one-liner:
python -c "import sys; from ruamel.yaml import YAML; yaml = YAML(); yaml.preserve_quotes=yaml.allow_duplicate_keys=True; yaml.dump(yaml.load(open('input.yaml')), sys.stdout)"
to get:
a: 1 # some duplicate keys follow
app: activity-worker
b: "abc"
and if your input were like:
{a: 1, app: activity-worker, app:
activity-worker, b: "abc"}
the output would be:
{a: 1, app: activity-worker, b: "abc"}
Thanks for a very thorough, informative answer Anthon!
– v1mg1rl
Nov 14 at 16:36
@v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
– Anthon
Nov 14 at 18:11
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
What you call YAML, is not YAML. The YAML specification
very explicitly states that
keys in a mapping must be unique, and your keys are not:
The content of a mapping node is an unordered set of key: value node
pairs, with the restriction that each of the keys is unique. YAML
places no further restrictions on the nodes. In particular, keys may
be arbitrary nodes, the same node may be used as the value of
several key: value pairs, and a mapping could even contain itself as
a key or a value (directly or indirectly).
On the other hand some libraries have implemented this incorrectly, choosing to overwrite
any previous value associated with a key, with a later value. In your case, since
the values are the same, which value would be taken doesn't really matter.
Also your block style representation is not the only way to represent key-value pairs of a
mapping in "YAML", these duplicates could also be represented in a mapping, as
{...., app: activity-worker, app: activity-worker, .... }
With the two occurences not necessarily being next to each, nor on the same line. The
following is also semantically equivalent "YAML" to your input:
{...., app: activity-worker, app:
activity-worker, .... }
If you have such faulty "YAML" files, the best way to clean them up is
using the round-trip capabilities of
ruamel.yaml
(disclaimer: I
am the author of that package), and its ability to switch except/warn
on faulty input containing duplicate keys. You can install it for your
Python (virtual environment) using:
pip install ruamel.yaml
Assuming your file is called input.yaml
and it contains:
a: 1 # some duplicate keys follow
app: activity-worker
app: activity-worker
b: "abc"
You can run the following one-liner:
python -c "import sys; from ruamel.yaml import YAML; yaml = YAML(); yaml.preserve_quotes=yaml.allow_duplicate_keys=True; yaml.dump(yaml.load(open('input.yaml')), sys.stdout)"
to get:
a: 1 # some duplicate keys follow
app: activity-worker
b: "abc"
and if your input were like:
{a: 1, app: activity-worker, app:
activity-worker, b: "abc"}
the output would be:
{a: 1, app: activity-worker, b: "abc"}
What you call YAML, is not YAML. The YAML specification
very explicitly states that
keys in a mapping must be unique, and your keys are not:
The content of a mapping node is an unordered set of key: value node
pairs, with the restriction that each of the keys is unique. YAML
places no further restrictions on the nodes. In particular, keys may
be arbitrary nodes, the same node may be used as the value of
several key: value pairs, and a mapping could even contain itself as
a key or a value (directly or indirectly).
On the other hand some libraries have implemented this incorrectly, choosing to overwrite
any previous value associated with a key, with a later value. In your case, since
the values are the same, which value would be taken doesn't really matter.
Also your block style representation is not the only way to represent key-value pairs of a
mapping in "YAML", these duplicates could also be represented in a mapping, as
{...., app: activity-worker, app: activity-worker, .... }
With the two occurences not necessarily being next to each, nor on the same line. The
following is also semantically equivalent "YAML" to your input:
{...., app: activity-worker, app:
activity-worker, .... }
If you have such faulty "YAML" files, the best way to clean them up is
using the round-trip capabilities of
ruamel.yaml
(disclaimer: I
am the author of that package), and its ability to switch except/warn
on faulty input containing duplicate keys. You can install it for your
Python (virtual environment) using:
pip install ruamel.yaml
Assuming your file is called input.yaml
and it contains:
a: 1 # some duplicate keys follow
app: activity-worker
app: activity-worker
b: "abc"
You can run the following one-liner:
python -c "import sys; from ruamel.yaml import YAML; yaml = YAML(); yaml.preserve_quotes=yaml.allow_duplicate_keys=True; yaml.dump(yaml.load(open('input.yaml')), sys.stdout)"
to get:
a: 1 # some duplicate keys follow
app: activity-worker
b: "abc"
and if your input were like:
{a: 1, app: activity-worker, app:
activity-worker, b: "abc"}
the output would be:
{a: 1, app: activity-worker, b: "abc"}
answered Nov 10 at 10:22
Anthon
27.9k1691143
27.9k1691143
Thanks for a very thorough, informative answer Anthon!
– v1mg1rl
Nov 14 at 16:36
@v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
– Anthon
Nov 14 at 18:11
add a comment |
Thanks for a very thorough, informative answer Anthon!
– v1mg1rl
Nov 14 at 16:36
@v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
– Anthon
Nov 14 at 18:11
Thanks for a very thorough, informative answer Anthon!
– v1mg1rl
Nov 14 at 16:36
Thanks for a very thorough, informative answer Anthon!
– v1mg1rl
Nov 14 at 16:36
@v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
– Anthon
Nov 14 at 18:11
@v1mg1rl If this solves your problem, please accept this answer by clicking the checkmark (✓) next to the top of this answer. That way others know your question has a working answer (without having to scroll to the comments). That also marks your question as answered in queries.
– Anthon
Nov 14 at 18:11
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53235155%2fhow-to-remove-duplicate-lines-in-yaml-format-configuration-files%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Please provide more context and some examples of what you're trying to accomplish
– John
Nov 10 at 1:10
@bishop okay I'm open to any other way
– v1mg1rl
Nov 10 at 1:13
Only thing I've tried to figure out is how to find the duplicate lines. I've tried
sort fileName | uniq -d
but this command showed me a little more than the duplicate lines I needed.– v1mg1rl
Nov 10 at 1:18
3
There is no such regex. You're looking for a relation between lines, not a pattern in a single line.
– melpomene
Nov 10 at 1:25
1
I have focused your question's title and tags; and removed the please and thank you noise. You may roll these edits back if they do not match your intent.
– bishop
Nov 10 at 1:29