Difference between two higher numbers in a column in R
up vote
1
down vote
favorite
I have a data frame like these:
NUM_TURNO CODIGO_MUNICIPIO SIGLA_PARTIDO SHARE
1 1 81825 PPB 38.713318
2 1 81825 PMDB 61.286682
3 1 09717 PMDB 48.025900
4 1 09717 PL 1.279217
5 1 09717 PFL 50.694883
6 1 61921 PMDB 51.793868
This is a data.frame of elections in Brazil. Grouping by NUM_TURNO and CODGIDO_MUNICIPIO I want to compare the SHARE of the FIRST and SECOND most votted politics in each city and round (1 or 2) and create a new column.
What am I having problem to do? I don't know how to calculate the difference only for the two biggest SHARES of votes.
For the first case, for example, I want to create something that gives me the difference between 61.286682 and 38.713318 = 22.573364 and so on.
Something like this:
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
mutate(Diff = HIGHER SHARE - 2º HIGHER SHARE))
r dataframe calculator difference
add a comment |
up vote
1
down vote
favorite
I have a data frame like these:
NUM_TURNO CODIGO_MUNICIPIO SIGLA_PARTIDO SHARE
1 1 81825 PPB 38.713318
2 1 81825 PMDB 61.286682
3 1 09717 PMDB 48.025900
4 1 09717 PL 1.279217
5 1 09717 PFL 50.694883
6 1 61921 PMDB 51.793868
This is a data.frame of elections in Brazil. Grouping by NUM_TURNO and CODGIDO_MUNICIPIO I want to compare the SHARE of the FIRST and SECOND most votted politics in each city and round (1 or 2) and create a new column.
What am I having problem to do? I don't know how to calculate the difference only for the two biggest SHARES of votes.
For the first case, for example, I want to create something that gives me the difference between 61.286682 and 38.713318 = 22.573364 and so on.
Something like this:
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
mutate(Diff = HIGHER SHARE - 2º HIGHER SHARE))
r dataframe calculator difference
2
Something like-diff(sort(SHARE,decreasing=TRUE)[1:2])
– nicola
Nov 8 at 16:47
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a data frame like these:
NUM_TURNO CODIGO_MUNICIPIO SIGLA_PARTIDO SHARE
1 1 81825 PPB 38.713318
2 1 81825 PMDB 61.286682
3 1 09717 PMDB 48.025900
4 1 09717 PL 1.279217
5 1 09717 PFL 50.694883
6 1 61921 PMDB 51.793868
This is a data.frame of elections in Brazil. Grouping by NUM_TURNO and CODGIDO_MUNICIPIO I want to compare the SHARE of the FIRST and SECOND most votted politics in each city and round (1 or 2) and create a new column.
What am I having problem to do? I don't know how to calculate the difference only for the two biggest SHARES of votes.
For the first case, for example, I want to create something that gives me the difference between 61.286682 and 38.713318 = 22.573364 and so on.
Something like this:
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
mutate(Diff = HIGHER SHARE - 2º HIGHER SHARE))
r dataframe calculator difference
I have a data frame like these:
NUM_TURNO CODIGO_MUNICIPIO SIGLA_PARTIDO SHARE
1 1 81825 PPB 38.713318
2 1 81825 PMDB 61.286682
3 1 09717 PMDB 48.025900
4 1 09717 PL 1.279217
5 1 09717 PFL 50.694883
6 1 61921 PMDB 51.793868
This is a data.frame of elections in Brazil. Grouping by NUM_TURNO and CODGIDO_MUNICIPIO I want to compare the SHARE of the FIRST and SECOND most votted politics in each city and round (1 or 2) and create a new column.
What am I having problem to do? I don't know how to calculate the difference only for the two biggest SHARES of votes.
For the first case, for example, I want to create something that gives me the difference between 61.286682 and 38.713318 = 22.573364 and so on.
Something like this:
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
mutate(Diff = HIGHER SHARE - 2º HIGHER SHARE))
r dataframe calculator difference
r dataframe calculator difference
edited Nov 8 at 16:48
Dave2e
6,161112228
6,161112228
asked Nov 8 at 16:43
Danilo Imbimbo
1558
1558
2
Something like-diff(sort(SHARE,decreasing=TRUE)[1:2])
– nicola
Nov 8 at 16:47
add a comment |
2
Something like-diff(sort(SHARE,decreasing=TRUE)[1:2])
– nicola
Nov 8 at 16:47
2
2
Something like
-diff(sort(SHARE,decreasing=TRUE)[1:2])– nicola
Nov 8 at 16:47
Something like
-diff(sort(SHARE,decreasing=TRUE)[1:2])– nicola
Nov 8 at 16:47
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
accepted
You could arrange your dataframe by Share and then slice the first two values. Then you could use summarise to get the diff between the values for every group:
library(dplyr)
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
arrange(desc(Share)) %>%
slice(1:2) %>%
summarise(Diff = -diff(Share))
add a comment |
up vote
2
down vote
You can also use top_n from dplyr with grouping and summarizing. Keep in mind that in the data you provided, you will get an error in summarize if you use diff with a single value, hence the use of ifelse.
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
top_n(2, SHARE) %>%
summarize(Diff = ifelse(n() == 1, NA, diff(SHARE)))
# A tibble: 3 x 3
# Groups: NUM_TURNO [?]
NUM_TURNO CODIGO_MUNICIPIO Diff
<dbl> <dbl> <dbl>
1 1 9717 2.67
2 1 61921 NA
3 1 81825 22.6
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
You could arrange your dataframe by Share and then slice the first two values. Then you could use summarise to get the diff between the values for every group:
library(dplyr)
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
arrange(desc(Share)) %>%
slice(1:2) %>%
summarise(Diff = -diff(Share))
add a comment |
up vote
1
down vote
accepted
You could arrange your dataframe by Share and then slice the first two values. Then you could use summarise to get the diff between the values for every group:
library(dplyr)
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
arrange(desc(Share)) %>%
slice(1:2) %>%
summarise(Diff = -diff(Share))
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
You could arrange your dataframe by Share and then slice the first two values. Then you could use summarise to get the diff between the values for every group:
library(dplyr)
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
arrange(desc(Share)) %>%
slice(1:2) %>%
summarise(Diff = -diff(Share))
You could arrange your dataframe by Share and then slice the first two values. Then you could use summarise to get the diff between the values for every group:
library(dplyr)
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
arrange(desc(Share)) %>%
slice(1:2) %>%
summarise(Diff = -diff(Share))
answered Nov 8 at 16:57
FloSchmo
4486
4486
add a comment |
add a comment |
up vote
2
down vote
You can also use top_n from dplyr with grouping and summarizing. Keep in mind that in the data you provided, you will get an error in summarize if you use diff with a single value, hence the use of ifelse.
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
top_n(2, SHARE) %>%
summarize(Diff = ifelse(n() == 1, NA, diff(SHARE)))
# A tibble: 3 x 3
# Groups: NUM_TURNO [?]
NUM_TURNO CODIGO_MUNICIPIO Diff
<dbl> <dbl> <dbl>
1 1 9717 2.67
2 1 61921 NA
3 1 81825 22.6
add a comment |
up vote
2
down vote
You can also use top_n from dplyr with grouping and summarizing. Keep in mind that in the data you provided, you will get an error in summarize if you use diff with a single value, hence the use of ifelse.
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
top_n(2, SHARE) %>%
summarize(Diff = ifelse(n() == 1, NA, diff(SHARE)))
# A tibble: 3 x 3
# Groups: NUM_TURNO [?]
NUM_TURNO CODIGO_MUNICIPIO Diff
<dbl> <dbl> <dbl>
1 1 9717 2.67
2 1 61921 NA
3 1 81825 22.6
add a comment |
up vote
2
down vote
up vote
2
down vote
You can also use top_n from dplyr with grouping and summarizing. Keep in mind that in the data you provided, you will get an error in summarize if you use diff with a single value, hence the use of ifelse.
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
top_n(2, SHARE) %>%
summarize(Diff = ifelse(n() == 1, NA, diff(SHARE)))
# A tibble: 3 x 3
# Groups: NUM_TURNO [?]
NUM_TURNO CODIGO_MUNICIPIO Diff
<dbl> <dbl> <dbl>
1 1 9717 2.67
2 1 61921 NA
3 1 81825 22.6
You can also use top_n from dplyr with grouping and summarizing. Keep in mind that in the data you provided, you will get an error in summarize if you use diff with a single value, hence the use of ifelse.
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
top_n(2, SHARE) %>%
summarize(Diff = ifelse(n() == 1, NA, diff(SHARE)))
# A tibble: 3 x 3
# Groups: NUM_TURNO [?]
NUM_TURNO CODIGO_MUNICIPIO Diff
<dbl> <dbl> <dbl>
1 1 9717 2.67
2 1 61921 NA
3 1 81825 22.6
answered Nov 8 at 17:13
Jake Kaupp
4,83221427
4,83221427
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53212322%2fdifference-between-two-higher-numbers-in-a-column-in-r%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Something like
-diff(sort(SHARE,decreasing=TRUE)[1:2])– nicola
Nov 8 at 16:47