Difference between two higher numbers in a column in R











up vote
1
down vote

favorite












I have a data frame like these:



  NUM_TURNO CODIGO_MUNICIPIO SIGLA_PARTIDO     SHARE
1 1 81825 PPB 38.713318
2 1 81825 PMDB 61.286682
3 1 09717 PMDB 48.025900
4 1 09717 PL 1.279217
5 1 09717 PFL 50.694883
6 1 61921 PMDB 51.793868


This is a data.frame of elections in Brazil. Grouping by NUM_TURNO and CODGIDO_MUNICIPIO I want to compare the SHARE of the FIRST and SECOND most votted politics in each city and round (1 or 2) and create a new column.

What am I having problem to do? I don't know how to calculate the difference only for the two biggest SHARES of votes.



For the first case, for example, I want to create something that gives me the difference between 61.286682 and 38.713318 = 22.573364 and so on.



Something like this:



df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
mutate(Diff = HIGHER SHARE - 2º HIGHER SHARE))









share|improve this question




















  • 2




    Something like -diff(sort(SHARE,decreasing=TRUE)[1:2])
    – nicola
    Nov 8 at 16:47

















up vote
1
down vote

favorite












I have a data frame like these:



  NUM_TURNO CODIGO_MUNICIPIO SIGLA_PARTIDO     SHARE
1 1 81825 PPB 38.713318
2 1 81825 PMDB 61.286682
3 1 09717 PMDB 48.025900
4 1 09717 PL 1.279217
5 1 09717 PFL 50.694883
6 1 61921 PMDB 51.793868


This is a data.frame of elections in Brazil. Grouping by NUM_TURNO and CODGIDO_MUNICIPIO I want to compare the SHARE of the FIRST and SECOND most votted politics in each city and round (1 or 2) and create a new column.

What am I having problem to do? I don't know how to calculate the difference only for the two biggest SHARES of votes.



For the first case, for example, I want to create something that gives me the difference between 61.286682 and 38.713318 = 22.573364 and so on.



Something like this:



df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
mutate(Diff = HIGHER SHARE - 2º HIGHER SHARE))









share|improve this question




















  • 2




    Something like -diff(sort(SHARE,decreasing=TRUE)[1:2])
    – nicola
    Nov 8 at 16:47















up vote
1
down vote

favorite









up vote
1
down vote

favorite











I have a data frame like these:



  NUM_TURNO CODIGO_MUNICIPIO SIGLA_PARTIDO     SHARE
1 1 81825 PPB 38.713318
2 1 81825 PMDB 61.286682
3 1 09717 PMDB 48.025900
4 1 09717 PL 1.279217
5 1 09717 PFL 50.694883
6 1 61921 PMDB 51.793868


This is a data.frame of elections in Brazil. Grouping by NUM_TURNO and CODGIDO_MUNICIPIO I want to compare the SHARE of the FIRST and SECOND most votted politics in each city and round (1 or 2) and create a new column.

What am I having problem to do? I don't know how to calculate the difference only for the two biggest SHARES of votes.



For the first case, for example, I want to create something that gives me the difference between 61.286682 and 38.713318 = 22.573364 and so on.



Something like this:



df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
mutate(Diff = HIGHER SHARE - 2º HIGHER SHARE))









share|improve this question















I have a data frame like these:



  NUM_TURNO CODIGO_MUNICIPIO SIGLA_PARTIDO     SHARE
1 1 81825 PPB 38.713318
2 1 81825 PMDB 61.286682
3 1 09717 PMDB 48.025900
4 1 09717 PL 1.279217
5 1 09717 PFL 50.694883
6 1 61921 PMDB 51.793868


This is a data.frame of elections in Brazil. Grouping by NUM_TURNO and CODGIDO_MUNICIPIO I want to compare the SHARE of the FIRST and SECOND most votted politics in each city and round (1 or 2) and create a new column.

What am I having problem to do? I don't know how to calculate the difference only for the two biggest SHARES of votes.



For the first case, for example, I want to create something that gives me the difference between 61.286682 and 38.713318 = 22.573364 and so on.



Something like this:



df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
mutate(Diff = HIGHER SHARE - 2º HIGHER SHARE))






r dataframe calculator difference






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 8 at 16:48









Dave2e

6,161112228




6,161112228










asked Nov 8 at 16:43









Danilo Imbimbo

1558




1558








  • 2




    Something like -diff(sort(SHARE,decreasing=TRUE)[1:2])
    – nicola
    Nov 8 at 16:47
















  • 2




    Something like -diff(sort(SHARE,decreasing=TRUE)[1:2])
    – nicola
    Nov 8 at 16:47










2




2




Something like -diff(sort(SHARE,decreasing=TRUE)[1:2])
– nicola
Nov 8 at 16:47






Something like -diff(sort(SHARE,decreasing=TRUE)[1:2])
– nicola
Nov 8 at 16:47














2 Answers
2






active

oldest

votes

















up vote
1
down vote



accepted










You could arrange your dataframe by Share and then slice the first two values. Then you could use summarise to get the diff between the values for every group:



library(dplyr)
df %>%
group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
arrange(desc(Share)) %>%
slice(1:2) %>%
summarise(Diff = -diff(Share))





share|improve this answer




























    up vote
    2
    down vote













    You can also use top_n from dplyr with grouping and summarizing. Keep in mind that in the data you provided, you will get an error in summarize if you use diff with a single value, hence the use of ifelse.



    df %>%
    group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
    top_n(2, SHARE) %>%
    summarize(Diff = ifelse(n() == 1, NA, diff(SHARE)))

    # A tibble: 3 x 3
    # Groups: NUM_TURNO [?]
    NUM_TURNO CODIGO_MUNICIPIO Diff
    <dbl> <dbl> <dbl>
    1 1 9717 2.67
    2 1 61921 NA
    3 1 81825 22.6





    share|improve this answer





















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














       

      draft saved


      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53212322%2fdifference-between-two-higher-numbers-in-a-column-in-r%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote



      accepted










      You could arrange your dataframe by Share and then slice the first two values. Then you could use summarise to get the diff between the values for every group:



      library(dplyr)
      df %>%
      group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
      arrange(desc(Share)) %>%
      slice(1:2) %>%
      summarise(Diff = -diff(Share))





      share|improve this answer

























        up vote
        1
        down vote



        accepted










        You could arrange your dataframe by Share and then slice the first two values. Then you could use summarise to get the diff between the values for every group:



        library(dplyr)
        df %>%
        group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
        arrange(desc(Share)) %>%
        slice(1:2) %>%
        summarise(Diff = -diff(Share))





        share|improve this answer























          up vote
          1
          down vote



          accepted







          up vote
          1
          down vote



          accepted






          You could arrange your dataframe by Share and then slice the first two values. Then you could use summarise to get the diff between the values for every group:



          library(dplyr)
          df %>%
          group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
          arrange(desc(Share)) %>%
          slice(1:2) %>%
          summarise(Diff = -diff(Share))





          share|improve this answer












          You could arrange your dataframe by Share and then slice the first two values. Then you could use summarise to get the diff between the values for every group:



          library(dplyr)
          df %>%
          group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
          arrange(desc(Share)) %>%
          slice(1:2) %>%
          summarise(Diff = -diff(Share))






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 8 at 16:57









          FloSchmo

          4486




          4486
























              up vote
              2
              down vote













              You can also use top_n from dplyr with grouping and summarizing. Keep in mind that in the data you provided, you will get an error in summarize if you use diff with a single value, hence the use of ifelse.



              df %>%
              group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
              top_n(2, SHARE) %>%
              summarize(Diff = ifelse(n() == 1, NA, diff(SHARE)))

              # A tibble: 3 x 3
              # Groups: NUM_TURNO [?]
              NUM_TURNO CODIGO_MUNICIPIO Diff
              <dbl> <dbl> <dbl>
              1 1 9717 2.67
              2 1 61921 NA
              3 1 81825 22.6





              share|improve this answer

























                up vote
                2
                down vote













                You can also use top_n from dplyr with grouping and summarizing. Keep in mind that in the data you provided, you will get an error in summarize if you use diff with a single value, hence the use of ifelse.



                df %>%
                group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
                top_n(2, SHARE) %>%
                summarize(Diff = ifelse(n() == 1, NA, diff(SHARE)))

                # A tibble: 3 x 3
                # Groups: NUM_TURNO [?]
                NUM_TURNO CODIGO_MUNICIPIO Diff
                <dbl> <dbl> <dbl>
                1 1 9717 2.67
                2 1 61921 NA
                3 1 81825 22.6





                share|improve this answer























                  up vote
                  2
                  down vote










                  up vote
                  2
                  down vote









                  You can also use top_n from dplyr with grouping and summarizing. Keep in mind that in the data you provided, you will get an error in summarize if you use diff with a single value, hence the use of ifelse.



                  df %>%
                  group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
                  top_n(2, SHARE) %>%
                  summarize(Diff = ifelse(n() == 1, NA, diff(SHARE)))

                  # A tibble: 3 x 3
                  # Groups: NUM_TURNO [?]
                  NUM_TURNO CODIGO_MUNICIPIO Diff
                  <dbl> <dbl> <dbl>
                  1 1 9717 2.67
                  2 1 61921 NA
                  3 1 81825 22.6





                  share|improve this answer












                  You can also use top_n from dplyr with grouping and summarizing. Keep in mind that in the data you provided, you will get an error in summarize if you use diff with a single value, hence the use of ifelse.



                  df %>%
                  group_by(NUM_TURNO, CODIGO_MUNICIPIO) %>%
                  top_n(2, SHARE) %>%
                  summarize(Diff = ifelse(n() == 1, NA, diff(SHARE)))

                  # A tibble: 3 x 3
                  # Groups: NUM_TURNO [?]
                  NUM_TURNO CODIGO_MUNICIPIO Diff
                  <dbl> <dbl> <dbl>
                  1 1 9717 2.67
                  2 1 61921 NA
                  3 1 81825 22.6






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 8 at 17:13









                  Jake Kaupp

                  4,83221427




                  4,83221427






























                       

                      draft saved


                      draft discarded



















































                       


                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53212322%2fdifference-between-two-higher-numbers-in-a-column-in-r%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Landwehr

                      Reims

                      Schenkenzell