Pandas: Conditionally insert rows into DataFrame while iterating through rows











up vote
3
down vote

favorite












While iterating through the rows of a specific column in a Pandas DataFrame, I would like to add a new row below the currently iterated row, if the cell in the currently iterated row meets a certain condition.



Say for example:



df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})


DataFrame:



      A     B
0 0.15 1500
1 0.15 1500
2 0.70 7000


Attempt:



y = 100                             #An example scalar

i = 1

for x in df['A']:
if x is not None: #Values in 'A' are filled atm, but not necessarily.
df.loc[i] = [None, x*y] #Should insert None into 'A', and product into 'B'.
df.index = df.index + 1 #Shift index? According to this S/O answer: https://stackoverflow.com/a/24284680/4909923
i = i + 1

df.sort_index(inplace=True) #Sort index?


I haven't been able to succeed so far; getting a shifted index numbering that doesn't start at 0, and rows seem not to be inserted in an orderly way:



      A     B
3 0.15 1500
4 NaN 70
5 0.70 7000


I tried various variants of this, trying to use applymap with a lambda function, but was not able to get it working.



Desired result:



      A     B
0 0.15 1500
1 None 15
2 0.15 1500
3 None 15
4 0.70 7000
5 None 70









share|improve this question






















  • As it stands, I see no use case for pandas. You're iterating (mostly a no no) and you want to insert rows (also not really a good thing in numpy, which is the underlying structure)
    – roganjosh
    Nov 10 at 10:39












  • @roganjosh I am already using Pandas, this is just a subset of a DataFrame script doing other things also. I need to be able to insert rows as the program will create a DataFrame depending on other factors, therefore not so desirable to preallocate the index (also while inserting rows is inefficient, it doesn't matter as I'm dealing with less than tens of rows, not thousands).
    – Winterflags
    Nov 10 at 10:43












  • So my point still stands. Drop pandas and deal with nested lists. Pandas is just getting in the way here.
    – roganjosh
    Nov 10 at 10:44

















up vote
3
down vote

favorite












While iterating through the rows of a specific column in a Pandas DataFrame, I would like to add a new row below the currently iterated row, if the cell in the currently iterated row meets a certain condition.



Say for example:



df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})


DataFrame:



      A     B
0 0.15 1500
1 0.15 1500
2 0.70 7000


Attempt:



y = 100                             #An example scalar

i = 1

for x in df['A']:
if x is not None: #Values in 'A' are filled atm, but not necessarily.
df.loc[i] = [None, x*y] #Should insert None into 'A', and product into 'B'.
df.index = df.index + 1 #Shift index? According to this S/O answer: https://stackoverflow.com/a/24284680/4909923
i = i + 1

df.sort_index(inplace=True) #Sort index?


I haven't been able to succeed so far; getting a shifted index numbering that doesn't start at 0, and rows seem not to be inserted in an orderly way:



      A     B
3 0.15 1500
4 NaN 70
5 0.70 7000


I tried various variants of this, trying to use applymap with a lambda function, but was not able to get it working.



Desired result:



      A     B
0 0.15 1500
1 None 15
2 0.15 1500
3 None 15
4 0.70 7000
5 None 70









share|improve this question






















  • As it stands, I see no use case for pandas. You're iterating (mostly a no no) and you want to insert rows (also not really a good thing in numpy, which is the underlying structure)
    – roganjosh
    Nov 10 at 10:39












  • @roganjosh I am already using Pandas, this is just a subset of a DataFrame script doing other things also. I need to be able to insert rows as the program will create a DataFrame depending on other factors, therefore not so desirable to preallocate the index (also while inserting rows is inefficient, it doesn't matter as I'm dealing with less than tens of rows, not thousands).
    – Winterflags
    Nov 10 at 10:43












  • So my point still stands. Drop pandas and deal with nested lists. Pandas is just getting in the way here.
    – roganjosh
    Nov 10 at 10:44















up vote
3
down vote

favorite









up vote
3
down vote

favorite











While iterating through the rows of a specific column in a Pandas DataFrame, I would like to add a new row below the currently iterated row, if the cell in the currently iterated row meets a certain condition.



Say for example:



df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})


DataFrame:



      A     B
0 0.15 1500
1 0.15 1500
2 0.70 7000


Attempt:



y = 100                             #An example scalar

i = 1

for x in df['A']:
if x is not None: #Values in 'A' are filled atm, but not necessarily.
df.loc[i] = [None, x*y] #Should insert None into 'A', and product into 'B'.
df.index = df.index + 1 #Shift index? According to this S/O answer: https://stackoverflow.com/a/24284680/4909923
i = i + 1

df.sort_index(inplace=True) #Sort index?


I haven't been able to succeed so far; getting a shifted index numbering that doesn't start at 0, and rows seem not to be inserted in an orderly way:



      A     B
3 0.15 1500
4 NaN 70
5 0.70 7000


I tried various variants of this, trying to use applymap with a lambda function, but was not able to get it working.



Desired result:



      A     B
0 0.15 1500
1 None 15
2 0.15 1500
3 None 15
4 0.70 7000
5 None 70









share|improve this question













While iterating through the rows of a specific column in a Pandas DataFrame, I would like to add a new row below the currently iterated row, if the cell in the currently iterated row meets a certain condition.



Say for example:



df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})


DataFrame:



      A     B
0 0.15 1500
1 0.15 1500
2 0.70 7000


Attempt:



y = 100                             #An example scalar

i = 1

for x in df['A']:
if x is not None: #Values in 'A' are filled atm, but not necessarily.
df.loc[i] = [None, x*y] #Should insert None into 'A', and product into 'B'.
df.index = df.index + 1 #Shift index? According to this S/O answer: https://stackoverflow.com/a/24284680/4909923
i = i + 1

df.sort_index(inplace=True) #Sort index?


I haven't been able to succeed so far; getting a shifted index numbering that doesn't start at 0, and rows seem not to be inserted in an orderly way:



      A     B
3 0.15 1500
4 NaN 70
5 0.70 7000


I tried various variants of this, trying to use applymap with a lambda function, but was not able to get it working.



Desired result:



      A     B
0 0.15 1500
1 None 15
2 0.15 1500
3 None 15
4 0.70 7000
5 None 70






python pandas






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 10 at 10:37









Winterflags

1,35942151




1,35942151












  • As it stands, I see no use case for pandas. You're iterating (mostly a no no) and you want to insert rows (also not really a good thing in numpy, which is the underlying structure)
    – roganjosh
    Nov 10 at 10:39












  • @roganjosh I am already using Pandas, this is just a subset of a DataFrame script doing other things also. I need to be able to insert rows as the program will create a DataFrame depending on other factors, therefore not so desirable to preallocate the index (also while inserting rows is inefficient, it doesn't matter as I'm dealing with less than tens of rows, not thousands).
    – Winterflags
    Nov 10 at 10:43












  • So my point still stands. Drop pandas and deal with nested lists. Pandas is just getting in the way here.
    – roganjosh
    Nov 10 at 10:44




















  • As it stands, I see no use case for pandas. You're iterating (mostly a no no) and you want to insert rows (also not really a good thing in numpy, which is the underlying structure)
    – roganjosh
    Nov 10 at 10:39












  • @roganjosh I am already using Pandas, this is just a subset of a DataFrame script doing other things also. I need to be able to insert rows as the program will create a DataFrame depending on other factors, therefore not so desirable to preallocate the index (also while inserting rows is inefficient, it doesn't matter as I'm dealing with less than tens of rows, not thousands).
    – Winterflags
    Nov 10 at 10:43












  • So my point still stands. Drop pandas and deal with nested lists. Pandas is just getting in the way here.
    – roganjosh
    Nov 10 at 10:44


















As it stands, I see no use case for pandas. You're iterating (mostly a no no) and you want to insert rows (also not really a good thing in numpy, which is the underlying structure)
– roganjosh
Nov 10 at 10:39






As it stands, I see no use case for pandas. You're iterating (mostly a no no) and you want to insert rows (also not really a good thing in numpy, which is the underlying structure)
– roganjosh
Nov 10 at 10:39














@roganjosh I am already using Pandas, this is just a subset of a DataFrame script doing other things also. I need to be able to insert rows as the program will create a DataFrame depending on other factors, therefore not so desirable to preallocate the index (also while inserting rows is inefficient, it doesn't matter as I'm dealing with less than tens of rows, not thousands).
– Winterflags
Nov 10 at 10:43






@roganjosh I am already using Pandas, this is just a subset of a DataFrame script doing other things also. I need to be able to insert rows as the program will create a DataFrame depending on other factors, therefore not so desirable to preallocate the index (also while inserting rows is inefficient, it doesn't matter as I'm dealing with less than tens of rows, not thousands).
– Winterflags
Nov 10 at 10:43














So my point still stands. Drop pandas and deal with nested lists. Pandas is just getting in the way here.
– roganjosh
Nov 10 at 10:44






So my point still stands. Drop pandas and deal with nested lists. Pandas is just getting in the way here.
– roganjosh
Nov 10 at 10:44














2 Answers
2






active

oldest

votes

















up vote
1
down vote



accepted










I believe you can use:



df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 
'B': [1500, 1500, 7000],
'C': [100, 200, 400]})

v = 100
L =
for i, x in df.to_dict('index').items():
print (x)
#append dictionary
L.append(x)
#append new dictionary, for missing keys ('B, C') DataFrame constructor add NaNs
L.append({'A':x['A'] * v})

df = pd.DataFrame(L)
print (df)
A B C
0 0.15 1500.0 100.0
1 15.00 NaN NaN
2 0.15 1500.0 200.0
3 15.00 NaN NaN
4 0.70 7000.0 400.0
5 70.00 NaN NaN





share|improve this answer























  • @Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
    – jezrael
    Nov 10 at 12:00


















up vote
1
down vote













It doesn't seem you need a manual loop here:



df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})

y = 100

# copy slice of dataframe
df_extra = df.loc[df['A'].notnull()].copy()

# assign A and B series values
df_extra = df_extra.assign(A=np.nan, B=(df_extra['A']*y).astype(int))

# increment index partially, required for sorting afterwards
df_extra.index += 0.5

# append, sort index, drop index
res = df.append(df_extra).sort_index().reset_index(drop=True)

print(res)

A B
0 0.15 1500
1 NaN 15
2 0.15 1500
3 NaN 15
4 0.70 7000
5 NaN 70





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238097%2fpandas-conditionally-insert-rows-into-dataframe-while-iterating-through-rows%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    I believe you can use:



    df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 
    'B': [1500, 1500, 7000],
    'C': [100, 200, 400]})

    v = 100
    L =
    for i, x in df.to_dict('index').items():
    print (x)
    #append dictionary
    L.append(x)
    #append new dictionary, for missing keys ('B, C') DataFrame constructor add NaNs
    L.append({'A':x['A'] * v})

    df = pd.DataFrame(L)
    print (df)
    A B C
    0 0.15 1500.0 100.0
    1 15.00 NaN NaN
    2 0.15 1500.0 200.0
    3 15.00 NaN NaN
    4 0.70 7000.0 400.0
    5 70.00 NaN NaN





    share|improve this answer























    • @Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
      – jezrael
      Nov 10 at 12:00















    up vote
    1
    down vote



    accepted










    I believe you can use:



    df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 
    'B': [1500, 1500, 7000],
    'C': [100, 200, 400]})

    v = 100
    L =
    for i, x in df.to_dict('index').items():
    print (x)
    #append dictionary
    L.append(x)
    #append new dictionary, for missing keys ('B, C') DataFrame constructor add NaNs
    L.append({'A':x['A'] * v})

    df = pd.DataFrame(L)
    print (df)
    A B C
    0 0.15 1500.0 100.0
    1 15.00 NaN NaN
    2 0.15 1500.0 200.0
    3 15.00 NaN NaN
    4 0.70 7000.0 400.0
    5 70.00 NaN NaN





    share|improve this answer























    • @Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
      – jezrael
      Nov 10 at 12:00













    up vote
    1
    down vote



    accepted







    up vote
    1
    down vote



    accepted






    I believe you can use:



    df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 
    'B': [1500, 1500, 7000],
    'C': [100, 200, 400]})

    v = 100
    L =
    for i, x in df.to_dict('index').items():
    print (x)
    #append dictionary
    L.append(x)
    #append new dictionary, for missing keys ('B, C') DataFrame constructor add NaNs
    L.append({'A':x['A'] * v})

    df = pd.DataFrame(L)
    print (df)
    A B C
    0 0.15 1500.0 100.0
    1 15.00 NaN NaN
    2 0.15 1500.0 200.0
    3 15.00 NaN NaN
    4 0.70 7000.0 400.0
    5 70.00 NaN NaN





    share|improve this answer














    I believe you can use:



    df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 
    'B': [1500, 1500, 7000],
    'C': [100, 200, 400]})

    v = 100
    L =
    for i, x in df.to_dict('index').items():
    print (x)
    #append dictionary
    L.append(x)
    #append new dictionary, for missing keys ('B, C') DataFrame constructor add NaNs
    L.append({'A':x['A'] * v})

    df = pd.DataFrame(L)
    print (df)
    A B C
    0 0.15 1500.0 100.0
    1 15.00 NaN NaN
    2 0.15 1500.0 200.0
    3 15.00 NaN NaN
    4 0.70 7000.0 400.0
    5 70.00 NaN NaN






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Nov 10 at 12:00

























    answered Nov 10 at 10:59









    jezrael

    316k22256333




    316k22256333












    • @Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
      – jezrael
      Nov 10 at 12:00


















    • @Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
      – jezrael
      Nov 10 at 12:00
















    @Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
    – jezrael
    Nov 10 at 12:00




    @Winterflags - So maybe easier should be looping by dictionaries, check edited answer.
    – jezrael
    Nov 10 at 12:00












    up vote
    1
    down vote













    It doesn't seem you need a manual loop here:



    df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})

    y = 100

    # copy slice of dataframe
    df_extra = df.loc[df['A'].notnull()].copy()

    # assign A and B series values
    df_extra = df_extra.assign(A=np.nan, B=(df_extra['A']*y).astype(int))

    # increment index partially, required for sorting afterwards
    df_extra.index += 0.5

    # append, sort index, drop index
    res = df.append(df_extra).sort_index().reset_index(drop=True)

    print(res)

    A B
    0 0.15 1500
    1 NaN 15
    2 0.15 1500
    3 NaN 15
    4 0.70 7000
    5 NaN 70





    share|improve this answer

























      up vote
      1
      down vote













      It doesn't seem you need a manual loop here:



      df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})

      y = 100

      # copy slice of dataframe
      df_extra = df.loc[df['A'].notnull()].copy()

      # assign A and B series values
      df_extra = df_extra.assign(A=np.nan, B=(df_extra['A']*y).astype(int))

      # increment index partially, required for sorting afterwards
      df_extra.index += 0.5

      # append, sort index, drop index
      res = df.append(df_extra).sort_index().reset_index(drop=True)

      print(res)

      A B
      0 0.15 1500
      1 NaN 15
      2 0.15 1500
      3 NaN 15
      4 0.70 7000
      5 NaN 70





      share|improve this answer























        up vote
        1
        down vote










        up vote
        1
        down vote









        It doesn't seem you need a manual loop here:



        df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})

        y = 100

        # copy slice of dataframe
        df_extra = df.loc[df['A'].notnull()].copy()

        # assign A and B series values
        df_extra = df_extra.assign(A=np.nan, B=(df_extra['A']*y).astype(int))

        # increment index partially, required for sorting afterwards
        df_extra.index += 0.5

        # append, sort index, drop index
        res = df.append(df_extra).sort_index().reset_index(drop=True)

        print(res)

        A B
        0 0.15 1500
        1 NaN 15
        2 0.15 1500
        3 NaN 15
        4 0.70 7000
        5 NaN 70





        share|improve this answer












        It doesn't seem you need a manual loop here:



        df = pd.DataFrame(data = {'A': [0.15, 0.15, 0.7], 'B': [1500, 1500, 7000]})

        y = 100

        # copy slice of dataframe
        df_extra = df.loc[df['A'].notnull()].copy()

        # assign A and B series values
        df_extra = df_extra.assign(A=np.nan, B=(df_extra['A']*y).astype(int))

        # increment index partially, required for sorting afterwards
        df_extra.index += 0.5

        # append, sort index, drop index
        res = df.append(df_extra).sort_index().reset_index(drop=True)

        print(res)

        A B
        0 0.15 1500
        1 NaN 15
        2 0.15 1500
        3 NaN 15
        4 0.70 7000
        5 NaN 70






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 10 at 12:22









        jpp

        88.6k195199




        88.6k195199






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53238097%2fpandas-conditionally-insert-rows-into-dataframe-while-iterating-through-rows%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Schultheiß

            Verwaltungsgliederung Dänemarks

            Liste der Kulturdenkmale in Wilsdruff