Eigen rowwise addition/subtraction performance











up vote
0
down vote

favorite












During profiling of my program i found that following lines is bottleneck



// Some big nested loop
{
const auto inpRow = inpMap.row(counter);
outMap.row(adjRow) -= inpRow;
outMap.row(colInd) += inpRow;
}


outMap and inpMap are Eigen::Map<Eigen::MatrixRX<Scalar>> where Eigen::MatrixRX defined as Eigen::Matrix<Scalar, -1, -1, Eigen::RowMajor> i.e. row major matrix.



Is there a way to improve performance of such operations? (Except parallel for of course)










share|improve this question






















  • Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
    – James Mitchell
    Nov 8 at 9:00






  • 1




    @JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
    – Dark_Daiver
    Nov 8 at 9:01










  • #define EIGEN_NO_DEBUG?
    – R zu
    yesterday















up vote
0
down vote

favorite












During profiling of my program i found that following lines is bottleneck



// Some big nested loop
{
const auto inpRow = inpMap.row(counter);
outMap.row(adjRow) -= inpRow;
outMap.row(colInd) += inpRow;
}


outMap and inpMap are Eigen::Map<Eigen::MatrixRX<Scalar>> where Eigen::MatrixRX defined as Eigen::Matrix<Scalar, -1, -1, Eigen::RowMajor> i.e. row major matrix.



Is there a way to improve performance of such operations? (Except parallel for of course)










share|improve this question






















  • Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
    – James Mitchell
    Nov 8 at 9:00






  • 1




    @JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
    – Dark_Daiver
    Nov 8 at 9:01










  • #define EIGEN_NO_DEBUG?
    – R zu
    yesterday













up vote
0
down vote

favorite









up vote
0
down vote

favorite











During profiling of my program i found that following lines is bottleneck



// Some big nested loop
{
const auto inpRow = inpMap.row(counter);
outMap.row(adjRow) -= inpRow;
outMap.row(colInd) += inpRow;
}


outMap and inpMap are Eigen::Map<Eigen::MatrixRX<Scalar>> where Eigen::MatrixRX defined as Eigen::Matrix<Scalar, -1, -1, Eigen::RowMajor> i.e. row major matrix.



Is there a way to improve performance of such operations? (Except parallel for of course)










share|improve this question













During profiling of my program i found that following lines is bottleneck



// Some big nested loop
{
const auto inpRow = inpMap.row(counter);
outMap.row(adjRow) -= inpRow;
outMap.row(colInd) += inpRow;
}


outMap and inpMap are Eigen::Map<Eigen::MatrixRX<Scalar>> where Eigen::MatrixRX defined as Eigen::Matrix<Scalar, -1, -1, Eigen::RowMajor> i.e. row major matrix.



Is there a way to improve performance of such operations? (Except parallel for of course)







c++ performance matrix eigen eigen3






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 8 at 8:33









Dark_Daiver

7061830




7061830












  • Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
    – James Mitchell
    Nov 8 at 9:00






  • 1




    @JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
    – Dark_Daiver
    Nov 8 at 9:01










  • #define EIGEN_NO_DEBUG?
    – R zu
    yesterday


















  • Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
    – James Mitchell
    Nov 8 at 9:00






  • 1




    @JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
    – Dark_Daiver
    Nov 8 at 9:01










  • #define EIGEN_NO_DEBUG?
    – R zu
    yesterday
















Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
– James Mitchell
Nov 8 at 9:00




Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
– James Mitchell
Nov 8 at 9:00




1




1




@JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
– Dark_Daiver
Nov 8 at 9:01




@JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
– Dark_Daiver
Nov 8 at 9:01












#define EIGEN_NO_DEBUG?
– R zu
yesterday




#define EIGEN_NO_DEBUG?
– R zu
yesterday












1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:




  • Make sure you enabled compiler optimizations, -O3 -march=native

  • Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)

  • If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.






share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53204003%2feigen-rowwise-addition-subtraction-performance%23new-answer', 'question_page');
    }
    );

    Post as a guest
































    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    1
    down vote



    accepted










    There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:




    • Make sure you enabled compiler optimizations, -O3 -march=native

    • Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)

    • If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.






    share|improve this answer

























      up vote
      1
      down vote



      accepted










      There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:




      • Make sure you enabled compiler optimizations, -O3 -march=native

      • Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)

      • If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.






      share|improve this answer























        up vote
        1
        down vote



        accepted







        up vote
        1
        down vote



        accepted






        There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:




        • Make sure you enabled compiler optimizations, -O3 -march=native

        • Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)

        • If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.






        share|improve this answer












        There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:




        • Make sure you enabled compiler optimizations, -O3 -march=native

        • Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)

        • If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 8 at 10:59









        ggael

        19.5k22944




        19.5k22944






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53204003%2feigen-rowwise-addition-subtraction-performance%23new-answer', 'question_page');
            }
            );

            Post as a guest




















































































            Popular posts from this blog

            Landwehr

            Reims

            Schenkenzell