Eigen rowwise addition/subtraction performance

up vote
0
down vote

favorite

During profiling of my program i found that following lines is bottleneck

// Some big nested loop

{

    const auto inpRow = inpMap.row(counter);

    outMap.row(adjRow) -= inpRow;

    outMap.row(colInd) += inpRow;

}

outMap and inpMap are Eigen::Map<Eigen::MatrixRX<Scalar>> where Eigen::MatrixRX defined as Eigen::Matrix<Scalar, -1, -1, Eigen::RowMajor> i.e. row major matrix.

Is there a way to improve performance of such operations? (Except parallel for of course)

asked Nov 8 at 8:33

Dark_Daiver

7061830

Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
– James Mitchell
Nov 8 at 9:00

1

@JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
– Dark_Daiver
Nov 8 at 9:01

#define EIGEN_NO_DEBUG?
– R zu
yesterday

add a comment |

up vote
0
down vote

favorite

During profiling of my program i found that following lines is bottleneck

// Some big nested loop

{

    const auto inpRow = inpMap.row(counter);

    outMap.row(adjRow) -= inpRow;

    outMap.row(colInd) += inpRow;

}

outMap and inpMap are Eigen::Map<Eigen::MatrixRX<Scalar>> where Eigen::MatrixRX defined as Eigen::Matrix<Scalar, -1, -1, Eigen::RowMajor> i.e. row major matrix.

Is there a way to improve performance of such operations? (Except parallel for of course)

asked Nov 8 at 8:33

Dark_Daiver

7061830

Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
– James Mitchell
Nov 8 at 9:00

1

@JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
– Dark_Daiver
Nov 8 at 9:01

#define EIGEN_NO_DEBUG?
– R zu
yesterday

add a comment |

up vote
0
down vote

favorite

During profiling of my program i found that following lines is bottleneck

// Some big nested loop

{

    const auto inpRow = inpMap.row(counter);

    outMap.row(adjRow) -= inpRow;

    outMap.row(colInd) += inpRow;

}

outMap and inpMap are Eigen::Map<Eigen::MatrixRX<Scalar>> where Eigen::MatrixRX defined as Eigen::Matrix<Scalar, -1, -1, Eigen::RowMajor> i.e. row major matrix.

Is there a way to improve performance of such operations? (Except parallel for of course)

asked Nov 8 at 8:33

Dark_Daiver

7061830

During profiling of my program i found that following lines is bottleneck

// Some big nested loop

{

    const auto inpRow = inpMap.row(counter);

    outMap.row(adjRow) -= inpRow;

    outMap.row(colInd) += inpRow;

}

outMap and inpMap are Eigen::Map<Eigen::MatrixRX<Scalar>> where Eigen::MatrixRX defined as Eigen::Matrix<Scalar, -1, -1, Eigen::RowMajor> i.e. row major matrix.

Is there a way to improve performance of such operations? (Except parallel for of course)

c++ performance matrix eigen eigen3

asked Nov 8 at 8:33

Dark_Daiver

7061830

asked Nov 8 at 8:33

Dark_Daiver

7061830

asked Nov 8 at 8:33

Dark_Daiver

7061830

asked Nov 8 at 8:33

Dark_Daiver

7061830

asked Nov 8 at 8:33

Dark_Daiver

7061830

Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
– James Mitchell
Nov 8 at 9:00

1

@JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
– Dark_Daiver
Nov 8 at 9:01

#define EIGEN_NO_DEBUG?
– R zu
yesterday

add a comment |

Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
– James Mitchell
Nov 8 at 9:00

1

@JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
– Dark_Daiver
Nov 8 at 9:01

#define EIGEN_NO_DEBUG?
– R zu
yesterday

Looks like inpRow is a copy, is there any reason that you can’t use a reference instead?
– James Mitchell
Nov 8 at 9:00

@JamesMitchell thank you for answer! I'm not 100% sure but eigen should provide lazy evaluated proxy object (something like block). So there are should be no copy. But i'm not sure
– Dark_Daiver
Nov 8 at 9:01

#define EIGEN_NO_DEBUG?
– R zu
yesterday

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:

Make sure you enabled compiler optimizations, -O3 -march=native

Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)

If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.

answered Nov 8 at 10:59

ggael

19.5k22944

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53204003%2feigen-rowwise-addition-subtraction-performance%23new-answer', 'question_page');
}
);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:

Make sure you enabled compiler optimizations, -O3 -march=native

Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)

If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.

answered Nov 8 at 10:59

ggael

19.5k22944

add a comment |

up vote
1
down vote

accepted

There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:

Make sure you enabled compiler optimizations, -O3 -march=native

Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)

If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.

answered Nov 8 at 10:59

ggael

19.5k22944

add a comment |

up vote
1
down vote

accepted

There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:

Make sure you enabled compiler optimizations, -O3 -march=native

Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)

If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.

answered Nov 8 at 10:59

ggael

19.5k22944

There is not much you can do as such expressions should already be fully vectorized. Nevertheless here are some tips:

Make sure you enabled compiler optimizations, -O3 -march=native

Then measure the time it takes and compute the FLOPS to see how far you are from the theoretical peak performance of your CPU (disable turbo-boost for that experiment)

If you're very far away the theoretical peak, then you're very likely suffering from cache misses. You might reduce them by splitting the two assignments in chunk lower than 16kB. You might get a speed-up up to x2 from that.

answered Nov 8 at 10:59

ggael

19.5k22944

answered Nov 8 at 10:59

ggael

19.5k22944

answered Nov 8 at 10:59

ggael

19.5k22944

answered Nov 8 at 10:59

ggael

19.5k22944

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Name

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xtykutl