Postgres time update on large table

up vote
2
down vote

favorite

I am working with a very large (5.9 billion rows) table on AWS (RDS m4.large), and I am having some trouble getting the data to a point that I can work with and query effectively.

The data has a time variable in seconds from the beginning of the record (type: bigint), but I would like to be able to aggregate the data on at the daily level (i.e. GROUP BY day).

I began this process by simply creating a day column, and updating the table to populate the day field by converting seconds to days:

ALTER TABLE tbl

ADD COLUMN day INTEGER;



UPDATE tbl

SET day=tbl.eventtime/86400+1;

which worked fine in a testing environment (first 10 days of the 90 day record), but, perhaps unexpectedly, it is not working so well with the full 5 billion-line dataset.

After a few tries of this process I realized that my previously created indexes were slowing the process down considerably (update queries never finished). I have since dropped all indexes and am trying the update again. Here 20 hours later, and query still running, I wonder if this approach is woefully inappropriate for the amount of data I'm working with.

I realize another approach would be to convert the time in seconds to some sort of postgres timestamp type - but I am unfamiliar timestamps and am not sure where I would begin.

Is performing an update on a table this large feasible? Is there a different approach that would be more efficient to get the second-resolution data to a point where I could aggregate it by day?

Please let me know if any additional information is needed.

edited Nov 8 at 9:37

Laurenz Albe

40.9k92745

asked Nov 7 at 15:31

JordanL

111

New contributor

add a comment |

up vote
2
down vote

favorite

I am working with a very large (5.9 billion rows) table on AWS (RDS m4.large), and I am having some trouble getting the data to a point that I can work with and query effectively.

The data has a time variable in seconds from the beginning of the record (type: bigint), but I would like to be able to aggregate the data on at the daily level (i.e. GROUP BY day).

I began this process by simply creating a day column, and updating the table to populate the day field by converting seconds to days:

ALTER TABLE tbl

ADD COLUMN day INTEGER;



UPDATE tbl

SET day=tbl.eventtime/86400+1;

which worked fine in a testing environment (first 10 days of the 90 day record), but, perhaps unexpectedly, it is not working so well with the full 5 billion-line dataset.

I realize another approach would be to convert the time in seconds to some sort of postgres timestamp type - but I am unfamiliar timestamps and am not sure where I would begin.

Is performing an update on a table this large feasible? Is there a different approach that would be more efficient to get the second-resolution data to a point where I could aggregate it by day?

Please let me know if any additional information is needed.

edited Nov 8 at 9:37

Laurenz Albe

40.9k92745

asked Nov 7 at 15:31

JordanL

111

New contributor

add a comment |

up vote
2
down vote

favorite

I am working with a very large (5.9 billion rows) table on AWS (RDS m4.large), and I am having some trouble getting the data to a point that I can work with and query effectively.

The data has a time variable in seconds from the beginning of the record (type: bigint), but I would like to be able to aggregate the data on at the daily level (i.e. GROUP BY day).

I began this process by simply creating a day column, and updating the table to populate the day field by converting seconds to days:

ALTER TABLE tbl

ADD COLUMN day INTEGER;



UPDATE tbl

SET day=tbl.eventtime/86400+1;

which worked fine in a testing environment (first 10 days of the 90 day record), but, perhaps unexpectedly, it is not working so well with the full 5 billion-line dataset.

I realize another approach would be to convert the time in seconds to some sort of postgres timestamp type - but I am unfamiliar timestamps and am not sure where I would begin.

Is performing an update on a table this large feasible? Is there a different approach that would be more efficient to get the second-resolution data to a point where I could aggregate it by day?

Please let me know if any additional information is needed.

edited Nov 8 at 9:37

Laurenz Albe

40.9k92745

asked Nov 7 at 15:31

JordanL

111

New contributor

I am working with a very large (5.9 billion rows) table on AWS (RDS m4.large), and I am having some trouble getting the data to a point that I can work with and query effectively.

The data has a time variable in seconds from the beginning of the record (type: bigint), but I would like to be able to aggregate the data on at the daily level (i.e. GROUP BY day).

I began this process by simply creating a day column, and updating the table to populate the day field by converting seconds to days:

ALTER TABLE tbl

ADD COLUMN day INTEGER;



UPDATE tbl

SET day=tbl.eventtime/86400+1;

which worked fine in a testing environment (first 10 days of the 90 day record), but, perhaps unexpectedly, it is not working so well with the full 5 billion-line dataset.

I realize another approach would be to convert the time in seconds to some sort of postgres timestamp type - but I am unfamiliar timestamps and am not sure where I would begin.

Is performing an update on a table this large feasible? Is there a different approach that would be more efficient to get the second-resolution data to a point where I could aggregate it by day?

Please let me know if any additional information is needed.

postgresql amazon-web-services amazon-rds large-data

edited Nov 8 at 9:37

Laurenz Albe

40.9k92745

asked Nov 7 at 15:31

JordanL

111

New contributor

edited Nov 8 at 9:37

Laurenz Albe

40.9k92745

asked Nov 7 at 15:31

JordanL

111

New contributor

edited Nov 8 at 9:37

Laurenz Albe

40.9k92745

edited Nov 8 at 9:37

Laurenz Albe

40.9k92745

edited Nov 8 at 9:37

Laurenz Albe

40.9k92745

asked Nov 7 at 15:31

JordanL

111

New contributor

asked Nov 7 at 15:31

JordanL

111

asked Nov 7 at 15:31

JordanL

111

New contributor

JordanL is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

Indexes slow down data modifications a lot, but you cannot search a large table efficiently without them.

So the art here is to have as few indexes as possible.

Your idea with an additional column is a bad one. Not only will it bloat the already large table with redundant data, it will also not speed up searches unless you create an index.

If you need to search all rows for a certain day, you could create an index on the corresponding expression:

CREATE INDEX ON tbl ((eventtime / 86400 + 1));

Then any search that uses this expression can use the index, and you don't have to store additional data in the table.

This is only useful for querying the data. If you want to aggregate efficiently, an index won't help you much. In that case you should use a materialized view, perhaps one that you keep up to date with a trigger.

answered Nov 8 at 9:34

Laurenz Albe

40.9k92745

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

JordanL is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53192617%2fpostgres-time-update-on-large-table%23new-answer', 'question_page');
}
);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

Indexes slow down data modifications a lot, but you cannot search a large table efficiently without them.

So the art here is to have as few indexes as possible.

Your idea with an additional column is a bad one. Not only will it bloat the already large table with redundant data, it will also not speed up searches unless you create an index.

If you need to search all rows for a certain day, you could create an index on the corresponding expression:

CREATE INDEX ON tbl ((eventtime / 86400 + 1));

Then any search that uses this expression can use the index, and you don't have to store additional data in the table.

answered Nov 8 at 9:34

Laurenz Albe

40.9k92745

add a comment |

up vote
0
down vote

Indexes slow down data modifications a lot, but you cannot search a large table efficiently without them.

So the art here is to have as few indexes as possible.

Your idea with an additional column is a bad one. Not only will it bloat the already large table with redundant data, it will also not speed up searches unless you create an index.

If you need to search all rows for a certain day, you could create an index on the corresponding expression:

CREATE INDEX ON tbl ((eventtime / 86400 + 1));

Then any search that uses this expression can use the index, and you don't have to store additional data in the table.

answered Nov 8 at 9:34

Laurenz Albe

40.9k92745

add a comment |

up vote
0
down vote

Indexes slow down data modifications a lot, but you cannot search a large table efficiently without them.

So the art here is to have as few indexes as possible.

Your idea with an additional column is a bad one. Not only will it bloat the already large table with redundant data, it will also not speed up searches unless you create an index.

If you need to search all rows for a certain day, you could create an index on the corresponding expression:

CREATE INDEX ON tbl ((eventtime / 86400 + 1));

Then any search that uses this expression can use the index, and you don't have to store additional data in the table.

answered Nov 8 at 9:34

Laurenz Albe

40.9k92745

Indexes slow down data modifications a lot, but you cannot search a large table efficiently without them.

So the art here is to have as few indexes as possible.

Your idea with an additional column is a bad one. Not only will it bloat the already large table with redundant data, it will also not speed up searches unless you create an index.

If you need to search all rows for a certain day, you could create an index on the corresponding expression:

CREATE INDEX ON tbl ((eventtime / 86400 + 1));

Then any search that uses this expression can use the index, and you don't have to store additional data in the table.

answered Nov 8 at 9:34

Laurenz Albe

40.9k92745

answered Nov 8 at 9:34

Laurenz Albe

40.9k92745

answered Nov 8 at 9:34

Laurenz Albe

40.9k92745

answered Nov 8 at 9:34

Laurenz Albe

40.9k92745

add a comment |

JordanL is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

JordanL is a new contributor. Be nice, and check out our Code of Conduct.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Name

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xtykutl