PostgreSQL vs Hadoop for large amounts of data storage and retrieval [closed]
up vote
-1
down vote
favorite
Note: this question exists on dba.SE, but has no answers and practically no views. So I'm posting it here in the hopes that it will get wider attention.
I have recently been tasked with migrating a large volume of data stored within various excel sheets and CSV files into a structured database. The amount of data to process is enormous, well within the range of multiple Terabytes. The aim is to provide a quick data retrieval system and provide statistics about the data.
Since I have years of experience with relational databases, especially Postgres, my first thought was to analyze the data and migrate it to a Postgres DB. However, I recently read about "Big Data", and I see Hadoop being mentioned in many places. I have no experience whatsoever within this field, so I'm inclined to not use these frameworks, however it looks like this is the standard in storing and processing large amounts of data.
After spending some time on Google, it is still not entirely clear to me what the Big Data paradigm really is and how to "set up a Hadoop cluster". I know that it aims to resolve speed issues when retrieving data from a very large DB, but I still fail to understand where this "DB" is, i.e., is it Hadoop itself, is it some proprietary model, can it be a Postgres DB, ...?
My questions are:
- Is it worth learning the Big Data paradigm and implement a solution based on Hadoop?
- Can I get away with using a well-structured Postgres database instead?
- Can I migrate my Postgres solution to some kind of Big Data structure if it turns out that it is better?
postgresql hadoop
closed as too broad by Denys Séguret, ChrisF♦ Nov 12 at 13:42
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
up vote
-1
down vote
favorite
Note: this question exists on dba.SE, but has no answers and practically no views. So I'm posting it here in the hopes that it will get wider attention.
I have recently been tasked with migrating a large volume of data stored within various excel sheets and CSV files into a structured database. The amount of data to process is enormous, well within the range of multiple Terabytes. The aim is to provide a quick data retrieval system and provide statistics about the data.
Since I have years of experience with relational databases, especially Postgres, my first thought was to analyze the data and migrate it to a Postgres DB. However, I recently read about "Big Data", and I see Hadoop being mentioned in many places. I have no experience whatsoever within this field, so I'm inclined to not use these frameworks, however it looks like this is the standard in storing and processing large amounts of data.
After spending some time on Google, it is still not entirely clear to me what the Big Data paradigm really is and how to "set up a Hadoop cluster". I know that it aims to resolve speed issues when retrieving data from a very large DB, but I still fail to understand where this "DB" is, i.e., is it Hadoop itself, is it some proprietary model, can it be a Postgres DB, ...?
My questions are:
- Is it worth learning the Big Data paradigm and implement a solution based on Hadoop?
- Can I get away with using a well-structured Postgres database instead?
- Can I migrate my Postgres solution to some kind of Big Data structure if it turns out that it is better?
postgresql hadoop
closed as too broad by Denys Séguret, ChrisF♦ Nov 12 at 13:42
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
My questions are clear and concise. Moreover, adequate background information is provided and the ensemble is well structured. Yet it is put on hold as "unclear". This is why this site is losing popularity. Note: the same question has +2 reputation on dba.SE. Figures.
– Flermat
Nov 12 at 14:14
Your question was placed on hold because you ask multiple questions at once and ask for opinion based/resource requests, which are 3 closure reasons this question can fall under, I personally believe your sister question should be closed on DBA stack too, but I am not frequent over there
– WhatsThePoint
Nov 12 at 15:54
I don't see how my questions involve opinions of any kind. There is a problem, and there are two solutions. I need to know which one of the solutions is most suited to the problem. Feel free to block the answers that are opinion-based. The question is not.
– Flermat
Nov 12 at 16:10
Should I do X will always be an opinion based question
– WhatsThePoint
Nov 12 at 16:29
1
I totally agree with OP ... someone help the man!
– Veljko89
Nov 13 at 8:18
add a comment |
up vote
-1
down vote
favorite
up vote
-1
down vote
favorite
Note: this question exists on dba.SE, but has no answers and practically no views. So I'm posting it here in the hopes that it will get wider attention.
I have recently been tasked with migrating a large volume of data stored within various excel sheets and CSV files into a structured database. The amount of data to process is enormous, well within the range of multiple Terabytes. The aim is to provide a quick data retrieval system and provide statistics about the data.
Since I have years of experience with relational databases, especially Postgres, my first thought was to analyze the data and migrate it to a Postgres DB. However, I recently read about "Big Data", and I see Hadoop being mentioned in many places. I have no experience whatsoever within this field, so I'm inclined to not use these frameworks, however it looks like this is the standard in storing and processing large amounts of data.
After spending some time on Google, it is still not entirely clear to me what the Big Data paradigm really is and how to "set up a Hadoop cluster". I know that it aims to resolve speed issues when retrieving data from a very large DB, but I still fail to understand where this "DB" is, i.e., is it Hadoop itself, is it some proprietary model, can it be a Postgres DB, ...?
My questions are:
- Is it worth learning the Big Data paradigm and implement a solution based on Hadoop?
- Can I get away with using a well-structured Postgres database instead?
- Can I migrate my Postgres solution to some kind of Big Data structure if it turns out that it is better?
postgresql hadoop
Note: this question exists on dba.SE, but has no answers and practically no views. So I'm posting it here in the hopes that it will get wider attention.
I have recently been tasked with migrating a large volume of data stored within various excel sheets and CSV files into a structured database. The amount of data to process is enormous, well within the range of multiple Terabytes. The aim is to provide a quick data retrieval system and provide statistics about the data.
Since I have years of experience with relational databases, especially Postgres, my first thought was to analyze the data and migrate it to a Postgres DB. However, I recently read about "Big Data", and I see Hadoop being mentioned in many places. I have no experience whatsoever within this field, so I'm inclined to not use these frameworks, however it looks like this is the standard in storing and processing large amounts of data.
After spending some time on Google, it is still not entirely clear to me what the Big Data paradigm really is and how to "set up a Hadoop cluster". I know that it aims to resolve speed issues when retrieving data from a very large DB, but I still fail to understand where this "DB" is, i.e., is it Hadoop itself, is it some proprietary model, can it be a Postgres DB, ...?
My questions are:
- Is it worth learning the Big Data paradigm and implement a solution based on Hadoop?
- Can I get away with using a well-structured Postgres database instead?
- Can I migrate my Postgres solution to some kind of Big Data structure if it turns out that it is better?
postgresql hadoop
postgresql hadoop
edited Nov 9 at 15:38
asked Nov 9 at 15:29
Flermat
466825
466825
closed as too broad by Denys Séguret, ChrisF♦ Nov 12 at 13:42
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
closed as too broad by Denys Séguret, ChrisF♦ Nov 12 at 13:42
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
My questions are clear and concise. Moreover, adequate background information is provided and the ensemble is well structured. Yet it is put on hold as "unclear". This is why this site is losing popularity. Note: the same question has +2 reputation on dba.SE. Figures.
– Flermat
Nov 12 at 14:14
Your question was placed on hold because you ask multiple questions at once and ask for opinion based/resource requests, which are 3 closure reasons this question can fall under, I personally believe your sister question should be closed on DBA stack too, but I am not frequent over there
– WhatsThePoint
Nov 12 at 15:54
I don't see how my questions involve opinions of any kind. There is a problem, and there are two solutions. I need to know which one of the solutions is most suited to the problem. Feel free to block the answers that are opinion-based. The question is not.
– Flermat
Nov 12 at 16:10
Should I do X will always be an opinion based question
– WhatsThePoint
Nov 12 at 16:29
1
I totally agree with OP ... someone help the man!
– Veljko89
Nov 13 at 8:18
add a comment |
My questions are clear and concise. Moreover, adequate background information is provided and the ensemble is well structured. Yet it is put on hold as "unclear". This is why this site is losing popularity. Note: the same question has +2 reputation on dba.SE. Figures.
– Flermat
Nov 12 at 14:14
Your question was placed on hold because you ask multiple questions at once and ask for opinion based/resource requests, which are 3 closure reasons this question can fall under, I personally believe your sister question should be closed on DBA stack too, but I am not frequent over there
– WhatsThePoint
Nov 12 at 15:54
I don't see how my questions involve opinions of any kind. There is a problem, and there are two solutions. I need to know which one of the solutions is most suited to the problem. Feel free to block the answers that are opinion-based. The question is not.
– Flermat
Nov 12 at 16:10
Should I do X will always be an opinion based question
– WhatsThePoint
Nov 12 at 16:29
1
I totally agree with OP ... someone help the man!
– Veljko89
Nov 13 at 8:18
My questions are clear and concise. Moreover, adequate background information is provided and the ensemble is well structured. Yet it is put on hold as "unclear". This is why this site is losing popularity. Note: the same question has +2 reputation on dba.SE. Figures.
– Flermat
Nov 12 at 14:14
My questions are clear and concise. Moreover, adequate background information is provided and the ensemble is well structured. Yet it is put on hold as "unclear". This is why this site is losing popularity. Note: the same question has +2 reputation on dba.SE. Figures.
– Flermat
Nov 12 at 14:14
Your question was placed on hold because you ask multiple questions at once and ask for opinion based/resource requests, which are 3 closure reasons this question can fall under, I personally believe your sister question should be closed on DBA stack too, but I am not frequent over there
– WhatsThePoint
Nov 12 at 15:54
Your question was placed on hold because you ask multiple questions at once and ask for opinion based/resource requests, which are 3 closure reasons this question can fall under, I personally believe your sister question should be closed on DBA stack too, but I am not frequent over there
– WhatsThePoint
Nov 12 at 15:54
I don't see how my questions involve opinions of any kind. There is a problem, and there are two solutions. I need to know which one of the solutions is most suited to the problem. Feel free to block the answers that are opinion-based. The question is not.
– Flermat
Nov 12 at 16:10
I don't see how my questions involve opinions of any kind. There is a problem, and there are two solutions. I need to know which one of the solutions is most suited to the problem. Feel free to block the answers that are opinion-based. The question is not.
– Flermat
Nov 12 at 16:10
Should I do X will always be an opinion based question
– WhatsThePoint
Nov 12 at 16:29
Should I do X will always be an opinion based question
– WhatsThePoint
Nov 12 at 16:29
1
1
I totally agree with OP ... someone help the man!
– Veljko89
Nov 13 at 8:18
I totally agree with OP ... someone help the man!
– Veljko89
Nov 13 at 8:18
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
The migration from postgres (and classic rgbd) to "big data solution" is clearly time-consuming. If you have the budget you can have some help on public cloud. For example on Amazon you have EMR solution,it pre-package some big-data solution.
But on amazone to you have Redshift spectrum more easy to use : here some talk.
add a comment |
up vote
1
down vote
Big Data are terms.. It means the data can from anything like Article, News, Media and other so it is so big that's why the name is Big Data..
Hadoop
is free source to implement Big Data If you ask about is it worthy.. Of course nowadays data has become so important.
Big Data do data mining from many data like i said before..
Big Data will give the data from mining and you need to store it toDatabase
but depends on How you do implement Big Data. The data can be stored toNoSql
orRdbms
likePostgresql
do.. But you need someETL
to transform data because the data is so Big
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
The migration from postgres (and classic rgbd) to "big data solution" is clearly time-consuming. If you have the budget you can have some help on public cloud. For example on Amazon you have EMR solution,it pre-package some big-data solution.
But on amazone to you have Redshift spectrum more easy to use : here some talk.
add a comment |
up vote
1
down vote
The migration from postgres (and classic rgbd) to "big data solution" is clearly time-consuming. If you have the budget you can have some help on public cloud. For example on Amazon you have EMR solution,it pre-package some big-data solution.
But on amazone to you have Redshift spectrum more easy to use : here some talk.
add a comment |
up vote
1
down vote
up vote
1
down vote
The migration from postgres (and classic rgbd) to "big data solution" is clearly time-consuming. If you have the budget you can have some help on public cloud. For example on Amazon you have EMR solution,it pre-package some big-data solution.
But on amazone to you have Redshift spectrum more easy to use : here some talk.
The migration from postgres (and classic rgbd) to "big data solution" is clearly time-consuming. If you have the budget you can have some help on public cloud. For example on Amazon you have EMR solution,it pre-package some big-data solution.
But on amazone to you have Redshift spectrum more easy to use : here some talk.
answered Nov 9 at 16:44
Le farfadet
356
356
add a comment |
add a comment |
up vote
1
down vote
Big Data are terms.. It means the data can from anything like Article, News, Media and other so it is so big that's why the name is Big Data..
Hadoop
is free source to implement Big Data If you ask about is it worthy.. Of course nowadays data has become so important.
Big Data do data mining from many data like i said before..
Big Data will give the data from mining and you need to store it toDatabase
but depends on How you do implement Big Data. The data can be stored toNoSql
orRdbms
likePostgresql
do.. But you need someETL
to transform data because the data is so Big
add a comment |
up vote
1
down vote
Big Data are terms.. It means the data can from anything like Article, News, Media and other so it is so big that's why the name is Big Data..
Hadoop
is free source to implement Big Data If you ask about is it worthy.. Of course nowadays data has become so important.
Big Data do data mining from many data like i said before..
Big Data will give the data from mining and you need to store it toDatabase
but depends on How you do implement Big Data. The data can be stored toNoSql
orRdbms
likePostgresql
do.. But you need someETL
to transform data because the data is so Big
add a comment |
up vote
1
down vote
up vote
1
down vote
Big Data are terms.. It means the data can from anything like Article, News, Media and other so it is so big that's why the name is Big Data..
Hadoop
is free source to implement Big Data If you ask about is it worthy.. Of course nowadays data has become so important.
Big Data do data mining from many data like i said before..
Big Data will give the data from mining and you need to store it toDatabase
but depends on How you do implement Big Data. The data can be stored toNoSql
orRdbms
likePostgresql
do.. But you need someETL
to transform data because the data is so Big
Big Data are terms.. It means the data can from anything like Article, News, Media and other so it is so big that's why the name is Big Data..
Hadoop
is free source to implement Big Data If you ask about is it worthy.. Of course nowadays data has become so important.
Big Data do data mining from many data like i said before..
Big Data will give the data from mining and you need to store it toDatabase
but depends on How you do implement Big Data. The data can be stored toNoSql
orRdbms
likePostgresql
do.. But you need someETL
to transform data because the data is so Big
answered Nov 9 at 22:41
dwir182
1
1
add a comment |
add a comment |
My questions are clear and concise. Moreover, adequate background information is provided and the ensemble is well structured. Yet it is put on hold as "unclear". This is why this site is losing popularity. Note: the same question has +2 reputation on dba.SE. Figures.
– Flermat
Nov 12 at 14:14
Your question was placed on hold because you ask multiple questions at once and ask for opinion based/resource requests, which are 3 closure reasons this question can fall under, I personally believe your sister question should be closed on DBA stack too, but I am not frequent over there
– WhatsThePoint
Nov 12 at 15:54
I don't see how my questions involve opinions of any kind. There is a problem, and there are two solutions. I need to know which one of the solutions is most suited to the problem. Feel free to block the answers that are opinion-based. The question is not.
– Flermat
Nov 12 at 16:10
Should I do X will always be an opinion based question
– WhatsThePoint
Nov 12 at 16:29
1
I totally agree with OP ... someone help the man!
– Veljko89
Nov 13 at 8:18