PostgreSQL vs Hadoop for large amounts of data storage and retrieval [closed]











up vote
-1
down vote

favorite
2












Note: this question exists on dba.SE, but has no answers and practically no views. So I'm posting it here in the hopes that it will get wider attention.





I have recently been tasked with migrating a large volume of data stored within various excel sheets and CSV files into a structured database. The amount of data to process is enormous, well within the range of multiple Terabytes. The aim is to provide a quick data retrieval system and provide statistics about the data.



Since I have years of experience with relational databases, especially Postgres, my first thought was to analyze the data and migrate it to a Postgres DB. However, I recently read about "Big Data", and I see Hadoop being mentioned in many places. I have no experience whatsoever within this field, so I'm inclined to not use these frameworks, however it looks like this is the standard in storing and processing large amounts of data.



After spending some time on Google, it is still not entirely clear to me what the Big Data paradigm really is and how to "set up a Hadoop cluster". I know that it aims to resolve speed issues when retrieving data from a very large DB, but I still fail to understand where this "DB" is, i.e., is it Hadoop itself, is it some proprietary model, can it be a Postgres DB, ...?



My questions are:




  • Is it worth learning the Big Data paradigm and implement a solution based on Hadoop?

  • Can I get away with using a well-structured Postgres database instead?

  • Can I migrate my Postgres solution to some kind of Big Data structure if it turns out that it is better?










share|improve this question















closed as too broad by Denys Séguret, ChrisF Nov 12 at 13:42


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.















  • My questions are clear and concise. Moreover, adequate background information is provided and the ensemble is well structured. Yet it is put on hold as "unclear". This is why this site is losing popularity. Note: the same question has +2 reputation on dba.SE. Figures.
    – Flermat
    Nov 12 at 14:14












  • Your question was placed on hold because you ask multiple questions at once and ask for opinion based/resource requests, which are 3 closure reasons this question can fall under, I personally believe your sister question should be closed on DBA stack too, but I am not frequent over there
    – WhatsThePoint
    Nov 12 at 15:54










  • I don't see how my questions involve opinions of any kind. There is a problem, and there are two solutions. I need to know which one of the solutions is most suited to the problem. Feel free to block the answers that are opinion-based. The question is not.
    – Flermat
    Nov 12 at 16:10










  • Should I do X will always be an opinion based question
    – WhatsThePoint
    Nov 12 at 16:29






  • 1




    I totally agree with OP ... someone help the man!
    – Veljko89
    Nov 13 at 8:18















up vote
-1
down vote

favorite
2












Note: this question exists on dba.SE, but has no answers and practically no views. So I'm posting it here in the hopes that it will get wider attention.





I have recently been tasked with migrating a large volume of data stored within various excel sheets and CSV files into a structured database. The amount of data to process is enormous, well within the range of multiple Terabytes. The aim is to provide a quick data retrieval system and provide statistics about the data.



Since I have years of experience with relational databases, especially Postgres, my first thought was to analyze the data and migrate it to a Postgres DB. However, I recently read about "Big Data", and I see Hadoop being mentioned in many places. I have no experience whatsoever within this field, so I'm inclined to not use these frameworks, however it looks like this is the standard in storing and processing large amounts of data.



After spending some time on Google, it is still not entirely clear to me what the Big Data paradigm really is and how to "set up a Hadoop cluster". I know that it aims to resolve speed issues when retrieving data from a very large DB, but I still fail to understand where this "DB" is, i.e., is it Hadoop itself, is it some proprietary model, can it be a Postgres DB, ...?



My questions are:




  • Is it worth learning the Big Data paradigm and implement a solution based on Hadoop?

  • Can I get away with using a well-structured Postgres database instead?

  • Can I migrate my Postgres solution to some kind of Big Data structure if it turns out that it is better?










share|improve this question















closed as too broad by Denys Séguret, ChrisF Nov 12 at 13:42


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.















  • My questions are clear and concise. Moreover, adequate background information is provided and the ensemble is well structured. Yet it is put on hold as "unclear". This is why this site is losing popularity. Note: the same question has +2 reputation on dba.SE. Figures.
    – Flermat
    Nov 12 at 14:14












  • Your question was placed on hold because you ask multiple questions at once and ask for opinion based/resource requests, which are 3 closure reasons this question can fall under, I personally believe your sister question should be closed on DBA stack too, but I am not frequent over there
    – WhatsThePoint
    Nov 12 at 15:54










  • I don't see how my questions involve opinions of any kind. There is a problem, and there are two solutions. I need to know which one of the solutions is most suited to the problem. Feel free to block the answers that are opinion-based. The question is not.
    – Flermat
    Nov 12 at 16:10










  • Should I do X will always be an opinion based question
    – WhatsThePoint
    Nov 12 at 16:29






  • 1




    I totally agree with OP ... someone help the man!
    – Veljko89
    Nov 13 at 8:18













up vote
-1
down vote

favorite
2









up vote
-1
down vote

favorite
2






2





Note: this question exists on dba.SE, but has no answers and practically no views. So I'm posting it here in the hopes that it will get wider attention.





I have recently been tasked with migrating a large volume of data stored within various excel sheets and CSV files into a structured database. The amount of data to process is enormous, well within the range of multiple Terabytes. The aim is to provide a quick data retrieval system and provide statistics about the data.



Since I have years of experience with relational databases, especially Postgres, my first thought was to analyze the data and migrate it to a Postgres DB. However, I recently read about "Big Data", and I see Hadoop being mentioned in many places. I have no experience whatsoever within this field, so I'm inclined to not use these frameworks, however it looks like this is the standard in storing and processing large amounts of data.



After spending some time on Google, it is still not entirely clear to me what the Big Data paradigm really is and how to "set up a Hadoop cluster". I know that it aims to resolve speed issues when retrieving data from a very large DB, but I still fail to understand where this "DB" is, i.e., is it Hadoop itself, is it some proprietary model, can it be a Postgres DB, ...?



My questions are:




  • Is it worth learning the Big Data paradigm and implement a solution based on Hadoop?

  • Can I get away with using a well-structured Postgres database instead?

  • Can I migrate my Postgres solution to some kind of Big Data structure if it turns out that it is better?










share|improve this question















Note: this question exists on dba.SE, but has no answers and practically no views. So I'm posting it here in the hopes that it will get wider attention.





I have recently been tasked with migrating a large volume of data stored within various excel sheets and CSV files into a structured database. The amount of data to process is enormous, well within the range of multiple Terabytes. The aim is to provide a quick data retrieval system and provide statistics about the data.



Since I have years of experience with relational databases, especially Postgres, my first thought was to analyze the data and migrate it to a Postgres DB. However, I recently read about "Big Data", and I see Hadoop being mentioned in many places. I have no experience whatsoever within this field, so I'm inclined to not use these frameworks, however it looks like this is the standard in storing and processing large amounts of data.



After spending some time on Google, it is still not entirely clear to me what the Big Data paradigm really is and how to "set up a Hadoop cluster". I know that it aims to resolve speed issues when retrieving data from a very large DB, but I still fail to understand where this "DB" is, i.e., is it Hadoop itself, is it some proprietary model, can it be a Postgres DB, ...?



My questions are:




  • Is it worth learning the Big Data paradigm and implement a solution based on Hadoop?

  • Can I get away with using a well-structured Postgres database instead?

  • Can I migrate my Postgres solution to some kind of Big Data structure if it turns out that it is better?







postgresql hadoop






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 9 at 15:38

























asked Nov 9 at 15:29









Flermat

466825




466825




closed as too broad by Denys Séguret, ChrisF Nov 12 at 13:42


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.






closed as too broad by Denys Séguret, ChrisF Nov 12 at 13:42


Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.














  • My questions are clear and concise. Moreover, adequate background information is provided and the ensemble is well structured. Yet it is put on hold as "unclear". This is why this site is losing popularity. Note: the same question has +2 reputation on dba.SE. Figures.
    – Flermat
    Nov 12 at 14:14












  • Your question was placed on hold because you ask multiple questions at once and ask for opinion based/resource requests, which are 3 closure reasons this question can fall under, I personally believe your sister question should be closed on DBA stack too, but I am not frequent over there
    – WhatsThePoint
    Nov 12 at 15:54










  • I don't see how my questions involve opinions of any kind. There is a problem, and there are two solutions. I need to know which one of the solutions is most suited to the problem. Feel free to block the answers that are opinion-based. The question is not.
    – Flermat
    Nov 12 at 16:10










  • Should I do X will always be an opinion based question
    – WhatsThePoint
    Nov 12 at 16:29






  • 1




    I totally agree with OP ... someone help the man!
    – Veljko89
    Nov 13 at 8:18


















  • My questions are clear and concise. Moreover, adequate background information is provided and the ensemble is well structured. Yet it is put on hold as "unclear". This is why this site is losing popularity. Note: the same question has +2 reputation on dba.SE. Figures.
    – Flermat
    Nov 12 at 14:14












  • Your question was placed on hold because you ask multiple questions at once and ask for opinion based/resource requests, which are 3 closure reasons this question can fall under, I personally believe your sister question should be closed on DBA stack too, but I am not frequent over there
    – WhatsThePoint
    Nov 12 at 15:54










  • I don't see how my questions involve opinions of any kind. There is a problem, and there are two solutions. I need to know which one of the solutions is most suited to the problem. Feel free to block the answers that are opinion-based. The question is not.
    – Flermat
    Nov 12 at 16:10










  • Should I do X will always be an opinion based question
    – WhatsThePoint
    Nov 12 at 16:29






  • 1




    I totally agree with OP ... someone help the man!
    – Veljko89
    Nov 13 at 8:18
















My questions are clear and concise. Moreover, adequate background information is provided and the ensemble is well structured. Yet it is put on hold as "unclear". This is why this site is losing popularity. Note: the same question has +2 reputation on dba.SE. Figures.
– Flermat
Nov 12 at 14:14






My questions are clear and concise. Moreover, adequate background information is provided and the ensemble is well structured. Yet it is put on hold as "unclear". This is why this site is losing popularity. Note: the same question has +2 reputation on dba.SE. Figures.
– Flermat
Nov 12 at 14:14














Your question was placed on hold because you ask multiple questions at once and ask for opinion based/resource requests, which are 3 closure reasons this question can fall under, I personally believe your sister question should be closed on DBA stack too, but I am not frequent over there
– WhatsThePoint
Nov 12 at 15:54




Your question was placed on hold because you ask multiple questions at once and ask for opinion based/resource requests, which are 3 closure reasons this question can fall under, I personally believe your sister question should be closed on DBA stack too, but I am not frequent over there
– WhatsThePoint
Nov 12 at 15:54












I don't see how my questions involve opinions of any kind. There is a problem, and there are two solutions. I need to know which one of the solutions is most suited to the problem. Feel free to block the answers that are opinion-based. The question is not.
– Flermat
Nov 12 at 16:10




I don't see how my questions involve opinions of any kind. There is a problem, and there are two solutions. I need to know which one of the solutions is most suited to the problem. Feel free to block the answers that are opinion-based. The question is not.
– Flermat
Nov 12 at 16:10












Should I do X will always be an opinion based question
– WhatsThePoint
Nov 12 at 16:29




Should I do X will always be an opinion based question
– WhatsThePoint
Nov 12 at 16:29




1




1




I totally agree with OP ... someone help the man!
– Veljko89
Nov 13 at 8:18




I totally agree with OP ... someone help the man!
– Veljko89
Nov 13 at 8:18












2 Answers
2






active

oldest

votes

















up vote
1
down vote













The migration from postgres (and classic rgbd) to "big data solution" is clearly time-consuming. If you have the budget you can have some help on public cloud. For example on Amazon you have EMR solution,it pre-package some big-data solution.



But on amazone to you have Redshift spectrum more easy to use : here some talk.






share|improve this answer




























    up vote
    1
    down vote













    Big Data are terms.. It means the data can from anything like Article, News, Media and other so it is so big that's why the name is Big Data..





    1. Hadoop is free source to implement Big Data If you ask about is it worthy.. Of course nowadays data has become so important.


    2. Big Data do data mining from many data like i said before..


    3. Big Data will give the data from mining and you need to store it to Database but depends on How you do implement Big Data. The data can be stored to NoSql or Rdbms like Postgresql do.. But you need some ETL to transform data because the data is so Big






    share|improve this answer




























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes








      up vote
      1
      down vote













      The migration from postgres (and classic rgbd) to "big data solution" is clearly time-consuming. If you have the budget you can have some help on public cloud. For example on Amazon you have EMR solution,it pre-package some big-data solution.



      But on amazone to you have Redshift spectrum more easy to use : here some talk.






      share|improve this answer

























        up vote
        1
        down vote













        The migration from postgres (and classic rgbd) to "big data solution" is clearly time-consuming. If you have the budget you can have some help on public cloud. For example on Amazon you have EMR solution,it pre-package some big-data solution.



        But on amazone to you have Redshift spectrum more easy to use : here some talk.






        share|improve this answer























          up vote
          1
          down vote










          up vote
          1
          down vote









          The migration from postgres (and classic rgbd) to "big data solution" is clearly time-consuming. If you have the budget you can have some help on public cloud. For example on Amazon you have EMR solution,it pre-package some big-data solution.



          But on amazone to you have Redshift spectrum more easy to use : here some talk.






          share|improve this answer












          The migration from postgres (and classic rgbd) to "big data solution" is clearly time-consuming. If you have the budget you can have some help on public cloud. For example on Amazon you have EMR solution,it pre-package some big-data solution.



          But on amazone to you have Redshift spectrum more easy to use : here some talk.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 9 at 16:44









          Le farfadet

          356




          356
























              up vote
              1
              down vote













              Big Data are terms.. It means the data can from anything like Article, News, Media and other so it is so big that's why the name is Big Data..





              1. Hadoop is free source to implement Big Data If you ask about is it worthy.. Of course nowadays data has become so important.


              2. Big Data do data mining from many data like i said before..


              3. Big Data will give the data from mining and you need to store it to Database but depends on How you do implement Big Data. The data can be stored to NoSql or Rdbms like Postgresql do.. But you need some ETL to transform data because the data is so Big






              share|improve this answer

























                up vote
                1
                down vote













                Big Data are terms.. It means the data can from anything like Article, News, Media and other so it is so big that's why the name is Big Data..





                1. Hadoop is free source to implement Big Data If you ask about is it worthy.. Of course nowadays data has become so important.


                2. Big Data do data mining from many data like i said before..


                3. Big Data will give the data from mining and you need to store it to Database but depends on How you do implement Big Data. The data can be stored to NoSql or Rdbms like Postgresql do.. But you need some ETL to transform data because the data is so Big






                share|improve this answer























                  up vote
                  1
                  down vote










                  up vote
                  1
                  down vote









                  Big Data are terms.. It means the data can from anything like Article, News, Media and other so it is so big that's why the name is Big Data..





                  1. Hadoop is free source to implement Big Data If you ask about is it worthy.. Of course nowadays data has become so important.


                  2. Big Data do data mining from many data like i said before..


                  3. Big Data will give the data from mining and you need to store it to Database but depends on How you do implement Big Data. The data can be stored to NoSql or Rdbms like Postgresql do.. But you need some ETL to transform data because the data is so Big






                  share|improve this answer












                  Big Data are terms.. It means the data can from anything like Article, News, Media and other so it is so big that's why the name is Big Data..





                  1. Hadoop is free source to implement Big Data If you ask about is it worthy.. Of course nowadays data has become so important.


                  2. Big Data do data mining from many data like i said before..


                  3. Big Data will give the data from mining and you need to store it to Database but depends on How you do implement Big Data. The data can be stored to NoSql or Rdbms like Postgresql do.. But you need some ETL to transform data because the data is so Big







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 9 at 22:41









                  dwir182

                  1




                  1















                      Popular posts from this blog

                      Schultheiß

                      Verwaltungsgliederung Dänemarks

                      Liste der Kulturdenkmale in Wilsdruff