Extracting Data from HTML Span using Beautiful Soup











up vote
0
down vote

favorite












I want to extract"1.02 Crores" and "7864" from html code and save them in different column in csv file.



Code:



<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>


enter image description here










share|improve this question




























    up vote
    0
    down vote

    favorite












    I want to extract"1.02 Crores" and "7864" from html code and save them in different column in csv file.



    Code:



    <div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>


    enter image description here










    share|improve this question


























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      I want to extract"1.02 Crores" and "7864" from html code and save them in different column in csv file.



      Code:



      <div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>


      enter image description here










      share|improve this question















      I want to extract"1.02 Crores" and "7864" from html code and save them in different column in csv file.



      Code:



      <div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>


      enter image description here







      python-3.x html5 web-scraping beautifulsoup






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 8 at 21:59









      Ashfaque Marfani

      4812




      4812










      asked Nov 8 at 17:47









      Nick

      75




      75
























          2 Answers
          2






          active

          oldest

          votes

















          up vote
          0
          down vote



          accepted










          Not sure about the actual data but this is just something that I threw together really quick. If you need it to navigate to a website then use import requests. you'' need to add url = 'yourwebpagehere' page = requests.get(url) and change soup to soup = BeautifulSoup(page.text, 'lxml') then remove the html variable since it would be unneeded.



          from bs4 import BeautifulSoup
          import csv

          html = '<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>'
          soup = BeautifulSoup(html, 'lxml')
          findSpan = soup.find('span')
          findB = soup.find('b')
          print([findSpan.text, findB.text.replace('/sq.ft', '')])

          with open('NAMEYOURFILE.csv', 'w+') as writer:
          csv_writer = csv.writer(writer)
          csv_writer.writerow(["First Column Name", "Second Column Name"])
          csv_writer.writerow([findSpan, findB])





          share|improve this answer




























            up vote
            0
            down vote













            self explained in code



            from bs4 import BeautifulSoup

            # data for first column
            firstCol =
            # data for second column
            secondCol =

            for url in listURL:
            html = '.....' # downloaded html
            soup = BeautifulSoup(html, 'html.parser')

            # 'select_one' select using CSS selectors, return only first element
            fCol = soup.select_one('.featuresvap h3 span')
            # remove: <i class="icon-inr"></i>
            span.find("i").extract()
            sCol = soup.select_one('.featuresvap h3 b')
            firstCol.append(fCol.text)
            secondCol.append(sCol.text.replace('/sq.ft', ''))

            with open('results.csv', 'w') as fl:
            csvContent = ','.join(firstCol) + 'n' + ','.join(secondCol)
            fl.write(csvContent)

            ''' sample results
            1.02 Crores | 2.34 Crores
            7864 | 2475

            '''
            print('finish')





            share|improve this answer





















              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














               

              draft saved


              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53213437%2fextracting-data-from-html-span-using-beautiful-soup%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes








              up vote
              0
              down vote



              accepted










              Not sure about the actual data but this is just something that I threw together really quick. If you need it to navigate to a website then use import requests. you'' need to add url = 'yourwebpagehere' page = requests.get(url) and change soup to soup = BeautifulSoup(page.text, 'lxml') then remove the html variable since it would be unneeded.



              from bs4 import BeautifulSoup
              import csv

              html = '<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>'
              soup = BeautifulSoup(html, 'lxml')
              findSpan = soup.find('span')
              findB = soup.find('b')
              print([findSpan.text, findB.text.replace('/sq.ft', '')])

              with open('NAMEYOURFILE.csv', 'w+') as writer:
              csv_writer = csv.writer(writer)
              csv_writer.writerow(["First Column Name", "Second Column Name"])
              csv_writer.writerow([findSpan, findB])





              share|improve this answer

























                up vote
                0
                down vote



                accepted










                Not sure about the actual data but this is just something that I threw together really quick. If you need it to navigate to a website then use import requests. you'' need to add url = 'yourwebpagehere' page = requests.get(url) and change soup to soup = BeautifulSoup(page.text, 'lxml') then remove the html variable since it would be unneeded.



                from bs4 import BeautifulSoup
                import csv

                html = '<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>'
                soup = BeautifulSoup(html, 'lxml')
                findSpan = soup.find('span')
                findB = soup.find('b')
                print([findSpan.text, findB.text.replace('/sq.ft', '')])

                with open('NAMEYOURFILE.csv', 'w+') as writer:
                csv_writer = csv.writer(writer)
                csv_writer.writerow(["First Column Name", "Second Column Name"])
                csv_writer.writerow([findSpan, findB])





                share|improve this answer























                  up vote
                  0
                  down vote



                  accepted







                  up vote
                  0
                  down vote



                  accepted






                  Not sure about the actual data but this is just something that I threw together really quick. If you need it to navigate to a website then use import requests. you'' need to add url = 'yourwebpagehere' page = requests.get(url) and change soup to soup = BeautifulSoup(page.text, 'lxml') then remove the html variable since it would be unneeded.



                  from bs4 import BeautifulSoup
                  import csv

                  html = '<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>'
                  soup = BeautifulSoup(html, 'lxml')
                  findSpan = soup.find('span')
                  findB = soup.find('b')
                  print([findSpan.text, findB.text.replace('/sq.ft', '')])

                  with open('NAMEYOURFILE.csv', 'w+') as writer:
                  csv_writer = csv.writer(writer)
                  csv_writer.writerow(["First Column Name", "Second Column Name"])
                  csv_writer.writerow([findSpan, findB])





                  share|improve this answer












                  Not sure about the actual data but this is just something that I threw together really quick. If you need it to navigate to a website then use import requests. you'' need to add url = 'yourwebpagehere' page = requests.get(url) and change soup to soup = BeautifulSoup(page.text, 'lxml') then remove the html variable since it would be unneeded.



                  from bs4 import BeautifulSoup
                  import csv

                  html = '<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>'
                  soup = BeautifulSoup(html, 'lxml')
                  findSpan = soup.find('span')
                  findB = soup.find('b')
                  print([findSpan.text, findB.text.replace('/sq.ft', '')])

                  with open('NAMEYOURFILE.csv', 'w+') as writer:
                  csv_writer = csv.writer(writer)
                  csv_writer.writerow(["First Column Name", "Second Column Name"])
                  csv_writer.writerow([findSpan, findB])






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 8 at 20:37









                  Kamikaze_goldfish

                  413311




                  413311
























                      up vote
                      0
                      down vote













                      self explained in code



                      from bs4 import BeautifulSoup

                      # data for first column
                      firstCol =
                      # data for second column
                      secondCol =

                      for url in listURL:
                      html = '.....' # downloaded html
                      soup = BeautifulSoup(html, 'html.parser')

                      # 'select_one' select using CSS selectors, return only first element
                      fCol = soup.select_one('.featuresvap h3 span')
                      # remove: <i class="icon-inr"></i>
                      span.find("i").extract()
                      sCol = soup.select_one('.featuresvap h3 b')
                      firstCol.append(fCol.text)
                      secondCol.append(sCol.text.replace('/sq.ft', ''))

                      with open('results.csv', 'w') as fl:
                      csvContent = ','.join(firstCol) + 'n' + ','.join(secondCol)
                      fl.write(csvContent)

                      ''' sample results
                      1.02 Crores | 2.34 Crores
                      7864 | 2475

                      '''
                      print('finish')





                      share|improve this answer

























                        up vote
                        0
                        down vote













                        self explained in code



                        from bs4 import BeautifulSoup

                        # data for first column
                        firstCol =
                        # data for second column
                        secondCol =

                        for url in listURL:
                        html = '.....' # downloaded html
                        soup = BeautifulSoup(html, 'html.parser')

                        # 'select_one' select using CSS selectors, return only first element
                        fCol = soup.select_one('.featuresvap h3 span')
                        # remove: <i class="icon-inr"></i>
                        span.find("i").extract()
                        sCol = soup.select_one('.featuresvap h3 b')
                        firstCol.append(fCol.text)
                        secondCol.append(sCol.text.replace('/sq.ft', ''))

                        with open('results.csv', 'w') as fl:
                        csvContent = ','.join(firstCol) + 'n' + ','.join(secondCol)
                        fl.write(csvContent)

                        ''' sample results
                        1.02 Crores | 2.34 Crores
                        7864 | 2475

                        '''
                        print('finish')





                        share|improve this answer























                          up vote
                          0
                          down vote










                          up vote
                          0
                          down vote









                          self explained in code



                          from bs4 import BeautifulSoup

                          # data for first column
                          firstCol =
                          # data for second column
                          secondCol =

                          for url in listURL:
                          html = '.....' # downloaded html
                          soup = BeautifulSoup(html, 'html.parser')

                          # 'select_one' select using CSS selectors, return only first element
                          fCol = soup.select_one('.featuresvap h3 span')
                          # remove: <i class="icon-inr"></i>
                          span.find("i").extract()
                          sCol = soup.select_one('.featuresvap h3 b')
                          firstCol.append(fCol.text)
                          secondCol.append(sCol.text.replace('/sq.ft', ''))

                          with open('results.csv', 'w') as fl:
                          csvContent = ','.join(firstCol) + 'n' + ','.join(secondCol)
                          fl.write(csvContent)

                          ''' sample results
                          1.02 Crores | 2.34 Crores
                          7864 | 2475

                          '''
                          print('finish')





                          share|improve this answer












                          self explained in code



                          from bs4 import BeautifulSoup

                          # data for first column
                          firstCol =
                          # data for second column
                          secondCol =

                          for url in listURL:
                          html = '.....' # downloaded html
                          soup = BeautifulSoup(html, 'html.parser')

                          # 'select_one' select using CSS selectors, return only first element
                          fCol = soup.select_one('.featuresvap h3 span')
                          # remove: <i class="icon-inr"></i>
                          span.find("i").extract()
                          sCol = soup.select_one('.featuresvap h3 b')
                          firstCol.append(fCol.text)
                          secondCol.append(sCol.text.replace('/sq.ft', ''))

                          with open('results.csv', 'w') as fl:
                          csvContent = ','.join(firstCol) + 'n' + ','.join(secondCol)
                          fl.write(csvContent)

                          ''' sample results
                          1.02 Crores | 2.34 Crores
                          7864 | 2475

                          '''
                          print('finish')






                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 9 at 14:18









                          ewwink

                          6,10622233




                          6,10622233






























                               

                              draft saved


                              draft discarded



















































                               


                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53213437%2fextracting-data-from-html-span-using-beautiful-soup%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              how to define a CAPL function taking a sysvar argument

                              Schultheiß

                              Extract exact text in tags