Extracting Data from HTML Span using Beautiful Soup

up vote
0
down vote

favorite

I want to extract"1.02 Crores" and "7864" from html code and save them in different column in csv file.

Code:

<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>

enter image description here

edited Nov 8 at 21:59

Ashfaque Marfani

4812

asked Nov 8 at 17:47

Nick

add a comment |

up vote
0
down vote

favorite

I want to extract"1.02 Crores" and "7864" from html code and save them in different column in csv file.

Code:

<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>

enter image description here

edited Nov 8 at 21:59

Ashfaque Marfani

4812

asked Nov 8 at 17:47

Nick

add a comment |

up vote
0
down vote

favorite

I want to extract"1.02 Crores" and "7864" from html code and save them in different column in csv file.

Code:

<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>

enter image description here

edited Nov 8 at 21:59

Ashfaque Marfani

4812

asked Nov 8 at 17:47

Nick

I want to extract"1.02 Crores" and "7864" from html code and save them in different column in csv file.

Code:

<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>

enter image description here

python-3.x html5 web-scraping beautifulsoup

edited Nov 8 at 21:59

Ashfaque Marfani

4812

asked Nov 8 at 17:47

Nick

edited Nov 8 at 21:59

Ashfaque Marfani

4812

asked Nov 8 at 17:47

Nick

edited Nov 8 at 21:59

Ashfaque Marfani

4812

edited Nov 8 at 21:59

Ashfaque Marfani

4812

edited Nov 8 at 21:59

Ashfaque Marfani

4812

asked Nov 8 at 17:47

Nick

asked Nov 8 at 17:47

Nick

asked Nov 8 at 17:47

Nick

add a comment |

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

Not sure about the actual data but this is just something that I threw together really quick. If you need it to navigate to a website then use import requests. you'' need to add url = 'yourwebpagehere' page = requests.get(url) and change soup to soup = BeautifulSoup(page.text, 'lxml') then remove the html variable since it would be unneeded.

from bs4 import BeautifulSoup

import csv



html = '<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>'

soup = BeautifulSoup(html, 'lxml')

findSpan = soup.find('span')

findB = soup.find('b')

print([findSpan.text, findB.text.replace('/sq.ft', '')])



with open('NAMEYOURFILE.csv', 'w+') as writer:

    csv_writer = csv.writer(writer)

    csv_writer.writerow(["First Column Name", "Second Column Name"])

    csv_writer.writerow([findSpan, findB])

answered Nov 8 at 20:37

Kamikaze_goldfish

413311

add a comment |

up vote
0
down vote

self explained in code

from bs4 import BeautifulSoup



# data for first column

firstCol = 

# data for second column

secondCol = 



for url in listURL:

    html = '.....' # downloaded html

    soup = BeautifulSoup(html, 'html.parser')



    # 'select_one' select using CSS selectors, return only first element

    fCol = soup.select_one('.featuresvap h3 span')

    # remove: <i class="icon-inr"></i>

    span.find("i").extract()

    sCol = soup.select_one('.featuresvap h3 b')

    firstCol.append(fCol.text)

    secondCol.append(sCol.text.replace('/sq.ft', ''))



with open('results.csv', 'w') as fl:

    csvContent = ','.join(firstCol) + 'n' + ','.join(secondCol)

    fl.write(csvContent)



''' sample results

1.02 Crores | 2.34 Crores

7864        | 2475



'''

print('finish')

answered Nov 9 at 14:18

ewwink

6,10622233

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53213437%2fextracting-data-from-html-span-using-beautiful-soup%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
0
down vote

accepted

from bs4 import BeautifulSoup

import csv



html = '<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>'

soup = BeautifulSoup(html, 'lxml')

findSpan = soup.find('span')

findB = soup.find('b')

print([findSpan.text, findB.text.replace('/sq.ft', '')])



with open('NAMEYOURFILE.csv', 'w+') as writer:

    csv_writer = csv.writer(writer)

    csv_writer.writerow(["First Column Name", "Second Column Name"])

    csv_writer.writerow([findSpan, findB])

answered Nov 8 at 20:37

Kamikaze_goldfish

413311

add a comment |

up vote
0
down vote

accepted

from bs4 import BeautifulSoup

import csv



html = '<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>'

soup = BeautifulSoup(html, 'lxml')

findSpan = soup.find('span')

findB = soup.find('b')

print([findSpan.text, findB.text.replace('/sq.ft', '')])



with open('NAMEYOURFILE.csv', 'w+') as writer:

    csv_writer = csv.writer(writer)

    csv_writer.writerow(["First Column Name", "Second Column Name"])

    csv_writer.writerow([findSpan, findB])

answered Nov 8 at 20:37

Kamikaze_goldfish

413311

add a comment |

up vote
0
down vote

accepted

from bs4 import BeautifulSoup

import csv



html = '<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>'

soup = BeautifulSoup(html, 'lxml')

findSpan = soup.find('span')

findB = soup.find('b')

print([findSpan.text, findB.text.replace('/sq.ft', '')])



with open('NAMEYOURFILE.csv', 'w+') as writer:

    csv_writer = csv.writer(writer)

    csv_writer.writerow(["First Column Name", "Second Column Name"])

    csv_writer.writerow([findSpan, findB])

answered Nov 8 at 20:37

Kamikaze_goldfish

413311

from bs4 import BeautifulSoup

import csv



html = '<div class="featuresvap _graybox clearfix"><h3><span><i class="icon-inr"></i>1.02 Crores</span><small> @ <i class="icon-inr"></i><b>7864/sq.ft</b> as per carpet area</small></h3>'

soup = BeautifulSoup(html, 'lxml')

findSpan = soup.find('span')

findB = soup.find('b')

print([findSpan.text, findB.text.replace('/sq.ft', '')])



with open('NAMEYOURFILE.csv', 'w+') as writer:

    csv_writer = csv.writer(writer)

    csv_writer.writerow(["First Column Name", "Second Column Name"])

    csv_writer.writerow([findSpan, findB])

answered Nov 8 at 20:37

Kamikaze_goldfish

413311

answered Nov 8 at 20:37

Kamikaze_goldfish

413311

answered Nov 8 at 20:37

Kamikaze_goldfish

413311

answered Nov 8 at 20:37

Kamikaze_goldfish

413311

add a comment |

up vote
0
down vote

self explained in code

from bs4 import BeautifulSoup



# data for first column

firstCol = 

# data for second column

secondCol = 



for url in listURL:

    html = '.....' # downloaded html

    soup = BeautifulSoup(html, 'html.parser')



    # 'select_one' select using CSS selectors, return only first element

    fCol = soup.select_one('.featuresvap h3 span')

    # remove: <i class="icon-inr"></i>

    span.find("i").extract()

    sCol = soup.select_one('.featuresvap h3 b')

    firstCol.append(fCol.text)

    secondCol.append(sCol.text.replace('/sq.ft', ''))



with open('results.csv', 'w') as fl:

    csvContent = ','.join(firstCol) + 'n' + ','.join(secondCol)

    fl.write(csvContent)



''' sample results

1.02 Crores | 2.34 Crores

7864        | 2475



'''

print('finish')

answered Nov 9 at 14:18

ewwink

6,10622233

add a comment |

up vote
0
down vote

self explained in code

from bs4 import BeautifulSoup



# data for first column

firstCol = 

# data for second column

secondCol = 



for url in listURL:

    html = '.....' # downloaded html

    soup = BeautifulSoup(html, 'html.parser')



    # 'select_one' select using CSS selectors, return only first element

    fCol = soup.select_one('.featuresvap h3 span')

    # remove: <i class="icon-inr"></i>

    span.find("i").extract()

    sCol = soup.select_one('.featuresvap h3 b')

    firstCol.append(fCol.text)

    secondCol.append(sCol.text.replace('/sq.ft', ''))



with open('results.csv', 'w') as fl:

    csvContent = ','.join(firstCol) + 'n' + ','.join(secondCol)

    fl.write(csvContent)



''' sample results

1.02 Crores | 2.34 Crores

7864        | 2475



'''

print('finish')

answered Nov 9 at 14:18

ewwink

6,10622233

add a comment |

up vote
0
down vote

self explained in code

from bs4 import BeautifulSoup



# data for first column

firstCol = 

# data for second column

secondCol = 



for url in listURL:

    html = '.....' # downloaded html

    soup = BeautifulSoup(html, 'html.parser')



    # 'select_one' select using CSS selectors, return only first element

    fCol = soup.select_one('.featuresvap h3 span')

    # remove: <i class="icon-inr"></i>

    span.find("i").extract()

    sCol = soup.select_one('.featuresvap h3 b')

    firstCol.append(fCol.text)

    secondCol.append(sCol.text.replace('/sq.ft', ''))



with open('results.csv', 'w') as fl:

    csvContent = ','.join(firstCol) + 'n' + ','.join(secondCol)

    fl.write(csvContent)



''' sample results

1.02 Crores | 2.34 Crores

7864        | 2475



'''

print('finish')

answered Nov 9 at 14:18

ewwink

6,10622233

self explained in code

from bs4 import BeautifulSoup



# data for first column

firstCol = 

# data for second column

secondCol = 



for url in listURL:

    html = '.....' # downloaded html

    soup = BeautifulSoup(html, 'html.parser')



    # 'select_one' select using CSS selectors, return only first element

    fCol = soup.select_one('.featuresvap h3 span')

    # remove: <i class="icon-inr"></i>

    span.find("i").extract()

    sCol = soup.select_one('.featuresvap h3 b')

    firstCol.append(fCol.text)

    secondCol.append(sCol.text.replace('/sq.ft', ''))



with open('results.csv', 'w') as fl:

    csvContent = ','.join(firstCol) + 'n' + ','.join(secondCol)

    fl.write(csvContent)



''' sample results

1.02 Crores | 2.34 Crores

7864        | 2475



'''

print('finish')

answered Nov 9 at 14:18

ewwink

6,10622233

answered Nov 9 at 14:18

ewwink

6,10622233

answered Nov 9 at 14:18

ewwink

6,10622233

answered Nov 9 at 14:18

ewwink

6,10622233

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xtykutl