Create JSON with XML file using BeautifulSoup
up vote
0
down vote
favorite
I am using Jupyer notebook, running python 3. My task is to extract data from XML file and convert it to json format (perhaps even save the json in an output.dat file). I am using BeautifulSoup to navigate through the nodes. I have the following data:
<?xml version='1.0' encoding='UTF-8'?>
<Terms>
<Term>
<Title>.177 (4.5mm) Airgun</Title>
<Description>The standard airgun calibre for international target
shooting.</Description>
<RelatedTerms>
<Term>
<Title>Shooting sport equipment</Title>
<Relationship>Narrower Term</Relationship>
</Term>
</RelatedTerms>
</Term>
<Term>
<Title>1 Kilometre Time Trial</Title>
<Description>test2</Description>
<RelatedTerms>
<Term>
<Title>1 Kilometre TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>One km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
</RelatedTerms>
</Term>
This is the following output that I am expecting in JSON:
{
"thesaurus": [
{
"Description": "The standard airgun calibre for international target shooting.",
"RelatedTerms": [
{
"Relationship": "Narrower Term",
"Title": "Shooting sport equipment"
}
],
"Title": ".177 (4.5mm) Airgun"
},
{
"Description": "test2",
"RelatedTerms": [
{
"Relationship": "Used For",
"Title": "1 Kilometre TT"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km TT"
},
{
"Relationship": "Used For",
"Title": "One km Time Trial"
}
],
"Title": "1 Kilometre Time Trial"
},
I am navigating through the tags so that I can create dictionaries as seen in the output example. Since I am new to text scraping, this is quite frustrating.
I was able to extract the "Description" tag with the following code:
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, encoding="utf8"),"xml")
elements = btree.find_all('Description')
descriptionTag =
for element in elements:
descriptionTag.append(element.text)
Like the above Description tag, I am not sure how to create a list of dictionaries for the information stored between the "RelatedTerms" tag.
Ideally, I would parse all the tags to a dataframe which would then convert the data to JSON format.
So, can someone please help in determining how to extract the information from "RelatedTerms" tag.
json xml beautifulsoup
add a comment |
up vote
0
down vote
favorite
I am using Jupyer notebook, running python 3. My task is to extract data from XML file and convert it to json format (perhaps even save the json in an output.dat file). I am using BeautifulSoup to navigate through the nodes. I have the following data:
<?xml version='1.0' encoding='UTF-8'?>
<Terms>
<Term>
<Title>.177 (4.5mm) Airgun</Title>
<Description>The standard airgun calibre for international target
shooting.</Description>
<RelatedTerms>
<Term>
<Title>Shooting sport equipment</Title>
<Relationship>Narrower Term</Relationship>
</Term>
</RelatedTerms>
</Term>
<Term>
<Title>1 Kilometre Time Trial</Title>
<Description>test2</Description>
<RelatedTerms>
<Term>
<Title>1 Kilometre TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>One km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
</RelatedTerms>
</Term>
This is the following output that I am expecting in JSON:
{
"thesaurus": [
{
"Description": "The standard airgun calibre for international target shooting.",
"RelatedTerms": [
{
"Relationship": "Narrower Term",
"Title": "Shooting sport equipment"
}
],
"Title": ".177 (4.5mm) Airgun"
},
{
"Description": "test2",
"RelatedTerms": [
{
"Relationship": "Used For",
"Title": "1 Kilometre TT"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km TT"
},
{
"Relationship": "Used For",
"Title": "One km Time Trial"
}
],
"Title": "1 Kilometre Time Trial"
},
I am navigating through the tags so that I can create dictionaries as seen in the output example. Since I am new to text scraping, this is quite frustrating.
I was able to extract the "Description" tag with the following code:
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, encoding="utf8"),"xml")
elements = btree.find_all('Description')
descriptionTag =
for element in elements:
descriptionTag.append(element.text)
Like the above Description tag, I am not sure how to create a list of dictionaries for the information stored between the "RelatedTerms" tag.
Ideally, I would parse all the tags to a dataframe which would then convert the data to JSON format.
So, can someone please help in determining how to extract the information from "RelatedTerms" tag.
json xml beautifulsoup
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am using Jupyer notebook, running python 3. My task is to extract data from XML file and convert it to json format (perhaps even save the json in an output.dat file). I am using BeautifulSoup to navigate through the nodes. I have the following data:
<?xml version='1.0' encoding='UTF-8'?>
<Terms>
<Term>
<Title>.177 (4.5mm) Airgun</Title>
<Description>The standard airgun calibre for international target
shooting.</Description>
<RelatedTerms>
<Term>
<Title>Shooting sport equipment</Title>
<Relationship>Narrower Term</Relationship>
</Term>
</RelatedTerms>
</Term>
<Term>
<Title>1 Kilometre Time Trial</Title>
<Description>test2</Description>
<RelatedTerms>
<Term>
<Title>1 Kilometre TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>One km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
</RelatedTerms>
</Term>
This is the following output that I am expecting in JSON:
{
"thesaurus": [
{
"Description": "The standard airgun calibre for international target shooting.",
"RelatedTerms": [
{
"Relationship": "Narrower Term",
"Title": "Shooting sport equipment"
}
],
"Title": ".177 (4.5mm) Airgun"
},
{
"Description": "test2",
"RelatedTerms": [
{
"Relationship": "Used For",
"Title": "1 Kilometre TT"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km TT"
},
{
"Relationship": "Used For",
"Title": "One km Time Trial"
}
],
"Title": "1 Kilometre Time Trial"
},
I am navigating through the tags so that I can create dictionaries as seen in the output example. Since I am new to text scraping, this is quite frustrating.
I was able to extract the "Description" tag with the following code:
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, encoding="utf8"),"xml")
elements = btree.find_all('Description')
descriptionTag =
for element in elements:
descriptionTag.append(element.text)
Like the above Description tag, I am not sure how to create a list of dictionaries for the information stored between the "RelatedTerms" tag.
Ideally, I would parse all the tags to a dataframe which would then convert the data to JSON format.
So, can someone please help in determining how to extract the information from "RelatedTerms" tag.
json xml beautifulsoup
I am using Jupyer notebook, running python 3. My task is to extract data from XML file and convert it to json format (perhaps even save the json in an output.dat file). I am using BeautifulSoup to navigate through the nodes. I have the following data:
<?xml version='1.0' encoding='UTF-8'?>
<Terms>
<Term>
<Title>.177 (4.5mm) Airgun</Title>
<Description>The standard airgun calibre for international target
shooting.</Description>
<RelatedTerms>
<Term>
<Title>Shooting sport equipment</Title>
<Relationship>Narrower Term</Relationship>
</Term>
</RelatedTerms>
</Term>
<Term>
<Title>1 Kilometre Time Trial</Title>
<Description>test2</Description>
<RelatedTerms>
<Term>
<Title>1 Kilometre TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>1km TT</Title>
<Relationship>Used For</Relationship>
</Term>
<Term>
<Title>One km Time Trial</Title>
<Relationship>Used For</Relationship>
</Term>
</RelatedTerms>
</Term>
This is the following output that I am expecting in JSON:
{
"thesaurus": [
{
"Description": "The standard airgun calibre for international target shooting.",
"RelatedTerms": [
{
"Relationship": "Narrower Term",
"Title": "Shooting sport equipment"
}
],
"Title": ".177 (4.5mm) Airgun"
},
{
"Description": "test2",
"RelatedTerms": [
{
"Relationship": "Used For",
"Title": "1 Kilometre TT"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km Time Trial"
},
{
"Relationship": "Used For",
"Title": "1km TT"
},
{
"Relationship": "Used For",
"Title": "One km Time Trial"
}
],
"Title": "1 Kilometre Time Trial"
},
I am navigating through the tags so that I can create dictionaries as seen in the output example. Since I am new to text scraping, this is quite frustrating.
I was able to extract the "Description" tag with the following code:
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, encoding="utf8"),"xml")
elements = btree.find_all('Description')
descriptionTag =
for element in elements:
descriptionTag.append(element.text)
Like the above Description tag, I am not sure how to create a list of dictionaries for the information stored between the "RelatedTerms" tag.
Ideally, I would parse all the tags to a dataframe which would then convert the data to JSON format.
So, can someone please help in determining how to extract the information from "RelatedTerms" tag.
json xml beautifulsoup
json xml beautifulsoup
asked Nov 10 at 9:35
Timetraveller
129114
129114
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
to extract RelatedTerms
first you have to extract top Term
element using btree.select('Terms > Term')
now you can loop it and extract Term
inside RelatedTerms
using term.select('RelatedTerms > Term')
import json
from bs4 import BeautifulSoup
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, 'r'), "xml")
Terms = btree.select('Terms > Term')
jsonObj = {"thesaurus": }
for term in Terms:
termDetail = {
"Description": term.find('Description').text,
"Title": term.find('Title').text
}
RelatedTerms = term.select('RelatedTerms > Term')
if RelatedTerms:
termDetail["RelatedTerms"] =
for rterm in RelatedTerms:
termDetail["RelatedTerms"].append({
"Title": rterm.find('Title').text,
"Relationship": rterm.find('Relationship').text
})
jsonObj["thesaurus"].append(termDetail)
print json.dumps(jsonObj, indent=4)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237663%2fcreate-json-with-xml-file-using-beautifulsoup%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
to extract RelatedTerms
first you have to extract top Term
element using btree.select('Terms > Term')
now you can loop it and extract Term
inside RelatedTerms
using term.select('RelatedTerms > Term')
import json
from bs4 import BeautifulSoup
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, 'r'), "xml")
Terms = btree.select('Terms > Term')
jsonObj = {"thesaurus": }
for term in Terms:
termDetail = {
"Description": term.find('Description').text,
"Title": term.find('Title').text
}
RelatedTerms = term.select('RelatedTerms > Term')
if RelatedTerms:
termDetail["RelatedTerms"] =
for rterm in RelatedTerms:
termDetail["RelatedTerms"].append({
"Title": rterm.find('Title').text,
"Relationship": rterm.find('Relationship').text
})
jsonObj["thesaurus"].append(termDetail)
print json.dumps(jsonObj, indent=4)
add a comment |
up vote
1
down vote
accepted
to extract RelatedTerms
first you have to extract top Term
element using btree.select('Terms > Term')
now you can loop it and extract Term
inside RelatedTerms
using term.select('RelatedTerms > Term')
import json
from bs4 import BeautifulSoup
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, 'r'), "xml")
Terms = btree.select('Terms > Term')
jsonObj = {"thesaurus": }
for term in Terms:
termDetail = {
"Description": term.find('Description').text,
"Title": term.find('Title').text
}
RelatedTerms = term.select('RelatedTerms > Term')
if RelatedTerms:
termDetail["RelatedTerms"] =
for rterm in RelatedTerms:
termDetail["RelatedTerms"].append({
"Title": rterm.find('Title').text,
"Relationship": rterm.find('Relationship').text
})
jsonObj["thesaurus"].append(termDetail)
print json.dumps(jsonObj, indent=4)
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
to extract RelatedTerms
first you have to extract top Term
element using btree.select('Terms > Term')
now you can loop it and extract Term
inside RelatedTerms
using term.select('RelatedTerms > Term')
import json
from bs4 import BeautifulSoup
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, 'r'), "xml")
Terms = btree.select('Terms > Term')
jsonObj = {"thesaurus": }
for term in Terms:
termDetail = {
"Description": term.find('Description').text,
"Title": term.find('Title').text
}
RelatedTerms = term.select('RelatedTerms > Term')
if RelatedTerms:
termDetail["RelatedTerms"] =
for rterm in RelatedTerms:
termDetail["RelatedTerms"].append({
"Title": rterm.find('Title').text,
"Relationship": rterm.find('Relationship').text
})
jsonObj["thesaurus"].append(termDetail)
print json.dumps(jsonObj, indent=4)
to extract RelatedTerms
first you have to extract top Term
element using btree.select('Terms > Term')
now you can loop it and extract Term
inside RelatedTerms
using term.select('RelatedTerms > Term')
import json
from bs4 import BeautifulSoup
xml_file = './xml.xml'
btree = BeautifulSoup(open(xml_file, 'r'), "xml")
Terms = btree.select('Terms > Term')
jsonObj = {"thesaurus": }
for term in Terms:
termDetail = {
"Description": term.find('Description').text,
"Title": term.find('Title').text
}
RelatedTerms = term.select('RelatedTerms > Term')
if RelatedTerms:
termDetail["RelatedTerms"] =
for rterm in RelatedTerms:
termDetail["RelatedTerms"].append({
"Title": rterm.find('Title').text,
"Relationship": rterm.find('Relationship').text
})
jsonObj["thesaurus"].append(termDetail)
print json.dumps(jsonObj, indent=4)
edited Nov 10 at 12:29
answered Nov 10 at 12:23
ewwink
8,78622236
8,78622236
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53237663%2fcreate-json-with-xml-file-using-beautifulsoup%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown