Searching XML tables using LXML & XPath
up vote
0
down vote
favorite
Trying to find a better way to navigate an XML file in Python. Currently have something looking like this.
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}'
for child in root:
DID = child.find('PollingDistrictIdentifier')
for grandchild in child.getchildren():
Name = grandchild.find('TagSlug+Name')
for grandgrandchild in grandchild.getchildren():
for grandgrandgrandchild in grandgrandchild.getchildren():
PP = grandgrandchild.find(TagSlug+'PollingPlaceIdentifier')
print(PP.attrib['Id'], PP.attrib['Name'], DID.attrib['Id'], Name.text)
The XML is structured similar to below.
<PollingDistrictList Created="2018-10-30T12:01:21.043" xmlns="http://www.aec.gov.au/xml/schema/mediafeed" xmlns:eml="urn:oasis:names:tc:evs:schema:eml" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xal="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:xnl="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" xmlns:ts="urn:oasis:names:tc:evs:schema:eml:ts" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<TransactionId>4C59F7F3-2405-4443-8A1F-3F2BEF6E07C4</TransactionId>
<eml:EventIdentifier Id="12122">
<eml:EventName>State Election 2018</eml:EventName>
</eml:EventIdentifier>
<PollingDistrict>
<PollingDistrictIdentifier Id="10153">
<Name>Albert Park District</Name>
</PollingDistrictIdentifier>
<PollingPlaces>
<PollingPlace>
<PollingPlaceIdentifier Id="13133" Name="Bridport" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
<PollingPlace>
<PollingPlaceIdentifier Id="13987" Name="Kerferd South" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
<PollingPlaceIdentifier Id="13504" Name="Middle Park" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
</PollingDistrict>
<PollingDistrict>
<PollingDistrictIdentifier = ....
et cetera
I am trying to print a list of polling place IDs, polling place names, district IDs, and district names but am struggling with the final part. I have tried several different things, this is some of the methods:
a = tree.findall('./PollingDistrictList/PollingDistrict/PollingPlaces/PollingPlace')
print(a.text)
a = tree.findall('.//PollingPlace')
print(a.text)
I end up getting errors that 'Nonetype' or 'list' has has no attribute 'text' and if I removed the '.text', I get nothing. I am looking to find a better way to navigate the XML file instead of doing this recursive 'child in root' stuff.
Ideally, I'd be getting:
[PP1Id], [PP1Name], [District1Id], [District1Name]
[PP2Id], [PP2Name], [District1Id], [District1Name]
...
[PP1Id], [PP1Name], [District2Id], [District2Name]
etc
Any advice would be appreciated.
python xml xpath
add a comment |
up vote
0
down vote
favorite
Trying to find a better way to navigate an XML file in Python. Currently have something looking like this.
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}'
for child in root:
DID = child.find('PollingDistrictIdentifier')
for grandchild in child.getchildren():
Name = grandchild.find('TagSlug+Name')
for grandgrandchild in grandchild.getchildren():
for grandgrandgrandchild in grandgrandchild.getchildren():
PP = grandgrandchild.find(TagSlug+'PollingPlaceIdentifier')
print(PP.attrib['Id'], PP.attrib['Name'], DID.attrib['Id'], Name.text)
The XML is structured similar to below.
<PollingDistrictList Created="2018-10-30T12:01:21.043" xmlns="http://www.aec.gov.au/xml/schema/mediafeed" xmlns:eml="urn:oasis:names:tc:evs:schema:eml" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xal="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:xnl="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" xmlns:ts="urn:oasis:names:tc:evs:schema:eml:ts" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<TransactionId>4C59F7F3-2405-4443-8A1F-3F2BEF6E07C4</TransactionId>
<eml:EventIdentifier Id="12122">
<eml:EventName>State Election 2018</eml:EventName>
</eml:EventIdentifier>
<PollingDistrict>
<PollingDistrictIdentifier Id="10153">
<Name>Albert Park District</Name>
</PollingDistrictIdentifier>
<PollingPlaces>
<PollingPlace>
<PollingPlaceIdentifier Id="13133" Name="Bridport" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
<PollingPlace>
<PollingPlaceIdentifier Id="13987" Name="Kerferd South" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
<PollingPlaceIdentifier Id="13504" Name="Middle Park" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
</PollingDistrict>
<PollingDistrict>
<PollingDistrictIdentifier = ....
et cetera
I am trying to print a list of polling place IDs, polling place names, district IDs, and district names but am struggling with the final part. I have tried several different things, this is some of the methods:
a = tree.findall('./PollingDistrictList/PollingDistrict/PollingPlaces/PollingPlace')
print(a.text)
a = tree.findall('.//PollingPlace')
print(a.text)
I end up getting errors that 'Nonetype' or 'list' has has no attribute 'text' and if I removed the '.text', I get nothing. I am looking to find a better way to navigate the XML file instead of doing this recursive 'child in root' stuff.
Ideally, I'd be getting:
[PP1Id], [PP1Name], [District1Id], [District1Name]
[PP2Id], [PP2Name], [District1Id], [District1Name]
...
[PP1Id], [PP1Name], [District2Id], [District2Name]
etc
Any advice would be appreciated.
python xml xpath
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
Trying to find a better way to navigate an XML file in Python. Currently have something looking like this.
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}'
for child in root:
DID = child.find('PollingDistrictIdentifier')
for grandchild in child.getchildren():
Name = grandchild.find('TagSlug+Name')
for grandgrandchild in grandchild.getchildren():
for grandgrandgrandchild in grandgrandchild.getchildren():
PP = grandgrandchild.find(TagSlug+'PollingPlaceIdentifier')
print(PP.attrib['Id'], PP.attrib['Name'], DID.attrib['Id'], Name.text)
The XML is structured similar to below.
<PollingDistrictList Created="2018-10-30T12:01:21.043" xmlns="http://www.aec.gov.au/xml/schema/mediafeed" xmlns:eml="urn:oasis:names:tc:evs:schema:eml" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xal="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:xnl="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" xmlns:ts="urn:oasis:names:tc:evs:schema:eml:ts" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<TransactionId>4C59F7F3-2405-4443-8A1F-3F2BEF6E07C4</TransactionId>
<eml:EventIdentifier Id="12122">
<eml:EventName>State Election 2018</eml:EventName>
</eml:EventIdentifier>
<PollingDistrict>
<PollingDistrictIdentifier Id="10153">
<Name>Albert Park District</Name>
</PollingDistrictIdentifier>
<PollingPlaces>
<PollingPlace>
<PollingPlaceIdentifier Id="13133" Name="Bridport" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
<PollingPlace>
<PollingPlaceIdentifier Id="13987" Name="Kerferd South" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
<PollingPlaceIdentifier Id="13504" Name="Middle Park" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
</PollingDistrict>
<PollingDistrict>
<PollingDistrictIdentifier = ....
et cetera
I am trying to print a list of polling place IDs, polling place names, district IDs, and district names but am struggling with the final part. I have tried several different things, this is some of the methods:
a = tree.findall('./PollingDistrictList/PollingDistrict/PollingPlaces/PollingPlace')
print(a.text)
a = tree.findall('.//PollingPlace')
print(a.text)
I end up getting errors that 'Nonetype' or 'list' has has no attribute 'text' and if I removed the '.text', I get nothing. I am looking to find a better way to navigate the XML file instead of doing this recursive 'child in root' stuff.
Ideally, I'd be getting:
[PP1Id], [PP1Name], [District1Id], [District1Name]
[PP2Id], [PP2Name], [District1Id], [District1Name]
...
[PP1Id], [PP1Name], [District2Id], [District2Name]
etc
Any advice would be appreciated.
python xml xpath
Trying to find a better way to navigate an XML file in Python. Currently have something looking like this.
import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}'
for child in root:
DID = child.find('PollingDistrictIdentifier')
for grandchild in child.getchildren():
Name = grandchild.find('TagSlug+Name')
for grandgrandchild in grandchild.getchildren():
for grandgrandgrandchild in grandgrandchild.getchildren():
PP = grandgrandchild.find(TagSlug+'PollingPlaceIdentifier')
print(PP.attrib['Id'], PP.attrib['Name'], DID.attrib['Id'], Name.text)
The XML is structured similar to below.
<PollingDistrictList Created="2018-10-30T12:01:21.043" xmlns="http://www.aec.gov.au/xml/schema/mediafeed" xmlns:eml="urn:oasis:names:tc:evs:schema:eml" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xal="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:xnl="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" xmlns:ts="urn:oasis:names:tc:evs:schema:eml:ts" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<TransactionId>4C59F7F3-2405-4443-8A1F-3F2BEF6E07C4</TransactionId>
<eml:EventIdentifier Id="12122">
<eml:EventName>State Election 2018</eml:EventName>
</eml:EventIdentifier>
<PollingDistrict>
<PollingDistrictIdentifier Id="10153">
<Name>Albert Park District</Name>
</PollingDistrictIdentifier>
<PollingPlaces>
<PollingPlace>
<PollingPlaceIdentifier Id="13133" Name="Bridport" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
<PollingPlace>
<PollingPlaceIdentifier Id="13987" Name="Kerferd South" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
<PollingPlaceIdentifier Id="13504" Name="Middle Park" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
</PollingDistrict>
<PollingDistrict>
<PollingDistrictIdentifier = ....
et cetera
I am trying to print a list of polling place IDs, polling place names, district IDs, and district names but am struggling with the final part. I have tried several different things, this is some of the methods:
a = tree.findall('./PollingDistrictList/PollingDistrict/PollingPlaces/PollingPlace')
print(a.text)
a = tree.findall('.//PollingPlace')
print(a.text)
I end up getting errors that 'Nonetype' or 'list' has has no attribute 'text' and if I removed the '.text', I get nothing. I am looking to find a better way to navigate the XML file instead of doing this recursive 'child in root' stuff.
Ideally, I'd be getting:
[PP1Id], [PP1Name], [District1Id], [District1Name]
[PP2Id], [PP2Name], [District1Id], [District1Name]
...
[PP1Id], [PP1Name], [District2Id], [District2Name]
etc
Any advice would be appreciated.
python xml xpath
python xml xpath
asked Nov 9 at 13:06
notanothercliche
83
83
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
accepted
Fixed this with the following. Annotated so you can see what it's doing.
import os ###Required to change directory
os.chdir('C:/XMLDataLocation') ###Set directory
import lxml
from lxml import etree
import xml.etree.ElementTree as ET ###Will parse xml
import requests ###Requests will be used for the VEC site, not utilised at this stage
tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file
root = tree.getroot()
TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space
PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)
for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display
DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID
Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)
PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces')
PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file
for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate
PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name
print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
Fixed this with the following. Annotated so you can see what it's doing.
import os ###Required to change directory
os.chdir('C:/XMLDataLocation') ###Set directory
import lxml
from lxml import etree
import xml.etree.ElementTree as ET ###Will parse xml
import requests ###Requests will be used for the VEC site, not utilised at this stage
tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file
root = tree.getroot()
TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space
PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)
for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display
DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID
Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)
PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces')
PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file
for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate
PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name
print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text
add a comment |
up vote
0
down vote
accepted
Fixed this with the following. Annotated so you can see what it's doing.
import os ###Required to change directory
os.chdir('C:/XMLDataLocation') ###Set directory
import lxml
from lxml import etree
import xml.etree.ElementTree as ET ###Will parse xml
import requests ###Requests will be used for the VEC site, not utilised at this stage
tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file
root = tree.getroot()
TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space
PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)
for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display
DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID
Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)
PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces')
PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file
for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate
PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name
print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
Fixed this with the following. Annotated so you can see what it's doing.
import os ###Required to change directory
os.chdir('C:/XMLDataLocation') ###Set directory
import lxml
from lxml import etree
import xml.etree.ElementTree as ET ###Will parse xml
import requests ###Requests will be used for the VEC site, not utilised at this stage
tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file
root = tree.getroot()
TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space
PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)
for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display
DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID
Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)
PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces')
PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file
for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate
PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name
print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text
Fixed this with the following. Annotated so you can see what it's doing.
import os ###Required to change directory
os.chdir('C:/XMLDataLocation') ###Set directory
import lxml
from lxml import etree
import xml.etree.ElementTree as ET ###Will parse xml
import requests ###Requests will be used for the VEC site, not utilised at this stage
tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file
root = tree.getroot()
TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space
PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)
for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display
DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID
Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)
PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces')
PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file
for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate
PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name
print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text
answered Nov 10 at 6:41
notanothercliche
83
83
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53226289%2fsearching-xml-tables-using-lxml-xpath%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown