Searching XML tables using LXML & XPath

up vote
0
down vote

favorite

Trying to find a better way to navigate an XML file in Python. Currently have something looking like this.

import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')

root = tree.getroot()

TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}'

for child in root:

DID = child.find('PollingDistrictIdentifier')

for grandchild in child.getchildren():

    Name = grandchild.find('TagSlug+Name')

    for grandgrandchild in grandchild.getchildren():

        for grandgrandgrandchild in grandgrandchild.getchildren():    

            PP = grandgrandchild.find(TagSlug+'PollingPlaceIdentifier')

            print(PP.attrib['Id'], PP.attrib['Name'], DID.attrib['Id'], Name.text)

The XML is structured similar to below.

<PollingDistrictList Created="2018-10-30T12:01:21.043" xmlns="http://www.aec.gov.au/xml/schema/mediafeed" xmlns:eml="urn:oasis:names:tc:evs:schema:eml" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xal="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:xnl="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" xmlns:ts="urn:oasis:names:tc:evs:schema:eml:ts" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">

<TransactionId>4C59F7F3-2405-4443-8A1F-3F2BEF6E07C4</TransactionId>

<eml:EventIdentifier Id="12122">

<eml:EventName>State Election 2018</eml:EventName>

</eml:EventIdentifier>

<PollingDistrict>

<PollingDistrictIdentifier Id="10153">

<Name>Albert Park District</Name>

</PollingDistrictIdentifier>

<PollingPlaces>

<PollingPlace>

<PollingPlaceIdentifier Id="13133" Name="Bridport" />

<WheelchairAccess>None</WheelchairAccess>

</PollingPlace>

<PollingPlace>

<PollingPlaceIdentifier Id="13987" Name="Kerferd South" />

<WheelchairAccess>None</WheelchairAccess>

</PollingPlace>

<PollingPlaceIdentifier Id="13504" Name="Middle Park" />

<WheelchairAccess>None</WheelchairAccess>

</PollingPlace>

</PollingDistrict>

<PollingDistrict>

<PollingDistrictIdentifier = ....

et cetera

I am trying to print a list of polling place IDs, polling place names, district IDs, and district names but am struggling with the final part. I have tried several different things, this is some of the methods:

a = tree.findall('./PollingDistrictList/PollingDistrict/PollingPlaces/PollingPlace')

print(a.text)





a = tree.findall('.//PollingPlace')

print(a.text)

I end up getting errors that 'Nonetype' or 'list' has has no attribute 'text' and if I removed the '.text', I get nothing. I am looking to find a better way to navigate the XML file instead of doing this recursive 'child in root' stuff.

Ideally, I'd be getting:

[PP1Id], [PP1Name], [District1Id], [District1Name]

[PP2Id], [PP2Name], [District1Id], [District1Name]

...

[PP1Id], [PP1Name], [District2Id], [District2Name]

etc

Any advice would be appreciated.

asked Nov 9 at 13:06

notanothercliche

add a comment |

up vote
0
down vote

favorite

Trying to find a better way to navigate an XML file in Python. Currently have something looking like this.

import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')

root = tree.getroot()

TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}'

for child in root:

DID = child.find('PollingDistrictIdentifier')

for grandchild in child.getchildren():

    Name = grandchild.find('TagSlug+Name')

    for grandgrandchild in grandchild.getchildren():

        for grandgrandgrandchild in grandgrandchild.getchildren():    

            PP = grandgrandchild.find(TagSlug+'PollingPlaceIdentifier')

            print(PP.attrib['Id'], PP.attrib['Name'], DID.attrib['Id'], Name.text)

The XML is structured similar to below.

<PollingDistrictList Created="2018-10-30T12:01:21.043" xmlns="http://www.aec.gov.au/xml/schema/mediafeed" xmlns:eml="urn:oasis:names:tc:evs:schema:eml" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xal="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:xnl="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" xmlns:ts="urn:oasis:names:tc:evs:schema:eml:ts" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">

<TransactionId>4C59F7F3-2405-4443-8A1F-3F2BEF6E07C4</TransactionId>

<eml:EventIdentifier Id="12122">

<eml:EventName>State Election 2018</eml:EventName>

</eml:EventIdentifier>

<PollingDistrict>

<PollingDistrictIdentifier Id="10153">

<Name>Albert Park District</Name>

</PollingDistrictIdentifier>

<PollingPlaces>

<PollingPlace>

<PollingPlaceIdentifier Id="13133" Name="Bridport" />

<WheelchairAccess>None</WheelchairAccess>

</PollingPlace>

<PollingPlace>

<PollingPlaceIdentifier Id="13987" Name="Kerferd South" />

<WheelchairAccess>None</WheelchairAccess>

</PollingPlace>

<PollingPlaceIdentifier Id="13504" Name="Middle Park" />

<WheelchairAccess>None</WheelchairAccess>

</PollingPlace>

</PollingDistrict>

<PollingDistrict>

<PollingDistrictIdentifier = ....

et cetera

a = tree.findall('./PollingDistrictList/PollingDistrict/PollingPlaces/PollingPlace')

print(a.text)





a = tree.findall('.//PollingPlace')

print(a.text)

Ideally, I'd be getting:

[PP1Id], [PP1Name], [District1Id], [District1Name]

[PP2Id], [PP2Name], [District1Id], [District1Name]

...

[PP1Id], [PP1Name], [District2Id], [District2Name]

etc

Any advice would be appreciated.

asked Nov 9 at 13:06

notanothercliche

add a comment |

up vote
0
down vote

favorite

Trying to find a better way to navigate an XML file in Python. Currently have something looking like this.

import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')

root = tree.getroot()

TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}'

for child in root:

DID = child.find('PollingDistrictIdentifier')

for grandchild in child.getchildren():

    Name = grandchild.find('TagSlug+Name')

    for grandgrandchild in grandchild.getchildren():

        for grandgrandgrandchild in grandgrandchild.getchildren():    

            PP = grandgrandchild.find(TagSlug+'PollingPlaceIdentifier')

            print(PP.attrib['Id'], PP.attrib['Name'], DID.attrib['Id'], Name.text)

The XML is structured similar to below.

<PollingDistrictList Created="2018-10-30T12:01:21.043" xmlns="http://www.aec.gov.au/xml/schema/mediafeed" xmlns:eml="urn:oasis:names:tc:evs:schema:eml" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xal="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:xnl="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" xmlns:ts="urn:oasis:names:tc:evs:schema:eml:ts" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">

<TransactionId>4C59F7F3-2405-4443-8A1F-3F2BEF6E07C4</TransactionId>

<eml:EventIdentifier Id="12122">

<eml:EventName>State Election 2018</eml:EventName>

</eml:EventIdentifier>

<PollingDistrict>

<PollingDistrictIdentifier Id="10153">

<Name>Albert Park District</Name>

</PollingDistrictIdentifier>

<PollingPlaces>

<PollingPlace>

<PollingPlaceIdentifier Id="13133" Name="Bridport" />

<WheelchairAccess>None</WheelchairAccess>

</PollingPlace>

<PollingPlace>

<PollingPlaceIdentifier Id="13987" Name="Kerferd South" />

<WheelchairAccess>None</WheelchairAccess>

</PollingPlace>

<PollingPlaceIdentifier Id="13504" Name="Middle Park" />

<WheelchairAccess>None</WheelchairAccess>

</PollingPlace>

</PollingDistrict>

<PollingDistrict>

<PollingDistrictIdentifier = ....

et cetera

a = tree.findall('./PollingDistrictList/PollingDistrict/PollingPlaces/PollingPlace')

print(a.text)





a = tree.findall('.//PollingPlace')

print(a.text)

Ideally, I'd be getting:

[PP1Id], [PP1Name], [District1Id], [District1Name]

[PP2Id], [PP2Name], [District1Id], [District1Name]

...

[PP1Id], [PP1Name], [District2Id], [District2Name]

etc

Any advice would be appreciated.

asked Nov 9 at 13:06

notanothercliche

Trying to find a better way to navigate an XML file in Python. Currently have something looking like this.

import xml.etree.ElementTree as ET

tree = ET.parse('file.xml')

root = tree.getroot()

TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}'

for child in root:

DID = child.find('PollingDistrictIdentifier')

for grandchild in child.getchildren():

    Name = grandchild.find('TagSlug+Name')

    for grandgrandchild in grandchild.getchildren():

        for grandgrandgrandchild in grandgrandchild.getchildren():    

            PP = grandgrandchild.find(TagSlug+'PollingPlaceIdentifier')

            print(PP.attrib['Id'], PP.attrib['Name'], DID.attrib['Id'], Name.text)

The XML is structured similar to below.

<PollingDistrictList Created="2018-10-30T12:01:21.043" xmlns="http://www.aec.gov.au/xml/schema/mediafeed" xmlns:eml="urn:oasis:names:tc:evs:schema:eml" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xal="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:xnl="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" xmlns:ts="urn:oasis:names:tc:evs:schema:eml:ts" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">

<TransactionId>4C59F7F3-2405-4443-8A1F-3F2BEF6E07C4</TransactionId>

<eml:EventIdentifier Id="12122">

<eml:EventName>State Election 2018</eml:EventName>

</eml:EventIdentifier>

<PollingDistrict>

<PollingDistrictIdentifier Id="10153">

<Name>Albert Park District</Name>

</PollingDistrictIdentifier>

<PollingPlaces>

<PollingPlace>

<PollingPlaceIdentifier Id="13133" Name="Bridport" />

<WheelchairAccess>None</WheelchairAccess>

</PollingPlace>

<PollingPlace>

<PollingPlaceIdentifier Id="13987" Name="Kerferd South" />

<WheelchairAccess>None</WheelchairAccess>

</PollingPlace>

<PollingPlaceIdentifier Id="13504" Name="Middle Park" />

<WheelchairAccess>None</WheelchairAccess>

</PollingPlace>

</PollingDistrict>

<PollingDistrict>

<PollingDistrictIdentifier = ....

et cetera

a = tree.findall('./PollingDistrictList/PollingDistrict/PollingPlaces/PollingPlace')

print(a.text)





a = tree.findall('.//PollingPlace')

print(a.text)

Ideally, I'd be getting:

[PP1Id], [PP1Name], [District1Id], [District1Name]

[PP2Id], [PP2Name], [District1Id], [District1Name]

...

[PP1Id], [PP1Name], [District2Id], [District2Name]

etc

Any advice would be appreciated.

python xml xpath

asked Nov 9 at 13:06

notanothercliche

asked Nov 9 at 13:06

notanothercliche

asked Nov 9 at 13:06

notanothercliche

asked Nov 9 at 13:06

notanothercliche

asked Nov 9 at 13:06

notanothercliche

add a comment |

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

Fixed this with the following. Annotated so you can see what it's doing.

    import os ###Required to change directory

    os.chdir('C:/XMLDataLocation') ###Set directory

    import lxml

    from lxml import etree

    import xml.etree.ElementTree as ET ###Will parse xml

    import requests ###Requests will be used for the VEC site, not utilised at this stage

    tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file

    root = tree.getroot()





    TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space



    PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)

    for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display

        DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID

        Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)

        PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces') 

        PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file

        for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate

            PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name

            print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text

answered Nov 10 at 6:41

notanothercliche

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53226289%2fsearching-xml-tables-using-lxml-xpath%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
0
down vote

accepted

Fixed this with the following. Annotated so you can see what it's doing.

    import os ###Required to change directory

    os.chdir('C:/XMLDataLocation') ###Set directory

    import lxml

    from lxml import etree

    import xml.etree.ElementTree as ET ###Will parse xml

    import requests ###Requests will be used for the VEC site, not utilised at this stage

    tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file

    root = tree.getroot()





    TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space



    PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)

    for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display

        DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID

        Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)

        PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces') 

        PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file

        for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate

            PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name

            print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text

answered Nov 10 at 6:41

notanothercliche

add a comment |

up vote
0
down vote

accepted

Fixed this with the following. Annotated so you can see what it's doing.

    import os ###Required to change directory

    os.chdir('C:/XMLDataLocation') ###Set directory

    import lxml

    from lxml import etree

    import xml.etree.ElementTree as ET ###Will parse xml

    import requests ###Requests will be used for the VEC site, not utilised at this stage

    tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file

    root = tree.getroot()





    TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space



    PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)

    for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display

        DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID

        Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)

        PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces') 

        PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file

        for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate

            PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name

            print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text

answered Nov 10 at 6:41

notanothercliche

add a comment |

up vote
0
down vote

accepted

Fixed this with the following. Annotated so you can see what it's doing.

    import os ###Required to change directory

    os.chdir('C:/XMLDataLocation') ###Set directory

    import lxml

    from lxml import etree

    import xml.etree.ElementTree as ET ###Will parse xml

    import requests ###Requests will be used for the VEC site, not utilised at this stage

    tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file

    root = tree.getroot()





    TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space



    PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)

    for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display

        DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID

        Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)

        PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces') 

        PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file

        for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate

            PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name

            print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text

answered Nov 10 at 6:41

notanothercliche

Fixed this with the following. Annotated so you can see what it's doing.

    import os ###Required to change directory

    os.chdir('C:/XMLDataLocation') ###Set directory

    import lxml

    from lxml import etree

    import xml.etree.ElementTree as ET ###Will parse xml

    import requests ###Requests will be used for the VEC site, not utilised at this stage

    tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file

    root = tree.getroot()





    TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space



    PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)

    for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display

        DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID

        Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)

        PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces') 

        PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file

        for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate

            PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name

            print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text

answered Nov 10 at 6:41

notanothercliche

answered Nov 10 at 6:41

notanothercliche

answered Nov 10 at 6:41

notanothercliche

answered Nov 10 at 6:41

notanothercliche

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xtykutl