Searching XML tables using LXML & XPath











up vote
0
down vote

favorite












Trying to find a better way to navigate an XML file in Python. Currently have something looking like this.



import xml.etree.ElementTree as ET
tree = ET.parse('file.xml')
root = tree.getroot()
TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}'
for child in root:
DID = child.find('PollingDistrictIdentifier')
for grandchild in child.getchildren():
Name = grandchild.find('TagSlug+Name')
for grandgrandchild in grandchild.getchildren():
for grandgrandgrandchild in grandgrandchild.getchildren():
PP = grandgrandchild.find(TagSlug+'PollingPlaceIdentifier')
print(PP.attrib['Id'], PP.attrib['Name'], DID.attrib['Id'], Name.text)


The XML is structured similar to below.



<PollingDistrictList Created="2018-10-30T12:01:21.043" xmlns="http://www.aec.gov.au/xml/schema/mediafeed" xmlns:eml="urn:oasis:names:tc:evs:schema:eml" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xal="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:xnl="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" xmlns:ts="urn:oasis:names:tc:evs:schema:eml:ts" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
<TransactionId>4C59F7F3-2405-4443-8A1F-3F2BEF6E07C4</TransactionId>
<eml:EventIdentifier Id="12122">
<eml:EventName>State Election 2018</eml:EventName>
</eml:EventIdentifier>
<PollingDistrict>
<PollingDistrictIdentifier Id="10153">
<Name>Albert Park District</Name>
</PollingDistrictIdentifier>
<PollingPlaces>
<PollingPlace>
<PollingPlaceIdentifier Id="13133" Name="Bridport" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
<PollingPlace>
<PollingPlaceIdentifier Id="13987" Name="Kerferd South" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
<PollingPlaceIdentifier Id="13504" Name="Middle Park" />
<WheelchairAccess>None</WheelchairAccess>
</PollingPlace>
</PollingDistrict>
<PollingDistrict>
<PollingDistrictIdentifier = ....
et cetera


I am trying to print a list of polling place IDs, polling place names, district IDs, and district names but am struggling with the final part. I have tried several different things, this is some of the methods:



a = tree.findall('./PollingDistrictList/PollingDistrict/PollingPlaces/PollingPlace')
print(a.text)


a = tree.findall('.//PollingPlace')
print(a.text)


I end up getting errors that 'Nonetype' or 'list' has has no attribute 'text' and if I removed the '.text', I get nothing. I am looking to find a better way to navigate the XML file instead of doing this recursive 'child in root' stuff.



Ideally, I'd be getting:



[PP1Id], [PP1Name], [District1Id], [District1Name]
[PP2Id], [PP2Name], [District1Id], [District1Name]
...
[PP1Id], [PP1Name], [District2Id], [District2Name]
etc


Any advice would be appreciated.










share|improve this question


























    up vote
    0
    down vote

    favorite












    Trying to find a better way to navigate an XML file in Python. Currently have something looking like this.



    import xml.etree.ElementTree as ET
    tree = ET.parse('file.xml')
    root = tree.getroot()
    TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}'
    for child in root:
    DID = child.find('PollingDistrictIdentifier')
    for grandchild in child.getchildren():
    Name = grandchild.find('TagSlug+Name')
    for grandgrandchild in grandchild.getchildren():
    for grandgrandgrandchild in grandgrandchild.getchildren():
    PP = grandgrandchild.find(TagSlug+'PollingPlaceIdentifier')
    print(PP.attrib['Id'], PP.attrib['Name'], DID.attrib['Id'], Name.text)


    The XML is structured similar to below.



    <PollingDistrictList Created="2018-10-30T12:01:21.043" xmlns="http://www.aec.gov.au/xml/schema/mediafeed" xmlns:eml="urn:oasis:names:tc:evs:schema:eml" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xal="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:xnl="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" xmlns:ts="urn:oasis:names:tc:evs:schema:eml:ts" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
    <TransactionId>4C59F7F3-2405-4443-8A1F-3F2BEF6E07C4</TransactionId>
    <eml:EventIdentifier Id="12122">
    <eml:EventName>State Election 2018</eml:EventName>
    </eml:EventIdentifier>
    <PollingDistrict>
    <PollingDistrictIdentifier Id="10153">
    <Name>Albert Park District</Name>
    </PollingDistrictIdentifier>
    <PollingPlaces>
    <PollingPlace>
    <PollingPlaceIdentifier Id="13133" Name="Bridport" />
    <WheelchairAccess>None</WheelchairAccess>
    </PollingPlace>
    <PollingPlace>
    <PollingPlaceIdentifier Id="13987" Name="Kerferd South" />
    <WheelchairAccess>None</WheelchairAccess>
    </PollingPlace>
    <PollingPlaceIdentifier Id="13504" Name="Middle Park" />
    <WheelchairAccess>None</WheelchairAccess>
    </PollingPlace>
    </PollingDistrict>
    <PollingDistrict>
    <PollingDistrictIdentifier = ....
    et cetera


    I am trying to print a list of polling place IDs, polling place names, district IDs, and district names but am struggling with the final part. I have tried several different things, this is some of the methods:



    a = tree.findall('./PollingDistrictList/PollingDistrict/PollingPlaces/PollingPlace')
    print(a.text)


    a = tree.findall('.//PollingPlace')
    print(a.text)


    I end up getting errors that 'Nonetype' or 'list' has has no attribute 'text' and if I removed the '.text', I get nothing. I am looking to find a better way to navigate the XML file instead of doing this recursive 'child in root' stuff.



    Ideally, I'd be getting:



    [PP1Id], [PP1Name], [District1Id], [District1Name]
    [PP2Id], [PP2Name], [District1Id], [District1Name]
    ...
    [PP1Id], [PP1Name], [District2Id], [District2Name]
    etc


    Any advice would be appreciated.










    share|improve this question
























      up vote
      0
      down vote

      favorite









      up vote
      0
      down vote

      favorite











      Trying to find a better way to navigate an XML file in Python. Currently have something looking like this.



      import xml.etree.ElementTree as ET
      tree = ET.parse('file.xml')
      root = tree.getroot()
      TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}'
      for child in root:
      DID = child.find('PollingDistrictIdentifier')
      for grandchild in child.getchildren():
      Name = grandchild.find('TagSlug+Name')
      for grandgrandchild in grandchild.getchildren():
      for grandgrandgrandchild in grandgrandchild.getchildren():
      PP = grandgrandchild.find(TagSlug+'PollingPlaceIdentifier')
      print(PP.attrib['Id'], PP.attrib['Name'], DID.attrib['Id'], Name.text)


      The XML is structured similar to below.



      <PollingDistrictList Created="2018-10-30T12:01:21.043" xmlns="http://www.aec.gov.au/xml/schema/mediafeed" xmlns:eml="urn:oasis:names:tc:evs:schema:eml" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xal="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:xnl="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" xmlns:ts="urn:oasis:names:tc:evs:schema:eml:ts" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
      <TransactionId>4C59F7F3-2405-4443-8A1F-3F2BEF6E07C4</TransactionId>
      <eml:EventIdentifier Id="12122">
      <eml:EventName>State Election 2018</eml:EventName>
      </eml:EventIdentifier>
      <PollingDistrict>
      <PollingDistrictIdentifier Id="10153">
      <Name>Albert Park District</Name>
      </PollingDistrictIdentifier>
      <PollingPlaces>
      <PollingPlace>
      <PollingPlaceIdentifier Id="13133" Name="Bridport" />
      <WheelchairAccess>None</WheelchairAccess>
      </PollingPlace>
      <PollingPlace>
      <PollingPlaceIdentifier Id="13987" Name="Kerferd South" />
      <WheelchairAccess>None</WheelchairAccess>
      </PollingPlace>
      <PollingPlaceIdentifier Id="13504" Name="Middle Park" />
      <WheelchairAccess>None</WheelchairAccess>
      </PollingPlace>
      </PollingDistrict>
      <PollingDistrict>
      <PollingDistrictIdentifier = ....
      et cetera


      I am trying to print a list of polling place IDs, polling place names, district IDs, and district names but am struggling with the final part. I have tried several different things, this is some of the methods:



      a = tree.findall('./PollingDistrictList/PollingDistrict/PollingPlaces/PollingPlace')
      print(a.text)


      a = tree.findall('.//PollingPlace')
      print(a.text)


      I end up getting errors that 'Nonetype' or 'list' has has no attribute 'text' and if I removed the '.text', I get nothing. I am looking to find a better way to navigate the XML file instead of doing this recursive 'child in root' stuff.



      Ideally, I'd be getting:



      [PP1Id], [PP1Name], [District1Id], [District1Name]
      [PP2Id], [PP2Name], [District1Id], [District1Name]
      ...
      [PP1Id], [PP1Name], [District2Id], [District2Name]
      etc


      Any advice would be appreciated.










      share|improve this question













      Trying to find a better way to navigate an XML file in Python. Currently have something looking like this.



      import xml.etree.ElementTree as ET
      tree = ET.parse('file.xml')
      root = tree.getroot()
      TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}'
      for child in root:
      DID = child.find('PollingDistrictIdentifier')
      for grandchild in child.getchildren():
      Name = grandchild.find('TagSlug+Name')
      for grandgrandchild in grandchild.getchildren():
      for grandgrandgrandchild in grandgrandchild.getchildren():
      PP = grandgrandchild.find(TagSlug+'PollingPlaceIdentifier')
      print(PP.attrib['Id'], PP.attrib['Name'], DID.attrib['Id'], Name.text)


      The XML is structured similar to below.



      <PollingDistrictList Created="2018-10-30T12:01:21.043" xmlns="http://www.aec.gov.au/xml/schema/mediafeed" xmlns:eml="urn:oasis:names:tc:evs:schema:eml" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xal="urn:oasis:names:tc:ciq:xsdschema:xAL:2.0" xmlns:xnl="urn:oasis:names:tc:ciq:xsdschema:xNL:2.0" xmlns:ts="urn:oasis:names:tc:evs:schema:eml:ts" xmlns:xs="http://www.w3.org/2001/XMLSchema-instance">
      <TransactionId>4C59F7F3-2405-4443-8A1F-3F2BEF6E07C4</TransactionId>
      <eml:EventIdentifier Id="12122">
      <eml:EventName>State Election 2018</eml:EventName>
      </eml:EventIdentifier>
      <PollingDistrict>
      <PollingDistrictIdentifier Id="10153">
      <Name>Albert Park District</Name>
      </PollingDistrictIdentifier>
      <PollingPlaces>
      <PollingPlace>
      <PollingPlaceIdentifier Id="13133" Name="Bridport" />
      <WheelchairAccess>None</WheelchairAccess>
      </PollingPlace>
      <PollingPlace>
      <PollingPlaceIdentifier Id="13987" Name="Kerferd South" />
      <WheelchairAccess>None</WheelchairAccess>
      </PollingPlace>
      <PollingPlaceIdentifier Id="13504" Name="Middle Park" />
      <WheelchairAccess>None</WheelchairAccess>
      </PollingPlace>
      </PollingDistrict>
      <PollingDistrict>
      <PollingDistrictIdentifier = ....
      et cetera


      I am trying to print a list of polling place IDs, polling place names, district IDs, and district names but am struggling with the final part. I have tried several different things, this is some of the methods:



      a = tree.findall('./PollingDistrictList/PollingDistrict/PollingPlaces/PollingPlace')
      print(a.text)


      a = tree.findall('.//PollingPlace')
      print(a.text)


      I end up getting errors that 'Nonetype' or 'list' has has no attribute 'text' and if I removed the '.text', I get nothing. I am looking to find a better way to navigate the XML file instead of doing this recursive 'child in root' stuff.



      Ideally, I'd be getting:



      [PP1Id], [PP1Name], [District1Id], [District1Name]
      [PP2Id], [PP2Name], [District1Id], [District1Name]
      ...
      [PP1Id], [PP1Name], [District2Id], [District2Name]
      etc


      Any advice would be appreciated.







      python xml xpath






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Nov 9 at 13:06









      notanothercliche

      83




      83
























          1 Answer
          1






          active

          oldest

          votes

















          up vote
          0
          down vote



          accepted










          Fixed this with the following. Annotated so you can see what it's doing.



              import os ###Required to change directory
          os.chdir('C:/XMLDataLocation') ###Set directory
          import lxml
          from lxml import etree
          import xml.etree.ElementTree as ET ###Will parse xml
          import requests ###Requests will be used for the VEC site, not utilised at this stage
          tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file
          root = tree.getroot()


          TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space

          PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)
          for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display
          DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID
          Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)
          PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces')
          PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file
          for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate
          PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name
          print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text





          share|improve this answer





















            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














             

            draft saved


            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53226289%2fsearching-xml-tables-using-lxml-xpath%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            0
            down vote



            accepted










            Fixed this with the following. Annotated so you can see what it's doing.



                import os ###Required to change directory
            os.chdir('C:/XMLDataLocation') ###Set directory
            import lxml
            from lxml import etree
            import xml.etree.ElementTree as ET ###Will parse xml
            import requests ###Requests will be used for the VEC site, not utilised at this stage
            tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file
            root = tree.getroot()


            TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space

            PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)
            for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display
            DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID
            Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)
            PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces')
            PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file
            for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate
            PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name
            print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text





            share|improve this answer

























              up vote
              0
              down vote



              accepted










              Fixed this with the following. Annotated so you can see what it's doing.



                  import os ###Required to change directory
              os.chdir('C:/XMLDataLocation') ###Set directory
              import lxml
              from lxml import etree
              import xml.etree.ElementTree as ET ###Will parse xml
              import requests ###Requests will be used for the VEC site, not utilised at this stage
              tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file
              root = tree.getroot()


              TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space

              PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)
              for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display
              DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID
              Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)
              PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces')
              PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file
              for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate
              PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name
              print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text





              share|improve this answer























                up vote
                0
                down vote



                accepted







                up vote
                0
                down vote



                accepted






                Fixed this with the following. Annotated so you can see what it's doing.



                    import os ###Required to change directory
                os.chdir('C:/XMLDataLocation') ###Set directory
                import lxml
                from lxml import etree
                import xml.etree.ElementTree as ET ###Will parse xml
                import requests ###Requests will be used for the VEC site, not utilised at this stage
                tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file
                root = tree.getroot()


                TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space

                PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)
                for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display
                DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID
                Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)
                PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces')
                PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file
                for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate
                PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name
                print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text





                share|improve this answer












                Fixed this with the following. Annotated so you can see what it's doing.



                    import os ###Required to change directory
                os.chdir('C:/XMLDataLocation') ###Set directory
                import lxml
                from lxml import etree
                import xml.etree.ElementTree as ET ###Will parse xml
                import requests ###Requests will be used for the VEC site, not utilised at this stage
                tree = ET.parse('State2018MediaFilePollingLocations.xml') ###Loads file
                root = tree.getroot()


                TagSlug = '{http://www.aec.gov.au/xml/schema/mediafeed}' #This is pre-appended all nodes so saves space

                PollingDistricts = root.findall(TagSlug+'PollingDistrict') #Goes from level 0 (root) to level 1 (PollingDistrict)
                for PollingDistrict in PollingDistricts: #Required otherwise only the first district would display
                DistrictID = PollingDistrict.find(TagSlug+'PollingDistrictIdentifier') #Finds the district ID
                Name = DistrictID.find(TagSlug+'Name') #Finds the name of each electorate (as a child of DistrictID)
                PollingPlaces = PollingDistrict.find(TagSlug+'PollingPlaces')
                PollingPlace = PollingPlaces.find(TagSlug+'PollingPlace') #These two lines are ONLY for navigating the XML file
                for PollingPlace in PollingPlaces: #Required otherwise it would only print the first booth in each electorate
                PPID = PollingPlace.find(TagSlug+'PollingPlaceIdentifier') #Finds both the booth ID and name
                print(PPID.attrib['Id'], PPID.attrib['Name'], DistrictID.attrib['Id'], Name.text) #Prints the text






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 10 at 6:41









                notanothercliche

                83




                83






























                     

                    draft saved


                    draft discarded



















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53226289%2fsearching-xml-tables-using-lxml-xpath%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Schultheiß

                    Verwaltungsgliederung Dänemarks

                    Liste der Kulturdenkmale in Wilsdruff