Python - Web scraping - URL Protected by Kerberos HTTP SPNEGO











up vote
0
down vote

favorite












I guys,



I have a script that test if the URL is accessible or not. The script use the module requests and requests_kerberos.



For a specific Website - that gives me some detail about my Hadoop Cluster - that always return the same message:



HTTP Status: 401 

401


I know that I have to make some configurations since this URL is protected by Kerberos.



For example, to have access using Firefox I had to follow these steps on Cloudera documentation:



https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_sg_browser_access_kerberos_protected_url.html


How can I apply this using Python script? There exists any module that allows me to pass this authentication issue?



Thanks!










share|improve this question


















  • 1




    Google for python SPNego and have fun...
    – Samson Scharfrichter
    Nov 9 at 18:00










  • Save yourself a lot of heartache and use curl instead.
    – tk421
    Nov 10 at 6:09










  • with curl I get 'requests.exceptions.SSLError: HTTPSConnectionPool(host='HOSTNAME', port=8090): Max retries exceeded with url: /cluster/nodes (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1045)')))' as a error message
    – Pedro Alves
    Nov 12 at 14:30















up vote
0
down vote

favorite












I guys,



I have a script that test if the URL is accessible or not. The script use the module requests and requests_kerberos.



For a specific Website - that gives me some detail about my Hadoop Cluster - that always return the same message:



HTTP Status: 401 

401


I know that I have to make some configurations since this URL is protected by Kerberos.



For example, to have access using Firefox I had to follow these steps on Cloudera documentation:



https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_sg_browser_access_kerberos_protected_url.html


How can I apply this using Python script? There exists any module that allows me to pass this authentication issue?



Thanks!










share|improve this question


















  • 1




    Google for python SPNego and have fun...
    – Samson Scharfrichter
    Nov 9 at 18:00










  • Save yourself a lot of heartache and use curl instead.
    – tk421
    Nov 10 at 6:09










  • with curl I get 'requests.exceptions.SSLError: HTTPSConnectionPool(host='HOSTNAME', port=8090): Max retries exceeded with url: /cluster/nodes (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1045)')))' as a error message
    – Pedro Alves
    Nov 12 at 14:30













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I guys,



I have a script that test if the URL is accessible or not. The script use the module requests and requests_kerberos.



For a specific Website - that gives me some detail about my Hadoop Cluster - that always return the same message:



HTTP Status: 401 

401


I know that I have to make some configurations since this URL is protected by Kerberos.



For example, to have access using Firefox I had to follow these steps on Cloudera documentation:



https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_sg_browser_access_kerberos_protected_url.html


How can I apply this using Python script? There exists any module that allows me to pass this authentication issue?



Thanks!










share|improve this question













I guys,



I have a script that test if the URL is accessible or not. The script use the module requests and requests_kerberos.



For a specific Website - that gives me some detail about my Hadoop Cluster - that always return the same message:



HTTP Status: 401 

401


I know that I have to make some configurations since this URL is protected by Kerberos.



For example, to have access using Firefox I had to follow these steps on Cloudera documentation:



https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_sg_browser_access_kerberos_protected_url.html


How can I apply this using Python script? There exists any module that allows me to pass this authentication issue?



Thanks!







python hadoop web-scraping kerberos cloudera






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 9 at 16:21









Pedro Alves

134111




134111








  • 1




    Google for python SPNego and have fun...
    – Samson Scharfrichter
    Nov 9 at 18:00










  • Save yourself a lot of heartache and use curl instead.
    – tk421
    Nov 10 at 6:09










  • with curl I get 'requests.exceptions.SSLError: HTTPSConnectionPool(host='HOSTNAME', port=8090): Max retries exceeded with url: /cluster/nodes (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1045)')))' as a error message
    – Pedro Alves
    Nov 12 at 14:30














  • 1




    Google for python SPNego and have fun...
    – Samson Scharfrichter
    Nov 9 at 18:00










  • Save yourself a lot of heartache and use curl instead.
    – tk421
    Nov 10 at 6:09










  • with curl I get 'requests.exceptions.SSLError: HTTPSConnectionPool(host='HOSTNAME', port=8090): Max retries exceeded with url: /cluster/nodes (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1045)')))' as a error message
    – Pedro Alves
    Nov 12 at 14:30








1




1




Google for python SPNego and have fun...
– Samson Scharfrichter
Nov 9 at 18:00




Google for python SPNego and have fun...
– Samson Scharfrichter
Nov 9 at 18:00












Save yourself a lot of heartache and use curl instead.
– tk421
Nov 10 at 6:09




Save yourself a lot of heartache and use curl instead.
– tk421
Nov 10 at 6:09












with curl I get 'requests.exceptions.SSLError: HTTPSConnectionPool(host='HOSTNAME', port=8090): Max retries exceeded with url: /cluster/nodes (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1045)')))' as a error message
– Pedro Alves
Nov 12 at 14:30




with curl I get 'requests.exceptions.SSLError: HTTPSConnectionPool(host='HOSTNAME', port=8090): Max retries exceeded with url: /cluster/nodes (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1045)')))' as a error message
– Pedro Alves
Nov 12 at 14:30

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53229537%2fpython-web-scraping-url-protected-by-kerberos-http-spnego%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53229537%2fpython-web-scraping-url-protected-by-kerberos-http-spnego%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Schultheiß

Verwaltungsgliederung Dänemarks

Liste der Kulturdenkmale in Wilsdruff