Python - Web scraping - URL Protected by Kerberos HTTP SPNEGO

up vote
0
down vote

favorite

I guys,

I have a script that test if the URL is accessible or not. The script use the module requests and requests_kerberos.

For a specific Website - that gives me some detail about my Hadoop Cluster - that always return the same message:

HTTP Status: 401 



401

I know that I have to make some configurations since this URL is protected by Kerberos.

For example, to have access using Firefox I had to follow these steps on Cloudera documentation:

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_sg_browser_access_kerberos_protected_url.html

How can I apply this using Python script? There exists any module that allows me to pass this authentication issue?

Thanks!

asked Nov 9 at 16:21

Pedro Alves

134111

1

Google for python SPNego and have fun...
– Samson Scharfrichter
Nov 9 at 18:00

Save yourself a lot of heartache and use curl instead.
– tk421
Nov 10 at 6:09

with curl I get 'requests.exceptions.SSLError: HTTPSConnectionPool(host='HOSTNAME', port=8090): Max retries exceeded with url: /cluster/nodes (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1045)')))' as a error message
– Pedro Alves
Nov 12 at 14:30

add a comment |

up vote
0
down vote

favorite

I guys,

I have a script that test if the URL is accessible or not. The script use the module requests and requests_kerberos.

For a specific Website - that gives me some detail about my Hadoop Cluster - that always return the same message:

HTTP Status: 401 



401

I know that I have to make some configurations since this URL is protected by Kerberos.

For example, to have access using Firefox I had to follow these steps on Cloudera documentation:

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_sg_browser_access_kerberos_protected_url.html

How can I apply this using Python script? There exists any module that allows me to pass this authentication issue?

Thanks!

asked Nov 9 at 16:21

Pedro Alves

134111

1

Google for python SPNego and have fun...
– Samson Scharfrichter
Nov 9 at 18:00

Save yourself a lot of heartache and use curl instead.
– tk421
Nov 10 at 6:09

with curl I get 'requests.exceptions.SSLError: HTTPSConnectionPool(host='HOSTNAME', port=8090): Max retries exceeded with url: /cluster/nodes (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1045)')))' as a error message
– Pedro Alves
Nov 12 at 14:30

add a comment |

up vote
0
down vote

favorite

I guys,

I have a script that test if the URL is accessible or not. The script use the module requests and requests_kerberos.

For a specific Website - that gives me some detail about my Hadoop Cluster - that always return the same message:

HTTP Status: 401 



401

I know that I have to make some configurations since this URL is protected by Kerberos.

For example, to have access using Firefox I had to follow these steps on Cloudera documentation:

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_sg_browser_access_kerberos_protected_url.html

How can I apply this using Python script? There exists any module that allows me to pass this authentication issue?

Thanks!

asked Nov 9 at 16:21

Pedro Alves

134111

I guys,

I have a script that test if the URL is accessible or not. The script use the module requests and requests_kerberos.

For a specific Website - that gives me some detail about my Hadoop Cluster - that always return the same message:

HTTP Status: 401 



401

I know that I have to make some configurations since this URL is protected by Kerberos.

For example, to have access using Firefox I had to follow these steps on Cloudera documentation:

https://www.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_sg_browser_access_kerberos_protected_url.html

How can I apply this using Python script? There exists any module that allows me to pass this authentication issue?

Thanks!

python hadoop web-scraping kerberos cloudera

asked Nov 9 at 16:21

Pedro Alves

134111

asked Nov 9 at 16:21

Pedro Alves

134111

asked Nov 9 at 16:21

Pedro Alves

134111

asked Nov 9 at 16:21

Pedro Alves

134111

asked Nov 9 at 16:21

Pedro Alves

134111

1

Google for python SPNego and have fun...
– Samson Scharfrichter
Nov 9 at 18:00

Save yourself a lot of heartache and use curl instead.
– tk421
Nov 10 at 6:09

with curl I get 'requests.exceptions.SSLError: HTTPSConnectionPool(host='HOSTNAME', port=8090): Max retries exceeded with url: /cluster/nodes (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1045)')))' as a error message
– Pedro Alves
Nov 12 at 14:30

add a comment |

1

Google for python SPNego and have fun...
– Samson Scharfrichter
Nov 9 at 18:00

Save yourself a lot of heartache and use curl instead.
– tk421
Nov 10 at 6:09

with curl I get 'requests.exceptions.SSLError: HTTPSConnectionPool(host='HOSTNAME', port=8090): Max retries exceeded with url: /cluster/nodes (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1045)')))' as a error message
– Pedro Alves
Nov 12 at 14:30

Google for python SPNego and have fun...
– Samson Scharfrichter
Nov 9 at 18:00

Save yourself a lot of heartache and use curl instead.
– tk421
Nov 10 at 6:09

with curl I get 'requests.exceptions.SSLError: HTTPSConnectionPool(host='HOSTNAME', port=8090): Max retries exceeded with url: /cluster/nodes (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1045)')))' as a error message
– Pedro Alves
Nov 12 at 14:30

add a comment |

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53229537%2fpython-web-scraping-url-protected-by-kerberos-http-spnego%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

active

oldest

votes

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Xtykutl