Matlab - URLREAD2 - User Agent and Cookies -
i'm @ loss @ how sample code working, , hoping if able review , assess assumptions mat wrong.
problem: use matlab access webpage protected login screen. able use wget , works fine, know, wget not load ajax/javascript etc. embedded within page. therefore, have turned using urlread2
function available matlab file exchange. hereafter, examples based on function.
example:
i trying login financial website, upon testing other sites same error. therefore, example going use fitbit.com. mimimic behaviour of browser, pass following combined headers urlread2
(i have split code make easier see i'm doing):
value = 'https://www.fitbit.com'; header = http_createheader('host',value); value = 'keep-alive'; header2 = http_createheader('connection',value); value = '278'; header3 = http_createheader('content-length',value); value = 'max-age=0'; header4 = http_createheader('cache-control',value); value = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8'; header5 = http_createheader('accept',value); value = 'https://www.fitbit.com'; header6 = http_createheader('origin',value); value = 'mozilla/5.0 (windows nt 6.1; wow64) applewebkit/537.36 (khtml, gecko) chrome/47.0.2526.106 safari/537.36'; header7 = http_createheader('user-agent',value); value = 'application/x-www-form-urlencoded'; header8 = http_createheader('content-type',value); value = 'https://www.fitbit.com/login'; header9 = http_createheader('referer',value); value = 'gzip, deflate'; header10 = http_createheader('accept-encoding',value); value = 'en-us,en;q=0.8'; header11 = http_createheader('accept-language',value); %generate combined header required urlread2 combined_header = [header header2 header3 header4 header5 header6 header7 header8 header9 header10 header11];
with header information defined, generate query string required (this post operation):
querystring = 'email=myemail&password=mypassword&login=log+in';
finally, bring urlread2
function:
[output,extras] = urlread2('https://www.fitbit.com/login','post',querystring,combined_header);
the following response embedded within html:
'the owner of website (www.fitbit.com) has banned access based on browser''s signature (2659bb18cf10354e-ua21).'
possible problem 1:
it may i'm passing in header incorrectly, when mimic headers via firefox page works correctly. advice on appreciated.
possible problem 2:
i think problem may down cookies, urlread2
(nor other function in matlab) supporting cookies. if case, have suggestions on how tackle this?
the problem isn't user agent. able verify trying handful of user agent values should have worked. instead, problem described problem 2. in other words, cloudflare requires http header contain valid cookie value/name pair.
this line of urlread2 output tells me case:
<div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data- translate="enable_cookies">please enable cookies.</div>
to see cookies fitbit.com using, add view cookies add-on firefox. count, login page sets 36 cookies, , guess barred entry if you're missing @ least of them. 1 thing take cookie values browser , manually add them http header name/value pair, better let website set cookies in php script. here stack overflow post describes how work: how can scrape website content in php website requires cookie login? not easy, not impossible. let me know if need more help.
Comments
Post a Comment