python 2.7 - Why does Beautiful Soup Return filename instead of full link? -

python 2.7 - Why does Beautiful Soup Return filename instead of full link? -

- June 15, 2011

this question has answer here:

reconstructing absolute urls relative urls on page 2 answers

using below simple code, i'm facing following problem: why beautiful soup return file names rather full link addresses?

from bs4 import beautifulsoup import urllib2 url = 'http://www.gks.ru/bgd/free/b00_25/isswww.exe/stg/d000/i000650r.htm' data = urllib2.urlopen(url).read() page = beautifulsoup(data,'lxml') link in page.findall('a'):        l = link.get('href')        print l

all i'm getting output:

i000660r.htm i000670r.htm i000680r.htm i000690r.htm i000700r.htm i000706r.htm i000707r.htm i000708r.htm i000709r.htm 000710.htm 000711.htm 000712.htm 000713.htm 000714.htm 000715.htm

problem solved, given relativeness of links concatenated output root of url. thanks.

Comments