python - Scrape data from reduced table -

- May 15, 2015

using beautiful soup , isolating web source data inside 'p' tag, managed retrieve data need. now, i'd iterate on remaining data inside variable 'table' (over each row , each cell) scrape data list. can me how achieve this? i've read several other posts not able apply specific issue... thanks.

from bs4 import beautifulsoup import urllib2 url = "http://www.gks.ru/bgd/free/b00_25/isswww.exe/stg/d000/000715.htm" page = urllib2.urlopen(url) soup = beautifulsoup(page.read(), 'html.parser') table=soup.findall('p',text=true) print(table)

assuming want per-month price data, need find tr elements inside table , skip first 3 (header rows). note that, html.parser did not work me, lxml did (see differences between parsers):

soup = beautifulsoup(page, 'lxml')  # requires 'lxml' installed  table = soup.find("center").find("table") row in table.find_all("tr")[3:]:     cells = [cell.get_text(strip=true) cell in row.find_all("td")]     print(cells)

prints:

['january', '469,4', '15,0', '3,9'] ['february', '479,8', '16,7', '2,2'] ['march', '485,6', '16,9', '1,2'] ['april', '487,8', '16,4', '0,5'] ['may', '489,5', '15,8', '0,4'] ['june', '490,5', '15,3', '0,2'] ['july', '494,4', '15,6', '0,8'] ['august', '496,1', '15,8', '0,4'] ['september', '499,0', '15,7', '0,6'] ['october', '502,7', '15,6', '0,7'] ['november', '506,4', '15,0', '0,8'] ['december', '', '', '']

Search This Blog

Erty

python - Scrape data from reduced table -

Comments

Post a Comment

Popular posts from this blog

c++ - llvm function pass ReplaceInstWithInst malloc -

Cross-Compiling Linux Kernel for Raspberry Pi - ${CCPREFIX}gcc -v does not work -

python - IO.UnsupportedOperation: Not Writable -