Decoding a Python 2 `tempfile` with python-future -
i'm attempting write python 2/3 compatible routine fetch csv file, decode latin_1
unicode , feed csv.dictreader
in robust, scalable manner.
- for python 2/3 support, i'm using
python-future
including imporingopen
builtins
, , importingunicode_literals
consistent behaviour - i'm hoping handle exceptionally large files spilling disk, using
tempfile.spooledtemporaryfile
- i'm using
io.textiowrapper
handle decodinglatin_1
encoding before feedingdictreader
this works fine under python 3.
the problem textiowrapper
expects wrap stream conforms bufferediobase
. unfortunately under python 2, although have imported python 3-style open
, vanilla python 2 tempfile.spooledtemporaryfile
still of course returns python 2 cstringio.stringo
, instead of python 3 io.bytesio
required textiowrapper
.
i can think of these possible approaches:
- wrap python 2
cstringio.stringo
python 3-styleio.bytesio
. i'm not sure how approach - need write such wrapper or 1 exist? - find python 2 alternative wrap
cstringio.stringo
stream decoding. haven't found 1 yet. - do away
spooledtemporaryfile
, decode entirely in memory. how big csv file need operating entirely in memory become concern? - do away
spooledtemporaryfile
, , implement own spill-to-disk. allow me callopen
python-future, i'd rather not tedious , less secure.
what's best way forward? have missed anything?
imports:
from __future__ import (absolute_import, division, print_function, unicode_literals) builtins import (ascii, bytes, chr, dict, filter, hex, input, # noqa int, map, next, oct, open, pow, range, round, # noqa str, super, zip) # noqa import csv import tempfile io import textiowrapper import requests
init:
... self._session = requests.session() ...
routine:
def _fetch_csv(self, path): raw_file = tempfile.spooledtemporaryfile( max_size=self._config.get('spool_size') ) csv_r = self._session.get(self.url + path) chunk in csv_r.iter_content(): raw_file.write(chunk) raw_file.seek(0) text_file = textiowrapper(raw_file._file, encoding='latin_1') return csv.dictreader(text_file)
error:
...in _fetch_csv text_file = textiowrapper(raw_file._file, encoding='utf-8') attributeerror: 'cstringio.stringo' object has no attribute 'readable'
not sure whether useful. situation vaguely analogous yours.
i wanted use namedtemporaryfile create csv encoded in utf-8 , have os native line endings, possibly not-quite-standard, accommodated using python 3 style io.open.
the difficulty namedtemporaryfile in python 2 opens byte stream, causing problems line endings. solution settled on, think bit nicer separate cases python 2 , 3, create temp file close , reopen io.open. final piece excellent backports.csv library provides python 3 style csv handling in python 2.
from __future__ import absolute_import, division, print_function, unicode_literals builtins import str import csv, tempfile, io, os backports import csv data = [["1", "1", "john coltrane", 1926], ["2", "1", "miles davis", 1926], ["3", "1", "bill evans", 1929], ["4", "1", "paul chambers", 1935], ["5", "1", "scott lafaro", 1936], ["6", "1", "sonny rollins", 1930], ["7", "1", "kenny burrel", 1931]] ## create csv file tempfile.namedtemporaryfile(delete=false) temp: filename = temp.name io.open(filename, mode='w', encoding="utf-8", newline='') temp: writer = csv.writer(temp, quoting=csv.quote_nonnumeric, lineterminator=str(os.linesep)) headers = ['x', 'y', 'name', 'born'] writer.writerow(headers) row in data: print(row) writer.writerow(row)
Comments
Post a Comment