Decoding a Python 2 `tempfile` with python-future -


i'm attempting write python 2/3 compatible routine fetch csv file, decode latin_1 unicode , feed csv.dictreader in robust, scalable manner.

  • for python 2/3 support, i'm using python-future including imporing open builtins, , importing unicode_literals consistent behaviour
  • i'm hoping handle exceptionally large files spilling disk, using tempfile.spooledtemporaryfile
  • i'm using io.textiowrapper handle decoding latin_1 encoding before feeding dictreader

this works fine under python 3.

the problem textiowrapper expects wrap stream conforms bufferediobase. unfortunately under python 2, although have imported python 3-style open, vanilla python 2 tempfile.spooledtemporaryfile still of course returns python 2 cstringio.stringo, instead of python 3 io.bytesio required textiowrapper.

i can think of these possible approaches:

  1. wrap python 2 cstringio.stringo python 3-style io.bytesio. i'm not sure how approach - need write such wrapper or 1 exist?
  2. find python 2 alternative wrap cstringio.stringo stream decoding. haven't found 1 yet.
  3. do away spooledtemporaryfile, decode entirely in memory. how big csv file need operating entirely in memory become concern?
  4. do away spooledtemporaryfile, , implement own spill-to-disk. allow me call open python-future, i'd rather not tedious , less secure.

what's best way forward? have missed anything?


imports:

from __future__ import (absolute_import, division,                     print_function, unicode_literals) builtins import (ascii, bytes, chr, dict, filter, hex, input,  # noqa                   int, map, next, oct, open, pow, range, round,  # noqa                   str, super, zip)  # noqa import csv import tempfile io import textiowrapper import requests 

init:

... self._session = requests.session() ... 

routine:

def _fetch_csv(self, path):     raw_file = tempfile.spooledtemporaryfile(         max_size=self._config.get('spool_size')     )     csv_r = self._session.get(self.url + path)     chunk in csv_r.iter_content():         raw_file.write(chunk)     raw_file.seek(0)     text_file = textiowrapper(raw_file._file, encoding='latin_1')     return csv.dictreader(text_file) 

error:

...in _fetch_csv     text_file = textiowrapper(raw_file._file, encoding='utf-8') attributeerror: 'cstringio.stringo' object has no attribute 'readable' 

not sure whether useful. situation vaguely analogous yours.

i wanted use namedtemporaryfile create csv encoded in utf-8 , have os native line endings, possibly not-quite-standard, accommodated using python 3 style io.open.

the difficulty namedtemporaryfile in python 2 opens byte stream, causing problems line endings. solution settled on, think bit nicer separate cases python 2 , 3, create temp file close , reopen io.open. final piece excellent backports.csv library provides python 3 style csv handling in python 2.

from __future__ import absolute_import, division, print_function, unicode_literals builtins import str import csv, tempfile, io, os backports import csv  data = [["1", "1", "john coltrane",  1926],         ["2", "1", "miles davis",    1926],         ["3", "1", "bill evans",     1929],         ["4", "1", "paul chambers",  1935],         ["5", "1", "scott lafaro",   1936],         ["6", "1", "sonny rollins",  1930],         ["7", "1", "kenny burrel",   1931]]  ## create csv file tempfile.namedtemporaryfile(delete=false) temp:     filename = temp.name  io.open(filename, mode='w', encoding="utf-8", newline='') temp:     writer = csv.writer(temp, quoting=csv.quote_nonnumeric, lineterminator=str(os.linesep))     headers = ['x', 'y', 'name', 'born']     writer.writerow(headers)     row in data:         print(row)         writer.writerow(row) 

Comments

Popular posts from this blog

c++ - llvm function pass ReplaceInstWithInst malloc -

Cross-Compiling Linux Kernel for Raspberry Pi - ${CCPREFIX}gcc -v does not work -