regex - How do I extract a range of lines from a text file using sed -n but in Python? -
say have file 10gb has 20,000 lines filled digits of pi.
- 123123
- 12312312
- 123123
- 123123
- 12312312
- 123123
how extract lines 10,000 20,000 using unix command sed -n
?
i'd each line newline character export file using code below.
so far, have following:
com = "sed -n \' " + str(window[0]) + "," + str(window[1]) + "p\' " + "sample.txt" + ">" + "output.txt" os.system(com)
but throwing concatenation errors.
how should phrase command sed -n
python in program below?
inputfilename = "sample.txt" import itertools import linecache def sliding_window(window_size, step_size, last_window_start): in xrange(0, last_window_start, step_size): yield (i, + window_size) yield (last_window_start, total_pi_digits) def picrop(window_size, step_size): f = open(inputfilename, 'r') first_line = f.readline().split() total_pi_digits = int(first_line[0]) last_window_start = total_pi_digits-(total_pi_digits%window_size) lastcounter = (total_pi_digits//window_size)*(window_size/step_size) flags = [false in range(lastcounter)] first_line[0] = str(window_size) second_line = f.readline().split() offset = int(round(float(second_line[0].strip('\n')))) first_line = " ".join(first_line) f. close() open(inputfilename, 'r') f: header = f.readline() counter, window in enumerate(sliding_window(window_size,step_size,last_window_start)): open('picrop_{}.txt'.format(counter), 'w') output: if (flags[counter] == false): flags[counter] = true headerline = float(linecache.getline(inputfilename, window[1]+1)) - offset output.write(str(window_size) + " " + str("{0:.4f}".format(headerline)) + " " + 'l' + '\n') com = "sed -n \' " + str(window[0]) + "," + str(window[1]) + "p\' " + "sample.txt" + ">" + "output.txt" os.system(com) picrop(1000,500)
you can yield each line file:
def lines(filename): open(filename) f: line in f: yield line
and can slice sequence using islice
:
from itertools import islice open('picrop.txt', 'w') output: line in islice(lines('sample.txt'), 10000, 20001): output.write(line)
Comments
Post a Comment