Using stdopen¶
stdopen is designed to handle the logic of using STDIN/STDOUT or a file. So it is best used in stripts that can take an input from STDIN or a file and can output to STDOUT or a file. It also can write to a temp file and then seamlessly copy it over to it’s desired location upon success or delete it upon failure. It is a very simple contextmanager function that just implements the logic. As it is a function and not a class it must be used as a solely contextmanager
[1]:
# The only import you need to worry about in your code is
# `import stdopen`. All the others are for illustration
from contextlib import contextmanager
from stdopen.example_data import examples
import stdopen
import csv
import gzip
import io
import sys
import tempfile
import os
import shutil
import time
[2]:
# Here are all the example datasets we will use test_text_path
# and compressed_test_text_path
examples.list_datasets()
[2]:
[('dummy_data', 'A dummy dataset function that returns a small list.'),
('dummy_load_data',
'A dummy dataset function that loads a string from a file.'),
('test_text_path', 'Return a file path to a compressed test file.'),
('compressed_test_text_path',
'Return a file path to a compressed test file.')]
[3]:
DELIMITER = "\t"
Reading files¶
Reading from a file is pretty similar to using a normal open call and if you know that is all you ever whant to do then there is no advantage to using stdopen. However, if the file may sometimes be a file name and somethimes STDIN then there is.
[4]:
# A regular file (no real difference to using open)
with stdopen.open(examples.get_data("test_text_path")) as infile:
reader = csv.reader(infile, delimiter=DELIMITER)
for row in reader:
print(row)
['colA', 'colB', 'colC', 'colD']
['row1colA', 'row1colB', 'row1colC', 'row1colD']
['row2colA', 'row2colB', 'row2colC', 'row2colD']
['row3colA', 'row3colB', 'row3colC', 'row3colD']
['row4colA', 'row4colB', 'row4colC', 'row4colD']
['row5colA', 'row5colB', 'row5colC', 'row5colD']
['row6colA', 'row6colB', 'row6colC', 'row6colD']
What if the input file is gzip compressed. In that case you can supply the open method argument.
[5]:
# A compressed file
with stdopen.open(examples.get_data("compressed_test_text_path"), method=gzip.open) as infile:
reader = csv.reader(infile, delimiter=DELIMITER)
for row in reader:
print(row)
['colA', 'colB', 'colC', 'colD']
['row1colA', 'row1colB', 'row1colC', 'row1colD']
['row2colA', 'row2colB', 'row2colC', 'row2colD']
['row3colA', 'row3colB', 'row3colC', 'row3colD']
['row4colA', 'row4colB', 'row4colC', 'row4colD']
['row5colA', 'row5colB', 'row5colC', 'row5colD']
['row6colA', 'row6colB', 'row6colC', 'row6colD']
Reading from STDIN¶
stdopen.open can be easily re-configured to read from STDIN by changing the file name to either of:
None''(emptry string)'-'- similar to the unix command line
To demonstrate this we will have to mock some input from STDIN. You can ignore the fake_stdin function in your code.
[6]:
@contextmanager
def fake_stdin():
"""Temp overide stdin and reset after test
"""
# Slurp all the input in
data = io.StringIO(
gzip.open(
examples.get_data("compressed_test_text_path"), 'rt'
).read()
)
# Backup and overide STDIN
bak = sys.stdin
sys.stdin = data
# Yield anything
yield True
# Restore
sys.stdin = bak
[7]:
# You can ignore this call
with fake_stdin():
# Should also work with NoneType and ''
with stdopen.open('-') as infile:
reader = csv.reader(infile, delimiter=DELIMITER)
for row in reader:
print(row)
['colA', 'colB', 'colC', 'colD']
['row1colA', 'row1colB', 'row1colC', 'row1colD']
['row2colA', 'row2colB', 'row2colC', 'row2colD']
['row3colA', 'row3colB', 'row3colC', 'row3colD']
['row4colA', 'row4colB', 'row4colC', 'row4colD']
['row5colA', 'row5colB', 'row5colC', 'row5colD']
['row6colA', 'row6colB', 'row6colC', 'row6colD']
Writing files¶
stdopen allows you to write via temp files as well. This is useful if you do not want the output file to appear until after the process has been successful.
[8]:
# We will write each of these elements to file
my_list = ['A', 'B', 'C', 'D']
# Create a working directory in your home dir
# We can delete this later
working_dir = tempfile.mkdtemp(prefix='stdopen', dir=os.environ['HOME'])
# Test file
test_file = os.path.join(working_dir, "my_test_file.txt")
try:
# Enseure the test file does not exist
os.unlink(test_file)
except FileNotFoundError:
pass
Here we demonstrate that the output file does not exist until the context manager has exited
[9]:
# Write to the output file via a tmp file
with stdopen.open(test_file, 'wt', use_tmp=True, tmpdir=working_dir) as outfile:
for i in my_list:
print("Output file exists:", os.path.exists(test_file))
outfile.writelines(i)
time.sleep(1)
print("Output file exists:", os.path.exists(test_file))
Output file exists: False
Output file exists: False
Output file exists: False
Output file exists: False
Output file exists: True
Writing to STDOUT¶
As with STDIN, we can redirect to STDOUT by altering the file name to None, '', or '-'. Below illustrates writing to STDOUT in text mode. In non-jupyter settings we can also write to STDOUT in binary as well.
[10]:
# Write to the output file via a tmp file
with stdopen.open('-', 'wt') as outfile:
for i in my_list:
outfile.writelines(i)
time.sleep(1)
ABCD
[11]:
# Delete the temp dir
shutil.rmtree(working_dir)