Writing to an Empty File
To write text to a file, you need to call open() with a second argument telling Python that you want to write to the file. To see how this works, let’s write a simple message and store it in a file instead of printing it to the screen:
filename = 'programming.txt'
with open(filename, 'w') as file_object:
file_object.write("I love programming.")
The second argument, ‘w’, tells Python that we want to open the file in write mode. You can open a file in read mode (‘r’), write mode (‘w’), append mode (‘a’), or a mode that allows you to read and write to the file (‘r+’). If you omit the mode argument, Python opens the file in read-only mode by default.
The open() function automatically creates the file you’re writing to if it doesn’t already exist. However, be careful opening a file in write mode (‘w’) because if the file does exist, Python will erase the file before returning the file object.
In the example above, close()
is called implicitly. Actually, we can call close()
explicitly without using with
keyword.
>>> f = open('data.txt', 'w') # Make a new file in output mode ('w' is write)
>>> f.write('Hello\n') 6 # Write strings of characters to it
>>> f.write('world\n') 6 # Return number of items written in Python 3.X
>>> f.close() # Close to flush output buffers to disk
Writing Multiple Lines
The write() function doesn’t add any newlines to the text you write. So if you write more than one line without including newline characters, your file may not look the way you want it to:
filename = 'programming.txt'
with open(filename, 'w') as file_object:
file_object.write("I love programming.")
file_object.write("I love creating new games.")
Binary Bytes Files
Python 3.X draws a sharp distinction between text and binary data in files: text files represent content as normal str strings and perform Unicode encoding and decoding automatically when writing and reading data, while binary files represent content as a special bytes string and allow you to access file content unaltered.
For example, binary files are useful for processing media, accessing data created by C programs, and so on. To illustrate, Python’s struct module can both create and unpack packed binary data—raw bytes that record values that are not Python objects—to be written to a file in binary mode.
>>> import struct
>>> packed = struct.pack('>i4sh', 7, b'spam', 8) # Create packed binary data
>>> packed # 10 bytes, not objects or text
b'\x00\x00\x00\x07spam\x00\x08'
>>>
>>> file = open('data.bin', 'wb') # Open binary output file
>>> file.write(packed) # Write packed binary data
10
>>> file.close()
Reading binary data back is essentially symmetric; not all programs need to tread so deeply into the low-level realm of bytes, but binary files make this easy in Python:
>>> data = open('data.bin', 'rb').read() # Open/read binary data file
>>> data # 10 bytes, unaltered
b'\x00\x00\x00\x07spam\x00\x08'
>>> data[4:8] # Slice bytes in the middle
b'spam'
>>> list(data) # A sequence of 8-bit bytes
[0, 0, 0, 7, 115, 112, 97, 109, 0, 8]
>>> struct.unpack('>i4sh', data) # Unpack into objects again
(7, b'spam', 8)
Unicode Text Files
Python text files automatically encode on writes and decode on reads per the encoding scheme name you provide. In Python 3.X:
>>> S = 'sp\xc4m' # Non-ASCII Unicode text
>>> S
'spÄm'
>>> S[2] # Sequence of characters
'Ä'
>>> S.encode('utf-8')
b'sp\xc3\x84m'
>>> S.encode('utf-8').decode('utf-8')
'spÄm'
>>> file = open('unidata.txt', 'w', encoding='utf-8') # Write/encode UTF-8 text
>>> file.write(S) # 4 characters written
4
>>> file.close()
>>> text = open('unidata.txt', encoding='utf-8').read() # Read/decode UTF-8 text
>>> text
'spÄm'
>>> len(text) # 4 chars (code points)
4
This is also useful to see how text files would automatically encode the same string differently under different encoding names, and provides a way to translate data to different encodings—it’s different bytes in files, but decodes to the same string in memory if you provide the proper encoding name:
>>> text = 'spÄm'
>>> text.encode('latin-1')
b'sp\xc4m'
>>> text.encode('utf-16')
b'\xff\xfes\x00p\x00\xc4\x00m\x00'
>>> len(text.encode('latin-1')), len(text.encode('utf-16'))
(4, 10)
>>> b'\xff\xfes\x00p\x00\xc4\x00m\x00'.decode('utf-16')
'spÄm'
Storing Native Python Objects: pickle
Using eval to convert from strings to objects, as demonstrated in the preceding code, is a powerful tool. In fact, sometimes it’s too powerful. eval will happily run any Python expression—even one that might delete all the files on your computer, given the necessary permissions! If you really want to store native Python objects, but you can’t trust the source of the data in the file, Python’s standard library pickle module is ideal.
The pickle module is a more advanced tool that allows us to store almost any Python object in a file directly, with no to- or from-string conversion requirement on our part. It’s like a super-general data formatting and parsing utility. To store a dictionary in a file, for instance, we pickle it directly:
>>> D = {'a': 1, 'b': 2}
>>> F = open('datafile.pkl', 'wb')
>>> import pickle
>>> pickle.dump(D, F) # Pickle any object to file
>>> F.close()
Then, to get the dictionary back later, we simply use pickle again to re-create it:
>>> F = open('datafile.pkl', 'rb')
>>> E = pickle.load(F) # Load any object from file
>>> E
{'a': 1, 'b': 2}
Storing Python Objects in JSON Format
For example, a Python dictionary with nested structures is very similar to JSON data, though Python’s variables and expressions support richer structuring options (any part of the following can be an arbitrary expression in Python code):
>>> name = dict(first='Bob', last='Smith')
>>> rec = dict(name=name, job=['dev', 'mgr'], age=40.5)
>>> rec
{'job': ['dev', 'mgr'], 'name': {'last': 'Smith', 'first': 'Bob'}, 'age': 40.5}
The final dictionary format displayed here is a valid literal in Python code, and almost passes for JSON when printed as is, but the json module makes the translation official —here translating Python objects to and from a JSON serialized string representation in memory:
>>> import json
>>> json.dumps(rec)
'{"job": ["dev", "mgr"], "name": {"last": "Smith", "first": "Bob"}, "age": 40.5}'
>>> S = json.dumps(rec)
>>> S
'{"job": ["dev", "mgr"], "name": {"last": "Smith", "first": "Bob"}, "age": 40.5}'
>>> O = json.loads(S)
>>> O
{'job': ['dev', 'mgr'], 'name': {'last': 'Smith', 'first': 'Bob'}, 'age': 40.5}
>>> O == rec
True
Reference
- Python Crash Course (2nd Edition) : A Hands-On, Project-Based Introduction to Programming
- Learning Python