Reading an Entire File
To begin, we need a file with a few lines of text in it. Let’s start with a file that contains pi to 30 decimal places with 10 decimal places per line:
# pi_digits.txt
3.1415926535
8979323846
2643383279
Here’s a program that opens this file, reads it, and prints the contents of the file to the screen:
# file_reader.py
with open('pi_digits.txt') as file_object:
contents = file_object.read()
print(contents)
The first line of this program has a lot going on. Let’s start by looking at the open()
function.
To do any work with a file, even just printing its con-tents, you first need to open the file to access it. The open() function needs one argument: the name of the file you want to open. Python looks for this file in the directory where the program that’s currently being executed is stored. In this example, file_reader.py is currently running, so Python looks for pi_digits.txt in the directory where file_reader.py is stored. The open() function returns an object representing the file. Here, open(‘pi_digits.txt’) returns an object representing pi_digits.txt. Python stores this object in file_object, which we’ll work with later in the program.
The keyword with closes the file once access to it is no longer needed. Notice how we call open() in this program but not close(). You could open and close the file by calling open() and close(), but if a bug in your program prevents the close() statement from being executed, the file may never close. This may seem trivial, but improperly closed files can cause data to be lost or corrupted. And if you call close() too early in your program, you’ll find yourself trying to work with a closed file (a file you can’t access), which leads to more errors. It’s not always easy to know exactly when you should close a file, but with the structure shown here, Python will figure that out for you. All you have to do is open the file and work with it as desired, trusting that Python will close it automatically when the time is right.
File Paths
Relative Path
Because text_files is inside python_work, you could use a relative file path to open a file from text_files. A relative file path tells Python to look for a given location relative to the directory where the currently running program file is stored. On Linux and OS X, you’d write:
with open('text_files/filename.txt') as file_object:
This line tells Python to look for the desired .txt file in the folder text_files and assumes that text_files is located inside python_work (which it is). On Windows systems, you use a backslash () instead of a forward slash (/) in the file path:
with open('text_files\filename.txt') as file_object:
Absolute Path
Absolute paths are usually longer than relative paths, so it’s helpful to store them in a variable and then pass that variable to open(). On Linux and OS X, absolute paths look like this:
file_path = '/home/ehmatthes/other_files/text_files/filename.txt'
with open(file_path) as file_object:
and on Windows they look like this:
file_path = 'C:\Users\ehmatthes\other_files\text_files\filename.txt'
with open(file_path) as file_object:
Reading Line by Line
When you’re reading a file, you’ll often want to examine each line of the file. You might be looking for certain information in the file, or you might want to modify the text in the file in some way. For example, you might want to read through a file of weather data and work with any line that includes the word sunny in the description of that day’s weather. In a news report, you might look for any line with the tag and rewrite that line with a specific kind of formatting.
You can use a for loop on the file object to examine each line from a file one at a time:
filename = 'pi_digits.txt'
with open(filename) as file_object:
for line in file_object:
print(line)
We store the name of the file we’re reading from in the variable filename. This is a common convention when working with files. Because the variable filename doesn’t represent the actual file—it’s just a string telling Python where to find the file—you can easily swap out ‘pi_digits.txt’ for the name of another file you want to work with. After we call open(), an object representing the file and its contents is stored in the variable file_object v. We again use the with syntax to let Python open and close the file properly. To examine the file’s contents, we work through each line in the file by looping over the file object.
When we print each line, we find even more blank lines:
3.1415926535
8979323846
2643383279
These blank lines appear because an invisible newline character is at the end of each line in the text file. The print statement adds its own newline each time we call it, so we end up with two newline characters at the end of each line: one from the file and one from the print statement. Using rstrip() on each line in the print statement eliminates these extra blank lines.
Making a List of Lines from a File
When you use with, the file object returned by open() is only available inside the with block that contains it. If you want to retain access to a file’s contents outside the with block, you can store the file’s lines in a list inside the block and then work with that list. You can process parts of the file immediately and postpone some processing for later in the program.
The following example stores the lines of pi_digits.txt in a list inside the with block and then prints the lines outside the with block:
filename = 'pi_digits.txt'
with open(filename) as file_object:
lines = file_object.readlines()
for line in lines:
print(line.rstrip())
Reference
- Python Crash Course (2nd Edition) : A Hands-On, Project-Based Introduction to Programming
- Learning Python