More Files

Opening Files

We've been using Python's pathlib.Path objects to read file contents into a string and write strings to a file. This works, but it's a bit of a shortcut.

Let's learn about Python's built-in open function, which is what Python's pathlib module uses under the hood.

Reading Files

Let's read from the file declaration-of-independence.txt.

>>> declaration_file = open('declaration-of-independence.txt')
>>> print(declaration_file.read())
>>> declaration_file.close()

First we open the file, then we read the contents of the file and print them out, then we close the file.

Let's make a program file_stats.py that will read from a file and gives us statistics on the text in a given file.

filename = input('What is the name of the file you want to know about? ')

def print_file_stats(filename):
    stat_file = open(filename)
    contents = stat_file.read()
    stat_file.close()
    word_count = len(contents.split())
    print("Number of Words: {}".format(word_count))

print_file_stats(filename)

Let's try it out:

$ python3 file_stats.py
What is the name of the file you want to know about? declaration-of-indepedence.txt
Number of Words: 1338

It works!

Note

Hey, what's that def print_file_stats(filename) thing about?

That's a function definiton!

Functions allow us to put a bunch of code in one block, and call it later with a single line. It's great for functionality you need to use over and over again, because it keeps you from having to repeat yourself.

Learn more about functions in the bonus section.

Closing Files

We need to remember to always close our files. This isn't as important when reading files, but will be very important when writing files.

This is such a common concern in Python, that the open function supports a special syntax for this.

filename = input('What is the name of the file you want to know about? ')

def print_file_stats(filename):
    with open(filename) as stat_file:
        contents = stat_file.read()
    stat_file.close()
    word_count = len(contents.split())
    print("Number of Words: {}".format(word_count))

print_file_stats(filename)

This with block is called a context manager. Context managers allow us to ensure that particular cleanup tasks occur whenever a block of code is exited. Basically after our context manager block is exited, the stat_file file descriptor will be closed.

Don't worry about understanding context managers fully, just remember that from now on we will always use the with open syntax for opening files.

Mode and Encoding

The last thing we'll learn about are the mode and encoding arguments. Files are opened in read text mode by default. The encoding uses the system default. This is utf-8 on my machine, but it can be different.

Let's make our code a little more explicit about these values:

filename = input('What is the name of the file you want to know about? ')

def print_file_stats(filename):
    with open(filename, mode='rt', encoding='utf-8') as stat_file:
        contents = stat_file.read()
    stat_file.close()
    word_count = len(contents.split())
    print("Number of Words: {}".format(word_count))

print_file_stats(filename)

Writing Files

Let's make a program that writes to a file.

To write to a file, we need to open it with a w in the mode argument. Then we can use the write() function, to write to the file:

>>> with open('test.txt', mode='wt', encoding='utf-8') as test_file:
...     test_file.write("Hello world!\n")
...
13

The write method on our file descriptor writes every character we give it to the file. It returns the number of characters it wrote to the file.

Let's pause here a second so you can all try this on your own, and we'll see what kind of problems arise. Let's do some exercises!

More File Exercises

Find TODOs

Copy-paste this into a file, called todos.py:

from argparse import ArgumentParser
from pathlib import Path

parser = ArgumentParser()  # TODO Write usage information
parser.add_argument("file", type=Path)
args = parser.parse_args()

path = args.file
print("The given path is", path)

# TODO read the path into a "contents" string

# TODO break up the "contents" string into lines

# TODO loop over each line

# TODO check whether the line includes the string "TODO"

# TODO print out the line if it does!

Try out this program like this (yes we're passing the file into itself!):

$ python3 todos.py todos.py
The given path is todos.py

Now update the last TODOs (don't worry about the first one) to gradually get the program functioning as expected.

Hint

You may want to use the string splitlines method (see more).

Tidier Capital Guesser

Update your program capital_guesser.py to use the file us-state-capitals.py instead of the list of lists of states and their capitals.

Hint

Here's how you can read all lines from a file into a list:

capitals_file = open("us-state-capitals.csv", mode="rt")
lines = capitals_file.read().splitlines()

Contact creator

Create a program my_contacts.py that allows the user to enter name, email, phone number, and Twitter handle and write it to a file contacts.csv.

Make sure that it appends to the end of the file instead of overwriting the whole file so you can call the program more than once.

Example usage:

$ python3 my_contacts.py
What is the contact's name? Brenda
What is Brenda's email address? brenda@bebrenda.com
What is Brenda's phone number? 619-867-5309
What is Brenda's Twitter handle? @bebrenda

Output:

Brenda,brenda@bebrenda.com,619-867-5309,@bebrenda

Tip

Opening a file with mode='a' allows you to append to a file, instead of writing over it.

Nice Troll

Make a program nice_troll.py that asks for a file, reads it, and replaces any adjectives that appear in an angry_words list with a random adjective from a nice_words list.

Note

We've been asking for file names as input thus far, but you can also do with with sys.argv.

If you're ready for a new challenge, feel free to the Command Line Arguments section in the bonus material.

ASCIIbetical Contacts

Add to your program my_contact.py so it sorts the tables ASCIIbetically after each new contact is added.

Tip

Hint: sorted() will be of use here.