Google Python 3 Class

This is a modernization of the original Google Python Class. This is not a full replacement of the original class. There is a lot of excellent material, including the video series.

The code snippets on this site use Python 3 syntax, and there are some minor fixes to help guide you through later exercises.

In addition to syntax and some standard library imports changing, Python 3 provides syntax for type hints. Read more about type hints in the official Python 3 documentation.

Python Setup

First of all, you'll need Python 3. You can get the latest version of Python at python.org.

Next you'll need to download the exercises. You can get the latest version of the exercises, updated for Python 3, at this project's releases page. Click on Assets under the latest release, then download exercises.zip.

Aside from the version of Python you'll be using and the exercises.zip file you'll be downloading, there is a lot of information on the original Google Python Class Setup page that you may want to look over.

Strings

The original Google Python Class Strings page does an excellent job of introducing the topic, highlighting some useful string methods, and visualizing slices.

Another excellent resource for Python strings is the official Python documentation. Here are some pages to check out:

Once you feel ready, try out the exercises. Use the above pages as a reference if you get stuck.

String Exercises

Try out the basic/string1.py and basic/string2.py exercises. You might want to flip back to the strings Reference page while coming up with your solutions.

Lists

The original Google Python Class Lists page has some nice illustrations, some helpful notes about iterating over lists, and highlights some useful methods for working with lists.

Check out the official Python documentation for more information about lists.

Sorting

The original Google Python Class Sorting page gives a nice overview of sorting in Python.

The official Python documentation also has a comprehensive sorting guide.

Once you feel ready, try out the exercises. Use the above pages as a reference if you get stuck.

List Exercises

Try out the basic/list1.py and basic/list2.py exercises. You might want to flip back to the lists Reference page while coming up with your solutions.

Dicts and Files

The original Google Python Class Dicts and Files page does a great job of introducing Python dicts and file operations.

Of course the official Python documentation describes dicts.

In the official documentation, you can also read about Python's open function, which is used to open files for reading.

Once you feel ready, try out the exercises. Use the above pages as a reference if you get stuck.

Dict and File Exercises

Try out the basic/wordcount.py and basic/mimic.py exercises. You might want to flip back to the dicts and files Reference page while coming up with your solutions.

Regular Expressions

The original Google Python Class Regular Expressions page includes a gentle introduction to Python's regular expression syntax.

The official Python documentation for the re module is a great reference.

Once you feel ready, try out the next exercise. Use the above pages as a reference if you get stuck.

Regular Expression Exercise

The next exercise is babynames/babynames.py. The instructions for the exercise can be found on the original Google Python Class Baby Names page. It is also copied below.

The Social Security administration has this neat data by year of what names are most popular for babies born that year in the USA (see social security baby names).

The files for this exercise are in the "babynames" directory of the downloaded exercises.zip. Add your code in babynames.py. The files baby1990.html baby1992.html ... contain raw html, similar to what you get visiting the above social security site. Take a look at the html and think about how you might scrape the data out of it.

Part 1

In the babynames.py file, implement the extract_names(filename) function which takes the filename of a baby1990.html file and returns the data from the file as a single list -- the year string at the start of the list followed by the name-rank strings in alphabetical order. ['2006', 'Aaliyah 91', 'Abagail 895', 'Aaron 57', ...]. Modify main() so it calls your extract_names function and prints what it returns (main already has the code for the command line argument parsing). If you get stuck working out the regular expressions for the year and each name, solution regular expression patterns are shown at the end of this document. Note that for parsing webpages in general, regular expressions don't do a good job, but these webpages have a simple and consistent format.

Rather than treat the boy and girl names separately, we'll just lump them all together. In some years, a name appears more than once in the html, but we'll just use one number per name. Optional: make the algorithm smart about this case and choose whichever number is smaller.

Build the program as a series of small milestones, getting each step to run/print something before trying the next step. This is the pattern used by experienced programmers -- build a series of incremental milestones, each with some output to check, rather than building the whole program in one huge step.

Printing the data you have at the end of one milestone helps you think about how to re-structure that data for the next milestone. Python is well suited to this style of incremental development. For example, first get it to the point where it extracts and prints the year and calls sys.exit(0). Here are some suggested milestones:

Extract all the text from the file and print it
Find and extract the year and print it
Extract the names and rank numbers and print them
Get the names data into a dict and print it
Build the [year, 'name rank', ... ] list and print it
Update main so that it calls extract_names

Earlier we have had functions just print to standard out. It's more re-usable to have the function return the extracted data, so then the caller has the choice to print it or do something else with it. (You can still print directly from inside your functions for your little experiments during development.)

Have main call extract_names for each command line argument and print a text summary. To make the list into a reasonable looking summary text, here's a clever use of join:

text = '\n'.join(mylist) + '\n'

The summary text should look like this for each file:

2006
Aaliyah 91
Aaron 57
Abagail 895
Abbey 695
Abbie 650
...

Part 2

Suppose instead of printing the text to standard out, we want to write files containing the text. If the flag --summaryfile is present, do the following: for each input file foo.html, create a new file foo.html.summary that contains the summary text for that file.

Once the --summaryfile feature is working, run the program on all the files using * like this:

python babynames.py --summaryfile baby*.html

This generates all the summaries in one step. (The standard behavior of the shell is that it expands the baby*.html pattern into a list of matching filenames, and then the shell runs babynames.py, passing in all those filenames in the sys.argv list.)

With the data organized into summary files, you can see patterns over time with shell commands, like this:

$ grep 'Trinity ' *.summary
$ grep 'Nick ' *.summary
$ grep 'Miguel ' *.summary
$ grep 'Emily ' *.summary

Regular expression hints

year:

r'Popularity\sin\s(\d\d\d\d)'

names:

r'<td>(\d+)</td><td>(\w+)</td>\<td>(\w+)</td>'

Utilities

In this section, we look at a few of Python's many standard utility modules to solve common problems. Much of this reference is adapted from the original Google Python Class Utilities page. Some of the utilities mentioned on that page are no longer used in Python (because better replacements exist).

`os`

First of all, there are many useful utilities in Python's os module. The official Python documentation for the os module is a great reference. Some highlights:

listdir
mkdir
makedirs
remove
rename
rmdir
walk

`os.path`

Next up is Python's os.path module. Check out the official Python documentation for the os.path module. Some highlights:

os.path.join - very useful for uniform handling of paths on linux (where directories are separated by /) and windows (where directories are separated by \)
abspath
basename
dirname
exists
isfile
isdir

`shutil`

The Python shutil module most notably has the shutil.copy function for copying files from one location to another. Check out the official Python documentation for the shutil module.

`zipfile`

Python has a module called zipfile that can be used to work with zip files. Check out the official Python documentation for the zipfile module. Using this module, it's easy to create zip files like so:

with ZipFile('spam.zip', 'w') as myzip:
    myzip.write('eggs.txt')

This is handy because not all systems (especially Windows) are guaranteed to have a zip program that can be called.

In the Copy Special exercise, you'll be able to practice using the os, os.path, shutil, and zipfile modules.

`urllib`

Python's urllib module can be used to make requests and parse URLs (among other things). Check out the official Python documentation for the urllib module. Some highlights:

urllib.request.urlretrieve
urllib.parse.urlparse

In the Log Puzzle exercise, you'll be able to practice using the urllib module.

Python's documentation recommends using the Requests library for making requests. See Part 4 of the Log Puzzle exercise for a guide to installing and using the requests package in Python.

`venv`

The Python venv module can be used to create a Python virtual environment. Check out the official Python documentation for the venv module.

In Part 4 of the Log Puzzle exercise, you'll create a virtual environment and install the requests package within that virtual environment.

`subprocess`

Another very handle module is Python's subprocess module, which can be used to run another program from your script. Python has come a long way over the years and now has the subprocess.run function, which is a very convenient way to run external programs in Python.

`argparse`

The Python argparse module is used in a lot of Python scripts! Check out the official Python documentation for the argparse module.

In the Git Config exercise, you'll get a chance to see how using the argparse module differs from the direct use of sys.argv (that can be found in main functions throughout this class).

Copy Special Exercise

The next exercise is copyspecial/copyspecial.py. The instructions for the exercise can be found on the original Google Python Class Copy Special page. It is also copied below.

The copyspecial.py program takes one or more directories as its arguments. We'll say that a "special" file is one where the name contains the pattern __w__ somewhere, where the w is one or more word chars. The provided main includes code to parse the command line arguments, but the rest is up to you. Write functions to implement the features below and modify main to call your functions.

Suggested functions for your solution(details below):

get_special_paths(dir: str) -> list[str] - returns a list of the absolute paths of the special files in the given directory
copy_to(paths: list[str], dir: str) -> None - given a list of paths, copies those files into the given directory
zip_to(paths: list[str], zippath: str) - given a list of paths, zip those files up into the given zipfile

Part 1

Gather a list of the absolute paths of the special files in all the directories. In the simplest case, just print that list (here the . after the command is a single argument indicating the current directory). Print one absolute path per line.

$ python copyspecial.py .
/Users/miller-time/pycourse/day2/xyz__hello__.txt
/Users/miller-time/pycourse/day2/zz__something__.jpg

We'll assume that names are not repeated across the directories (optional: check that assumption and error out if it's violated).

Part 2

If the --todir dir option is present, do not print anything and instead copy the files to the given directory, creating it if necessary. Use the Python shutil module for file copying.

$ python copyspecial.py --todir /tmp/fooby .
$ ls /tmp/fooby
xyz__hello__.txt        zz__something__.jpg

Part 3

If the --tozip zipfile option is present, create the zip file using the Python zipfile module.

$ python copyspecial.py --tozip tmp.zip .

Log Puzzle Exercise

The next exercise is logpuzzle/logpuzzle.py. The instructions for the exercise can be found on the original Google Python Class Log Puzzle page. It is also copied below.

An image of an animal has been broken into many narrow vertical stripe images. The stripe images are on the internet somewhere, each with its own URL. The URLs are hidden in a web server log file. Your mission is to find the URLs and download all image stripes to re-create the original image.

The slice URLs are hidden inside apache log files (the open source apache web server is the most widely used server on the internet). Each log file is from some server, and the desired slice URLs are hidden within the logs. The log file encodes what server it comes from like this: the log file animal_code.google.com is from the code.google.com server (formally, we'll say that the server name is whatever follows the first underbar). The animal_code.google.com log file contains the data for the "animal" puzzle image. Although the data in the log files has the syntax of a real apache web server, the data beyond what's needed for the puzzle is randomized data from a real log file.

Here is what a single line from the log file looks like (this really is what apache log files look like):

10.254.254.28 - - [06/Aug/2007:00:14:08 -0700] "GET /foo/talks/ HTTP/1.1"
200 5910 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"

The first few numbers are the address of the requesting browser. The most interesting part is the GET path HTTP showing the path of a web request received by the server. The path itself never contains spaces, and is separated from the GET and HTTP by spaces (regex suggestion: \S (upper case S) matches any non-space char). Find the lines in the log where the string "puzzle" appears inside the path, ignoring the many other lines in the log.

Part 1

Complete the read_urls function that extracts the puzzle URLs from inside a log file. Find all the "puzzle" path URLs in the log file. Combine the path from each URL with the server name from the filename to form a full URL, e.g. http://www.example.com/path/puzzle/from/inside/file. Exclude URLs that appear more than once. The read_urls function should return the list of full URLs, sorted into alphabetical order and without duplicates. Taking the URLs in alphabetical order will yield the image slices in the correct left-to-right order to re-create the original image. In the simplest case, main should just print the URLs, one per line.

$ python logpuzzle.py animal_code.google.com
http://code.google.com/something/puzzle-animal-baaa.jpg
http://code.google.com/something/puzzle-animal-baab.jpg
...

Part 2

Complete the download_images function which takes a sorted list of URLs and a directory. Download the image from each URL into the given directory, creating the directory first if necessary (see the os module to create a directory, and urllib.request.urlretrieve for downloading a url). Name the local image files with a simple scheme like "img0", "img1", "img2", and so on. You may wish to print a little "Retrieving..." status output line while downloading each image since it can be slow and it's nice to have some indication that the program is working. Each image is a little vertical slice from the original. How to put the slices together to re-create the original? It can be solved nicely with a little html (knowledge of HTML is not required).

The download_images function should also create an index.html file in the directory with an img tag to show each local image file. The img tags should all be on one line together without separation. In this way, the browser displays all the slices together seamlessly. You do not need knowledge of HTML to do this; just create an index.html file that looks like this:

<html>
<body>
<img src="/edu/python/exercises/img0"><img src="/edu/python/exercises/img1"><img src="/edu/python/exercises/img2">...
</body>
</html>

Here's what it should look like when you download the animal puzzle:

$ python logpuzzle.py --todir animaldir animal_code.google.com
$ ls animaldir
img0  img1  img2  img3  img4  img5  img6  img7  img8  img9  index.html

When it's all working, opening the index.html in a browser should reveal the original animal image. What is the animal in the image?

Part 3

The second puzzle involves an image of a very famous place but depends on some custom sorting. For the first puzzle, the URLs can be sorted alphabetically to order the images correctly. In the sort, the whole URL is used. However, we'll say that if the URL ends in the pattern -wordchars-wordchars.jpg, e.g. http://example.com/foo/puzzle/bar-abab-baaa.jpg, then the sort should use the second word (e.g. "baaa") for that URL. So sorting a list of URLs each ending with the word-word.jpg pattern should order the URLs by the second word.

Extend your code to order such URLs properly, and then you should be able to decode the place_code.google.com puzzle which shows a famous place. What place does it show?

Part 4

The official Python documentation for urllib.request recommends using the Requests package for a higher-level HTTP client interface. This is a great opportunity to practice using third party libraries in your scripts.

You can either work in your existing logpuzzle directory or create a new directory for the new version of the puzzle. Within that directory, we want to create a Python virtual environment using this command:

python -m venv .venv

Next we want to activate our new virtual environment:

source .venv/bin/activate

Note: The path will be .venv/Scripts/activate on Windows

Now we want to create a file called requirements.txt, which will contain a list of the third-party packages that our script depends on.

requests==2.27

This file ensures that our script will always use version 2.27.* of this package. A newer version of the package might be available in the future -- feel free to experiment with newer versions if you'd like.

Next we need to install the package into our active virtual environment.

pip install -r requirements.txt

And now in the logpuzzle.py script, we can import the new package:

import requests

The goal here is to replace our call to urllib.request.urlretrieve. The requests package supports downloading a file from a given URL with the following approach:

r = requests.get(url, stream=True)
with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size=128):
        fd.write(chunk)

Run your script again to verify that it still works as expected.

$ python logpuzzle.py --todir animaldir animal_code.google.com
$ python logpuzzle.py --todir placedir place_code.google.com

CC Attribution: the images used in this puzzle were made available by their owners under the Creative Commons Attribution 2.5 license, which generously encourages remixes of the content such as this one. The animal image is from the user zappowbang at flickr and the place image is from the user booleansplit at flickr.

Git Config Exercise

The final exercise is gitconfig/gitconfig.py. In this exercise we'll run an external program using Python's subprocess module and we'll practice using the argparse module.

Before jumping in, make sure git is installed on your system. On Windows, Git BASH is a great option.

If you have a new git installation, then a good first step to make sure it works and get it configured properly is to set your global username and email (used for authoring commits):

$ git config --global user.name=me
$ git config --global user.email=my@email.biz

Part 1

As a first step, we'll just list all of the properties in our global git config. Your ArgumentParser will need an argument for specifying that you just want to print a listing.

parser.add_argument('--list', action='store_true', help='list existing git configs')

By setting action='store_true', the value of the parsed args.list will now either be True or False depending on whether --list was passed to your script. In addition to simplifying the parsing of arguments, argparse also automatically provides great help text that can be accessed with -h or --help.

$ python exercises/gitconfig/solution/gitconfig.py -h
usage: gitconfig.py [-h] [--list]

options:
  -h, --help     show this help message and exit
  --list         list existing git configs

Use the subprocess.run function to run the following command:

git config --global --list

When you run your script, you should see your git config printed to the screen:

$ python gitconfig.py --list
user.email=my@email.biz
user.name=me

Part 2

Next, we'll make it possible to set git config properties with your script.

Your ArgumentParser will need some more arguments.

parser.add_argument('--set', help='config key to set')
parser.add_argument('--value', help='config value to set')

Arguments added like this store a string value. This is what you'll want for passing the config key and value to your function that runs the external command. The parsed args.set will be the key and args.value will be the value. Now you can construct a command like so:

git config --global {key}={value}

Try it out with these git configs that also happen to be very useful!

$ python gitconfig.py --set push.default --value current
$ python gitconfig.py --set merge.conflictstyle --value diff3
$ python gitconfig.py --set rebase.autostash --value true
$ python gitconfig.py --set rebase.autosquash --value true
$ python gitconfig.py --set rebase.missingcommitscheck --value error
$ python gitconfig.py --set color.ui --value true

$ python gitconfig.py --list
user.email=my@email.biz
user.name=me
push.default=current
merge.conflictstyle=diff3
rebase.autostash=true
rebase.autosquash=true
rebase.missingcommitscheck=error
color.ui=true