open function,
which is used to open files for reading.
re module is a great
reference.
extract_names(filename) function which takes the filename
of a baby1990.html file and returns the data from the file as a single list -- the year string at the start of
the list followed by the name-rank strings in alphabetical order.
['2006', 'Aaliyah 91', 'Abagail 895', 'Aaron 57', ...].
Modify main() so it calls your extract_names function and prints what it returns
(main already has the code for the command line argument parsing). If you get stuck working out
the regular expressions for the year and each name, solution regular expression patterns are shown at the end
of this document. Note that for parsing webpages in general, regular expressions don't do a good job, but
these webpages have a simple and consistent format.
sys.exit(0). Here are some suggested
milestones:
[year, 'name rank', ... ] list and print itmain so that it calls extract_namesmain call extract_names for each command line argument and print a text
summary. To make the list into a reasonable looking summary text, here's a clever use of join:
text = '\n'.join(mylist) + '\n'
2006 Aaliyah 91 Aaron 57 Abagail 895 Abbey 695 Abbie 650 ...
--summaryfile is present, do the following: for each input file foo.html, create a new file
foo.html.summary that contains the summary text for that file.
--summaryfile feature is working, run the program on all the files using *
like this:
python babynames.py --summaryfile baby*.html
baby*.html pattern into a list of matching filenames, and then the shell runs babynames.py,
passing in all those filenames in the sys.argv list.)
$ grep 'Trinity ' *.summary $ grep 'Nick ' *.summary $ grep 'Miguel ' *.summary $ grep 'Emily ' *.summary
year:
r'Popularity\sin\s(\d\d\d\d)'
names:
r'<td>(\d+)</td><td>(\w+)</td>\<td>(\w+)</td>'
osos module. The official Python
documentation for the os
module is a great reference. Some highlights:
listdirmkdirmakedirsremoverenamermdirwalkos.pathos.path module. Check out the official Python documentation for the
os.path module. Some
highlights:
os.path.join - very useful for uniform handling of paths on linux (where
directories are separated by /) and windows (where directories are separated by
\)
abspathbasenamedirnameexistsisfileisdirshutilshutil module most notably has the shutil.copy function for copying files
from one location to another. Check out the official Python documentation for the
shutil module.
zipfilezipfile that can be used to work with zip files. Check out the official
Python documentation for the
zipfile module. Using
this module, it's easy to create zip files like so:
with ZipFile('spam.zip', 'w') as myzip:
myzip.write('eggs.txt')
zip program
that can be called.
os, os.path,
shutil, and zipfile modules.
urlliburllib module can be used to make requests and parse URLs (among other things). Check out
the official Python documentation for the
urllib module. Some
highlights:
urllib.request.urlretrieveurllib.parse.urlparseurllib module.
requests package in Python.
venvvenv module can be used to create a Python virtual environment. Check out the
official Python documentation for the
venv module.
requests package within that virtual environment.
subprocesssubprocess module, which can be used to run another program
from your script. Python has come a long way over the years and now has the
subprocess.run
function, which is a very convenient way to run external programs in Python.
argparseargparse module is used in a lot of Python scripts! Check out the official Python
documentation for the
argparse module.
argparse module differs
from the direct use of sys.argv (that can be found in main functions throughout this
class).
__w__ somewhere, where the w is one
or more word chars. The provided main includes code to parse the command line arguments, but the
rest is up to you. Write functions to implement the features below and modify main to call your
functions.
get_special_paths(dir: str) -> list[str] - returns a list of the absolute paths of the
special files in the given directory
copy_to(paths: list[str], dir: str) -> None - given a list of paths, copies those files
into the given directory
zip_to(paths: list[str], zippath: str) - given a list of paths, zip those files up into the
given zipfile
. after the command is a single argument indicating the current
directory). Print one absolute path per line.
$ python copyspecial.py . /Users/miller-time/pycourse/day2/xyz__hello__.txt /Users/miller-time/pycourse/day2/zz__something__.jpg
--todir dir option is present, do not print anything and instead copy the files to the
given directory, creating it if necessary. Use the Python shutil module for file copying.
$ python copyspecial.py --todir /tmp/fooby . $ ls /tmp/fooby xyz__hello__.txt zz__something__.jpg
--tozip zipfile option is present, create the zip file using the Python
zipfile module.
$ python copyspecial.py --tozip tmp.zip .
code.google.com server
(formally, we'll say that the server name is whatever follows the first underbar). The
animal_code.google.com log file contains the data for the "animal" puzzle image. Although the data in
the log files has the syntax of a real apache web server, the data beyond what's needed for the puzzle is
randomized data from a real log file.
10.254.254.28 - - [06/Aug/2007:00:14:08 -0700] "GET /foo/talks/ HTTP/1.1" 200 5910 "-" "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.8.1.4) Gecko/20070515 Firefox/2.0.0.4"
GET path HTTP showing the path of a web request received by the server. The path itself never
contains spaces, and is separated from the GET and HTTP by spaces (regex suggestion:
\S (upper case S) matches any non-space char). Find the lines in the log where the string
"puzzle" appears inside the path, ignoring the many other lines in the log.
read_urls function that extracts the puzzle URLs from inside a log file. Find all
the "puzzle" path URLs in the log file. Combine the path from each URL with the server name from the filename
to form a full URL, e.g. http://www.example.com/path/puzzle/from/inside/file. Exclude URLs that
appear more than once. The read_urls function should return the list of full URLs, sorted into
alphabetical order and without duplicates. Taking the URLs in alphabetical order will yield the image slices
in the correct left-to-right order to re-create the original image. In the simplest case, main
should just print the URLs, one per line.
$ python logpuzzle.py animal_code.google.com http://code.google.com/something/puzzle-animal-baaa.jpg http://code.google.com/something/puzzle-animal-baab.jpg ...
download_images function which takes a sorted list of URLs and a directory. Download
the image from each URL into the given directory, creating the directory first if necessary (see the
os module to create a directory, and urllib.request.urlretrieve for downloading a
url). Name the local image files with a simple scheme like "img0", "img1", "img2", and so on. You may wish to
print a little "Retrieving..." status output line while downloading each image since it can be slow and it's
nice to have some indication that the program is working. Each image is a little vertical slice from the
original. How to put the slices together to re-create the original? It can be solved nicely with a little html
(knowledge of HTML is not required).
download_images function should also create an index.html file in the directory with
an img tag to show each local image file. The img tags should all be on one line
together without separation. In this way, the browser displays all the slices together seamlessly. You do not
need knowledge of HTML to do this; just create an index.html file that looks like this:
<html> <body> <img src="/edu/python/exercises/img0"><img src="/edu/python/exercises/img1"><img src="/edu/python/exercises/img2">... </body> </html>
$ python logpuzzle.py --todir animaldir animal_code.google.com $ ls animaldir img0 img1 img2 img3 img4 img5 img6 img7 img8 img9 index.html
-wordchars-wordchars.jpg, e.g.
http://example.com/foo/puzzle/bar-abab-baaa.jpg, then the sort should use the second word (e.g.
"baaa") for that URL. So sorting a list of URLs each ending with the word-word.jpg pattern should
order the URLs by the second word.
urllib.request
recommends using the Requests package
for a higher-level HTTP client interface. This is a great opportunity to practice using third party libraries
in your scripts.
python -m venv .venv
source .venv/bin/activate
.venv/Scripts/activate on Windows
requests==2.27
2.27.* of this package. A newer version
of the package might be available in the future -- feel free to experiment with newer versions if you'd like.
pip install -r requirements.txt
import requests
urllib.request.urlretrieve. The requests
package supports downloading a file from a given URL with the following approach:
r = requests.get(url, stream=True)
with open(filename, 'wb') as fd:
for chunk in r.iter_content(chunk_size=128):
fd.write(chunk)
$ python logpuzzle.py --todir animaldir animal_code.google.com $ python logpuzzle.py --todir placedir place_code.google.com
subprocess module and we'll practice using the argparse module.
$ git config --global user.name=me $ git config --global user.email=my@email.biz
ArgumentParser will need an argument for specifying that you just want to print a listing.
parser.add_argument('--list', action='store_true', help='list existing git configs')
action='store_true', the value of the parsed args.list will now either be
True or False depending on whether --list was passed to your script. In
addition to simplifying the parsing of arguments, argparse also automatically provides great help
text that can be accessed with -h or --help.
$ python exercises/gitconfig/solution/gitconfig.py -h usage: gitconfig.py [-h] [--list] options: -h, --help show this help message and exit --list list existing git configs
subprocess.run
function to run the following command:
git config --global --list
$ python gitconfig.py --list user.email=my@email.biz user.name=me
ArgumentParser will need some more arguments.
parser.add_argument('--set', help='config key to set')
parser.add_argument('--value', help='config value to set')
args.set will be the key and
args.value will be the value. Now you can construct a command like so:
git config --global {key}={value}
$ python gitconfig.py --set push.default --value current $ python gitconfig.py --set merge.conflictstyle --value diff3 $ python gitconfig.py --set rebase.autostash --value true $ python gitconfig.py --set rebase.autosquash --value true $ python gitconfig.py --set rebase.missingcommitscheck --value error $ python gitconfig.py --set color.ui --value true $ python gitconfig.py --list user.email=my@email.biz user.name=me push.default=current merge.conflictstyle=diff3 rebase.autostash=true rebase.autosquash=true rebase.missingcommitscheck=error color.ui=true