Dictionaries - Exercises Discussion

Before you try out the programs, download the files:

mbox.txt : https://www.py4e.com/code3/mbox.txt
mbox-short.txt : https://www.py4e.com/code3/mbox-short.txt


Know-how about files and lists


All the programs require you to read from a file.
fhand = open('file.txt') will return a handle to you with which you can read the file contents by default
we usually put it in a try... except block to avoid a trace message if file is not found or can't be opened.

Reading files can be done in two ways... there are more, but we learnt 2 ways
1.  Read line by line (recommended) - as program won't crash if you file size is greater than the main memory size
2. Read all the file contents.

To read line by line from the file handle object fhand, you would do.
for line in fhand:
     print(line)

usually a line contains a newline character at the end, '\n'.
For usual text processing, we don't require that.
An rstrip() would remove the new lines

for line in fhand:
    line = line.rstrip()
    print(line)

If you're trying to filter out lines, you can.
eg. if line.startswith("From")
or
if 'uct.ac.za' in line

If you're trying to further extract a particular string from a line, again there are ways, but what we learnt in the lecture was to split it into words and access the word using index.

So lets say in the follwoing line, you wanted to extract the month, 'Jan'
 stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
You would do
words  = line.split()
month = words[2]

Again split() by default splits a line, separating the strings by spaces, and returning a list.
It takes an argument, a delimiter that you can use to split using that delimiter

For eg.
If we extracted the time, '09:14:16' from the above line
t = words[4]
Now, if you wanted only the hour, '09', you could extract it further by

tl = t.split(':')
hour= tl[0]

Dictionaries
-how to create a dictionary
d= dict()
or
d={}

- how to add items to a dictionary
d['r'] = 'red'

- check if a key is present in the dictionary
'r' in d

- how to print a dictionary
print(d)
or
for key in d:
   print(key, d[key])

You need to know these functions also :
keys() returns a list of keys
get(key,defaultValue)

d.keys() returns a list of keys
d.get('r','n/a')
prints the value if the key 'r' is present in dictionary or prints the 'n/a' if key is not found.


Write a program to look for lines that start with “From”, then look for the third word and keep a running count of each of the days of the week. At the end of the program print out the contents of your dictionary (order does not matter).
Sample Line:
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
Sample Execution:
python dow.py
Enter a file name: mbox-short.txt
{'Fri': 20, 'Thu': 6, 'Sat': 1}

Program:
fname = input("Enter a file name:")
try:
    fhand = open(fname)
except:
    print("Invalid file name")
    exit()
dday=dict()
for line in fhand:
    line = line.rstrip()
    if line.startswith("From "):
        words = line.split()
        d = words[2]
        dday[d]=dday.get(d,0) + 1
print(dday)

Output:

python ex2.py
Enter a file name:mbox-short.txt
{'Sat': 1, 'Fri': 20, 'Thu': 6}
Write a program to read through a mail log, build a histogram using a dictionary to count how many messages have come from each email address, and print the dictionary. ( Look for lines that start with “From:”)
Enter file name: mbox-short.txt
{'gopal.ramasammycook@gmail.com': 1, 'louis@media.berkeley.edu': 3,
'cwen@iupui.edu': 5, 'antranig@caret.cam.ac.uk': 1,
'rjlowe@iupui.edu': 2, 'gsilver@umich.edu': 3,
'david.horwitz@uct.ac.za': 4, 'wagnermr@iupui.edu': 1,
'zqian@umich.edu': 4, 'stephen.marquard@uct.ac.za': 2,
'ray@media.berkeley.edu': 1}

Program:

fname = input("Enter a file name:")
try:
    fhand = open(fname)
except:
    print("Invalid file name")
    exit()
email = dict()
for line in fhand:
    line =line.rstrip()
    if line.startswith("From:"):
        words = line.split()
        em = words[1]
        email[em] = email.get(em,0) +1
print(email)

Output:
python ex3_email.py
Enter a file name:mbox-short.txt
{'stephen.marquard@uct.ac.za': 2, 'louis@media.berkeley.edu': 3, 'zqian@umich.edu': 4, 'rjlowe@iupui.edu': 2, 'cwen@iupui.edu': 5, 'gsilver@umich.edu': 3, 'wagnermr@iupui.edu': 1, 'antranig@caret.cam.ac.uk': 1, 'gopal.ramasammycook@gmail.com': 1, 'david.horwitz@uct.ac.za': 4, 'ray@media.berkeley.edu': 1}

Add code to the above program to figure out who has the most messages in the file. After all the data has been read and the dictionary has been created, look through the dictionary using a maximum loop (see Chapter 5: Maximum and minimum loops) to find who has the most messages and print how many messages the person has.
Enter a file name: mbox-short.txt
cwen@iupui.edu 5
Enter a file name: mbox.txt
zqian@umich.edu 195

Program:
fname = input("Enter a file name:")
try:
    fhand = open(fname)
except:
    print("Invalid file name")
    exit()
email = dict()
for line in fhand:
    line =line.rstrip()
    if line.startswith("From:"):
        words = line.split()
        em = words[1]
        email[em] = email.get(em,0) +1
#print(email)
maximum = None
mn=0
for key in email:
    if email[key]>mn:
        mn = email[key]
        maximum = key
print(maximum, mn)

Output:
python ex3_email.py
Enter a file name:mbox-short.txt
cwen@iupui.edu 5

This program records the domain name (instead of the address) where the message was sent from instead of who the mail came from (i.e., the whole email address). At the end of the program, print out the contents of your dictionary.
python schoolcount.py
Enter a file name: mbox-short.txt
{'media.berkeley.edu': 4, 'uct.ac.za': 6, 'umich.edu': 7,
'gmail.com': 1, 'caret.cam.ac.uk': 1, 'iupui.edu': 8}

Program:

fname = input("Enter a file name:")
try:
    fhand = open(fname)
except:
    print("Invalid file name")
    exit()
domain = dict()
for line in fhand:
    line =line.rstrip()
    if line.startswith("From:"):
        words = line.split()
        em = words[1].split('@')
        dom= em[1]
        domain[dom] = domain.get(dom,0) +1
print(domain) 

Output:
python dom.py
Enter a file name:mbox-short.txt
{'uct.ac.za': 6, 'media.berkeley.edu': 4, 'umich.edu': 7, 'iupui.edu': 8, 'caret.cam.ac.uk': 1, 'gmail.com': 1}

Comments

Popular posts from this blog

TYL - Food Corner Program

Classes and objects solution

TYL - Salary Hike - Python Problem