Tuples with Dictionary - Exercises

Some helpful hints are give on solving the problem.

The post will be updated after assignment is submitted with solution.

Part A : Tuples with Dictionary (Class Assignment) 

Question 1

Open a file to read. Download mbox.txt and mbox-short.txt. 
Read and parse the “From” lines and pull out the addresses from the line.
Count the number of messages from each person using a dictionary. 
After all the data has been read, print the person with the most commits by creating a list of (count, email) tuples from the dictionary. 
Then sort the list in reverse order and print out the person who has the most commits.

Sample Line: From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008 

Enter a file name: mbox-short.txt
 cwen@iupui.edu 5 
Enter a file name: mbox.txt 
zqian@umich.edu 195


Please note that that there are two types of From lines in the sample file :
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
From: stephen.marquard@uct.ac.za

We are interested in the first format which starts with From without the ':'

line.startswith('From ')


email id is the 2nd element that can be accessed with index 1


Program:

fname = input('Enter a file name:')

try:
    fhand = open(fname)
except:
    print('Invalid file')
    exit()
    
#Sample line From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
d={}
for line in fhand:
    line = line.rstrip()
    if line.startswith('From '):
        words = line.split()
        email = words[1]
        
        d[email] = d.get(email,0) +1

#decorate, create a list of tuples(count,email)
lst=[]
for email,count in d.items():
    lst.append( (count,email) )

#sort
lst.sort(reverse=True)

#undecorate

m_count, m_email = lst[0]

print(m_email,m_count)
        

Outptut:

python ex1.py

Enter a file name:mbox.txt
zqian@umich.edu 195

python ex1.py

Enter a file name:mbox-short.txt
cwen@iupui.edu 5

Question 2

Open a file to read. Download mbox.txt and mbox-short.txt.  
This program counts the distribution of the hour of the day for each of the messages.
You can pull the hour from the “From” line by finding the time string and then splitting that string into parts using the colon character.
Once you have accumulated the counts for each hour, print out the counts, one per line, sorted by hour as shown below.

python timeofday.py
Enter a file name: mbox-short.txt
04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1
19 1

Explanation:

Please note that that there are two types of From lines in the sample file :
From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
From: stephen.marquard@uct.ac.za

We are interested in the first format which starts with From without the ':'

line.startswith('From ')

time is in the 6th position, access using index 5

split again using ':' to extract the hour


Program:


fname = input('Enter a file name:')

try:
    fhand = open(fname)
except:
    print('Invalid file')
    exit()
    
#Sample line From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
d={}
for line in fhand:
    line = line.rstrip()
    if line.startswith('From '):
        words = line.split()
        time = words[5]
        hour = time.split(':')[0]
        
        d[hour] = d.get(hour,0) +1

#decorate, create a list of tuples(hour,count)
lst=[]
for hour,count in d.items():
    lst.append( (hour,count) )

#sort
lst.sort()

#undecorate

for hour,count in lst:
    print(hour,count)


Output:


python ex2.py
Enter a file name:mbox-short.txt
04 3
06 1
07 1
09 2
10 3
11 6
14 1
15 2
16 4
17 2
18 1

19 1

Question 3

Write a program that reads a file and prints the letters in decreasing order of frequency. 
Your program should convert all the input to lower case and only count the letters a-z.
Your program should not count spaces, digits, punctuation, or anything other than the letters a-z.
Use lower() to convert a character to an alphabet.
Check against https://en.wikipedia.org/wiki/Letter_frequency

Pretty interesting stuff on letter frequencies on various languages, dialects, it's impact on the design of the keyboard, etc.

Filter out empty lines, newline characters at end of a line and punctuation before starting.
Use isalpha() to check for whether a character is an alphabet.


  Program:

import string
fname = input('Enter a file name:')

try:
    fhand = open(fname)
except:
    print('Invalid file')
    exit()
    
d={}
for line in fhand:
    line = line.rstrip()
    line = line.lower()
    line = line.translate(line.maketrans('','',string.punctuation))
    for c in line:
        if c.isalpha():
            d[c] = d.get(c,0) + 1
    
all_ch = d.values()
total_ch=sum(all_ch)

#decorate, create a list of tuples(count, ch)
lst=[]
for ch,count in d.items():
    frequency = (count/total_ch)*100
    lst.append ( (frequency, ch ) )

#sort
lst.sort(reverse=True)

#undecorate

for frequency,ch in lst:
    print('%s %0.2f' % (ch, frequency) )


    Output:

Enter a file name:mbox.txt
e 9.60
a 8.44
i 7.52
o 7.34
t 7.19
r 6.70
s 6.56
c 5.66
u 5.05
n 4.53
m 4.23
p 4.00
d 3.52
l 3.44
h 2.47
k 2.07
b 2.03
v 1.78
f 1.76
g 1.63
j 1.22
y 1.14
w 1.11
x 0.81
q 0.10
z 0.09


Comments

Popular posts from this blog

TYL - Food Corner Program

Classes and objects solution

TYL - Salary Hike - Python Problem