Exercises 3
Find the most common words
Download a text file, like Alice in wonderland from Project Gutenberg, count the number of time each word appears and print the 50 most common words.
Take into account that the Alice in wonderland text really only starts after the line “CHAPTER I.” and ends with the line “THE END”.
A solution using a dictionary to keep the counts.
A solution using the Counter class from the collections module.
Guess the number game
Write a program that chooses a random number between 1 and 10 and asks the user to guess it several times. The program should finish once the user guesses correctly or once the number of available attemps has run out.
Write a Hangman game
import random
DICTIONARY = ['hola', 'caracola', 'casa', 'barco']
INITIAL_NUM_ATTEMPS = 10
def create_revealed_string(secret_word, guessed_letters):
num_letters_missing = 0
revealed_string = ''
for letter in secret_word:
if letter.upper() in guessed_letters:
revealed_letter = letter
else:
revealed_letter = '-'
num_letters_missing += 1
revealed_string += revealed_letter
return revealed_string
def play_game():
secret_word = random.choice(DICTIONARY)
guessed_letters = set()
num_attempts = INITIAL_NUM_ATTEMPS
won = None
while True:
guessed_letter = input('Pick a letter ')
guessed_letters.add(guessed_letter.upper())
revealed_string = create_revealed_string(secret_word, guessed_letters)
num_attempts -= 1
num_letters_to_guess = revealed_string.count('-')
if not num_letters_to_guess:
won = True
break
elif num_letters_to_guess:
if num_attempts:
print('The guess so far: ', revealed_string, guessed_letters)
else:
won = False
break
if won:
print('Congratulations, you have won the game')
else:
print('You have lost')
print('The secret word was: ', secret_word)
play_game()
Read a fasta file
In bioinformatics we use fasta files to store DNA sequences. This is an example of a fasta file with three sequences.
>seq1
CGCTAGCTAGTCTATCGATCTAGTCTAGCT
>seq2 some description after the space
TGTCGATCGTAGTCATCTGATCGACGTATCTA
CTCGAGTCATGCTATCATCATGCTAG
>seq3
TCAGTCGATGCTATCATCGTAGCTGATCGATCTGGCA
CTAGCAGTCGATC
Write a program that reads the sequences found in a fasta file and stores them in a list of dictionaries, one dictionary per sequence.
Now, create functions capable of calculating the percentage of GCs in the sequences and their lengths.
Gene expression
We have done a study in which several patients have been given or not given (the placebo group) a drug. We have measured the expression of three genes and now we want to know which is the mean expression for each gene for the treated and placebo groups. We have also recorded the patient sex and we want the mean expression of each gene for males and females and the combined effect of sex and treament, so male-treated, male-placebo, female-treated, and female-placebo.