Exercises 3 – Introduction to programming with Python

Find the most common words

Download a text file, like Alice in wonderland from Project Gutenberg, count the number of time each word appears and print the 50 most common words.

Take into account that the Alice in wonderland text really only starts after the line “CHAPTER I.” and ends with the line “THE END”.

Guess the number game

Write a program that chooses a random number between 1 and 10 and asks the user to guess it several times. The program should finish once the user guesses correctly or once the number of available attemps has run out.

Write a Hangman game

Tip

import random

DICTIONARY = ['hola', 'caracola', 'casa', 'barco']
INITIAL_NUM_ATTEMPS = 10

def create_revealed_string(secret_word, guessed_letters):

    num_letters_missing = 0
    revealed_string = ''
    for letter in secret_word:
        if letter.upper() in guessed_letters:
            revealed_letter = letter
        else:
            revealed_letter = '-'
            num_letters_missing += 1
        revealed_string += revealed_letter
    return revealed_string


def play_game():
    secret_word = random.choice(DICTIONARY)
    guessed_letters = set()
    num_attempts = INITIAL_NUM_ATTEMPS
    won = None
    while True:
        guessed_letter = input('Pick a letter ')
        guessed_letters.add(guessed_letter.upper())

        revealed_string = create_revealed_string(secret_word, guessed_letters)
        num_attempts -= 1

        num_letters_to_guess = revealed_string.count('-')
        if not num_letters_to_guess:
            won = True
            break
        elif num_letters_to_guess:
            if num_attempts:
                print('The guess so far: ', revealed_string, guessed_letters)
            else:
                won = False
                break

    if won:
        print('Congratulations, you have won the game')
    else:
        print('You have lost')
    print('The secret word was: ', secret_word)


play_game()

Read a fasta file

In bioinformatics we use fasta files to store DNA sequences. This is an example of a fasta file with three sequences.

>seq1
CGCTAGCTAGTCTATCGATCTAGTCTAGCT
>seq2 some description after the space
TGTCGATCGTAGTCATCTGATCGACGTATCTA
CTCGAGTCATGCTATCATCATGCTAG
>seq3
TCAGTCGATGCTATCATCGTAGCTGATCGATCTGGCA
CTAGCAGTCGATC

Write a program that reads the sequences found in a fasta file and stores them in a list of dictionaries, one dictionary per sequence.

Now, create functions capable of calculating the percentage of GCs in the sequences and their lengths.

Gene expression

We have done a study in which several patients have been given or not given (the placebo group) a drug. We have measured the expression of three genes and now we want to know which is the mean expression for each gene for the treated and placebo groups. We have also recorded the patient sex and we want the mean expression of each gene for males and females and the combined effect of sex and treament, so male-treated, male-placebo, female-treated, and female-placebo.