Exercises-2

Assignment - Similarity of Sequences

Write in an editor the program, which calculates the distance between two sequences.

seq1 = "ACGT"
seq2 = "AGGT"

A simple program (without function and modules) is sufficient.

  1. Calculate the distance between the following sequences and print out the result.
    Since the following sequences are already aligned, we can calculate the distance between them. Change your program so that it can read two aligned sequences from the command line. Test your program with the following sequences.
a) ACGT and A-GT
b) AC-GT and AGT--
c) AC-CGT and AGT---
d) ACCGT and TGCCA
e) GATT-ACA and TACCATAC
f) --GA--TT--AC-A and TA--CC--AT--CA
  1. Extend the program that the aligned sequences are printed out additionally to their distance.
  2. Extend the program that the distance between two sequences is only calculated when both sequences have the same length. Test your program with the input sequences:
a) ACGT and AGT
b) ACCGT and TGCCA
  1. Extend the program that the second sequence is inverted and assigned to a third sequence. Please, read the first and second sequence from the command line. Calculate the distances between the first and the second and between the first and the third sequence.

Compare the distance between the first and the second and the first and the third sequence and print the alignment with the smaller distance. If the distances are equal, then print the alignment of the first and second sequence.

Test your program with the following sequences:

a) ACGT and A-GT
b) AC-GT and AGT--
c) ACCGT and TGCCA
d) GATT-ACA and TACCATAC

 

Bonus Exercises

Functions

Open an editor and save your new program. In this program we will create a few functions.

1.1 Define the two functions similarity and distance:


similarity(a,b)={1,if a=b0.5,if ab, a and b are both purines or pyrimidines0,otherwise


distance(a,b)={0,if a=b0.5,if ab, a and b are both purines or pyrimidines1,otherwise

Note: Purines are A and G, pyrimidines are C and T.

1.2 Write two functions sequence_similarity and sequence_distance, which calculates the similarity and distance of two whole sequences.

1.3 Calculate the similarity and distance for the following sequences.
Read these sequences from the command line and print out their similarity and distance.

Modules

In this exercise we will write three different programs.

2.1 Write a new Python file (module) called sequence_tools.py which contain both the two functions similarity and distance as defined previously.

2.2 Write another Python file that calculates for each combination of two sequences stored in list seq_list the similarity and distance using the module defined previously.

l = ["ATCCGGT", "GCGTTAC", "CTACTGC", "TTGCAGT", "AGTCACC"]

 

2.3 Extend your program. Determine the combination of sequences with the highest similarity of all sequences stored in list l. Write these two sequences and the alignment into a new file, called similar_sequences.txt.

For example for two given sequences: “ATC” and “ACC” The alignment would be:

ATC
| |
ACC

And this alignment should be written to a new output file.
Hint: A line-break in Python can be made by adding ’\n’ to the end of the line.