Example 4:
How to Generate Base
Conversion Tables between Samples
In
this example, you will take two samples, generate profiles for both of them and
compute the probability of base conversions between them.
Convert a Multi-FASTA file
into a profile:
In this
section, we will convert two multi-FASTA files into their respect profiles.
1. Go (cd) into the ANDES/example_data directory.
2. Run Clustalw2 on two sets of sequences to generate a couple .aln files
clustalw2
-infile=20090416.fasta –quicktree
clustalw2
-infile= 20081201.fasta
-quicktree
This will generate the 20090416.aln and the 20081201.aln file, and their respective .dnd files.
3. Convert the .aln files to profiles.
../ClustalALN_to_PositionProfile.pl
-a 20090416.aln
../ClustalALN_to_PositionProfile.pl
-a 20081201.aln
This will generate the 20090416.prof and 20081201.prof files.
Compute the Base
Conversion probabilities for two profiles:
4. Compute the base conversion probabilities between two profiles you just generated:
../Compute_BaseConversion_Tables.pl
-o bc.table 20090416.prof 20081201.prof
This will generate a file named bc.table
It should look something like this:
20090416->20090416:
A
T G C
-
A 99.83%
0.04% 0.10% 0.03%
0.00%
T 0.06%
99.81% 0.03% 0.10%
0.00%
G 0.16%
0.03% 99.80% 0.00%
0.00%
C 0.06%
0.13% 0.00% 99.80%
0.00%
- 0.00%
0.00% 0.00% 0.00%
0.00%
20090416->20081201:
A
T G C
-
A 74.18%
4.23% 13.90% 2.92%
4.78%
T 5.33%
77.53% 0.93% 14.75%
1.45%
G 19.92%
2.82% 71.89% 2.05%
3.33%
C 5.60%
18.69% 3.31% 70.21%
2.18%
- 0.00%
0.00% 0.00% 0.00%
0.00%
20081201->20090416:
A
T G
C -
A 79.80%
3.78% 13.34% 3.08%
0.00%
T 6.34%
76.69% 2.63% 14.35%
0.00%
G 22.80%
1.01% 73.40% 2.78%
0.00%
C 5.85%
19.52% 2.56% 72.08%
0.00%
- 53.57%
10.71% 23.21% 12.50%
0.00%
20081201->20081201:
A
T G C
-
A 99.39%
0.06% 0.50% 0.05%
0.00%
T 0.09%
99.73% 0.04% 0.15%
0.00%
G 0.76%
0.04% 99.20% 0.01%
0.00%
C 0.09%
0.20% 0.01% 99.70%
0.00%
- 0.00%
0.00% 0.00% 0.00%
100.00%
Remember
that these are conditional probabilities, so the percentages will sum to 100%
going across, but not going down. There
are 4 tables generated so you can see the base conversion within a sample and
between them. The profile name is
labeled before each table.