Example 4:

 

How to Generate Base Conversion Tables between Samples

In this example, you will take two samples, generate profiles for both of them and compute the probability of base conversions between them.

 

Convert a Multi-FASTA file into a profile:

In this section, we will convert two multi-FASTA files into their respect profiles.

 

1.       Go (cd) into the ANDES/example_data directory.

2.       Run Clustalw2 on two sets of sequences to generate a couple .aln files

 

clustalw2 -infile=20090416.fasta –quicktree

clustalw2 -infile= 20081201.fasta -quicktree

 

                This will generate the 20090416.aln and the 20081201.aln file, and their respective .dnd files.

 

3.       Convert the .aln files to profiles. 

 

../ClustalALN_to_PositionProfile.pl -a 20090416.aln

../ClustalALN_to_PositionProfile.pl -a 20081201.aln

 

This will generate the 20090416.prof and 20081201.prof files. 

 

Compute the Base Conversion probabilities for two profiles:

 

4.       Compute the base conversion probabilities between two profiles you just generated:

 

../Compute_BaseConversion_Tables.pl -o bc.table 20090416.prof 20081201.prof

 

                This will generate a file named bc.table

                It should look something like this:

 

20090416->20090416:

        A       T       G       C       -

A       99.83%   0.04%   0.10%   0.03%   0.00%

T        0.06%  99.81%   0.03%   0.10%   0.00%

G        0.16%   0.03%  99.80%   0.00%   0.00%

C        0.06%   0.13%   0.00%  99.80%   0.00%

-        0.00%   0.00%   0.00%   0.00%   0.00%

 

20090416->20081201:

        A       T       G       C       -

A       74.18%   4.23%  13.90%   2.92%   4.78%

T        5.33%  77.53%   0.93%  14.75%   1.45%

G       19.92%   2.82%  71.89%   2.05%   3.33%

C        5.60%  18.69%   3.31%  70.21%   2.18%

-        0.00%   0.00%   0.00%   0.00%   0.00%

 

20081201->20090416:

        A       T       G       C       -

A       79.80%   3.78%  13.34%   3.08%   0.00%

T        6.34%  76.69%   2.63%  14.35%   0.00%

G       22.80%   1.01%  73.40%   2.78%   0.00%

C        5.85%  19.52%   2.56%  72.08%   0.00%

-       53.57%  10.71%  23.21%  12.50%   0.00%

 

20081201->20081201:

        A       T       G       C       -

A       99.39%   0.06%   0.50%   0.05%   0.00%

T        0.09%  99.73%   0.04%   0.15%   0.00%

G        0.76%   0.04%  99.20%   0.01%   0.00%

C        0.09%   0.20%   0.01%  99.70%   0.00%

-        0.00%   0.00%   0.00%   0.00%  100.00%

 

 

Remember that these are conditional probabilities, so the percentages will sum to 100% going across, but not going down.  There are 4 tables generated so you can see the base conversion within a sample and between them.  The profile name is labeled before each table.