How to Generate an Consensus FASTA file from a Multi-FASTA of a Sample
In this example, you will take a sample and convert it into a consensus fasta file. You can find some sample input files in ANDES/example_data.† Remember that the multi-FASTA are all the reads for a given sample.† So you are generating a consensus sequence for a given sample.† This example show you how to also remove noise from a sample, so you donít have too many ambiguity codes in your consensus.
Convert a Multi-FASTA file into a profile:
In this example, we will convert a multi-FASTA into a profile.
1. Go (cd) into the ANDES/example_data directory.
2. Run Clustalw2 on sequences to generate .aln file
clustalw2 -infile=20081201.fasta -quicktree
††††††††††††††† This will generate the 20081201.dnd and the 20081201.aln file.
3. Convert .aln file to profile.†
../ClustalALN_to_PositionProfile.pl -a 20081201.aln
This will generate the 20081201.prof.†
4. Convert the profile to a consensus fasta file
../Profile_To_ConsensusFASTA.pl -c 20081201.cons.fasta -p 20081201.prof
††††††††††††††† This will generate the 20081201.cons.fasta file.†
Letís say that there are too many ambiguity codes in this fasta file that you canít find it useful.† You can filter out low frequencies alleles.
Apply a percentage filter to the profile to remove noise:
5. Run a percent threshold filter on the profile:
../Filter_Profile_By_Threshold.pl -i 20081201.prof -o 20081201.gt10p.prof -p 10
††††††††††††††† This will remove any alleles from the profile that do not exist at greater than 10%, and write a new profile named: 20081201.gt10p.prof
6. Regenerate a new consensus with the filtered profile:
../Profile_To_ConsensusFASTA.pl -c 20081201.cons.gt10p.fasta -p 20081201.gt10p.prof
††††††††††††††† This will generate a new consensus fasta file with the name 20081201.cons.gt10p.fasta
You can sanity check the results by doing a diff between the two files.† You should see that the filtered fasta file has fewer ambiguity codes in it.