morFeus is a program to search for remotely conserved orthologs.
Based on relaxed BLAST-searches, it aims to find remote orthologs of sequence orphans by:
- Clustering query-hit alignments based on their similarity to each other
- Performing iterative back-BLAST searches to verify orthology based on reciprocal best hit relationships
- Establishing a measure of similarity that is independent of the E-value; this is achieved by calculating a network score that represents the connectivity pattern of the hits with each other
The diagram below shows the pipeline of a morFeus search.
The strength of morFeus is four-fold:
MorFeus uses extremely relaxed BLAST parameters (E-value of 100), we can pick up more remotely conserved homologs than in standard searches.
As we cluster the alignments of the hits with the query protein based on their similarity to each other, we can filter for hits with a similar conservation pattern. Sequences (or better their alignments to the query) that belong to the same protein family are therefore clustererd according to their conserved pattern of amino acid residues.
Clustering is optional and can be turned off in the command-line version of morFeus. In this case, back-BLAST searches are started based on the E-value rather than the closeness in the hirarchical tree.
We perform back-BLAST searches to verify orthology relationships of potential hits. morFeus performs the back-BLASTs in an iterative manner. When a sequence hit is picked up in more than two back-BLAST searches, it is automatically
included for back-BLASTing in the next round. Like this, we generate a cluster of orthologs that are all entitled to select hits for reciprocal BLAST searches. This maximises the number of hits that are tested for orthology. Likewise,
confirmed orthologs (by their reciprocal best hit relationships) can exclude sequences as orthology candidates if for instance a much more closely related sequence from the same species has already been assigned as an ortholog to a hit.
We apply a majority vote for hit exclusion and only if more than a third of verified orthologs reject a sequence, the hit is truly excluded.
We have introduced a network score to get an independent measure for the degree of relatedness between sequences. Based on the results from the initial BLAST and all back-BLAST searches, we generate a network of orthology that
contains the relationships of the hits to the query and to each other. Using the connectivity of the nodes, we calculate a network score based on Eigenvector Centrality.
The network score independently scores hits based on their orthologous relationship to the other nodes (proteins) in the orthology network. The higher the network score, the more connections a protein has in the network.