About PROUST-II and how to use the server


A very simple description

Given:


This program:



More detail/background

Proteins often belong to large homologous families.  While most proteins in such families will most often show similarities in their molecular functions, they show important differences that are often related to details like enzymatic substrate specificity, or prefered interacting partners.  In such instances it is important to understand how the family has evolved to perform these different functions.  For "orphan" sequences, it is also very useful to be able to predict which of the sometimes many different functions that are performed by a family the sequences is most likely to be.

PROUST-II uses HMMer hidden Markov Model profiles to do just this.  A very detailed description can be found in the reference below (Hannehalli & Russell, 2000). Essentially, the method subtracts profiles for sequences that are not in a group from the profile for the sequences in the group.  This leaves only those parts of the profile that are unique to the group.  After identifying positions that are mostly  likely to be conferring the sub-types, it uses them to predict sub-types for those sequences that are in the alignment but are not in any group ("orphans").


Using the PROUST-II server

Go to:
Back to the top

Format for alignments

PROUST-II reads most standard alignment formats. The full list is:

There is no need to specify the format in advance, as the program will guess. However, if you get a message saying that it can't read the format, specifying the format may be necessary (i.e. PROUST-II can't guess the correct format.

Back to the top

Format for groups

A groups file must a line for each identifier (e.g. 100K_RAT, P123456, gi1234323, etc.) from the alignment that is known to be in a group. Each line must contain at least two fields, one with the identifier and one with the group. By default, the first field is expected to be the identifier, and the second the group, but you can change this if you like on the submit form. For example:

CYAB_STIAU/9-194    ADC
CYAG_DICDI/387-572  ADC
CYAG_DICDI/655-825  ADC
CYG2_RAT/399-582    GUC
CYG3_BOVIN/473-662  GUC
CYG3_CAEEL/889-1077 GUC
So here there are two groups, ADC (= adenylate cyclase) and GUC (= guanylate cyclase), each with three identifiers.

Back to the top


A worked example

Let's have a look at how PROUST-II works. Imagine that you are presented with this sequence:

>AAK45954 purine cyclase-related protein [Mycobacterium tuberculosis CDC1551]
MRLVPQTPRSSLPGSARTTYPCHVEVGPQDSESGAPDETATAMASPVPRQRSALRWLRTVNRSPGLVSFI
HRARRLLPGDPEFGDPLSTAGEGGPRAAARAADRLLRDRDAASREVGLSVLQVWQALTEAVSRRPANPEV
TLVFTDLVGFSTWSLHAGDDATLTLLRQVARAVESPLLDAGGHIVKRLGDGIMAVFRNPTVALRAVLVAQ
DAVKSLEVQGYTPRMRIGIHTGRPQRLAADWLGVDVNIAARVMERATKGGIMISQPTLDLIPQSELDALG
VVARRVRKPVFASKPTGIPPDLAIYRIKTVSESTAADNFDEMSPDAQ

Say also that you manage to find out (e.g. by BLAST) that it belongs to a family of proteins that contain adenylate and guanylate cyclases, and you get an alignment of this sequence to this family (e.g. see the file cyclase.aln; adapted from an alignment from Pfam). Moreover, you know that this family performs a cyclisation reaction that takes either ATP or GTP and converts it to cAMP or cGMP, and you also have "grouped" a set of other sequences into their groups (e.g. see the file cyclase.gr) . The "either" is important here, as this is what we are going to try and predict using PROUST-II. Note also that this was probably also a problem for the genome annotators. Notice that the sequence is called a "purine" (i.e. adenylate or guanylate) cyclase. I suspect that this is probably because the matches (with BLAST) to the other members of the family are quite weak, making it difficult to state whether it is ATP or GTP specific by looking, for example, at BLAST E-values.

So, we would like to know first of all what determines whether a cyclase is ATP or GTP specific, and secondly what is the likely specificity of this new protein. Try running the program with these files, or just trust me that what you get (or should get) with the defaults will look something like what appears below.

Back to the top

PROUST-II: PRediction Of Unknown Sub-Types
Please cite: S.S. Hannenhalli & R.B. Russell 2000
Analysis and prediction of protein sub-types from multiple sequence alignments
J. Mol. Biol. 303, 61-76, 2000.
PubMed
Alignment & Group data summary
Alignment format was Clustal with 71 sequences and a length of 390
Group GUC has 29 members, that score between -4.6 and -3.1 (mean -3.5, sd 0.4)
Group ADC has 46 members, that score between -11.7 and -2.9 (mean -4.4, sd 1.5)
A total of 6 positions were found with Z>= 2.0

Predictions
SequencePredictionConfidencePred scoreIndividual scores
AAK45954
ADC
LOW -4.69GUC -21.2 ADC -11.6

Predicted sub-type distinguishing positions
The known or predicted group is given to the right of the identifier.
Groups prefixed with a "p" denote predictions given above.
Residues having P>=Pmin are coloured according to type:
dark blue = positive; red = negative; yellow = hydrophobic; light-blue = small.
Numbers below are positional Z-scores rounded to the nearest integer.

ANPA_HUMAN/867-1053     GUC ------------------------------------------------------------
ANPA_MOUSE/863-1049     GUC ------------------------------------------------------------
ANPA_RAT/863-1049       GUC ------------------------------------------------------------
ANPB_ANGJA/856-1042     GUC ------------------------------------------------------------
ANPB_BOVIN/852-1038     GUC ------------------------------------------------------------
ANPB_HUMAN/852-1038     GUC ------------------------------------------------------------
ANPB_RAT/852-1038       GUC ------------------------------------------------------------
CYG2_RAT/399-582        GUC ------------------------------------------------------------
CYG3_BOVIN/473-662      GUC ------------------------------------------------------------
CYG3_CAEEL/889-1077     GUC ------------------------------------------------------------
CYG3_RAT/471-660        GUC ------------------------------------------------------------
CYG4_HUMAN/512-701      GUC ------------------------------------------------------------
CYG5_HUMAN/469-658      GUC ------------------------------------------------------------
CYGD_BOVIN/876-1063     GUC ------------------------------------------------------------
CYGD_HUMAN/871-1058     GUC ------------------------------------------------------------
CYGE_MOUSE/874-1061     GUC ------------------------------------------------------------
CYGE_RAT/874-1061       GUC ------------------------------------------------------------
CYGF_HUMAN/875-1062     GUC ------------------------------------------------------------
CYGF_RAT/875-1062       GUC ------------------------------------------------------------
CYGH_DROME/456-645      GUC ------------------------------------------------------------
CYGS_STRPU/905-1091     GUC ------------------------------------------------------------
CYGX_RAT/884-1071       GUC ------------------------------------------------------------
HSER_HUMAN/815-1002     GUC ------------------------------------------------------------
HSER_PIG/815-1002       GUC ------------------------------------------------------------
HSER_RAT/814-1001       GUC ------------------------------------------------------------
KSGC_RAT/225-411        GUC ------------------------------------------------------------
CYG1_BOVIN/412-605      GUC ------------------------------------------------------------
CYG1_HUMAN/412-605      GUC ------------------------------------------------------------
CYG1_RAT/412-605        GUC ------------------------------------------------------------
1azsa                   ADC ------------------------------------------------------------
CYA1_BOVIN/296-457      ADC ------------------------------------------------------------
CYA1_DROME/266-427      ADC ------------------------------------------------------------
CYA1_DROME/954-1153     ADC ------------------------------------------------------------
CYA1_HUMAN/14-175       ADC ------------------------------------------------------------
CYA2_HUMAN/263-463      ADC ------------------------------------------------------------
CYA2_RAT/280-447        ADC ------------------------------------------------------------
CYA2_RAT/877-1077       ADC ------------------------------------------------------------
CYA3_RAT/310-472        ADC ------------------------------------------------------------
CYA3_RAT/914-1121       ADC ------------------------------------------------------------
CYA4_RAT/264-418        ADC ------------------------------------------------------------
CYA4_RAT/853-1053       ADC ------------------------------------------------------------
CYA5_CANFA/382-543      ADC ------------------------------------------------------------
CYA5_RABIT/463-624      ADC ------------------------------------------------------------
CYA5_RAT/297-458        ADC ------------------------------------------------------------
CYA6_CANFA/368-528      ADC ------------------------------------------------------------
CYA6_MOUSE/368-528      ADC ------------------------------------------------------------
CYA6_RAT/368-529        ADC ------------------------------------------------------------
CYA7_HUMAN/270-432      ADC ------------------------------------------------------------
CYA7_MOUSE/272-434      ADC ------------------------------------------------------------
CYA8_HUMAN/405-589      ADC ------------------------------------------------------------
CYA8_RAT/402-586        ADC ------------------------------------------------------------
CYA9_MOUSE/385-565      ADC ------------------------------------------------------------
CYAA_ANACY/311-502      ADC ------------------------------------------------------------
CYAG_DICDI/387-572      ADC ------------------------------------------------------------
CYA1_BOVIN/862-1059     ADC ------------------------------------------------------------
CYA1_HUMAN/579-776      ADC ------------------------------------------------------------
CYA5_CANFA/985-1179     ADC ------------------------------------------------------------
CYA5_RABIT/1065-1259    ADC ------------------------------------------------------------
CYA5_RAT/899-1093       ADC ------------------------------------------------------------
CYA6_CANFA/967-1161     ADC ------------------------------------------------------------
CYA6_MOUSE/967-1161     ADC ------------------------------------------------------------
CYA6_RAT/968-1162       ADC ------------------------------------------------------------
CYA1_DROME/954-1153-64  ADC ------------------------------------------------------------
CYA2_HUMAN/263-463-65   ADC ------------------------------------------------------------
CYA2_RAT/877-1077-66    ADC ------------------------------------------------------------
CYA4_RAT/853-1053-67    ADC ------------------------------------------------------------
CYA7_HUMAN/870-1069     ADC ------------------------------------------------------------
CYA7_MOUSE/889-1088     ADC ------------------------------------------------------------
CYA3_RAT/914-1121-70    ADC ------------------------------------------------------------
CYA8_HUMAN/973-1172     ADC ------------------------------------------------------------
CYA8_RAT/970-1169       ADC ------------------------------------------------------------
CYA9_MOUSE/1049-1244    ADC ------------------------------------------------------------
CYAA_DICDI/1180-1356    ADC ------------------------------------------------------------
1ab8a                   ADC ------------------------------------------------------------
1azsb                   ADC ------------------------------------------------------------
AAK45954               pADC MRLVPQTPRSSLPGSARTTYPCHVEVGPQDSESGAPDETATAMASPVPRQRSALRWLRTV
Z score                     ____________________________________________________________

ANPA_HUMAN/867-1053     GUC ------------------------------------------------------------
ANPA_MOUSE/863-1049     GUC ------------------------------------------------------------
ANPA_RAT/863-1049       GUC ------------------------------------------------------------
ANPB_ANGJA/856-1042     GUC ------------------------------------------------------------
ANPB_BOVIN/852-1038     GUC ------------------------------------------------------------
ANPB_HUMAN/852-1038     GUC ------------------------------------------------------------
ANPB_RAT/852-1038       GUC ------------------------------------------------------------
CYG2_RAT/399-582        GUC ------------------------------------------------------------
CYG3_BOVIN/473-662      GUC ------------------------------------------------------------
CYG3_CAEEL/889-1077     GUC ------------------------------------------------------------
CYG3_RAT/471-660        GUC ------------------------------------------------------------
CYG4_HUMAN/512-701      GUC ------------------------------------------------------------
CYG5_HUMAN/469-658      GUC ------------------------------------------------------------
CYGD_BOVIN/876-1063     GUC ------------------------------------------------------------
CYGD_HUMAN/871-1058     GUC ------------------------------------------------------------
CYGE_MOUSE/874-1061     GUC ------------------------------------------------------------
CYGE_RAT/874-1061       GUC ------------------------------------------------------------
CYGF_HUMAN/875-1062     GUC ------------------------------------------------------------
CYGF_RAT/875-1062       GUC ------------------------------------------------------------
CYGH_DROME/456-645      GUC ------------------------------------------------------------
CYGS_STRPU/905-1091     GUC ------------------------------------------------------------
CYGX_RAT/884-1071       GUC ------------------------------------------------------------
HSER_HUMAN/815-1002     GUC ------------------------------------------------------------
HSER_PIG/815-1002       GUC ------------------------------------------------------------
HSER_RAT/814-1001       GUC ------------------------------------------------------------
KSGC_RAT/225-411        GUC ------------------------------------------------------------
CYG1_BOVIN/412-605      GUC ------------------------------------------------------------
CYG1_HUMAN/412-605      GUC ------------------------------------------------------------
CYG1_RAT/412-605        GUC ------------------------------------------------------------
1azsa                   ADC ------------------------------------------------------------
CYA1_BOVIN/296-457      ADC ------------------------------------------------------------
CYA1_DROME/266-427      ADC ------------------------------------------------------------
CYA1_DROME/954-1153     ADC ------------------------------------------------------------
CYA1_HUMAN/14-175       ADC ------------------------------------------------------------
CYA2_HUMAN/263-463      ADC ------------------------------------------------------------
CYA2_RAT/280-447        ADC ------------------------------------------------------------
CYA2_RAT/877-1077       ADC ------------------------------------------------------------
CYA3_RAT/310-472        ADC ------------------------------------------------------------
CYA3_RAT/914-1121       ADC ------------------------------------------------------------
CYA4_RAT/264-418        ADC ------------------------------------------------------------
CYA4_RAT/853-1053       ADC ------------------------------------------------------------
CYA5_CANFA/382-543      ADC ------------------------------------------------------------
CYA5_RABIT/463-624      ADC ------------------------------------------------------------
CYA5_RAT/297-458        ADC ------------------------------------------------------------
CYA6_CANFA/368-528      ADC ------------------------------------------------------------
CYA6_MOUSE/368-528      ADC ------------------------------------------------------------
CYA6_RAT/368-529        ADC ------------------------------------------------------------
CYA7_HUMAN/270-432      ADC ------------------------------------------------------------
CYA7_MOUSE/272-434      ADC ------------------------------------------------------------
CYA8_HUMAN/405-589      ADC ------------------------------------------------------------
CYA8_RAT/402-586        ADC ------------------------------------------------------------
CYA9_MOUSE/385-565      ADC ------------------------------------------------------------
CYAA_ANACY/311-502      ADC ------------------------------------------------------------
CYAG_DICDI/387-572      ADC ------------------------------------------------------------
CYA1_BOVIN/862-1059     ADC ------------------------------------------------------------
CYA1_HUMAN/579-776      ADC ------------------------------------------------------------
CYA5_CANFA/985-1179     ADC ------------------------------------------------------------
CYA5_RABIT/1065-1259    ADC ------------------------------------------------------------
CYA5_RAT/899-1093       ADC ------------------------------------------------------------
CYA6_CANFA/967-1161     ADC ------------------------------------------------------------
CYA6_MOUSE/967-1161     ADC ------------------------------------------------------------
CYA6_RAT/968-1162       ADC ------------------------------------------------------------
CYA1_DROME/954-1153-64  ADC ------------------------------------------------------------
CYA2_HUMAN/263-463-65   ADC ------------------------------------------------------------
CYA2_RAT/877-1077-66    ADC ------------------------------------------------------------
CYA4_RAT/853-1053-67    ADC ------------------------------------------------------------
CYA7_HUMAN/870-1069     ADC ------------------------------------------------------------
CYA7_MOUSE/889-1088     ADC ------------------------------------------------------------
CYA3_RAT/914-1121-70    ADC ------------------------------------------------------------
CYA8_HUMAN/973-1172     ADC ------------------------------------------------------------
CYA8_RAT/970-1169       ADC ------------------------------------------------------------
CYA9_MOUSE/1049-1244    ADC ------------------------------------------------------------
CYAA_DICDI/1180-1356    ADC ------------------------------------------------------------
1ab8a                   ADC ------------------------------------------------------------
1azsb                   ADC ------------------------------------------------------------
AAK45954               pADC NRSPGLVSFIHRARRLLPGDPEFGDPLSTAGEGGPRAAARAADRLLRDRDAASREVGLSV
Z score                     ____________________________________________________________

ANPA_HUMAN/867-1053     GUC -----------------VQAEAFDSVTIYFSDIVGFTALSAESTP----MQVVTLLNDLY
ANPA_MOUSE/863-1049     GUC -----------------VQAEAFDSVTIYFSDIVGFTALSAESTP----MQVVTLLNDLY
ANPA_RAT/863-1049       GUC -----------------VQAEAFDSVTIYFSDIVGFTALSAESTP----MQVVTLLNDLY
ANPB_ANGJA/856-1042     GUC -----------------VQAEAFDSVTIYFSDIVGFTSMSAESTP----LQVVTLLNDLY
ANPB_BOVIN/852-1038     GUC -----------------VQAEAFDSVTIYFSDIVGFTALSAESTP----MQVVTLLNDLY
ANPB_HUMAN/852-1038     GUC -----------------VQAEAFDSVTIYFSDIVGFTALSAESTP----MQVVTLLNDLY
ANPB_RAT/852-1038       GUC -----------------VQAEAFDSVTIYFSDIVGFTALSAESTP----MQVVTLLNDLY
CYG2_RAT/399-582        GUC -----------------VAAGEFETCTILFSDVVTFTNICAACEP----IQIVNMLNSMY
CYG3_BOVIN/473-662      GUC -----------------VQAKRFGNVTMLFSDIVGFTAICSQCSP----LQVITMLNALY
CYG3_CAEEL/889-1077     GUC -----------------VEPEGFDSVTVFFSDVVKFTILASKCSP----FQTVNLLNDLY
CYG3_RAT/471-660        GUC -----------------VQAKKFNEVTMLFSDIVGFTAICSQCSP----LQVITMLNALY
CYG4_HUMAN/512-701      GUC -----------------VQARKFDDVTMLFSDIVGFTAICAQCTP----MQVISMLNELY
CYG5_HUMAN/469-658      GUC -----------------VQAKKFSNVTMLFSDIVGFTAICSQCSP----LQVITMLNALY
CYGD_BOVIN/876-1063     GUC -----------------VEPEYFEEVTLYFSDIVGFTTISAMSEP----IEVVDLLNDLY
CYGD_HUMAN/871-1058     GUC -----------------VEPEYFEQVTLYFSDIVGFTTISAMSEP----IEVVDLLNDLY
CYGE_MOUSE/874-1061     GUC -----------------VEPEYFEEVTLYFSDIVGFTTISAMSEP----IEVVDLLNDLY
CYGE_RAT/874-1061       GUC -----------------VEPEYFEEVTLYFSDIVGFTTISAMSEP----IEVVDLLNDLY
CYGF_HUMAN/875-1062     GUC -----------------VEPEGFDLVTLYFSDIVGFTTISAMSEP----IEVVDLLNDLY
CYGF_RAT/875-1062       GUC -----------------VEPEGFDLVTLYFSDIVGFTTISAMSEP----IEVVDLLNDLY
CYGH_DROME/456-645      GUC -----------------IDAKTYPDVTILFSDIVGFTSICSRATP----FMVISMLEGLY
CYGS_STRPU/905-1091     GUC -----------------VLPETFEMVSIFFSDIVGFTALSAASTP----IQVVNLLNDLY
CYGX_RAT/884-1071       GUC -----------------VEPEYFDQVTIYFSDIVGFTTISALSEP----IEVVGFLNDLY
HSER_HUMAN/815-1002     GUC -----------------VEPELYEEVTIYFSDIVGFTTICKYSTP----MEVVDMLNDIY
HSER_PIG/815-1002       GUC -----------------VEPELYEEVTIYFSDIVGFTTICKYSTP----MEVVDMLNDIY
HSER_RAT/814-1001       GUC -----------------VEPELYEEVTIYFSDIVGFTTICKYSTP----MEVVDMLNDIY
KSGC_RAT/225-411        GUC -----------------VEPEHFESVTIFFSDIVGFTKLCSLSSP----LQVVKLLNDLY
CYG1_BOVIN/412-605      GUC -----------------VPAKRYDNVTILFSGIVGFNAFCSKHASGEGAMKIVNLLNDLY
CYG1_HUMAN/412-605      GUC -----------------VPAKRYDNVTILFSGIVGFNAFCSKHASGEGAMKIVNLLNDLY
CYG1_RAT/412-605        GUC -----------------VPAKRYDNVTILFSGIVGFNAFCSKHASGEGAMKIVNLLNDLY
1azsa                   ADC -----------DMMFHKIYIQKHDNVSILFADIEGFTSLASQCTA----QELVMTLNELF
CYA1_BOVIN/296-457      ADC -----------------IYIQRHDNVSILFADIVGFTGLASQCTA----QELVKLLNELF
CYA1_DROME/266-427      ADC -----------------IYIQKHENVSILFADIVGFTVLSSQCSA----QELVRLLNELF
CYA1_DROME/954-1153     ADC -----------------LYHQSYAKVGVIFASVPNFNEFYTEMDGSDQGLECLRLLNEII
CYA1_HUMAN/14-175       ADC -----------------IYIQRHDNVSILFADIVGFTGLASQCTA----QELVKLLNELF
CYA2_HUMAN/263-463      ADC -----------------LYHQSYDCVCVMFASIPDFKEFYTESDVNKEGLECLRLLNEII
CYA2_RAT/280-447        ADC -----------------LYVKRHTNVSILYADIVGFTRLASDCSP----GELVHMLNELF
CYA2_RAT/877-1077       ADC -----------------LYHQSYDCVCVMFASIPDFKEFYTESDVNKEGLECLRLLNEII
CYA3_RAT/310-472        ADC -----------------MYMYRHENVSILFADIVGFTQLSSACSA----QELVKLLNELF
CYA3_RAT/914-1121       ADC -----------------LYSQSYDEIGVMFASLPNFADFYTEESINNGGIECLRFLNEII
CYA4_RAT/264-418        ADC -----------------LYVKRHQGVSVLYADIVGFTRLASECSP----KELVLMLNELF
CYA4_RAT/853-1053       ADC -----------------LYHQSYECVCVLFASIPDFKEFYSESNINHEGLECLRLLNEII
CYA5_CANFA/382-543      ADC -----------------IYIQKHDNVSILFADIEGFTSLASQCTA----QELVMTLNELF
CYA5_RABIT/463-624      ADC -----------------IYIQKHDNVSILFADIEGFTSLASQCTA----QELVMTLNELF
CYA5_RAT/297-458        ADC -----------------IYIQKHDNVSILFADIEGFTSLASQCTA----QELVMTLNELF
CYA6_CANFA/368-528      ADC -----------------IYIQKHDNVSILFADIEGFTSLASQCTA----QELVMTLNELF
CYA6_MOUSE/368-528      ADC -----------------IYIQKHDNVSILFADIEGFTSLASQCTA----QELVMTLNELF
CYA6_RAT/368-529        ADC -----------------IYIQKHDNVSILFADIEGFTSLASQCTA----QELVMTLNELF
CYA7_HUMAN/270-432      ADC -----------------LYVKRHQNVSILYADIVGFTQLASDCSP----KELVVVLNELF
CYA7_MOUSE/272-434      ADC -----------------LYVKRHQNVSILYADIVGFTRLASDCSP----KELVVVLNELF
CYA8_HUMAN/405-589      ADC -----------------IYIHRYENVSILFADVKGFTNLSTTLSA----QELVRMLNELF
CYA8_RAT/402-586        ADC -----------------IYIHRYENVSILFADVKGFTNLSTTLSA----QELVRMLNELF
CYA9_MOUSE/385-565      ADC -----------------FKMQQIEEVSILFADIVGFTKMSANKSA----HALVGLLNDLF
CYAA_ANACY/311-502      ADC -----------------VGAASTRRMTILFCDIRGYTSMSEAMEP----IEIFRFLNDYL
CYAG_DICDI/387-572      ADC -----------------VVAERSNNACVFFLDIAGFTRFSSIHSP----EQVIQVLIKIF
CYA1_BOVIN/862-1059     ADC -----------------LYYQSYSQVGVMFASIPNFNDFYIELDGNNMGVECLRLLNEII
CYA1_HUMAN/579-776      ADC -----------------LYYQSYSQVGVMFASIPNFNDFYIELDGNNMGVECLRLLNEII
CYA5_CANFA/985-1179     ADC -----------------LYYQSCECVAVMFASIANFSEFYVELEANNEGVECLRVLNEII
CYA5_RABIT/1065-1259    ADC -----------------LYYQSCECVAVMFASIANFSEFYVELEANNEGVECLRLLNEII
CYA5_RAT/899-1093       ADC -----------------LYYQSCECVAVMFASIANFSEFYVELEANNEGVECLRLLNEII
CYA6_CANFA/967-1161     ADC -----------------LYYQSCECVAVMFASIANFSEFYVELEANNEGVECLRLLNEII
CYA6_MOUSE/967-1161     ADC -----------------LYYQSCECVAVMFASIANFSEFYVELEANNEGVECLRLLNEII
CYA6_RAT/968-1162       ADC -----------------LYYQSCECVAVMFASIANFSEFYVELEANNEGVECLRLLNEII
CYA1_DROME/954-1153-64  ADC -----------------LYHQSYAKVGVIFASVPNFNEFYTEMDGSDQGLECLRLLNEII
CYA2_HUMAN/263-463-65   ADC -----------------LYHQSYDCVCVMFASIPDFKEFYTESDVNKEGLECLRLLNEII
CYA2_RAT/877-1077-66    ADC -----------------LYHQSYDCVCVMFASIPDFKEFYTESDVNKEGLECLRLLNEII
CYA4_RAT/853-1053-67    ADC -----------------LYHQSYECVCVLFASIPDFKEFYSESNINHEGLECLRLLNEII
CYA7_HUMAN/870-1069     ADC -----------------WYHQSYDCVCVMFASVPDFKVFYTECDVNKEGLECLRLLNEII
CYA7_MOUSE/889-1088     ADC -----------------WYHQSYDCVCVMFASVPDFKVFYTECDVNKEGLECLRLLNEII
CYA3_RAT/914-1121-70    ADC -----------------LYSQSYDEIGVMFASLPNFADFYTEESINNGGIECLRFLNEII
CYA8_HUMAN/973-1172     ADC -----------------LYSQSYDAVGVMFASIPGFADFYSQTEMNNQGVECLRLLNEII
CYA8_RAT/970-1169       ADC -----------------LYSQSYDAVGVMFASIPGFADFYSQTEMNNQGVECLRLLNEII
CYA9_MOUSE/1049-1244    ADC -----------------TYSKNHDSGGVIFASIVNFSEFYEENY--EGGKECYRVLNELI
CYAA_DICDI/1180-1356    ADC -----------------VYVQPHQDVSIMFIQIAGFQEYDEPKD-------LIKKLNDIF
1ab8a                   ADC -----------------LYHQSYDCVCVMFASIPDFKEFYTESDVNKEGLECLRLLNEII
1azsb                   ADC -------------------HQSYDCVCVMFASIPDFKEFYTESDVNKEGLECLRLLNEII
AAK45954               pADC LQVWQALTEAVSRRPAN------PEVTLVFTDLVGFSTWSLHAGD----DATLTLLRQVA
Z score                     _________________-41001---1-1-1--0---001-00-0____1-1-1-----2

ANPA_HUMAN/867-1053     GUC TCFDAVIDNFD---VYKVETIGDAYMVVSGLPVRNGRL-----------------HACEV
ANPA_MOUSE/863-1049     GUC TCFDAVIDNFD---VYKVETIGDAYMVVSGLPVRNGQL-----------------HAREV
ANPA_RAT/863-1049       GUC TCFDAVIDNFD---VYKVETIGDAYMVVSGLPVRNGQL-----------------HAREV
ANPB_ANGJA/856-1042     GUC TCFDAIIDNFD---VYKVETIGDAYMVVSDSQSRNGKL-----------------HAREI
ANPB_BOVIN/852-1038     GUC TCFDAIIDNFD---VYKVETIGDAYMVVSGLPGRNGQR-----------------HAPEI
ANPB_HUMAN/852-1038     GUC TCFDAIIDNFD---VYKVETIGDAYMVVSGLPGRNGQR-----------------HAPEI
ANPB_RAT/852-1038       GUC TCFDAIIDNFD---VYKVETIGDAYMVVSGLPGRNGQR-----------------HAPEI
CYG2_RAT/399-582        GUC SKFDRLTSVHD---VYKVETIGDAYMVVGGVPVPVES------------------HAQRV
CYG3_BOVIN/473-662      GUC TRFDRQCGELD---VYKVETIGDAYCVAGGLHKESDT------------------HAVQI
CYG3_CAEEL/889-1077     GUC SNFDTIIEQHG---VYKVESIGDGYLCVSGLPTRNGYA-----------------HIKQI
CYG3_RAT/471-660        GUC TRFDQQCGELD---VYKVETIGDAYCVAGGLHRESDT------------------HAVQI
CYG4_HUMAN/512-701      GUC TRFDHQCGFLD---IYKVETIGDAYCVAAGLHRKSLC------------------HAKPI
CYG5_HUMAN/469-658      GUC TRFDQQCGELD---VYKVETIAMPIVWLGGLHKESDT------------------HAVQI
CYGD_BOVIN/876-1063     GUC TLFDAIIGSHD---VYKVETIGDAYMVASGLPQRNGHR-----------------HAAEI
CYGD_HUMAN/871-1058     GUC TLFDAIIGSHD---VYKVETIGDAYMVASGLPQRNGQR-----------------HAAEI
CYGE_MOUSE/874-1061     GUC TLFDAIIGAHD---VYKVETIGDAYMVASGLPQRNGQR-----------------HAAEI
CYGE_RAT/874-1061       GUC TLFDAIIGSHD---VYKVETIGDAYMVASGLPQRNGQR-----------------HAAEI
CYGF_HUMAN/875-1062     GUC TLFDAIIGSHD---VYKVETIGDAYMVASGLPKRNGSR-----------------HAAEI
CYGF_RAT/875-1062       GUC TLFDAIIGSHD---VYKVETIGDAYMVASGLPKRNGSR-----------------HAAEI
CYGH_DROME/456-645      GUC KDFDEFCDFFD---VYKVETIGDAYCVASGLHRASIY------------------DAHRC
CYGS_STRPU/905-1091     GUC TLFDAIISNYD---VYKVETIGDAYMLVSGLPLRNGDR-----------------HAGQI
CYGX_RAT/884-1071       GUC TMFDAVLDSHD---VYKVETIGDAYMVASGLPRRNGNR-----------------HAAEI
HSER_HUMAN/815-1002     GUC KSFDHIVDHHD---VYKVETIGDAYMVASGLPKRNGNR-----------------HAIDI
HSER_PIG/815-1002       GUC KSFDHILDHHD---VYKVETIGDAYMVASGLPKRNGNR-----------------HAIDI
HSER_RAT/814-1001       GUC KSFDQIVDHHD---VYKVETIGDAYVVASGLPMRNGNR-----------------HAVDI
KSGC_RAT/225-411        GUC SLFDHTIQTHD---VYKVETIGDAYMVASGLPIRNGAQ-----------------HADEI
CYG1_BOVIN/412-605      GUC TRFDTLTDSRKNPFVYKVETVGDKYMTVSGLPEPCIH------------------HARSI
CYG1_HUMAN/412-605      GUC TRFDTLTDSRKNPFVYKVETVGDKYMTVSGLPEPCIH------------------HARSI
CYG1_RAT/412-605        GUC TRFDTLTDSRKNPFVYKVETVGDKYMTVSGLPEPCIH------------------HARSI
1azsa                   ADC ARFDKLAAENH---CLRIKILGDCYYCVSGLPEARAD------------------HAHCC
CYA1_BOVIN/296-457      ADC GKFDELATENH---CRRIKILGDCYYCVSGLTQPKTD------------------HAHCC
CYA1_DROME/266-427      ADC GRFDQLAHDNH---CLRIKILGDCYYCVSGLPEPRKD------------------HAKCA
CYA1_DROME/954-1153     ADC ADFDELLKEDRFRGIDKIKTVGSTYMAVVGLIPEYKIQPNDP--------NSVRRHMTAL
CYA1_HUMAN/14-175       ADC GKFDELATENH---CRRIKILGDCYYCVSGLTQPKTD------------------HAHCC
CYA2_HUMAN/263-463      ADC ADFDDLLSKPKFSGVEKIKTIGSTYMAATGLSAVPSQEHSQEP-------ERQYMHIGTM
CYA2_RAT/280-447        ADC GKFDQIAKENE---CMRIKILGDCYYCVSGLPISLPN------------------HAKNC
CYA2_RAT/877-1077       ADC ADFDDLLSKPKFSGVEKIKTIGSTYMAATGLSAIPSQEHAQEP-------ERQYMHIGTM
CYA3_RAT/310-472        ADC ARFDKLAAKYH---QLRIKILGDCYYCICGLPDYRED------------------HAVCS
CYA3_RAT/914-1121       ADC SDFDSLLDNPKFRVITKIKTIGSTYMAASGVTPDVNTNGFTSSSKEEKSDKERWQHLADL
CYA4_RAT/264-418        ADC GKFDQIAKEHE---CMRIKILGDCYYCVSGLPLSLPD------------------HAINC
CYA4_RAT/853-1053       ADC ADFDELLSKPKFSGVEKIKTIGSTYMAATGLNATPGQDTQQDA-------ERSCSHLGTM
CYA5_CANFA/382-543      ADC ARFDKLAAENH---CLRIKILGDCYYCVSGLPEARAD------------------HAHCC
CYA5_RABIT/463-624      ADC ARFDKLAAENH---CLRIKILGDCYYCVSGLPEARAD------------------HAHCC
CYA5_RAT/297-458        ADC ARFDKLAAENH---CLRIKILGDCYYCVSGLPEARAD------------------HAHCC
CYA6_CANFA/368-528      ADC ARFDKLAAENH---CLRIKILGDCYYCVSGLPEARAD------------------HAHCC
CYA6_MOUSE/368-528      ADC ARFDKLAAENH---CLRIKILGDCYYCVSGLPEARAD------------------HAHCC
CYA6_RAT/368-529        ADC ARFDKLAAENH---CLRIKILGDCYYCVSGLPEARAD------------------HAHCC
CYA7_HUMAN/270-432      ADC GKFDQIAKANE---CMRIKILGDCYYCVSGLPVSLPT------------------HARNC
CYA7_MOUSE/272-434      ADC GKFDQIAKANE---CMRIKILGDCYYCVSGLPVSLPT------------------HARNC
CYA8_HUMAN/405-589      ADC ARFDRLAHEHH---CLRIKILGDCYYCVSGLPEPRQD------------------HAHCC
CYA8_RAT/402-586        ADC ARFDRLAHEHH---CLRIKILGDCYYCVSGLPEPRQD------------------HAHCC
CYA9_MOUSE/385-565      ADC GRFDRLCEQTK---CEKISTLGDCYYCVAGCPEPRAD------------------HAYCC
CYAA_ANACY/311-502      ADC ACMGKAIDEAG---GFIDKYIGDAIMALFDDGNTDCAL-----------------HAAIL
CYAG_DICDI/387-572      ADC NSMDLLCAKHG---IEKIKTIGDAYMATCGIFPKCDDIR---------------HNTYKM
CYA1_BOVIN/862-1059     ADC ADFDELMDKDFYKDLEKIKTIGSTYMAAVGLAPTAGTKAKK----------CISSHLSTL
CYA1_HUMAN/579-776      ADC ADFDELMEKDFYKDIEKIKTIGSTYMAAVGLAPTSGTKAKK----------SISSHLSTL
CYA5_CANFA/985-1179     ADC ADFDEIISEDRFRQLEKIKTIGSTYMAASGLNDSTYDKVG-------------KTHIKAL
CYA5_RABIT/1065-1259    ADC ADFDEIISEDRFRQLEKIKTIGSTYMAASGLNDSTYDKVG-------------KTHIKAL
CYA5_RAT/899-1093       ADC ADFDEIISEDRFRQLEKIKTIGSTYMAASGLNDSTYDKAG-------------KTHIKAL
CYA6_CANFA/967-1161     ADC ADFDEIISEERFRQLEKIKTIGSTYMAASGLNASTYDQAG-------------RSHITAL
CYA6_MOUSE/967-1161     ADC ADFDEIISEERFRQLEKIKTIGSTYMAASGLNASTYDQVG-------------RSHITAL
CYA6_RAT/968-1162       ADC ADFDEIISEERFRQLEKIKTIGSTYMAASGLNASTYDQVG-------------RSHITAL
CYA1_DROME/954-1153-64  ADC ADFDELLKEDRFRGIDKIKTVGSTYMAVVGLIPEYKIQPNDP--------NSVRRHMTAL
CYA2_HUMAN/263-463-65   ADC ADFDDLLSKPKFSGVEKIKTIGSTYMAATGLSAVPSQEHSQEP-------ERQYMHIGTM
CYA2_RAT/877-1077-66    ADC ADFDDLLSKPKFSGVEKIKTIGSTYMAATGLSAIPSQEHAQEP-------ERQYMHIGTM
CYA4_RAT/853-1053-67    ADC ADFDELLSKPKFSGVEKIKTIGSTYMAATGLNATPGQDTQQDA-------ERSCSHLGTM
CYA7_HUMAN/870-1069     ADC ADFDELLLKPKFSGVEKIKTIGSTYMAAAGLSVASGHENQEL--------ERQHAHIGVM
CYA7_MOUSE/889-1088     ADC ADFDELLLKPKFSGVEKIKTIGSTYMAAAGLSAPSGHENQDL--------ERKHVHIGVL
CYA3_RAT/914-1121-70    ADC SDFDSLLDNPKFRVITKIKTIGSTYMAASGVTPDVNTNGFTSSSKEEKSDKERWQHLADL
CYA8_HUMAN/973-1172     ADC ADFDELLGEDRFQDIEKIKTIGSTYMAVSGLSPEKQQCED------------KWGHLCAL
CYA8_RAT/970-1169       ADC ADFDELLGEDRFQDIEKIKTIGSTYMAVSGLSPEKQQCED------------KWGHLCAL
CYA9_MOUSE/1049-1244    ADC GDFDELLSKPDYNSIEKIKTIGATYMAASGLNTAQCQEGGH-----------PQEHLRIL
CYAA_DICDI/1180-1356    ADC SFFDGLLNQKYGGTVEKIKTIGNTYMAVSGLDGSPSF-------------------LEKM
1ab8a                   ADC ADFDDLLSKPKFSGVEKIKTIGSTYMAATGLSAIPSQQY---------------MHIGTM
1azsb                   ADC ADFDDLLSKPKFSGVEKIKTIGSTYMAATGLSAIRQYM-----------------HIGTM
AAK45954               pADC RAVESPLLDAG---GHIVKRLGDGIMAVFRNPTVALR-------------------AVLV
Z score                     1---0---111___03-12----1-01----00100-__________________---11

ANPA_HUMAN/867-1053     GUC ARMALALLDAVRSFRIRHRPQEQLRLRIGIHTGPVCAGVV-GLKMPRYC--LFGDTVNTA
ANPA_MOUSE/863-1049     GUC ARMALALLDAVRSFRIRHRPQEQLRLRIGIHTGPVCAGVV-GLKMPRYC--LFGDTVNTA
ANPA_RAT/863-1049       GUC ARMALALLDAVRSFRIRHRPQEQLRLRIGIHTGPVCAGVV-GLKMPRYC--LFGDTVNTA
ANPB_ANGJA/856-1042     GUC AGMSLALLEQVKTFKIRHRPNDQLRLRIGIHTGPVCAGVV-GLKMPRYC--LFGDTVNTA
ANPB_BOVIN/852-1038     GUC ARMALALLDAVSSFRIRHRPHDQLRLRIGVHTGPVCAGVV-GLKMPRYC--LFGDTVNTA
ANPB_HUMAN/852-1038     GUC ARMALALLDAVSSFRIRHRPHDQLRLRIGVHTGPVCAGVV-GLKMPRYC--LFGDTVNTA
ANPB_RAT/852-1038       GUC ARMALALLDAVSSFRIRHRPHDQLRLRIGVHTGPVCAGVV-GLKMPRYC--LFGDTVNTA
CYG2_RAT/399-582        GUC ANFALGMRISAKEVMNPV-TGEPIQIRVGIHTGPVLAGVV-GDKMPRYC--LFGDTVNTA
CYG3_BOVIN/473-662      GUC ALMALKMMELSHEVVSPH--GEPIKMRIGLHSGSVFAGVV-GVKMPRYC--LFGNNVTLA
CYG3_CAEEL/889-1077     GUC VDMSLKFMEYCKSFNIPHLPRENVELRIGVNSGPCVAGVV-GLSMPRYC--LFGDTVNTA
CYG3_RAT/471-660        GUC ALMALKMMELSNEVMSPH--GEPIKMRIGLHSGSVFAGVV-GVKMPRYC--LFGNNVTLA
CYG4_HUMAN/512-701      GUC ALMALKMMELSEEVLTPD--GRPIQMRIGIHSGSVLAGVV-GVRMPRYC--LFGNNVTLA
CYG5_HUMAN/469-658      GUC ALMALKMMELSDEVMSPH--GEPIKMRIGLHSGSVFAGVV-GVKMPRYC--LFGNNVTLA
CYGD_BOVIN/876-1063     GUC ANMALDILSAVGTFRMRHMPEVPVRIRIGLHSGPCVAGVV-GLTMPRYC--LFGDTVNTA
CYGD_HUMAN/871-1058     GUC ANMSLDILSAVGTFRMRHMPEVPVRIRIGLHSGPCVAGVV-GLTMPRYC--LFGDTVNTA
CYGE_MOUSE/874-1061     GUC ANMSLDILSAVGSFRMRHMPEVPVRIRIGLHSGPCVAGVV-GLTMPRYC--LFGDTVNTA
CYGE_RAT/874-1061       GUC ANMSLDILSAVGSFRMRHMPEVPVRIRIGLHSGPCVAGVV-GLTMPRYC--LFGDTVNTA
CYGF_HUMAN/875-1062     GUC ANMSLDILSSVGTFKMRHMPEVPVRIRIGLHSGPVVAGVV-GLTMPRYC--LFGDTVNTA
CYGF_RAT/875-1062       GUC ANMSLDILSSVGTFKMRHMPEVPVRIRIGLHTGPVVAGVV-GLTMPRYC--LFGDTVNTA
CYGH_DROME/456-645      GUC LD-GLKMIDACSKHITHD--GEQIKMRIGLHTGTVLAGVV-GRKMPRYC--LFGHSVTIA
CYGS_STRPU/905-1091     GUC ASTAHHLLESVKGFIVPHKPEVFLKLRIGIHSGSCVAGVV-GLTMPRYC--LFGDTVNTA
CYGX_RAT/884-1071       GUC ANMALEILSYAGNFRMRHAPDVPIRVRAGLHSGPCVAGVV-GLTMPRYC--LFGDTVNTA
HSER_HUMAN/815-1002     GUC AKMALEILSFMGTFELEHLPGLPIWIRIGVHSGPCAAGVV-GIKMPRYC--LFGDTVNTA
HSER_PIG/815-1002       GUC AKMALDILSFMGTFELEHLPGLPIWIRIGIHSGPCAAGVV-GIKMPRYC--LFGDTVNTA
HSER_RAT/814-1001       GUC SKMALDILSFMGTFELEHLPGLPVWIRIGVHSGPCAAGVV-GIKMPRYC--LFGDTVNTA
KSGC_RAT/225-411        GUC ATMSLHLLSVTTNFQIGHMPEERLKLRIGLHTGPVVAGVV-GITMPRYC--LFGDTVNMA
CYG1_BOVIN/412-605      GUC CHLALDMMEIAGQVQVD---GESVQITIGIHTGEVVTGVI-GQRMPRYC--LFGNTVNLT
CYG1_HUMAN/412-605      GUC CHLALDMMEIAGQVQVD---GESVQITIGIHTGEVVTGVI-GQRMPRYC--LFGNTVNLT
CYG1_RAT/412-605        GUC CHLALDMMEIAGQVQVD---GESVQITIGIHTGEVVTGVI-GQRMPRYC--LFGNTVNLT
1azsa                   ADC VEMGMDMIEAISLVREMT--GVNVNMRVGIHSGRVHCGVL-GLRKWQFD--VWSNDVTLA
CYA1_BOVIN/296-457      ADC VEMGLDMIDTITSVAEAT--EVDLNMRVGLHTGRVLCGVL-GLRKWQYD--VWSNDVTLA
CYA1_DROME/266-427      ADC VEMGLDMIDAIATVVEAT--DVILNMRVGIHTGRVLCGVL-GLRKWQFD--VWSNDVTLA
CYA1_DROME/954-1153     ADC IEYVKAMRHSLQEINSHS--YNNFMLRVGINIGPVVAGVI-GARKPQYD--IWGNTVNVA
CYA1_HUMAN/14-175       ADC VEMGLDMIDTITSVAEAT--EVDLNMRVGLHTGRVLCGVL-GLRKWQYD--VWSNDVTLA
CYA2_HUMAN/263-463      ADC VEFAFALVGKLDAINKHS--FNDFKLRVGINHGPVIAGVI-GAQKPQYD--IWGNTVNVA
CYA2_RAT/280-447        ADC VKMGLDMCEAIKKVRDAT--GVDINMRVGVHSGNVLCGVI-GLQKWQYD--VWSHDVTLA
CYA2_RAT/877-1077       ADC VEFAYALVGKLDAINKHS--FNDFKLRVGINHGPVIAGVI-GAQKPQYD--IWGNTVNVA
CYA3_RAT/310-472        ADC ILMGLAMVEAISYVREKT--KTGVDMRVGVHTGTVLGGVL-GQKRWQYD--VWSTDVTVA
CYA3_RAT/914-1121       ADC ADFALAMKDTLTNINNQS--FNNFMLRIGMNKGGVLAGVI-GARKPHYD--IWGNTVNVA
CYA4_RAT/264-418        ADC VRMGLDMCRAIRKLRVAT--GVDINMRVGVHSGSVLCGVI-GLQKWQYD--VWSHDVTLA
CYA4_RAT/853-1053       ADC VEFAVALGSKLGVINKHS--FNNFRLRVGLNHGPVVAGVI-GAQKPQYD--IWGNTVNVA
CYA5_CANFA/382-543      ADC VEMGMDMIEAISLVREVT--GVNVNMRVGIHSGRVHCGVL-GLRKWQFD--VWSNDVTLA
CYA5_RABIT/463-624      ADC VEMGMDMIEAISLVREVT--GVNVNMRVGIHSGRVHCGVL-GLRKWQFD--VWSNDVTLA
CYA5_RAT/297-458        ADC VEMGMDMIEAISSVREVT--GVNVNMRVGIHSGRVHCGVL-GLRKWQFD--VWSNDVTLA
CYA6_CANFA/368-528      ADC VEMGVDMIEAISLVREVT--GVNVNMRVGIHSGRVHCGVL-GLRKWQFD--VWSNDVTLA
CYA6_MOUSE/368-528      ADC VEMGVDMIEAISLVREVT--GVNVNMRVGIHSGRVHCGVL-GLRKWQFD--VWSNDVTLA
CYA6_RAT/368-529        ADC VEMGVDMIEAISLVREVT--GVNVNMRVGIHSGRVHCGVL-GLRKWQFD--VWSNDVTLA
CYA7_HUMAN/270-432      ADC VKMGLDMCQAIKQVREAT--GVDINMRVGIHSGNVLCGVI-GLRKWQYD--VWSHDVSLA
CYA7_MOUSE/272-434      ADC VKMGLDICEAIKQVREAT--GVDISMRVGIHSGNVLCGVI-GLRKWQYD--VWSHDVSLA
CYA8_HUMAN/405-589      ADC VEMGLSMIKTIRYVRSRT--KHDVDMRIGIHSGSVLCGVL-GLRKWQFD--VWSWDVDIA
CYA8_RAT/402-586        ADC VEMGLSMIKTIRFVRSRT--KHDVDMRIGIHSGSVLCGVL-GLRKWQFD--VWSWDVDIA
CYA9_MOUSE/385-565      ADC IEMGLGMIKAIEQFCQEK--KEMVNMRVGVHTGTVLCGIL-GMRRFKFD--VWSNDVNLA
CYAA_ANACY/311-502      ADC MQQALDKFNDERSMQTGKTGLPRISVGIGIHRGTVVMGTV-GFTS-RIDSTVIGDAVNVA
CYAG_DICDI/387-572      ADC LGFAMDVLEFIPKEMSFH---LGLQVRVGIHCGPVISGVISGYAKPHFD--VWGDTVNVA
CYA1_BOVIN/862-1059     ADC ADFAIEMFDVLDEINYQS--YNDFVLRVGINVGPVVAGVI-GARRPQYD--IWGNTVNVA
CYA1_HUMAN/579-776      ADC ADFAIEMFDVLDEINYQS--YNDFVLRVGINVGPVVAGVI-GARRPQYD--IWGNTVNVA
CYA5_CANFA/985-1179     ADC ADFAMKLMDQMKYINEHS--FNNFQMKIGLNIGPVVAGVI-GARKPQYD--IWGNTVNVA
CYA5_RABIT/1065-1259    ADC ADFAMKLMDQMKYINEHS--FNNFQMKIGLNIGPVVAGVI-GARKPQYD--IWGNTVNVA
CYA5_RAT/899-1093       ADC ADFAMKLMDQMKYINEHS--FNNFQMKIGLNIGPVVAGVI-GARKPQYD--IWGNTVNVA
CYA6_CANFA/967-1161     ADC ADYAMRLMEQMKHINEHS--FNNFQMKIGLNMGPVVAGVI-GARKPQYD--IWGNTVNVS
CYA6_MOUSE/967-1161     ADC ADYAMRLMEQMKHINEHS--FNNFQMKIGLNMGPVVAGVI-GARKPQYD--IWGNTVNVS
CYA6_RAT/968-1162       ADC ADYAMRLMEQMKHINEHS--FNNFQMKIGLNMGPVVAGVI-GARKPQYD--IWGNTVNVS
CYA1_DROME/954-1153-64  ADC IEYVKAMRHSLQEINSHS--YNNFMLRVGINIGPVVAGVI-GARKPQYD--IWGNTVNVA
CYA2_HUMAN/263-463-65   ADC VEFAFALVGKLDAINKHS--FNDFKLRVGINHGPVIAGVI-GAQKPQYD--IWGNTVNVA
CYA2_RAT/877-1077-66    ADC VEFAYALVGKLDAINKHS--FNDFKLRVGINHGPVIAGVI-GAQKPQYD--IWGNTVNVA
CYA4_RAT/853-1053-67    ADC VEFAVALGSKLGVINKHS--FNNFRLRVGLNHGPVVAGVI-GAQKPQYD--IWGNTVNVA
CYA7_HUMAN/870-1069     ADC VEFSIALMSKLDGINRHS--FNSFRLRVGINHGPVIAGVI-GARKPQYD--IWGNTVNVA
CYA7_MOUSE/889-1088     ADC VEFSMALMSKLDGINRHS--FNSFRLRVGINHGPVIAGVI-GARKPQYD--IWGNTVNVA
CYA3_RAT/914-1121-70    ADC ADFALAMKDTLTNINNQS--FNNFMLRIGMNKGGVLAGVI-GARKPHYD--IWGNTVNVA
CYA8_HUMAN/973-1172     ADC ADFSLALTESIQEINKHS--FNNFELRIGISHGSVVAGVI-GAKKPQYD--IWGKTVNLA
CYA8_RAT/970-1169       ADC ADFSLALTESIQEINKHS--FNNFELRIGISHGSVVAGVI-GAKKPQYD--IWGKTVNLA
CYA9_MOUSE/1049-1244    ADC FEFAKEMMRVVDDFNNN-MLWFNFKLRVGFNHGPLTAGVI-GTTKLLYD--IWGDTVNIA
CYAA_DICDI/1180-1356    ADC SDFALDVKAYTNSVAISR------VVRIGISHGPLVAGCI-GISRAKFD--VWGDTANTA
1ab8a                   ADC VEFAYALVGKLDAINKHS--FNDFKLRVGINHGPVIAGVI-GAQKPQYD--IWGNTVNVA
1azsb                   ADC VEFAYALVGKLDAINKHS--FNDFKLRVGINHGPVIAGVI-GAQKPQYD--IWGNTVNVA
AAK45954               pADC AQ------DAVKSLEVQG--YTP-RMRIGIHTGR--------PQRLAAD--WLGVDVNIA
Z score                     -------0--1--0-001__10100------0--0-0---_-0-311-6__15-----0-

ANPA_HUMAN/867-1053     GUC SR-MESNGEAL--KIHLSSETKAV-----LEEFGG------FELE--LR-GDVEMKGK--
ANPA_MOUSE/863-1049     GUC SR-MESNGEAL--RIHLSSETKAV-----LEEFDG------FELE--LR-GDVEMKGK--
ANPA_RAT/863-1049       GUC SR-MESNGEAL--KIHLSSETKAV-----LEEFDG------FELE--LR-GDVEMKGK--
ANPB_ANGJA/856-1042     GUC SR-MESNGEAL--KIHLSSATKEV-----LDEFGY------FDLQ--LR-GDVEMKGK--
ANPB_BOVIN/852-1038     GUC SR-MESNGQAL--KIHVSSTTKDA-----LDELGC------FQLE--LR-GDVEMKGK--
ANPB_HUMAN/852-1038     GUC SR-MESNGQAL--KIHVSSTTKDA-----LDELGC------FQLE--LR-GDVEMKGK--
ANPB_RAT/852-1038       GUC SR-MESNGQAL--KIHVSSTTKDA-----LDELGC------FQLE--LR-GDVEMKGK--
CYG2_RAT/399-582        GUC SR-MESHGLPS--KVHLSPTAHRA-----LKNKG-------FEIV--RR-GEIEVKGK--
CYG3_BOVIN/473-662      GUC NK-FESCSVPR--KINVSPTTYRL-----LKDCPGFVFTPRSREE--LP-PNFPSDIP--
CYG3_CAEEL/889-1077     GUC SR-MESNGKPS--LIHLTNDAHSL-----LTTHYP----NQYETS--SR-GEVIIKGK--
CYG3_RAT/471-660        GUC NK-FESCSVPR--KINVSPTTYRL-----LKDCPGFVFTPRSREE--LP-PNFPSDIP--
CYG4_HUMAN/512-701      GUC SK-FESGSHPR--RINVSPTTYQL-----LKREESFTFIPRSREE--LP-DNFPKEIP--
CYG5_HUMAN/469-658      GUC NK-FESCSVPR--KINVSPTTYRL-----LKDCPGFVFTPRSREE--LP-PNFPSEIP--
CYGD_BOVIN/876-1063     GUC SA-MESTGLPY--RIHVNRSTVQI-----LSALNE-----GFLTE--VR-GRTELKGK--
CYGD_HUMAN/871-1058     GUC SR-MESTGLPY--RIHVNLSTVGI-----LRALDS-----GYQVE--LR-GRTELKGK--
CYGE_MOUSE/874-1061     GUC SR-MESTGLPY--RIHVNMSTVRI-----LRSLDQ-----GFQME--CR-GRTELKGK--
CYGE_RAT/874-1061       GUC SR-MESTGLPY--RIHVNMSTVRI-----LRALDQ-----GFQME--CR-GRTELKGK--
CYGF_HUMAN/875-1062     GUC SR-MESTGLPY--RIHVSLSTVTI-----LQNLSE-----GYEVE--LR-GRTELKGK--
CYGF_RAT/875-1062       GUC SR-MESTGLPY--RIHVSLSTVTI-----LRTLSE-----GYEVE--LR-GRTELKGK--
CYGH_DROME/456-645      GUC NK-FESGSEAL--KINVSPTTKDW-----LTKHEG------FEFE--LQ-PRDPSFLPKE
CYGS_STRPU/905-1091     GUC SR-MESNGLAL--RIHVSPWCKQV-----LDKLGG------YELE--DR-GLVPMNGK--
CYGX_RAT/884-1071       GUC SR-MESTGLPY--RIHVSRNTVQA-----LLSLDE-----GYKID--VR-GQTELKGK--
HSER_HUMAN/815-1002     GUC SR-MESTGLPL--RIHVSGSTIAI-----LKRTEC-----QFLYE--VR-GETYLKGR--
HSER_PIG/815-1002       GUC SR-MESTGLPL--RIHVSGSTIAI-----LKRTEC-----QFLYE--VR-GETYLKGR--
HSER_RAT/814-1001       GUC SR-MESTGLPL--RIHMSSSTIAI-----LRRTDC-----QFLYE--VR-GETYLKGR--
KSGC_RAT/225-411        GUC SR-MESSSLPL--RIHVSQSTARA-----LLVAGG------YHLQ--KR-GTISVKGK--
CYG1_BOVIN/412-605      GUC SR-TETTGEKG--KINVSEYTYRC-----LMTPENS--DPQFHLE--HR-GPVSMKGK--
CYG1_HUMAN/412-605      GUC SR-TETTGEKG--KINVSEYTYRC-----LMSPENS--DPQFHLE--HR-GPVSMKGK--
CYG1_RAT/412-605        GUC SR-TETTGEKG--KINVSEYTYRC-----LMSPENS--DPQFHLE--HR-GPVSMKGK--
1azsa                   ADC NH-MEAGGKAG--RIHITKATLSY-----LNG-D-------YEVEPGCG-GERNAYLKEH
CYA1_BOVIN/296-457      ADC NV-MEAAGLPG--KVHITKTTLAC-----LNG-D-------YEVE---------------
CYA1_DROME/266-427      ADC NH-MESGGEPG--RVHVTRATLDS-----LSG-E-------YEVE---------------
CYA1_DROME/954-1153     ADC SR-MDSTGVPG--YSQVTQEVVDS-----LVGSH-------FEFR--CR-GTIKVKGK--
CYA1_HUMAN/14-175       ADC NV-MEAAGLPG--KVHITKTTLAC-----LNG-D-------YEVE---------------
CYA2_HUMAN/263-463      ADC SR-MDSTGVLD--KIQVTEETSLV-----LQTLG-------YTCT--CR-GIINVKGK--
CYA2_RAT/280-447        ADC NH-MEAGGVPG--RVHISSVTLEH-----LNGAY-------KVEE--GD-GEI-------
CYA2_RAT/877-1077       ADC SR-MDSTGVLD--KIQVTEETSLI-----LQTLG-------YTCT--CR-GIINVKGK--
CYA3_RAT/310-472        ADC NK-MEAGGIPG--RVHISQSTMDC-----LKG-E-------FDVE------------P--
CYA3_RAT/914-1121       ADC SR-MESTGVMG--NIQVVEETQVI-----LREYG-------FRFV--RR-GPIFVKGK--
CYA4_RAT/264-418        ADC NH-MEAGGVPG--RVHITGATLAL-----L------------------------------
CYA4_RAT/853-1053       ADC SR-MESTGVLG--KIQVTEETARA-----LQSLG-------YTCY--SR-GVIKVKGK--
CYA5_CANFA/382-543      ADC NH-MEAGGKAG--RIHITKATLSY-----LNG-D-------YEVE---------------
CYA5_RABIT/463-624      ADC NH-MEAGGKAG--RIHITKATLNY-----LNG-D-------YEVE---------------
CYA5_RAT/297-458        ADC NH-MEAGGKAG--RIHITKATLNY-----LNG-D-------YEVE---------------
CYA6_CANFA/368-528      ADC NH-MEAAR-AG--RIHITRATLQY-----LNG-D-------YEVE---------------
CYA6_MOUSE/368-528      ADC NH-MEAGG-GR--RIHITRATLQY-----LNG-D-------YEVE---------------
CYA6_RAT/368-529        ADC NH-MEAGGRAG--RIHITRATLQY-----LNG-D-------YEVE---------------
CYA7_HUMAN/270-432      ADC NR-MEAAGVPG--RVHITEATLKH-----LDK-A-------YEVE-------------D-
CYA7_MOUSE/272-434      ADC NR-MEAAGVPG--RVHITEATLNH-----LDK-A-------YEVE-------------D-
CYA8_HUMAN/405-589      ADC NK-LESGGIPG--RIHISKATLDC-----LNGDYN-----VEEGH--GK-ERNEFLRK--
CYA8_RAT/402-586        ADC NK-LESGGIPG--RIHISKATLDC-----LSGDYN-----VEEGH--GK-ERNEFLRK--
CYA9_MOUSE/385-565      ADC NL-MEQLGVAG--KVHISEATAKY-----LDD-R-------YEME---D-GRVIERLG--
CYAA_ANACY/311-502      ADC SR-IEGLTKQYGCNILITESVVRN-----LSCPES-----FSLRL--ID-KSVKVKGKD-
CYAG_DICDI/387-572      ADC SR-MESTGIAG--QIHVSDRVYQL-----GKE-D-------FNFS--ERCDIIHVKGK--
CYA1_BOVIN/862-1059     ADC SR-MDSTGVQG--RIQVTEEVHRL-----LRR-GS------YRFV--CR-GKVSVKGK--
CYA1_HUMAN/579-776      ADC SR-MDSTGVQG--RIQVTEEVHRL-----LRRCP-------YHFV--CR-GKVSVKGK--
CYA5_CANFA/985-1179     ADC SR-MDSTGVPD--RIQVTTDMYQV-----LAANT-------YQLE--CR-GVVKVKGK--
CYA5_RABIT/1065-1259    ADC SR-MDSTGVPD--RIQVTTDMYQV-----LAANT-------YQLE--CR-GVVKVKGK--
CYA5_RAT/899-1093       ADC SR-MDSTGVPD--RIQVTTDMYQV-----LAANT-------YQLE--CR-GVVKVKGK--
CYA6_CANFA/967-1161     ADC SR-MDSTGVPD--RIQVTTDLYQV-----LAAKR-------YQLE--CR-GVVKVKGK--
CYA6_MOUSE/967-1161     ADC SR-MDSTGVPD--RIQVTTDLYQV-----LAAKG-------YQLE--CR-GVVKVKGK--
CYA6_RAT/968-1162       ADC SR-MDSTGVPD--RIQVTTDLYQV-----LAAKG-------YQLE--CR-GVVKVKGK--
CYA1_DROME/954-1153-64  ADC SR-MDSTGVPG--YSQVTQEVVDS-----LVGSH-------FEFR--CR-GTIKVKGK--
CYA2_HUMAN/263-463-65   ADC SR-MDSTGVLD--KIQVTEETSLV-----LQTLG-------YTCT--CR-GIINVKGK--
CYA2_RAT/877-1077-66    ADC SR-MDSTGVLD--KIQVTEETSLI-----LQTLG-------YTCT--CR-GIINVKGK--
CYA4_RAT/853-1053-67    ADC SR-MESTGVLG--KIQVTEETARA-----LQSLG-------YTCY--SR-GVIKVKGK--
CYA7_HUMAN/870-1069     ADC SR-MESTGELG--KIQVTEETCTI-----LQGLG-------YSCE--CR-GLINVKGK--
CYA7_MOUSE/889-1088     ADC SR-MESTGELG--KIQVTEETCTI-----LQGLG-------YSCE--CR-GLINVKGK--
CYA3_RAT/914-1121-70    ADC SR-MESTGVMG--NIQVVEETQVI-----LREYG-------FRFV--RR-GPIFVKGK--
CYA8_HUMAN/973-1172     ADC SR-MDSTGVSG--RIQVPEETYLI-----LKDQG-------FAFD--YR-GEIYVKGISE
CYA8_RAT/970-1169       ADC SR-MDSTGVSG--RIQVPEETYLI-----LKDQG-------FAFD--YR-GEIYVKGISE
CYA9_MOUSE/1049-1244    ADC SR-MDTTGVEC--RIQVSEESYRV-----LSKMG-------YDFD--YR-GTVNVKGK--
CYAA_DICDI/1180-1356    ADC SR-MQSNAQDN--EIMVTHSVYER-----LNKL--------FYFD--DE-KEILVKGK--
1ab8a                   ADC SR-MDSTGVLD--KIQVTEETSLI-----LQTLG-------YTCT-------------CF
1azsb                   ADC SR-MDSTGVLD--KIQVTEETSLI-----LQTLG-------YTCT--CR-GIINVKGKGD
AAK45954               pADC ARVMERATKGG---IMISQPTLDLIPQSELDALG------VVARR--VR-KPVFASKPTG
Z score                     --_-----0-1__-----00----_____---_-_______--0-__1-_---00---__

ANPA_HUMAN/867-1053     GUC ------------------------------
ANPA_MOUSE/863-1049     GUC ------------------------------
ANPA_RAT/863-1049       GUC ------------------------------
ANPB_ANGJA/856-1042     GUC ------------------------------
ANPB_BOVIN/852-1038     GUC ------------------------------
ANPB_HUMAN/852-1038     GUC ------------------------------
ANPB_RAT/852-1038       GUC ------------------------------
CYG2_RAT/399-582        GUC ------------------------------
CYG3_BOVIN/473-662      GUC ------------------------------
CYG3_CAEEL/889-1077     GUC ------------------------------
CYG3_RAT/471-660        GUC ------------------------------
CYG4_HUMAN/512-701      GUC ------------------------------
CYG5_HUMAN/469-658      GUC ------------------------------
CYGD_BOVIN/876-1063     GUC ------------------------------
CYGD_HUMAN/871-1058     GUC ------------------------------
CYGE_MOUSE/874-1061     GUC ------------------------------
CYGE_RAT/874-1061       GUC ------------------------------
CYGF_HUMAN/875-1062     GUC ------------------------------
CYGF_RAT/875-1062       GUC ------------------------------
CYGH_DROME/456-645      GUC F-----------------------------
CYGS_STRPU/905-1091     GUC ------------------------------
CYGX_RAT/884-1071       GUC ------------------------------
HSER_HUMAN/815-1002     GUC ------------------------------
HSER_PIG/815-1002       GUC ------------------------------
HSER_RAT/814-1001       GUC ------------------------------
KSGC_RAT/225-411        GUC ------------------------------
CYG1_BOVIN/412-605      GUC ------------------------------
CYG1_HUMAN/412-605      GUC ------------------------------
CYG1_RAT/412-605        GUC ------------------------------
1azsa                   ADC SIETFLIL----------------------
CYA1_BOVIN/296-457      ADC ------------------------------
CYA1_DROME/266-427      ADC ------------------------------
CYA1_DROME/954-1153     ADC ------------------------------
CYA1_HUMAN/14-175       ADC ------------------------------
CYA2_HUMAN/263-463      ADC ------------------------------
CYA2_RAT/280-447        ADC ------------------------------
CYA2_RAT/877-1077       ADC ------------------------------
CYA3_RAT/310-472        ADC ------------------------------
CYA3_RAT/914-1121       ADC ------------------------------
CYA4_RAT/264-418        ADC ------------------------------
CYA4_RAT/853-1053       ADC ------------------------------
CYA5_CANFA/382-543      ADC ------------------------------
CYA5_RABIT/463-624      ADC ------------------------------
CYA5_RAT/297-458        ADC ------------------------------
CYA6_CANFA/368-528      ADC ------------------------------
CYA6_MOUSE/368-528      ADC ------------------------------
CYA6_RAT/368-529        ADC ------------------------------
CYA7_HUMAN/270-432      ADC ------------------------------
CYA7_MOUSE/272-434      ADC ------------------------------
CYA8_HUMAN/405-589      ADC ------------------------------
CYA8_RAT/402-586        ADC ------------------------------
CYA9_MOUSE/385-565      ADC ------------------------------
CYAA_ANACY/311-502      ADC ------------------------------
CYAG_DICDI/387-572      ADC ------------------------------
CYA1_BOVIN/862-1059     ADC ------------------------------
CYA1_HUMAN/579-776      ADC ------------------------------
CYA5_CANFA/985-1179     ADC ------------------------------
CYA5_RABIT/1065-1259    ADC ------------------------------
CYA5_RAT/899-1093       ADC ------------------------------
CYA6_CANFA/967-1161     ADC ------------------------------
CYA6_MOUSE/967-1161     ADC ------------------------------
CYA6_RAT/968-1162       ADC ------------------------------
CYA1_DROME/954-1153-64  ADC ------------------------------
CYA2_HUMAN/263-463-65   ADC ------------------------------
CYA2_RAT/877-1077-66    ADC ------------------------------
CYA4_RAT/853-1053-67    ADC ------------------------------
CYA7_HUMAN/870-1069     ADC ------------------------------
CYA7_MOUSE/889-1088     ADC ------------------------------
CYA3_RAT/914-1121-70    ADC ------------------------------
CYA8_HUMAN/973-1172     ADC ------------------------------
CYA8_RAT/970-1169       ADC ------------------------------
CYA9_MOUSE/1049-1244    ADC ------------------------------
CYAA_DICDI/1180-1356    ADC ------------------------------
1ab8a                   ADC VN----------------------------
1azsb                   ADC LKTYFVNT----------------------
AAK45954               pADC IPPDLAIYRIKTVSESTAADNFDEMSPDAQ
Z score                     ______________________________


Click here to download the raw Proust2 output (text) of the above

So above we have a modified version of the original alignment, where sequences have been grouped according to their known or predicted sub-types. The sub-types (in this case GUC and ADC for GUanylate or ADenylate Cyclase respectively) are shown just to the right of the identifiers, prefixed with a lower case "p" if they are predicted (i.e. instead of known). Below this is a single character representation of the Z-score, where numbers have been rounded to the nearest integer, or shown as "-" (<= zero) or "+" (>= 10). Those positions that have a Z-score above the threshold given (in this case 2.5) are coloured according to the amino acids responsible for the high scores; amino acids that have a high chance of occurring in the sub-set with a particular sub-type and not in the other class(es).

Note that there is also a link giving you a text (raw) version of the output, if you are interested in reformatting, etc. This is essentially the same output you would get if you ran the program yourself with a down-loaded version of the program

The program also makes predictions for all sequences in the alignment for which there is no group information (so in this case, just this "purine" cyclase). The method for deciding whether a protein belongs to one group or another has not yet been fully developed. For the moment, it just calculates all scores of the unknown sequences for all groups. It does this using profile information only for those parts of the alignment that have high Z-scores. It also calculates these scores for the sequences known to belong to the groups and gives you min/max/mean/standard-deviation (e.g. "Group GUC has 29 members, that score between -9.0 and -7.1 (mean -7.9, sd 0.4)" above). It calculates how many standard deviations your score is below the mean, and then decides on a crude confidence value based on this "Z-score" (HIGH Z>=-1.0 ; MEDIUM Z>=-3; LOW Z<-3).

Regarding the above example. Two of the positions identified agree well with those found by site-directed mutagenesis. A double mutant changes the specificity from GTP to ATP (specifically, it is the E->K in the sequence VYKVETIG in ANPA_HUMAN, and the C->D change in the sequence GLKMPRYC--LF). See Tucker et al, 1998 for more information. However, the method has also identified other positions, many of which are near to these in the known three-dimensional structures, thus suggesting that they might also play a role in substrate specificity.

To our knowledge, it is not known whether the "purine cyclase" sequence is ATP or GTP specific, but the methods prediction of ATP agrees with what is known from the experimental studies above, and indeed agrees with one's intuition (i.e. just look at the sequences).

Back to the top

About the authors

The first version of PROUST was written by Sridhar Hannenhalli and Rob Russell while at SmithKline Beecham Pharmaceuticals. Sridhar is now at Celera Genomics and Rob is at EMBL, Heidelberg (and indeed, SmithKline Beecham is now GlaxoSmithKline). The second version was written by Rob Russell.


Citing PROUST

When publishing data obtained from this site or program, please cite: S.S. Hannenhalli & R.B. Russell, Analysis and prediction of protein sub-types from multiple sequence alignments J. Mol. Biol. 303, 61-76, 2000. PubMed

Back to the top