Analyse
bioinformatique des séquences
Mode de fonctionnement des différents logiciels inclus dans le package PHYLIP
Principes généraux :
Tous les programmes s'utilisent en tapant la commande
Fichiers de séquences : au format PHYLIP (entrelacés)
Passage par READSEQ (ou sortie de CLUSTALW)
Les programmes lisent TOUJOURS un fichier infile
Les programmes génèrent les fichiers
outfile résultats
treefile fichiers de représentations des topologies (parenthésées)
plotfile fichier graphique
Comme les programmes utilisent les sorties de programmes comme
entrées d'autres, il est INDISPENSABLE de renommer les fichiers
outfile (en infile) à chaque étape
Parcimonie
dnapars (acides nucléiques) ou protpars (proteines)
Nécessité de fichiers PHYLIP (de séquences alignées)
utilisation de l'option de fichiers PHYLIP en sortie de CLUSTALW
Your choice: 9
********* Format of Alignment Output *********
1. Toggle CLUSTAL format output = ON
2. Toggle NBRF/PIR format output = OFF
3. Toggle GCG/MSF format output = OFF
4. Toggle PHYLIP format output = ON
5. Toggle GDE format output = OFF
6. Toggle GDE output case = LOWER
7. Toggle output order = INPUT FILE
8. Create alignment output file(s) now?
9. Toggle parameter output = OFF
H. HELP
fichier.aln
CLUSTAL W(1.6) multiple sequence alignment
CHKHBA_J00 ----------------------ACACAGAGGTGCAACCATGGTGCTGTCCGCTGCTGACA
DUKHBADWP CGCAACCCCGTCAGTTGCCAGCCTGCCACACCGCTGCCGCCATGCTGACCGCCGAGGACA
SMRHBAA_M1 -------------------------AACCACCGCAAACATGAAGCTGACTGCCGAAGATA
XELHBA_J00 -----------------TGCACAACACAAACAGGAACCATGCTTCTTTCAGCCGATGACA
DAVAGL_M14 -----------------------------------------GTGCTCTCGGATGCTGACA
** * * * ** *
CHKHBA_J00 AGAACAACGTCAAGGGCATCTTCACCAAAATCGCCGGCCATGCTGAGGAGTATGGCGCCG
DUKHBADWP AGAAGCTCATCACGCAGTTGTGGGAGAAGGTGGCTGGCCACCAGGAGGAATTCGGAAGTG
SMRHBAA_M1 AACATAATGTGAAGGCCATCTGGGATCATGTCAAAGGACATGAAGAGGCGATTGGTGCAG
XELHBA_J00 AGAAACACATCAAGGCAATTATGCCTCCTATCGCTGCCCATGGCGACAAATTTGGGGGAG
DAVAGL_M14 AGACTCACGTGAAAGCCATCTGGGGTAAGGTGGGAGGCCACGCCGGTGCCTACGCAGCTG
* * * * * * ** * * *
fichier.phy
lovelace$ more tofasta.phy
5 589
CHKHBA_J00 ---------- ---------- --ACACAGAG GTGCAACCAT GGTGCTGTCC
DUKHBADWP CGCAACCCCG TCAGTTGCCA GCCTGCCACA CCGCTGCCGC CATGCTGACC
SMRHBAA_M1 ---------- ---------- -----AACCA CCGCAAACAT GAAGCTGACT
XELHBA_J00 ---------- -------TGC ACAACACAAA CAGGAACCAT GCTTCTTTCA
DAVAGL_M14 ---------- ---------- ---------- ---------- -GTGCTCTCG
GCTGCTGACA AGAACAACGT CAAGGGCATC TTCACCAAAA TCGCCGGCCA
GCCGAGGACA AGAAGCTCAT CACGCAGTTG TGGGAGAAGG TGGCTGGCCA
GCCGAAGATA AACATAATGT GAAGGCCATC TGGGATCATG TCAAAGGACA
GCCGATGACA AGAAACACAT CAAGGCAATT ATGCCTCCTA TCGCTGCCCA
GATGCTGACA AGACTCACGT GAAAGCCATC TGGGGTAAGG TGGGAGGCCA
TGCTGAGGAG TATGGCGCCG AGACCTTGGA AAGGATGTTC ACCACCTACC
CCAGGAGGAA TTCGGAAGTG AAGCTCTGCA GAGGATGTTC CTCGCCTACC
TGAAGAGGCG ATTGGTGCAG AAGCTCTTTA CAGGATGTTC TGTTGTATGC
TGGCGACAAA TTTGGGGGAG AAGCTTTGTA CAGGATGTTC ATAGTCAACC
CGCCGGTGCC TACGCAGCTG AAGCTCTTGC CAGAACCTTC CTCTCCTTCC
lovelace$ protpars
protpars: can't read infile
Please enter a new filename>fmts.phy
Protein parsimony algorithm, version 3.55c
Setting for this run:
U Search for best tree? Yes
J Randomize input order of sequences? No. Use input order
O Outgroup root? No, use as outgroup species 1
T Use Threshold parsimony? No, use ordinary parsimony
M Analyze multiple data sets? No
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print o8 janvier, 2008o
5 Print sequences at all nodes of tree No
6 Write out trees onto tree file? Yes
Are these settings correct? (type Y or the letter for one to change)
Y
Adding species:
CHKHBA_J00
DUKHBADWP
SMRHBAA_M1
XELHBA_J00
DAVAGL_M14
Doing global rearrangements
!---------!
.........
Output written to output file
Trees also written onto file
l
Protein parsimony algorithm, version 3.55c
One most parsimonious tree found:
+-----XELHBA_J00
+--3
! ! +--DAVAGL_M14
+--2 +--4
! ! +--SMRHBAA_M1
--1 !
! +--------DUKHBADWP
!
+-----------CHKHBA_J00
remember: this is an unrooted tree!
requires a total of 1400.000
lovelace$ more treefile
(((XELHBA_J00,(DAVAGL_M14,SMRHBAA_M1)),DUKHBADWP),CHKHBA_J00);
lovelace$
arbre sans distances
Distances
lovelace$ dnadist
dnadist: can't read infile
Please enter a new filename>tofasta.phy
Nucleic acid sequence Distance Matrix program, version 3.55c
Settings for this run:
D Distance (Kimura, Jin/Nei, ML, J-C)? Kimura 2-parameter
T Transition/transversion ratio? 2.0
C One category of substitution rates? Yes
L Form of distance matrix? Square
M Analyze multiple data sets? No
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
Are these settings correct? (type Y or letter for one to change)
Y
Distances calculated for species
CHKHBA_J00 ....
DUKHBADWP ...
SMRHBAA_M1 ..
XELHBA_J00 .
DAVAGL_M14
Distances written to file
lovelace$ more outfile
5
CHKHBA_J00 0.0000 0.5962 0.9649 0.7203 0.6094
DUKHBADWP 0.5962 0.0000 1.0130 0.7741 0.5435
SMRHBAA_M1 0.9649 1.0130 0.0000 0.9289 0.9209
XELHBA_J00 0.7203 0.7741 0.9289 0.0000 0.8969
DAVAGL_M14 0.6094 0.5435 0.9209 0.8969 0.0000
lovelace$mv outfile infile
lovelace$ fitch
Fitch-Margoliash method version 3.55c
Settings for this run:
U Search for best tree? Yes
P Power? 2.00000
- Negative branch lengths allowed? No
O Outgroup root? No, use as outgroup species 1
L Lower-triangular data matrix? No
R Upper-triangular data matrix? No
S Subreplicates? No
G Global rearrangements? No
J Randomize input order of species? No. Use input order
M Analyze multiple data sets? No
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes
Are these settings correct? (type Y or the letter for one to change)
y
Adding species:
CHKHBA_J00
DUKHBADWP
SMRHBAA_M1
XELHBA_J00
DAVAGL_M14
Output written to output file
Tree also written onto file
lovelace$ more outfile
5 Populations
Fitch-Margoliash method version 3.55c
__ __ 2
\ \ (Obs - Exp)
Sum of squares = /_ /_ ------------
2
i j Obs
Negative branch lengths not allowed
+----------------DAVAGL_M14
+---3
! +--------------DUKHBADWP
!
! +----------------------XELHBA_J00
--1-----2
! +--------------------------------SMRHBAA_M1
!
+---------------CHKHBA_J00
remember: this is an unrooted tree!
Sum of squares = 0.03950
Average percent standard deviation = 4.68447
examined 15 trees
Between And Length
------- --- ------
1 3 0.06233
3 DAVAGL_M14 0.28139
3 DUKHBADWP 0.26211
1 2 0.09924
2 XELHBA_J00 0.37775
2 SMRHBAA_M1 0.55115
1 CHKHBA_J00 0.26879
lovelace$ more treefile
((DAVAGL_M14:0.28139,DUKHBADWP:0.26211):0.06233,(XELHBA_J00:0.37775,
SMRHBAA_M1:0.55115):0.09924,CHKHBA_J00:0.26879);
lovelace$
lovelace$ neighbor
Neighbor-Joining/UPGMA method version 3.5
Settings for this run:
N Neighbor-joining or UPGMA tree? Neighbor-joining
O Outgroup root? No, use as outgroup species 1
L Lower-triangular data matrix? No
R Upper-triangular data matrix? No
S Subreplicates? No
J Randomize input order of species? No. Use input order
M Analyze multiple data sets? No
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes
Are these settings correct? (type Y or the letter for one to change)
y
CYCLE 2: OTU 3 ( 0.54903) JOINS OTU 4 ( 0.37987)
CYCLE 1: OTU 1 ( 0.27209) JOINS NODE 3 ( 0.10606)
LAST CYCLE:
NODE 1 ( 0.05896) JOINS OTU 2 ( 0.26461) JOINS OTU 5 ( 0.27889)
Output written on output file
Tree written on tree file
lovelace$ more outfile
5 Populations
Neighbor-Joining/UPGMA method version 3.55c
Neighbor-joining method
Negative branch lengths allowed
+---------------DUKHBADWP
!
--3----------------DAVAGL_M14
!
! +---------------CHKHBA_J00
+---2
! +--------------------------------SMRHBAA_M1
+-----1
+----------------------XELHBA_J00
remember: this is an unrooted tree!
Between And Length
------- --- ------
3 DUKHBADWP 0.26461
3 DAVAGL_M14 0.27889
3 2 0.05896
2 CHKHBA_J00 0.27209
2 1 0.10606
1 SMRHBAA_M1 0.54903
1 XELHBA_J00 0.37987
lovelace$ more treefile
(DUKHBADWP:0.26461,DAVAGL_M14:0.27889,(CHKHBA_J00:0.27209,
(SMRHBAA_M1:0.54903,XELHBA_J00:0.37987):0.10606):0.05896);
lovelace$
lovelace$ kitsch
Fitch-Margoliash method with contemporary tips, version 3.55c
Settings for this run:
U Search for best tree? Yes
P Power? 2.00000
- Negative branch lengths allowed? No
L Lower-triangular data matrix? No
R Upper-triangular data matrix? No
S Subreplicates? No
J Randomize input order of species? No. Use input order
M Analyze multiple data sets? No
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes
Are these settings correct? (type Y or the letter for one to change)
y
Adding species:
CHKHBA_J00
DUKHBADWP
SMRHBAA_M1
XELHBA_J00
DAVAGL_M14
Doing global rearrangements
!---------!
.........
Output written to output file
Tree also written onto file
lovelace$ more outfile
5 Populations
Fitch-Margoliash method with contemporary tips, version 3.55c
__ __ 2
\ \ (Obs - Exp)
Sum of squares = /_ /_ ------------
2
i j Obs
negative branch lengths not allowed
+---------------DAVAGL_M14
+--4
+-----1 +---------------DUKHBADWP
! !
+----3 +-----------------CHKHBA_J00
! !
--2 +-----------------------XELHBA_J00
!
+----------------------------SMRHBAA_M1
Sum of squares = 0.059
Average percent standard deviation = 5.73593
examined 72 trees
From To Length Time
---- -- ------ ----
4 DAVAGL_M14 0.27175 0.47712
1 4 0.02958 0.20537
4 DUKHBADWP 0.27175 0.47712
3 1 0.09078 0.17580
1 CHKHBA_J00 0.30133 0.47712
2 3 0.08501 0.08501
3 XELHBA_J00 0.39211 0.47712
2 SMRHBAA_M1 0.47712 0.47712
lovelace$ more treefile
((((DAVAGL_M14:0.27175,DUKHBADWP:0.27175):0.02958,CHKHBA_J00:0.30133):0.09078,
XELHBA_J00:0.39211):0.08501,SMRHBAA_M1:0.47712);
lovelace$ dnaml
Nucleic acid sequence Maximum Likelihood method, version 3.55c
Settings for this run:
U Search for best tree? Yes
T Transition/transversion ratio: 2.0000
F Use empirical base frequencies? Yes
C One category of substitution rates? Yes
G Global rearrangements? No
J Randomize input order of sequences? No. Use input order
O Outgroup root? No, use as outgroup species 1
M Analyze multiple data sets? No
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes
Are these settings correct? (type Y or the letter for one to change)
Y
Adding species:
CHKHBA
DUKHBADWP
SMRHBAA
XELHBA
DAVAGL
Output written to output file
Tree also written onto file
lovelace$ more outfile
Nucleic acid sequence Maximum Likelihood method, version 3.55c
Empirical Base Frequencies:
A 0.25368
C 0.29449
G 0.23346
T(U) 0.21838
Transition/transversion ratio = 2.000000
(Transition/transversion parameter = 1.523022)
+--------------DAVAGL
+-----3
! +-----------------DUKHBADWP
!
! +--------------------XELHBA
--1---------2
! +-------------------------------SMRHBAA
!
+--------------CHKHBA
remember: this is an unrooted tree!
Ln Likelihood = -3145.55232
Examined 15 trees
Between And Length Approx. Confidence Limits
------- --- ------ ------- ---------- ------
1 3 0.09292 ( 0.03404, 0.15218) **
3 DAVAGL 0.26355 ( 0.19312, 0.33542) **
3 DUKHBADWP 0.30752 ( 0.23199, 0.38496) **
1 2 0.16329 ( 0.09148, 0.23605) **
2 XELHBA 0.34539 ( 0.25789, 0.43510) **
2 SMRHBAA 0.53168 ( 0.42197, 0.64816) **
1 CHKHBA 0.25619 ( 0.18690, 0.32797) **
* = significantly positive, P < 0.05
** = significantly positive, P < 0.01
lovelace$ more treefile
((DAVAGL:0.26355,DUKHBADWP:0.30752):0.09292,(XELHBA:0.34539,
SMRHBAA:0.53168):0.16329,CHKHBA:0.25619);
lovelace$ more fmt.phy
5 340
ECFMT_2 MSESLRIIFA GTPDFAARHL DALLS-SGHN VVGVFTQPDR PAGRGKKLMP
HI32745_2 -MKSLNIIFA GTPDFAAQHL QAILN-SQHN VIAVYTQPDK PAGRGKKLQA
TTDEFFMT_3 ----MRVAFF GTPLWAVPVL DALR--KRHQ VVLVVSQPDK PQGRGLRPAP
MG39721_2 ---MFKIVFF GTSTLSKKCL EQLFYDNDFE ICAVVTQPDK INHRNNKIVP
SSCPNC ---MMKTVFF GTPDFAVPTL EALLGHPDID VLAVVSQPDR RRGRGSKLIP
SPVKVLAEEK GLPVFQP-VS LRPQENQQLV AELQADVMVV VAYGLILPKA
SPVKQLAEQN NIPVYQP-KS LRKEEAQSEL KALNADVMVV VAYGLILPKA
SPVARYAEAE GLPLLRP-AR LREEAFLEAL RQAAPEVAVV AAYGKLIPKE
SDVKSFCLEK NITFFQP--K QS-ISIKADL EKLKADIGIC VSFGQYLHQD
SPVKEVAVQA GIPVWQPERV KRCQETLAKL KNCQADFFVV VAYGQLLSPE
lovelace$ seqboot
lovelace$ cp fmt.phy infile
lovelace$ seqboot
Random number seed (must be odd)?
11
Bootstrapped sequences algorithm, version 3.55c
Settings for this run:
D Sequence, Morph, Rest., Gene Freqs? Molecular sequences
J Bootstrap, Jackknife, or Permute? Bootstrap
R How many replicates? 100
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
Are these settings correct? (type Y or the letter for one to change)
Are these settings correct? (type Y or the letter for one to change)
R
Number of replicates?
10
..
completed replicate number 1
completed replicate number 2
completed replicate number 3
completed replicate number 4
completed replicate number 5
Output written to output file
lovelace$ mv outfile infile
lovelace$ protdist
Protein distance algorithm, version 3.55c
Settings for this run:
P Use PAM, Kimura or categories model? Dayhoff PAM matrix
M Analyze multiple data sets? No
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
Are these settings correct? (type Y or the letter for one to change)
M
How many data
10
Y
Computing distances:
ECFMT_2
HI32745_2 .
TTDEFFMT_3 ..
MG39721_2 ...
SSCPNC ....
Output written to output file
Data set # 2:
Computing distances:
ECFMT_2
HI32745_2 .
TTDEFFMT_3 ..
MG39721_2 ...
SSCPNC ....
Output written to output file
Data set # 3:
Computing distances:
ECFMT_2
...
Data set # 5:
Computing distances:
ECFMT_2
HI32745_2 .
TTDEFFMT_3 ..
MG39721_2 ...
SSCPNC ....
Output written to output file
lovelace$ mv outfile infile
lovelace$ neighbor
Neighbor-Joining/UPGMA method version 3.5
Settings for this run:
N Neighbor-joining or UPGMA tree? Neighbor-joining
O Outgroup root? No, use as outgroup species 1
L Lower-triangular data matrix? No
R Upper-triangular data matrix? No
S Subreplicates? No
J Randomize input order of species? No. Use input order
M Analyze multiple data sets? No
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes
Are these settings correct? (type Y or the letter for one to change)
M
How many data sets?
10
...
Output written on output file
Tree written on tree file
Data set # 10:
CYCLE 2: OTU 1 ( 0.15957) JOINS OTU 2 ( 0.31701)
CYCLE 1: NODE 1 ( 0.29776) JOINS OTU 3 ( 0.57794)
LAST CYCLE:
NODE 1 ( 0.11937) JOINS OTU 4 ( 1.38576) JOINS OTU 5 ( 0.68429)
Output written on output file
Tree written on tree file
lovelace$ mv treefile infile
lovelace$ consense
Majority-rule and strict consensus tree program, version 3.55c
Settings for this run:
O Outgroup root? No, use as outgroup species 1
R Trees to be treated as Rooted? No
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the sets of species Yes
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes
Are these settings correct? (type Y or the letter for one to change)
Y
Output written to output file
Tree also written onto file
lovelace$ more outfile
Majority-rule and strict consensus tree program, version 3.55c
Species in order:
HI32745 2
TTDEFFMT 3
MG39721 2
SSCPNC
ECFMT 2
Sets included in the consensus tree
Set (species in order) How many times out of 10.00
.***. 10.00
..**. 8.00
Sets NOT included in consensus tree:
Set (species in order) How many times out of 10.00
.**.. 2.00
CONSENSUS TREE:
the numbers at the forks indicate the number
of times the group consisting of the species
which are to the right of that fork occurred
among the trees, out of 10.00 trees
+---------TTDEFFMT 3
+-10.0
! ! +----SSCPNC
+--9.0 +--8.0
! ! +----MG39721 2
! !
! +--------------ECFMT 2
!
+-------------------HI32745 2
remember: this is an unrooted tree!
lovelace$ more treefile
(((TTDEFFMT_3:10.0,(SSCPNC:10.0,MG39721_2:10.0):8.0):10.0,ECFMT_2:10.0):9.0,
HI32745_2:10.0);
lovelace$