Analyse bioinformatique des séquences


Mode de fonctionnement des différents logiciels inclus dans le package PHYLIP




Principes généraux :

Tous les programmes s'utilisent en tapant la commande

Fichiers de séquences : au format PHYLIP (entrelacés)
Passage par READSEQ (ou sortie de CLUSTALW)

Les programmes lisent TOUJOURS un fichier infile
Les programmes génèrent les fichiers
outfile résultats
treefile fichiers de représentations des topologies (parenthésées)
plotfile fichier graphique

Comme les programmes utilisent les sorties de programmes comme
entrées d'autres, il est INDISPENSABLE de renommer les fichiers
outfile (en infile) à chaque étape

Parcimonie

dnapars (acides nucléiques) ou protpars (proteines)
Nécessité de fichiers PHYLIP (de séquences alignées)
utilisation de l'option de fichiers PHYLIP en sortie de CLUSTALW

Your choice: 9

********* Format of Alignment Output *********

1. Toggle CLUSTAL format output = ON
2. Toggle NBRF/PIR format output = OFF
3. Toggle GCG/MSF format output = OFF
4. Toggle PHYLIP format output = ON
5. Toggle GDE format output = OFF
6. Toggle GDE output case = LOWER
7. Toggle output order = INPUT FILE
8. Create alignment output file(s) now?
9. Toggle parameter output = OFF
H. HELP

fichier.aln
CLUSTAL W(1.6) multiple sequence alignment

CHKHBA_J00 ----------------------ACACAGAGGTGCAACCATGGTGCTGTCCGCTGCTGACA
DUKHBADWP CGCAACCCCGTCAGTTGCCAGCCTGCCACACCGCTGCCGCCATGCTGACCGCCGAGGACA
SMRHBAA_M1 -------------------------AACCACCGCAAACATGAAGCTGACTGCCGAAGATA
XELHBA_J00 -----------------TGCACAACACAAACAGGAACCATGCTTCTTTCAGCCGATGACA
DAVAGL_M14 -----------------------------------------GTGCTCTCGGATGCTGACA
** * * * ** *

CHKHBA_J00 AGAACAACGTCAAGGGCATCTTCACCAAAATCGCCGGCCATGCTGAGGAGTATGGCGCCG
DUKHBADWP AGAAGCTCATCACGCAGTTGTGGGAGAAGGTGGCTGGCCACCAGGAGGAATTCGGAAGTG
SMRHBAA_M1 AACATAATGTGAAGGCCATCTGGGATCATGTCAAAGGACATGAAGAGGCGATTGGTGCAG
XELHBA_J00 AGAAACACATCAAGGCAATTATGCCTCCTATCGCTGCCCATGGCGACAAATTTGGGGGAG
DAVAGL_M14 AGACTCACGTGAAAGCCATCTGGGGTAAGGTGGGAGGCCACGCCGGTGCCTACGCAGCTG
* * * * * * ** * * *
fichier.phy

lovelace$ more tofasta.phy
5 589
CHKHBA_J00 ---------- ---------- --ACACAGAG GTGCAACCAT GGTGCTGTCC
DUKHBADWP CGCAACCCCG TCAGTTGCCA GCCTGCCACA CCGCTGCCGC CATGCTGACC
SMRHBAA_M1 ---------- ---------- -----AACCA CCGCAAACAT GAAGCTGACT
XELHBA_J00 ---------- -------TGC ACAACACAAA CAGGAACCAT GCTTCTTTCA
DAVAGL_M14 ---------- ---------- ---------- ---------- -GTGCTCTCG

GCTGCTGACA AGAACAACGT CAAGGGCATC TTCACCAAAA TCGCCGGCCA
GCCGAGGACA AGAAGCTCAT CACGCAGTTG TGGGAGAAGG TGGCTGGCCA
GCCGAAGATA AACATAATGT GAAGGCCATC TGGGATCATG TCAAAGGACA
GCCGATGACA AGAAACACAT CAAGGCAATT ATGCCTCCTA TCGCTGCCCA
GATGCTGACA AGACTCACGT GAAAGCCATC TGGGGTAAGG TGGGAGGCCA

TGCTGAGGAG TATGGCGCCG AGACCTTGGA AAGGATGTTC ACCACCTACC
CCAGGAGGAA TTCGGAAGTG AAGCTCTGCA GAGGATGTTC CTCGCCTACC
TGAAGAGGCG ATTGGTGCAG AAGCTCTTTA CAGGATGTTC TGTTGTATGC
TGGCGACAAA TTTGGGGGAG AAGCTTTGTA CAGGATGTTC ATAGTCAACC
CGCCGGTGCC TACGCAGCTG AAGCTCTTGC CAGAACCTTC CTCTCCTTCC

lovelace$ protpars
protpars: can't read infile
Please enter a new filename>fmts.phy

Protein parsimony algorithm, version 3.55c

Setting for this run:
U Search for best tree? Yes
J Randomize input order of sequences? No. Use input order
O Outgroup root? No, use as outgroup species 1
T Use Threshold parsimony? No, use ordinary parsimony
M Analyze multiple data sets? No
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print o8 janvier, 2008o
5 Print sequences at all nodes of tree No
6 Write out trees onto tree file? Yes

Are these settings correct? (type Y or the letter for one to change)

Y

Adding species:
CHKHBA_J00
DUKHBADWP
SMRHBAA_M1
XELHBA_J00
DAVAGL_M14

Doing global rearrangements
!---------!
.........

Output written to output file

Trees also written onto file

l
Protein parsimony algorithm, version 3.55c



One most parsimonious tree found:




+-----XELHBA_J00
+--3
! ! +--DAVAGL_M14
+--2 +--4
! ! +--SMRHBAA_M1
--1 !
! +--------DUKHBADWP
!
+-----------CHKHBA_J00

remember: this is an unrooted tree!


requires a total of 1400.000

lovelace$ more treefile
(((XELHBA_J00,(DAVAGL_M14,SMRHBAA_M1)),DUKHBADWP),CHKHBA_J00);
lovelace$

arbre sans distances

Distances

lovelace$ dnadist
dnadist: can't read infile
Please enter a new filename>tofasta.phy

Nucleic acid sequence Distance Matrix program, version 3.55c

Settings for this run:
D Distance (Kimura, Jin/Nei, ML, J-C)? Kimura 2-parameter
T Transition/transversion ratio? 2.0
C One category of substitution rates? Yes
L Form of distance matrix? Square
M Analyze multiple data sets? No
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes

Are these settings correct? (type Y or letter for one to change)

Y

Distances calculated for species
CHKHBA_J00 ....
DUKHBADWP ...
SMRHBAA_M1 ..
XELHBA_J00 .
DAVAGL_M14

Distances written to file

lovelace$ more outfile
5
CHKHBA_J00 0.0000 0.5962 0.9649 0.7203 0.6094
DUKHBADWP 0.5962 0.0000 1.0130 0.7741 0.5435
SMRHBAA_M1 0.9649 1.0130 0.0000 0.9289 0.9209
XELHBA_J00 0.7203 0.7741 0.9289 0.0000 0.8969
DAVAGL_M14 0.6094 0.5435 0.9209 0.8969 0.0000

lovelace$mv outfile infile

lovelace$ fitch

Fitch-Margoliash method version 3.55c

Settings for this run:
U Search for best tree? Yes
P Power? 2.00000
- Negative branch lengths allowed? No
O Outgroup root? No, use as outgroup species 1
L Lower-triangular data matrix? No
R Upper-triangular data matrix? No
S Subreplicates? No
G Global rearrangements? No
J Randomize input order of species? No. Use input order
M Analyze multiple data sets? No
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes

Are these settings correct? (type Y or the letter for one to change)
y

Adding species:
CHKHBA_J00
DUKHBADWP
SMRHBAA_M1
XELHBA_J00
DAVAGL_M14

Output written to output file

Tree also written onto file

lovelace$ more outfile

5 Populations

Fitch-Margoliash method version 3.55c

__ __ 2
\ \ (Obs - Exp)
Sum of squares = /_ /_ ------------
2
i j Obs

Negative branch lengths not allowed


+----------------DAVAGL_M14
+---3
! +--------------DUKHBADWP
!
! +----------------------XELHBA_J00
--1-----2
! +--------------------------------SMRHBAA_M1
!
+---------------CHKHBA_J00


remember: this is an unrooted tree!

Sum of squares = 0.03950

Average percent standard deviation = 4.68447

examined 15 trees

Between And Length
------- --- ------
1 3 0.06233
3 DAVAGL_M14 0.28139
3 DUKHBADWP 0.26211
1 2 0.09924
2 XELHBA_J00 0.37775
2 SMRHBAA_M1 0.55115
1 CHKHBA_J00 0.26879


lovelace$ more treefile
((DAVAGL_M14:0.28139,DUKHBADWP:0.26211):0.06233,(XELHBA_J00:0.37775,
SMRHBAA_M1:0.55115):0.09924,CHKHBA_J00:0.26879);
lovelace$


lovelace$ neighbor

Neighbor-Joining/UPGMA method version 3.5

Settings for this run:
N Neighbor-joining or UPGMA tree? Neighbor-joining
O Outgroup root? No, use as outgroup species 1
L Lower-triangular data matrix? No
R Upper-triangular data matrix? No
S Subreplicates? No
J Randomize input order of species? No. Use input order
M Analyze multiple data sets? No
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes

Are these settings correct? (type Y or the letter for one to change)
y

CYCLE 2: OTU 3 ( 0.54903) JOINS OTU 4 ( 0.37987)
CYCLE 1: OTU 1 ( 0.27209) JOINS NODE 3 ( 0.10606)
LAST CYCLE:
NODE 1 ( 0.05896) JOINS OTU 2 ( 0.26461) JOINS OTU 5 ( 0.27889)

Output written on output file

Tree written on tree file


lovelace$ more outfile

5 Populations

Neighbor-Joining/UPGMA method version 3.55c


Neighbor-joining method

Negative branch lengths allowed


+---------------DUKHBADWP
!
--3----------------DAVAGL_M14
!
! +---------------CHKHBA_J00
+---2
! +--------------------------------SMRHBAA_M1
+-----1
+----------------------XELHBA_J00


remember: this is an unrooted tree!

Between And Length
------- --- ------
3 DUKHBADWP 0.26461
3 DAVAGL_M14 0.27889
3 2 0.05896
2 CHKHBA_J00 0.27209
2 1 0.10606
1 SMRHBAA_M1 0.54903
1 XELHBA_J00 0.37987

lovelace$ more treefile
(DUKHBADWP:0.26461,DAVAGL_M14:0.27889,(CHKHBA_J00:0.27209,
(SMRHBAA_M1:0.54903,XELHBA_J00:0.37987):0.10606):0.05896);
lovelace$

lovelace$ kitsch

Fitch-Margoliash method with contemporary tips, version 3.55c

Settings for this run:
U Search for best tree? Yes
P Power? 2.00000
- Negative branch lengths allowed? No
L Lower-triangular data matrix? No
R Upper-triangular data matrix? No
S Subreplicates? No
J Randomize input order of species? No. Use input order
M Analyze multiple data sets? No
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes

Are these settings correct? (type Y or the letter for one to change)
y

Adding species:
CHKHBA_J00
DUKHBADWP
SMRHBAA_M1
XELHBA_J00
DAVAGL_M14

Doing global rearrangements
!---------!
.........

Output written to output file

Tree also written onto file

lovelace$ more outfile

5 Populations

Fitch-Margoliash method with contemporary tips, version 3.55c

__ __ 2
\ \ (Obs - Exp)
Sum of squares = /_ /_ ------------
2
i j Obs

negative branch lengths not allowed


+---------------DAVAGL_M14
+--4
+-----1 +---------------DUKHBADWP
! !
+----3 +-----------------CHKHBA_J00
! !
--2 +-----------------------XELHBA_J00
!
+----------------------------SMRHBAA_M1

Sum of squares = 0.059

Average percent standard deviation = 5.73593

examined 72 trees

From To Length Time
---- -- ------ ----

4 DAVAGL_M14 0.27175 0.47712
1 4 0.02958 0.20537
4 DUKHBADWP 0.27175 0.47712
3 1 0.09078 0.17580
1 CHKHBA_J00 0.30133 0.47712
2 3 0.08501 0.08501
3 XELHBA_J00 0.39211 0.47712
2 SMRHBAA_M1 0.47712 0.47712

lovelace$ more treefile
((((DAVAGL_M14:0.27175,DUKHBADWP:0.27175):0.02958,CHKHBA_J00:0.30133):0.09078,
XELHBA_J00:0.39211):0.08501,SMRHBAA_M1:0.47712);



lovelace$ dnaml

Nucleic acid sequence Maximum Likelihood method, version 3.55c

Settings for this run:
U Search for best tree? Yes
T Transition/transversion ratio: 2.0000
F Use empirical base frequencies? Yes
C One category of substitution rates? Yes
G Global rearrangements? No
J Randomize input order of sequences? No. Use input order
O Outgroup root? No, use as outgroup species 1
M Analyze multiple data sets? No
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes

Are these settings correct? (type Y or the letter for one to change)
Y
Adding species:
CHKHBA
DUKHBADWP
SMRHBAA
XELHBA
DAVAGL


Output written to output file

Tree also written onto file

lovelace$ more outfile


Nucleic acid sequence Maximum Likelihood method, version 3.55c

Empirical Base Frequencies:

A 0.25368
C 0.29449
G 0.23346
T(U) 0.21838

Transition/transversion ratio = 2.000000

(Transition/transversion parameter = 1.523022)


+--------------DAVAGL
+-----3
! +-----------------DUKHBADWP
!
! +--------------------XELHBA
--1---------2
! +-------------------------------SMRHBAA
!
+--------------CHKHBA


remember: this is an unrooted tree!

Ln Likelihood = -3145.55232

Examined 15 trees

Between And Length Approx. Confidence Limits
------- --- ------ ------- ---------- ------

1 3 0.09292 ( 0.03404, 0.15218) **
3 DAVAGL 0.26355 ( 0.19312, 0.33542) **
3 DUKHBADWP 0.30752 ( 0.23199, 0.38496) **
1 2 0.16329 ( 0.09148, 0.23605) **
2 XELHBA 0.34539 ( 0.25789, 0.43510) **
2 SMRHBAA 0.53168 ( 0.42197, 0.64816) **
1 CHKHBA 0.25619 ( 0.18690, 0.32797) **

* = significantly positive, P < 0.05
** = significantly positive, P < 0.01


lovelace$ more treefile
((DAVAGL:0.26355,DUKHBADWP:0.30752):0.09292,(XELHBA:0.34539,
SMRHBAA:0.53168):0.16329,CHKHBA:0.25619);

lovelace$ more fmt.phy
5 340
ECFMT_2 MSESLRIIFA GTPDFAARHL DALLS-SGHN VVGVFTQPDR PAGRGKKLMP
HI32745_2 -MKSLNIIFA GTPDFAAQHL QAILN-SQHN VIAVYTQPDK PAGRGKKLQA
TTDEFFMT_3 ----MRVAFF GTPLWAVPVL DALR--KRHQ VVLVVSQPDK PQGRGLRPAP
MG39721_2 ---MFKIVFF GTSTLSKKCL EQLFYDNDFE ICAVVTQPDK INHRNNKIVP
SSCPNC ---MMKTVFF GTPDFAVPTL EALLGHPDID VLAVVSQPDR RRGRGSKLIP

SPVKVLAEEK GLPVFQP-VS LRPQENQQLV AELQADVMVV VAYGLILPKA
SPVKQLAEQN NIPVYQP-KS LRKEEAQSEL KALNADVMVV VAYGLILPKA
SPVARYAEAE GLPLLRP-AR LREEAFLEAL RQAAPEVAVV AAYGKLIPKE
SDVKSFCLEK NITFFQP--K QS-ISIKADL EKLKADIGIC VSFGQYLHQD
SPVKEVAVQA GIPVWQPERV KRCQETLAKL KNCQADFFVV VAYGQLLSPE

lovelace$ seqboot

lovelace$ cp fmt.phy infile
lovelace$ seqboot

Random number seed (must be odd)?
11

Bootstrapped sequences algorithm, version 3.55c

Settings for this run:
D Sequence, Morph, Rest., Gene Freqs? Molecular sequences
J Bootstrap, Jackknife, or Permute? Bootstrap
R How many replicates? 100
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes

Are these settings correct? (type Y or the letter for one to change)
Are these settings correct? (type Y or the letter for one to change)
R
Number of replicates?
10

..

completed replicate number 1
completed replicate number 2
completed replicate number 3
completed replicate number 4
completed replicate number 5

Output written to output file
lovelace$ mv outfile infile

lovelace$ protdist
Protein distance algorithm, version 3.55c

Settings for this run:
P Use PAM, Kimura or categories model? Dayhoff PAM matrix
M Analyze multiple data sets? No
I Input sequences interleaved? Yes
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes

Are these settings correct? (type Y or the letter for one to change)
M
How many data
10
Y

Computing distances:
ECFMT_2
HI32745_2 .
TTDEFFMT_3 ..
MG39721_2 ...
SSCPNC ....

Output written to output file

Data set # 2:

Computing distances:
ECFMT_2
HI32745_2 .
TTDEFFMT_3 ..
MG39721_2 ...
SSCPNC ....

Output written to output file

Data set # 3:

Computing distances:
ECFMT_2
...
Data set # 5:

Computing distances:
ECFMT_2
HI32745_2 .
TTDEFFMT_3 ..
MG39721_2 ...
SSCPNC ....

Output written to output file

lovelace$ mv outfile infile
lovelace$ neighbor

Neighbor-Joining/UPGMA method version 3.5

Settings for this run:
N Neighbor-joining or UPGMA tree? Neighbor-joining
O Outgroup root? No, use as outgroup species 1
L Lower-triangular data matrix? No
R Upper-triangular data matrix? No
S Subreplicates? No
J Randomize input order of species? No. Use input order
M Analyze multiple data sets? No
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the data at start of run No
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes

Are these settings correct? (type Y or the letter for one to change)
M
How many data sets?
10
...

Output written on output file

Tree written on tree file
Data set # 10:

CYCLE 2: OTU 1 ( 0.15957) JOINS OTU 2 ( 0.31701)
CYCLE 1: NODE 1 ( 0.29776) JOINS OTU 3 ( 0.57794)
LAST CYCLE:
NODE 1 ( 0.11937) JOINS OTU 4 ( 1.38576) JOINS OTU 5 ( 0.68429)

Output written on output file

Tree written on tree file

lovelace$ mv treefile infile
lovelace$ consense

Majority-rule and strict consensus tree program, version 3.55c

Settings for this run:
O Outgroup root? No, use as outgroup species 1
R Trees to be treated as Rooted? No
0 Terminal type (IBM PC, VT52, ANSI)? ANSI
1 Print out the sets of species Yes
2 Print indications of progress of run Yes
3 Print out tree Yes
4 Write out trees onto tree file? Yes

Are these settings correct? (type Y or the letter for one to change)
Y

Output written to output file

Tree also written onto file

lovelace$ more outfile

Majority-rule and strict consensus tree program, version 3.55c

Species in order:

HI32745 2
TTDEFFMT 3
MG39721 2
SSCPNC
ECFMT 2

Sets included in the consensus tree

Set (species in order) How many times out of 10.00

.***. 10.00
..**. 8.00

Sets NOT included in consensus tree:

Set (species in order) How many times out of 10.00

.**.. 2.00
CONSENSUS TREE:
the numbers at the forks indicate the number
of times the group consisting of the species
which are to the right of that fork occurred
among the trees, out of 10.00 trees

+---------TTDEFFMT 3
+-10.0
! ! +----SSCPNC
+--9.0 +--8.0
! ! +----MG39721 2
! !
! +--------------ECFMT 2
!
+-------------------HI32745 2


remember: this is an unrooted tree!


lovelace$ more treefile
(((TTDEFFMT_3:10.0,(SSCPNC:10.0,MG39721_2:10.0):8.0):10.0,ECFMT_2:10.0):9.0,
HI32745_2:10.0);
lovelace$

Ecran suivant

© Université de TOURS - GENET

Document modifié, le 21 novembre, 2006