Home | Contact 

BOXSHADE 3.2

This is a new release of the PASCAL program BOXSHADE , intended for shading multiple aligned sequence files. This is not a completely 'official' release, but makes the new facilities in v3.0 available until Kay can make time to tie up a full release. Currently, executables are only available for VMS, OSF1 and DOS (thanks to Dr. G. P. Thomas for compiling v3.0 for DOS).

*-N.B.-* We need someone with a Turbo Pascal compiler to make an executable for v3.2, please! If anyone wants to make other executables available at EBI or EMBL, please feel free to do so. This program is largely the work of Kay Hofmann (address below); the new shading/thresholding algorithm, and a couple of other bits and pieces, are the work of Michael D. Baron (address below). Address queries to the one you think most likely to help.

------------------------------------------------------------------------

What's new in v3.2?

The major change is the addition of PICT file output. PICT files are standard on Macs, and are also read by most of the PC drawing packages (e.g. Corel, Harvard Graphics, FreeLance for Windows). The files have been tested on every Mac and PC program I could get my hands on, and import correctly. Extra notation, outlining regions, etc., can now be done to the shaded alignment, which can also be in colour (text and shading). *Nothing* is perfect in this world, however, and an editable figure containing (for 6 sequences of 300 amino acids each) about 4000 editable objects will give most computers a headache unless they are very fast and have buckets of free RAM. The image that is imported is 'grouped', and can be moved around and sized, and things put over or behind it. If you 'ungroup' it, you will have problems. Be warned!! A few other small things have been tidied up: the 'master' sequence can be set in the command line for 'comparison to a single sequence, and /split is enforced for EPSF and PICT output. I tried to take care of the situation where the block is bigger than the page (anyone out there aligning 100 sequences?). Note that the program will still allow you to put too many characters on the page for PS, EPSF, HPGL, FIG and PICT output, where different font sizes are possible, but, hey, we're all intelligent people, right? What's new in v3.0? This is version 3.0 rather than 2.8 because the shading algorithm has been completely rewritten, and is now more sophisticated. Basically, MDB wrote it to do what he would do if he was shading an alignment by hand. As an unsophisticated molecular biologist, he hopes it will meet the requirement of all the other unsophisticated molecular biologists out there. The full description of the shading process is below. Also in this version, the introduction of a 'threshold' fraction of residues that must agree before there is a consensus at that position. Two more output formats have been included, with Kay threatening to add more in the next version. These are a .FIG file for the Unix program xfig, and ASCII text output, with varying or conserved residues printed.

------------------------------------------------------------------------

What is new in 2.7?

------------------------------------------------------------------------

The new version BOXSHADE 2.7 replaces the previous versions BOXHADE/DOS 2.6 and BOXSHADE/VMS 2.4 - The biggest difference to the previous versions is important only for me as the program author, the user is not supposed to even notice it. I restructured the source code in order to use the same code for DOS, VMS and OSF1 version. Considered the lacking standardization of PASCAL, that turned out to be a major challenge and involved some tricks. The implementation-dependent parts are concentrated in a small compilation unit, the program source code itself uses only very few conditional compilation (which i have to emulate for some of the compilers). - As a side-effect of this code merging, the new version supports all devices and formats of both previous versions. Even if some of them are common only in one particular type of environment, they don't hurt when using different systems. Supported output devices are now: PS, EPS, HPGL, RTF, VT100, ANSI, ReGIS, PC-CRT, DEC-LJ250. - To take account of the increasing importance of networking, particularly of NFS-mounting of remote disks, all version of BOXSHADE can now read parameter- and alignment files in both MSDOS and UNIX text file format. Unix and MSDOS text files use different line-terminators and PASCAL programs usually choke on files having the wrong format. It is now possible, for DOS, UNIX and VMS versions of BOXSHADE to share the same parameter files. It is also no problem, for BOXSHADE/DOS to read an alignment file created by UNIX-Pileup directly from a NFS mounted volume. - By the new command line parameters /unix and /dos, BOXSHADE can be forced to write its output files in either unix or DOS style. If none of the flags is used, BOXSHADE writes in the native style of the machine it is running on. - The new command line parameter /check gives information on all allowed command line parameters and allows adding them later. This functionality is similar to the GCG programs. - of all the helpful and important suggestions that unsatisfied users of BOXSHADE sent to me, I picked only a single one for implementing in V2.7 (but I promise to include more in V2.8!). When using the option of 'shading according to a master sequence', it is now possible to hide this master sequence in the final output. By this method, it is possible to have more influence on the look of the final alignment graph. For example, it is possible to remove (unconserved) parts of the master sequence and thus get shading only in the remaining parts of the alignment.

------------------------------------------------------------------------

CONTENTS of the packages

------------------------------------------------------------------------

GENERIC SOURCE-CODE PACKAGE

===========================

This archive file contains the universal source code for all BOXSHADE implementations, together with appropriate makefilesor scripts for building the program. Since not all PASCAL implementations support conditional compilation, a set of 'preprocessors' is provided for the creation of the machine-specific source code out of the generic one. The package contains the following files boxshade.doc : this file dep_xxx .pas : a set of files containing machine-specific parts bx_types.pas : a compilation unit containing type and var definitions bx_read .pas : the input routines for several alignment formats bx_dev .pas : the output routines for several file/device formats box .pas : the main program makefile.osf1 : a makefile for AXP/OSF1 systems makefile.com : a command file for VMS systems pp_vms .pas : a kind of precompiler for VAX/VMS systems pp_osf1 .pas : a kind of precompiler for AXP/OSF1 systems (MS-DOS/Turbo Pascal need no precompiler) box_pep.sim : files containing amino-acid/DNA similarities box_dna.sim box_pep.par : files containing the default parameters for PEPTIPE/DNA box_dna.par alignments (shading modes, devices etc.) box-pep.grp : files containing the amino acid/DNA groupings for box_dna.grp consensus by similarity

EXECUTABLE PACKAGES

===================

There are several such packages for different machines/operating systems Each contains the following files: boxshade.doc : this file BOX .EXE : the BOXSHADE executable (or box) box_pep.sim : files containing amino-acid/DNA similarities box_dna.sim box_pep.par : files containing the default parameters for PEPTIPE/DNA box_dna.par alignments (shading modes, devices etc.)

------------------------------------------------------------------------

COMPILATION

------------------------------------------------------------------------

If you have a PASCAL compiler on your system, it may be advantageous to compile the sources on your machine instead of using the precompiled and statically linked executables supplied. It also allows you to make modifications to the source code. There is only one kind of generic source code available that has to be preprocessed in order to get machine-specific source code that can be understood by your PASCAL compiler. This 'preprocessing' is done by a simple program supplied with this package; the operation is included in the makefile or building script so normally you don't have to worry about this. The generic source code follows the conditional compilation rules of Borland Pascal. -If you use a MSDOS system with Borland-Pascal, use the integrated development environment. You have to define the logical 'Borland_Pascal'. The program consists of four units (dep_tp,bx_types,bx_read,bx_write) and the main program box. After compilation, move the resulting BOX.EXE to a directory contained in your PATH statement.Move the parameter files *.par and *.sim to any directory and include the statement set boxdir=c:\mydir in your autoexec.bat file if you want to run BOXSHADE from a different directory. -If you use a AXP/OSF1 system with DECpascal, rename Makefile.OSF1 into Makefile and subsequently run the make utility. If not using the makefile, pay attention that the defines 'dec' and 'unix' have to be set. Install the program into any directory contained in the PATH statement and make sure to define the environment variable BOXDIR pointing to the installation. The program consists of four modules (dep_dec,bx_types,bx_read,bx_write) and the main program box. In a first step, the preprocessor pp_osf1 is made. In the second step, the *.pas files are preprocessed, resulting in the corresponding *.p files. These are forwarded to the PASCAL compiler that compiles them into several *.pen and *.o files and finally creates the output file box. After successful compilation you may remove all files *.o *.p and *.pen created by the compiler. -When on a VAX/VMX system using DECpascal, use the build-script makefile.com. After Compilation, Install the program as a VMS symbol e.g.: box:=="$mydisk:[mydir]box.exe" and make sure to define a logical BOXDIR pointing to the directory containing the parameter files. The program consists of four modules (dep_dec,bx_types,bx_read,bx_write) and the main program box. In a first step, the preprocessor pp_vms.exe is made. In the second step, the *.pas files are preprocessed, resulting in the corresponding *.p files. These are forwarded to the PASCAL compiler that compiles them into several *.pen and *.obj files and finally creates the output file box.exe. After successful compilation you may remove all files *.obj *.p and *.pen created by the compiler.

------------------------------------------------------------------------

INSTALLATION

------------------------------------------------------------------------

If you use one of the packages containing the executables, this is easily done: for running BOXSHADE you need the following files - BOX.EXE (or box) - box_pep.par, box_dna.par - box_pep.sim, box_dna.sim - box_pep.grp, box_dna.grp Copy those files into the directory you want to run BOXSHADE from. If you plan to run BOXSHADE from different directories, you have to make the following precautions: -make the dir holding the files part of your PATH statement. On unix systems, it is recommended to copy the executable to /usr/local/bin and the parameter files (*.par and *.sim) to an arbitrary directory, e.g. /usr/local/box On VMS systems, install the program as a symbol like e.g. box :== "$mydisk:[mydir]box.exe" -create a logical (or environment variable) called BOXDIR pointing to the directory holding the parameter files (*.par, *.grp and *.sim) On unix systems, make sure that the value ends with a slash(/) e.g. use the statement setenv BOXDIR /usr/local/box in your .cshrc file On VMS systems, create a logical using the command define BOXDIR mydisk:[mydir] in your login.com file. On MSDOS systems, include the line set BOXDIR c:\mydir in the autoexec.bat file. If you do not set these logicals, the program looks for the parameter files only in the current directory and, in case of failure, asks you to specify the whole filename.

-------------------------------------------------------------------------

SHORT PROGRAM DESCRIPTION

-------------------------------------------------------------------------

- Purpose -

BOXSHADE is a program for creating good looking printouts from multiple-aligned protein or DNA sequences. The program does no alignment by itself, it has to take as input a file preprocessed by a multiple alignment program or a multiple file editor. See below for a list of supported input formats and output devices. In the standard BOXSHADE output, identical and similar residues in the multiple-alignment chart are represented by different colors or shadings. There are some more options concerning the kind of shading to be applied, sequence numbering, consensus output and so on. The user interface is a bit clumsy at the moment, one has to answer a lot of questions in order to get the desired output. There is, however, the possibility to use default parameters from a standard parameter file or to supply the program with parameters from the command line. At the moment, the VMS and DOS versions of BOXSHADE have identical user interfaces. - Input formats - BOXSHADE 3.2 knows about the following input file formats: (some of the are generally used only for MSDOS or VMS systems) + CLUSTAL and CLUSTALV, multiple alignment program, DOS/VMS/MAC default extension .ALN + ESEE, multiple sequence editor, DOS default extension .ESE + PHYLIP, phylogenetic analysis package, DOS, VMS, UNIX default extension .PHY + PILEUP and PRETTY of the GCG sequence analysis package VMS/UNIX default extensions .MSF and .PRE NB!! you are strongly encouraged NOT to use the PRETTY format as input, it may be incompatible with the revised version of .MSF input. We can't actually think why anyone would use this format now, .MSF files are more useful generally. + MALIGNED, multiple sequence editor, VMS only default extension .MAL BOXSHADE tries to determine the file type from the extension but will work also if different extensions are used. - Output devices - BOXSHADE 3.2 supports the following output devices + POSTSCRIPT/EPS creates POSTSCRIPT(TM) files for printing on a Laserprinter or for further conversion with a POSTCRIPT interpreter (like GHOSTSCRIPT) + HPGL for export to various graphics programs or for conversion/printing with the shareware program PRINTGL. Plotting BOXSHADE output on a plotter is generally not recommended + RTF for export to various word-processing and graphics programs + CRT, uses direct screen writes to the PC-monitor. Possible options depend on the graphics adapter used. This output device is supported only in the MSDOS version. + ANSI. On a PC, this option uses an ANSI device driver (ANSI.SYS) that has to be loaded in CONFIG.SYS previously. Possible character renditions are reverse, bold,underlined, blinking etc. On non-DOS systems, this option behaves more or less like the VT100 output mode. + VT100 for display on a VT100 compatible terminal or emulator. + ReGISterm for display on a ReGIS compatible graphics terminal or emulator. + ReGISfile for later conversion by the program RETOS (copyright DEC) in order to print on DIGITALs printer series. + LJ250 for printing on DIGITALS LJ250 color printer. + ASCII output showing either the conserved residues or the varying ones (others as '-'). + FIG file for xfig 2.1. + PICT files for import to Mac and PC graphics progs. Some of the formats above offer the possibility of scaling the characters and of rotating the plot. Character size has to be entered in 'point' units. Normal output orientation is in portrait mode (PS/EPS/HPGL/PICT only), to obtain output in landscape orientation, 'rotate plot = y' has to be chosen. When creating multi-page output, all pages are contained in a single output file. If one page per file is desired, one has to use the command line parameter /SPLIT. This is enforced when requesting EPSF or PICT file output, as multi-page EPSFs are a contradiction of the purpose of an EPSF and large PICT files would probably be too big for most personal computers. While using the terminal as output device, the 'RETURN' key has to be pressed to obtain the next page of output. - Sequence numbering - Starting with version 2.2 there is the possibility to add numbering to the output files. The numbers are printed between the sequence names and the sequence itself. Since most of the input-files either use no numbering or number the first position in the alignment always with a "1" (and that does not necessarily reflect the numbers within the original sequence), the user is asked to enter the starting position for each sequence. The command line flag /DEFNUM suppressed that question, a starting position of 1 is assumed for all sequences. Boxshade starts with the value entered for the leftmost position and continues numbering every valid symbol, skipping blanks, '-','.' and stuff like that. - Default parameters - Several people using previous releases of BOXSHADE pointed me to the need of having default parameters for the various questions asked by the program. They argued that most sites only use one type of input files, one output device and one choice of colors for the output. I therefore added a management of default parameters allowing two levels of assistance to the user. 1) all default parameters are contained in an ASCII file that can be modified easily to accommodate the users taste. The format is roughly documented within the file-header, it resembles the keyboard input one has to make if using the program interactively. There are two such files supplied with this release of BOXSHADE, BOX_DNA.PAR and BOX_PEP.PAR , holding some example parameters for peptide and dna-comparisons. there are no big differences between these two, the major one is that when shading DNA-comparisons one doesn't care of "similar" residues. 2) to run the program with minimal user interaction, I have added the possibility to use command line parameters. At the moment, you can use: /check : list all allowed command line parameters (this list) and allows parameters to be added. /def : program runs without questions, BOX_PEP.PAR is used as default /dna : makes the program use BOX_DNA.PAR as parameter file /pep : makes the program use BOX_PEP.PAR as parameter file /in=xxx : makes the program take xxx as input file /out=yyy : makes the program take yyy as output file (note1) /par=zzz : makes the program use zzz as a default parameter file /type=1 : makes the program assume an input file of type 1 (PRETTY/MSF) /dev=1 : makes the program assume and output device of type 1 (CRT) /numdef : use default numbering (all sequences starting with "1") /thr : threshold fraction of residues that must agree for a consensus /split : forces one page per file output, creates multiple output files. /cons : makes the program create an additional consensus line (see below) /symbcons=: influences the way the consensus line is displayed. (see below) /unix : writes output files in unix style (LF only) (note2) /dos : writes output files in DOS style (CR/LF) (note2) note1: on unix machines, use out=OUTPUT for terminal output on DOS machines, use out=con: on VMS machines, use out=tt: note2: if no mode is specified, the native style of the machine is used.

******* ATTENTION ! ********************************************************

on unix systems, the dash (-) instead of the slash (/) has to be used as separation character for command line parameters. For example, a valid unix command line is: box -def -numdef -cons -symbcons=" .*" **************************************************************************

- Shading strategies (similarity to consensus or single sequence) -

Starting with version 3, BOXSHADE has a new shading system. The first difference is the introduction of a threshold fraction of residues that must agree for there to be a consensus. Previously, the program assumed that SOME residue was always the consensus. If no two residues were the same, the first sequence provided the consensus residue. This threshold fraction can be any number between 0.0 and 1.0. The number of sequences that must agree for there to be a consensus is, as you might expect, this fraction times the total number of sequences in the alignment (fractions of a sequence count as one, e.g. 3.2 becomes 4). The second difference is the idea of 'consensus by similarity'; this tries to take account of the situations where all the sequences may have (for example) R or K at a position, but neither in a majority. It would not be logical to shade one type of residue as 'identical' and the other as 'similar'; the threshold function might also eliminate both as being in too small numbers. Therefore, if there is not a single residue that is conserved (greater than the threshold) at a position, the program looks for a 'group' of amino acids that fulfills the requirements. 'Groups' are defined in the .grp files. Users can tailor these to their personal prejudices. Any amino acid not listed is assumed not to be in a group. All members of a group are considered to be mutually similar, unlike the .sim files, described below. If consensus by similarity is found, all the residues in the consensus are shaded using the 'similar' shading defined by the user. If the user does not select 'shading by similarity', only identity-type consensus is looked at. If an identity-type consensus is found, and similarity shading is in operation, the program looks to see if the remaining residues are similar to the consensus residue. Here the box_xxx.sim files are used. The main difference between relationships in these files and those in the .grp files is that, e.g. in a .grp file the line STA means that all three a.a.s are mutually similar. In a .sim file S TA means that both T and A are considered similar to S, where there is a conserved S residue in more than threshold number of sequences. However, it does NOT mean that T and A are similar to each other. Note that cases where two residues, or groups of residues, fulfill the threshold requirements (as could happen with values of the thr. fraction less than or equal to 0.5) are treated as having no consensus. This describes the main shading model 'shading according to a consensus'. The alternative model is called 'shading according to a master sequence'. In this case the user is prompted for a sequence of the alignment and consecutively that sequence is taken to be the 'consensus'. Only those residues become shaded that are identical or similar to the chosen sequence. Output obtained with this option tends to be less shaded and neglects similarities between the other (non-chosen) sequences. Starting in V2.7, this 'master sequence' can be hidden. Thus, it only influences the shading of the other sequences without being shown itself. - Consensus display - Starting with version 2.5, BOXSHADE offers the possibility to create an additional line holding a consensus symbol. This line can either be obtained by using the command line qualifier /CONS or interactively by answering the question ' create consensus? '. The way this consensus line is displayed can be modified by the command line parameter SYMBCONS=xyz, by editing the respective entry in the .PAR file or interactively. Since the SYMBCONS syntax is not intuitive, here a brief description: The SYMBCONS parameter consist of exactly three symbols: + the first one stands for 'normal' sequence residues that are not involved in any similar/identical relationship. + the second symbol represents positions that are similar in all sequences of the alignment. See the files BOX_PEP.SIM and BOX_DNA.SIM to see what residues are considered similar. + the third symbol represents positions that are identical in all sequences of the alignment. A SYMBCONS parameter string " .*" (blank/point/asterisc) means: label all positions in the alignment with totally identical residues by an asterisc, all positions with all similar residues by a point and do not mark the other positions. The letter 'B' can be used instead of the blank, this is necessary e.g. when using the command line option /SYMBCONS=B.* which gives the same result as the above example. The option /SYMBCONS= .* would result in an unexpected behaviour because MSDOS squeezes blanks out of the command line. Besides points, asteriscs and other symbols, there are two special characters when they appear in the SYMBCONS string: 'L' and 'U'. An 'L' means, that a lowercase representation of the most abundant residue at that position is to be used instead of a fixed consensus symbol while an 'U' means an uppercase character representation of that residue. A possible application would be the SYMBCONS string " LU" where similar residues are represented by lowercase characters and identical by uppercase characters.

--------------------------------------------------------------------------

SOME TIPS:

--------------------------------------------------------------------------

- availability of BOXSHADE -

The most recent official version of BOXSHADE (3.2) is available by FTP from

ftp.ebi.ac.uk

ftp.embl-heidelberg.de

ftp.isrec.sib.swiss

--------------------------------------------------------------------------

- shareware/PD programs useful in conjunction with BOXSHADE - multiple alignment files that to be used by BOXSHADE can be created, amongst others, by the following PD/freeware programs: + PHYLIP by Joe Felsenstein, available by ftp from anthro.utah.edu + ESEE by Eric Cabot, available from the same sources as BOXSHADE (see above) + CLUSTAL by Des Higgins, ditto for preview/conversion of POSTSCRIPT files, the program GHOSTSCRIPT from GNU software foundation is highly recommended. It is available from all major MSDOS ftp-sites (e.g. SIMTEL or ftp.uni-koeln.de) There is also a version tested for use with boxshade available at vax0.biomed.uni-koeln.de although this might be not the most recent release. for Mac users, there is MacGhostscript, also available from the main archives (info-mac, umich and their mirrors). A *very* good tool for putting a preview image into an EPSF file, often a prerequisite for incorporating into a drawing package, is PS2EPS, by Peter Lerup. This can be found on info-mac. for preview/conversion of HPGL files, the shareware program PRINTGL 1.18 by Cary Ravitz is highly recommended. It is available from many MSDOS ftp sites and from netserv@embl-heidelberg.de - output on dot printers - Since PRINTGL offers a broad choice of printer types and is a nice program, I recommend its use for printing BOXSHADE output on non-POSTSCRIPT printers. Use HPGL output with options 0F1N for normal residues 2F1N for identical residues 3F1N for similar residues 2F4N for conserved residues 8 for character size not rotated (these are the standard parameters in BOX_PEP.PAR) for creating a HPGL files. (lets call it TEST.PLT) Now use PRINTGL either interactively by calling PMI or use a command line like: PRINTGL /Fx/S0340/Waaac/Ptest.plt where test.plt is to be replaced by the filename to convert and the x in the expression /Fx is to be replaced by the letter of the printer you use. (See the PRINTGL documentation for further details)

--------------------------------------------------------------------------

RESTRICTIONS:

--------------------------------------------------------------------------

+ The RTF output and PHYLIP input implementations are still experimental. Please tell me of your experiences with the program. + the current DOS version supports only 13 sequences with 2000 residues each. This parameters can be easily changed in the source code. If you cannot compile the sources because you are lacking a pascal compiler, contact the author for precompiled versions

---------------------------------------------------------------------------

DISTRIBUTION POLICY

---------------------------------------------------------------------------

BOXSHADE is completely public-domain and may be passed around and modified without any notice to the author. If you have problems, suggestions or remarks, please contact either of us:

Kay Hofmann, PhD Tel: +49 (221) 950 4814 Bioinformatics Group FAX: +49 (221) 950 4848

MEMOREC Stoffel GmbH Stoeckheimer Weg 1 D50829 Koeln/Germany

E-mail: Kay.Hofmann@memorec.com

Michael D. Baron Institute for Animal Health Ash Road Pirbright Surrey GU24 0NF U.K.

E-mail: michael.baron@bbsrc.ac.uk