DBWISE
(Author: Elena Kochetkova)

PROGRAM DESCRIPTION

This python-based program is a database "aid" for choosing a group of chimeras according to characteristics they have in common in terms of both amino acid sequence and phenotype. It performs two types of searches: a search by amino acid position and ID (e.g. from position 20 to position 150) and a search by phenotype (e.g. Presence of interaction with AvrPto at 22oC). The resulting sequence or sequence fragment that fulfills both characteristics is deposited into an output file with .ID extension in the first case and .range in the second case.

The program can be used for automatically generating the input alignment file for statistical analysis; this is performed by using the alignment as the input file and the list of clones that should be chosen for the statistical analysis in another one. DbWise is also useful for finding sequences that do not follow a prediction that was made by statistical analysis. For example, it can detect sequences that have a lysine at amino acid 233 but still bind AvrPto.

The alignment file used as an input can be in both simple text and Tab-delimited formats. The phenotype file should be in tab-delimited format, usually with “1” if the phenotype is positive and “0” if the phenotype is negative. A third file with a list of clones to choose from the input file can also be used (optional); this is especially useful for the generation of input files for statistical analysis.

Using DbWise

Type "python dbwise.py" in bash prompt to run DbWise. The program prompts the user to enter the name of the alignment file. The first step of selection is by amino acid position. The “Search by amino acid ID” option allows for selection of chimeras based on specific features in their amino acid sequence; for example, typing “76Q and 233K” in this option would generate an output file with only sequences that have a glutamine at position 76 and a lysine at position 233. The resulting group of sequences is located in the output file with .ID extension.

The second step of selection is for choosing chimeras by name. This can be performed by manually typing the name of the desired clones or by typing the name of a file that contains a list of all the clones to be selected. In both cases, the name format should be the same as that in the alignment and phenotype files.

The third step of selection is by amino acid range. This allows for the selection of a subset of amino acids in the alignment. In this case, less than full length amino acid sequences can be obtained. This is useful for viewing polymorphic amino acids next to each other to detect linkage drag effects.

The fourth step of selection is by phenotype. Here, the program prompts the user to enter the name of the file with the phenotypes corresponding to the chimeras in the alignment file. This option is useful for selection of clones that show a specific phenotype. For example, typing “0 2 and 1 3” would generate an output file that includes only sequences that do not interact with AvrPto but interact with AvrPtoB (Column 2 is interaction with AvrPto and column 3 is interaction with AvrPtoB in the phenotype file, in this case). The resulting sequences are located in the output file with .range extension.

DOWNLOAD

Program: dbwise.py

Manual:  dbwise_manual.txt

Sample input files:

    tabproteinall72403 - contains tab-delimited protein sequence for each clone

    list.txt - contains clone names

    Phenotypesclean.txt - contains phenotypes