Software Name
|
Description
|
align_learn.pl
|
align_learn.pl
converts a multiple sequence alignment into a format
that can be readily analyzed using common machine learning algorithms. |
annotator.pl
|
Reads multiple
sequence files in FASTA format from a file and submits each to local BLAST.
The complete BLAST results are written to a file, and the best match is
sent as an Entrez query to NCBI. |
Batch
PSORT
|
This program
sends protein sequences to a PSORT
server, parses the response, and writes the results to a text file. |
batch_bind_blast.pl
|
This script
reads multiple FASTA
sequences from a file and submits each to BIND BLAST. |
BLAST
Hit Table Extender
|
This script uses
the identification number to retrieve a more detailed description of
the hit sequence from NCBI. |
blast_client3_2.pl
|
This script performs BLAST searches against NCBI's nr database. It prompts the user for a blast search type and an input file of FASTA formatted sequences. An optional 'limit by entrez query' value can be supplied to restrict the search. The script then submits each sequence to BLAST and retrieves the results. For each of the hits the script retrieves a detailed title by performing a separate query of NCBI's databases. Each BLAST hit and its descriptive title are written to a single tab-delimited output file. |
blastn_client3_1.pl
|
This script reads one or more DNA sequences in FASTA format from a file and submits each to NCBI BLAST using the blastn program. |
blastx_client3_1.pl
|
This script reads one or more DNA sequences in FASTA format from a file and submits each to NCBI BLAST using the blastx program. |
Clickable
Sequence Features
|
Clickable
Sequence Features is an object-oriented program that converts
GenBank, EMBL, FASTA,
or RAW sequence files into an HTML figure showing
the DNA sequence and translations described in the sequence record. |
Codon
Usage
|
Codon Usage
accepts a DNA sequence and returns the number and frequency of each
codon type. |
compare_library.pl
|
This script
accepts two files (i and j) containing multiple DNA
sequences in FASTA format. Each sequence in file i is compared using
local BLAST (bl2seq)
to each sequence in file j, and an HTML table is
generated to display a summary of the findings. |
DNA
Stats
|
DNA Stats
returns the number of occurrences of each residue in the sequence you
enter. |
EMBOSS
- User Interface
|
This software
package generates interfaces for the EMBOSS
suite of programs. |
Extract FASTA Headers
|
Given a file containing multiple FASTA-formatted entries, this script outputs a file containing only the FASTA headers. |
evolving_peptide_search.pl
|
This script
reads multiple protein sequences (in FASTA format) from
a file and then searches each for a peptide sequence. The search is
repeated using increasingly
degenerate versions of the peptide until the maximum allowed number of
matches is
obtained. This script can be used to find peptides with a primary
sequence close
to a peptide of interest. |
feature_parse.pl
|
This script reads a genomic sequence in FASTA or RAW format from a file and writes out the features that are described in a feature position file. The extracted features are written in FASTA format to the specified output file. |
fetch_protein_v_2.pl
|
This script accepts a list of Swiss-Prot IDs or Swiss-Prot names. The sequence record corresponding to each ID is retrieved from ExPASy and written to a separate file in the output directory you specify. Records can be written in FASTA format or in Swiss-Prot format. |
fetch_swissprot_using_id.pl
|
This script
accepts a list of Swiss-Prot
IDs. The sequence and title
corresponding to each ID are retrieved from ExPASy and written to a
file in FASTA format. |
Filter
DNA
|
Filter DNA
removes non-DNA characters from text. Use this program when you wish to
remove digits and blank spaces from a sequence to make it suitable for
other applications. |
Filter
Protein
|
Filter Protein
removes non-protein characters from text. Use this program when you
wish to remove digits and blank spaces from a sequence to make it
suitable for other applications. |
FlexArray
|
FlexArray is a Microsoft Windows software package for statistical
analysis of microarray data. FlexArray combines the ease-of-use
with a comprehensive set of statistical utilities. It offers
a wide variety of useful visualization options, a rich interactive
environment, full analysis history, a plug-in interface for
algorithms and plots, analysis protocol support, and more.
FlexArray currently supports Affymetrix expression GeneChip©
microarrays, and Illumina expression BeadChip© arrays.
FlexArray is free to academic researchers, and it was created
with funding from Genome Quebec. |
GenBank
Feature Extractor
|
GenBank Feature
Extractor accepts a GenBank
file as input and reads the sequence feature information described in
the feature
table, according to the rules outlined in the GenBank release notes.
The program concatenates or highlights the relevant sequence segments
and returns each sequence feature in FASTA format. |
GenBank
Trans Extractor
|
GenBank Trans
Extractor accepts a GenBank file as input and returns each of the
protein translations described in the file in FASTA format. |
genbank_to_cgview.pl
|
genbank_to_cgview.pl converts a GenBank or EMBL sequence record into an XML document for the CGView genome visualization software (http://wishart.biology.ualberta.ca/cgview/index.html). |
generic_ncbi_data_fetcher.pl
|
This script uses NCBI's Entrez Programming Utilities to perform searches of NCBI databases. This script can return either the complete database records, or the IDs of the records (recommended). It is up to you to know how to handle the IDs and records. The results are written to a single output file. For additional information on NCBI's Entrez Programming Utilities see: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/esearch_help.html. |
genome_search.pl
|
Genome Search
reads a genomic sequence in FASTA format from a file and
searches for the patterns you specify using regular expressions. |
genome_search_parse_results.pl
|
Reads the
results from genome_search.pl (see the description for
genome_search.pl)
and generates a summary for each match. |
go_fish_source.pl
|
This perl script
assigns Gene Ontology
(GO) numbers and descriptions for blast results generated by annotator.pl. |
Hydrophobicity Profiler
|
This Perl script
reads a FASTA formatted protein sequence file and
returns the hydrophobicity profile for the inputted sequence according
to the user-specified window size and hydrophobicity scale. |
local_blast_client.pl
|
This script performs BLAST searches against a local blast database. It prompts the user for a BLAST search type and an input file of FASTA formatted sequences. The script then submits each sequence to BLAST and retrieves the results. For each of the hits the script retrieves a detailed title by performing a separate query of NCBI's databases. Each BLAST hit and its descriptive title are written to a single tab-delimited output file. |
microarray_randomizer.pl
|
This script
accepts a file consisting of tab-delimited microarray data.
Numerical values, except for those in the first column, are replaced
with pseudo-random values greater than or equal to the lower limit you
specify, and less than the upper limit you specify. |
Multiple
Align Show
|
Multiple Align
Show accepts a group of aligned sequences (in FASTA or GDE format)
and formats the alignment to your specifications. |
Multi
Rev Trans
|
Multi Rev Trans
accepts a protein alignment and uses a codon usage table to generate a
graph that can be used to find regions of minimal degeneracy at the
nucleotide level. |
new_psort.pl
|
new_psort.pl
sends sequences to a PSORT server
and parses and saves the results. |
ORF
Finder
|
ORF Finder
searches for open reading frames (ORFs) in the DNA sequence you enter.
The program returns the range of each ORF, along with its protein
translation. |
Pearson Correlation Coefficient Parser
|
This perl script, given a single excel file with multiple genes along with their intensities, will calculate the Pearson correlation coefficient and, if the threshold is above 0.6 or below -0.6, will output the results to two Excel files, Detail_Over.xls and Detail_Under.xls. |
Perl BLAST Client
|
Reads a text
file containing multiple sequences in FASTA format and
submits each sequence to NCBI's
BLAST server using QBLAST'S URL
API. |
pI/MW
batch analysis tool
|
This Perl program creates a .txt file
containing the sequence name,
length, predicted molecular weight, and predicted isoelectric point of
the protein sequences it receives. |
Programming in Perl - Part 1
|
This collection
of simple programs is intended to introduce the Perl
programming language to students with little or no programming
experience (part one of two).
|
Programming
in Perl - Part 2
|
This collection
of simple programs is intended to introduce the Perl
programming language to students with little or no programming
experience (part two of two). |
Protein
Molecular Weight
|
Protein
Molecular Weight accepts a protein sequence and calculates the
molecular weight. You can append copies of commonly used epitopes and
fusion proteins using the supplied list. |
Protein
Stats
|
Protein Stats
returns the number of occurrences of each residue in the sequence you
enter. Percentage totals are also given for each residue, and for
certain groups of residues. |
Random
DNA Sequence
|
Random DNA
Sequence generates a random sequence of the length you specify. Random
sequences can be used to evaluate the significance of sequence analysis
results. |
Random
Protein Sequence
|
Random Protein
Sequence generates a random sequence of the length you specify. Random
sequences can be used to evaluate the significance of sequence analysis
results. |
random_seq_sample.pl
|
This script
accepts a file consisting of multiple FASTA formatted sequence records.
It then randomly selects sequences from the file, without replacement. |
range_extract.pl
|
Reads a genomic
sequence in FASTA or RAW format from a file and writes
out the range of bases between the supplied start and stop positions to
a file. |
Reformat PDB
|
A script to reformat unusual PDB files into a more standard PDB format. This script (1) re-orders the atoms within each residue into a 'standard' order, (2) renames atoms to a 'standard' format, e.g. HD23 becomes 3HD2, (3) renames certain residues, e.g. 'HSD' or 'HID' become 'HIS', (4) preserves only one location for each atom, for atoms that have alternate location codes. |
remote_blast_client.pl
|
This script performs BLAST searches against NCBI's sequence databases. It prompts the user for a blast search type and an input file of FASTA formatted sequences. An optional 'limit by Entrez query' value can be supplied to restrict the search. The script then submits each sequence to BLAST and retrieves the results. For each of the hits the script retrieves a detailed title by performing a separate query of NCBI's databases. Each BLAST hit and its descriptive title are written to a single tab-delimited output file. |
remove_duplicate_seqs.pl
|
Reads multiple sequence records in FASTA format from a file and if there are two or more sequences that match, only the first record in the matching group is written to the output file. |
remove_duplicates.pl
|
Reads multiple sequence files in FASTA format from a file and removes duplicate sequence records (based on sequence title). |
remove_near_duplicates.pl
|
This script reads multiple sequence records in FASTA format from a file and if there are two or more sequences that match, only the first record in the matching group is written to the output file. The names of the removed records are written to a log file. |
remove_x.pl
|
Reads multiple sequence files in FASTA format from a file and removes X's and x's from the sequences. |
Restriction Summary
|
Restriction
Summary accepts a DNA sequence and returns the number and positions of
restriction endonuclease cut sites. |
Retrieve_Entrez_Gene_Info.pl
|
This script uses NCBI's Entrez Programming Utilities URL API to submit batch requests to NCBI Entrez. It retrieves gene information for an organism such as Gene ID, Gene name, Gene description, Gene synonyms, Location, HGNC ID, HPRD ID, MIM ID, phenotype[MIM ID], KEGGPathways, ConserveDomains and Unigene ID information from NCBI's Entrez gene database.
|
retrieve_seq.pl
|
This script uses
NCBI's Entrez Programming Utilities URL API to
submit batch requests to NCBI Entrez.
It can be used, for example, to
download all the sequences in an NCBI database that were obtained from
a particular species. |
retrieve_seq_v2.pl
|
This script uses NCBI's Entrez Programming Utilities to perform batch requests to NCBI Entrez. It can be used, for example, to download all the sequences in an NCBI database that were obtained from a particular species. This version has been customized for retrieval of 16S RNA sequences. |
Reverse Complement
|
Reverse
Complement converts a DNA sequence into its reverse, complement, or
reverse-complement counterpart. |
seqsee
|
SEQSEE
is a comprehensive protein sequence analysis package commercialized by BioTools
Inc. |
Sequence
Extractor
|
Sequence
Extractor accepts a DNA sequence along with a set of primer
sequences and returns a textual map showing the annealing positions of
the primers, restriction cut sites, and protein translations. |
Sequence
Manipulation Suite
|
The Sequence Manipulation Suite
is a collection of web-based programs for analyzing and formatting DNA
and protein sequences (version 1).
|
Sequence
Manipulation Suite 2
|
The Sequence Manipulation Suite
version 2 is much faster than the
previous version and contains several new programs and enhancements. It
can be used to perform much of the simple sequence formatting and
analysis done in molecular biology labs, and as a teaching aid when
introducing students to DNA and protein sequences. |
Shuffle
DNA
|
Shuffle DNA
randomly shuffles a DNA sequence. Shuffled sequences can be used to
evaluate the significance of sequence analysis results, particularly
when sequence composition is an important consideration. |
Shuffle
Protein
|
Shuffle Protein
randomly shuffles a protein sequence. Shuffled sequences can be used to
evaluate the significance of sequence analysis results, particularly
when sequence composition is an important consideration. |
split_fasta.pl
|
This script
accepts a file consisting of multiple FASTA formatted
sequence records. It splits the file into multiple new files, each
consisting of a subset of the original records. |
summary_adder_2.pl
|
This script obtains summary information from NCBI and adds it to the output of earlier versions of the blast_client.pl scripts (versions 1.2 and earlier). |
three_frames.pl
|
This script converts a fasta formatted DNA sequence file into a new file containing all six protein translations of each supplied DNA sequence. |
Translate
|
Translate
accepts a DNA sequence and converts it into a protein using the reading
frame you specify. |
XALIGN
(version 5)
|
XALIGN
is a graphical X-windows program for multiple sequence alignment based
on sequence homology and secondary structure (version 5, Linux binary).
|
XALIGN
(version 6)
|
XALIGN is a
graphical X-windows program for multiple sequence alignment based on
sequence homology and secondary structure (version 6, source code).
|