AffyDB - An interface for insert affymetrix, licr annotation tables and UCSC exons maps into a Entity Relationship mySQL database and make queries on this data.
# INSERT TABLES
# AffyDB scripts that create a new set of Tables from affymetrix file annot_csv,
# create licr information tables starting for Licr file, make index for speed up query,
# and generate a Report and disconnect from mySQL ...
use AffyDB;
my $affytable = AffyDB -> new (
mysql => 'affy:localhost',
user => 'foo',
password => 'bar',
affyfile => 'Mouse430_2_annot.csv',
licrfile => 'Mouse430_2.RefSeq',
chip => 'Mouse430_2',
index => "1",
summary => "1"
);
$affytable -> affy_disconnect();
# ADD EXON MAP
use AffyDB;
my $affytable = AffyDB -> new (
mysql => 'affy:localhost',
user => 'foo',
password => 'bar',
refseqfile => 'RefSeqExons.txt',
refseqcode => 'HumanRefSeq1',
chip => 'test'
);
$affytable -> affy_disconnect();
# RETRIVE INFORMATION
# GET'S methods used to retrive information from mySQL tables
# Retrive probeset informatoion:
use AffyDB;
my $affytable = AffyDB -> new (
mysql => 'affy:localhost',
user => 'foo',
password => 'bar',
)
# get single probeset INFO
my $probe = '1415670_at';
my @probe = $affydb -> get_probeset($probe);
my $chip = 'Mouse Genome 430 2.0 Array';
print "Probe is @probe\n";
# get all probesets of a chip
my @probes = $affydb -> get_all_probeset($chip);
print join ("\n" , @probes);
$affytable -> affy_disconnect();
This perl library uses perl5 objects to make it easy to create and query a mySQL improved version of Affymetrix annotation tables. This package defines AffyDB objects, attributes and arguments. Using a AffyDB object's methods, you can insert new tables and retrive data. Using this module with Probeset.pm analysis module is possible to generate original data about single probes position on the genome.
AffyDB.pm provides a simple object-oriented interface to mySQL tables.
The current version of AffyDB.pm is available at
http://bio.ifom-ieo-campus.it/splicy/src/AffyDB.pm
my $affytable = AffyDB -> new (
mysql => 'affy:localhost',
user => 'foo',
password => 'bar',
affyfile => 'Mouse430_2_annot.csv',
chip => 'Mouse430_2',
summary => "1",
index => "1"
);
If you pass to the module an affyfile (format is csv: comma separeted values), this file will be inserted into the database. With summary you can ask for an extensive summary of the insertion and of the duplication avoided. With index we say to AffyDB.pm to generate index on tables (speed up queries).
my $licrtable = AffyDB -> new (
mysql => 'affy:localhost',
user => 'foo',
password => 'bar',
licrfile => 'HG-U133A.RefSeq',
chip => 'HG-U133A',
summary => "1",
index => "1"
);
As for affy annotation tables, if you give a licr file as argument (format is ssv: semicolon separeted values), the file will be inserted into the database.
my $affydb = AffyDB -> new (
mysql => 'affy:localhost',
user => 'foo',
password => 'bar',
chip => 'HG_U133A'
)
Creation of a new connection without insertion of a new file (for retrive queries relatives to chipcode: HG_U133A). Note: mySQl don't like minus (-) in the table names. For this getChipName method switch (-) to (_).
my $affydb = AffyDB -> new (
mysql => 'affy:localhost',
chip => 'test',
index => '1'
);
In this mode, mySQl index for speed up are created:
EXAMPLE: table => 'MOE430A_annot.csv', chip => 'MOE430A'
Cause mySQL doesn't like '-' into the table name, The module will substitute '-' with '_' (underscore). So chip names like HG-U133A are changed to HG_U133A.
Es:
my $query = <STDIN>;
chomp $query;
my @result = $affydb -> freequery($query);
print join ("\n", @result);
Es:
my $probe = '1415670_at';
my @probe = $affydb -> get_probeset($probe);
print "Probe is @probe\n";
Es:
my $public_id = 'NM_013477';
my @probes = $affydb -> get_matching_probeset ($public_id);
print join ("\n", @probes);
Es:
my $chip = 'Mouse Genome 430 2.0 Array';
my @probes = $affydb -> get_all_probeset($chip);
print join ("\n" , @probes);
$info[0] = public_id
$info[1] = seq_type
$info[2] = seq_source
$info[3] = target_des
$info[4] = arch_unigene
$info[5] = trans_id
$info[6] = description
$info[7] = cluster
$info[8] = assignments
$info[9] = notes
Es:
my @info = $affydb -> get_probeset_design ($probe);
print join ("\n",@info);
$info[0] = chipcode
$info[1] = genechip_name
$info[2] = organism
$info[3] = annotation_date
Es:
my @info = $affydb -> get_chip ($chip);
print "GENECHIP info are:\n";
print join ("\n",@info);
Format String:
PROBE_ID: SET_ID | X:N | Y:M | ACGTGCGTGTGTGTACGCGCGAA
You will retrive an array that contains a list of rows with this format.
Format HTML:
<tr><td>(X,Y) </td><td>ACGCGCGTGCAGCAGCGCAGCATGACGA</td></tr>"
Format String:
PROBE_ID: SET_ID | X:N | Y:M | ACGTGCGTGTGTGTACGCGCGAA
You will retrive an array that contains a list of rows with this format.
'_locations' => [
[
'1007_s_at',
'1',
'NM_001954',
'[3678..3702]3840',
'(+)'
],
]
$info[0] = probe_id
$info[1] = licr_id
$info[2] = x
$info[3] = y
$info[4] = oligo
$info[5] = set_id
$info[6] = position
$info[7] = strand
| CSS | |
| table class: licr | |
| td class: head and value | |
FORMAT STRING:
GENOME VERSION: @$row[1] ALIGNMENTS: @$row[2]
FORMAT STRING:
GENOME VERSION: @$row[1] ALIGNMENT: @$row[2]
FORMAT STRING:
PUBLIC ID: @$row[0] GENE_SYMBOL: @$row[1] GENE_TITLE: @$row[2] CHR_LOCATION: @$row[3] UNIGENE: @$row[4] ENSEMBL: @$row[5] LOCUSLINK: @$row[6] SWISSPROT: @$row[7] EC: @$row[8] OMIM: @$row[9] REFSEQ_PROT: @$row[10] REFSEQ_TRAN: @$row[11] FLYBASE: @$row[12] AGI: @$row[13] WORMBASE: @$row[14] MGI: @$row[15] RGD: @$row[16] SGD: @$row[17]
FORMAT STRING:
PUBLIC ID: @$row[0] GENE_SYMBOL: @$row[1] GENE_TITLE @$row[2] CHR_LOCATION: @$row[3] UNIGENE: @$row[4] UNIGENE_TYPE: @$row[5] ENSEMBL: @$row[6] LOCUSLINK: @$row[7] SWISSPROT: @$row[8] EC: @$row[9] OMIM: @$row[10] REFSEQ_PROT: @$row[11] REFSEQ_TRAN: @$row[12] FLYBASE: @$row[13] AGI: @$row[14] WORMBASE: @$row[15] MGI: @$row[16] RGD: @$row[17] SGD: @$row[18]
FORMAT STRING:
PUBLIC ID: @$row[0]
GO BIO: @$row[1]
GO CELL: @$row[2]
GO MOL: @$row[3]
PATHWAY: @$row[4]
PROT FAM: @$row[5]
PROT DOM: @$row[6]
INTERPRO: @$row[7]
MEMBRANE: @$row[8]
QTL: @$row[9]
FORMAT STRING:
PUBLIC ID: @$row[0]
GO BIO: @$row[1]
GO CELL: @$row[2]
GO MOL: @$row[3]
PATHWAY: @$row[4]
PROT FAM: @$row[5]
PROT DOM: @$row[6]
INTERPRO: @$row[7]
MEMBRANE: @$row[8]
QTL: @$row[9]
my @headers = $affydb -> get_headers(); print join (``\n'', @headers);
Please report them!
AffyDB.pm
Probeset.pm
Davide Rambaldi, IFOM-FIRC www.ifom-firc.it e-mail: filter-drambald@ifom-ieo-campus.it
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.