Kim Henrick
MSDsite: behind the scene: The technology used in database searching and retrieval for the analysis and viewing of bound ligands and active sites
Echeminfo

MSDsite: behind the scene: The technology used in database searching and retrieval for the analysis and viewing of bound ligands and active sites

Adel Golovin, Dimitris Dimitropoulos, Tom Oldfield, and Kim Henrick, EMBL Outstation, The European Bioinformatics Institute, Welcome Trust, Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Contact: Kim Henrick

The three-dimensional environments of ligand binding sites have been derived from the parsing and loading of the PDB entries into a relational database. We will introduce the web-based query system, MSDsite (http://www.ebi.ac.uk/msd-srv/msdsite) and demonstrate the technologies used. Non-trivial textual queries are facilitated by use of a graphical query interface where search attributes can be specified with dialog boxes to build complex queries. The interface was built with a biological content in mind and Ligand searching requires specific tasks that can't be resolved using simple basic SQL relational operators such as join or merge. For fast executable in Oracle query generation a web application server has been developed that contains java classes that composes complex SQL queries andprovide calculations on the dataset. In addition these classes design the SQL to enforce the correct use of indexes, and apply query hints. The SQL queries are also designed to use the relational algebra operation 'INTERSECT', which allows execution within the Oracle RDBMS in parallel without nesting and is faster than self joins for those cases where the result set is many times less than the size of the table queried. The approach used is based on a star architecture query where the ligand is central and interactions to the environment residues fan out. This design is used because it results in an algorithm of order 'N' with regard to the number of environment residues and is therefore scalable for complex active sites. The levels of optimisation developed will be described wherein table hierarchy is reflected within the query design. The manner in which MSDsite applies pattern searching and short sequence alignment is performed using SQL queries will also be described where we use Oracle hint mechanism, 'LEADING' and 'USE_NL' to implicitly force access to nested tables by the primary key.

ACKNOWLEDGEMENTS The project is funded by the European Commission as the TEMBLOR,contract-no. QLRI-CT- 2001-00015 under the RTD programme "Quality of Life and Management of Living Resources"