[ 32geeks home |
biological databases home ]
This list is by no means exhaustive. It contains primarily resources that I have found useful.
However, if there is a useful resource you think I have overlooked, please feel free to
e-mail me (melanie_renee_nelson at yahoo dot com). If the resource is an article, a
link is provided to the article's PubMed entry, or to the appropriate website if the article is not indexed by PubMed.
If the resource is a book, a link is provided to Amazon.com (this is for convenience only: I am not
signed up for their kickback program).
[ Biological DB Design | General DB Design |
Data Integration | Specific DBMS |
Design Process | SQL Tuning | Miscellaneous
Resources ]
- Designing databases to store biological information. M.R. Nelson, S.J. Reisinger, and
S.G. Henry. (2003) BioSilico, 1: 134-142.
This paper provides guidelines to help rookie designers of biological databases avoid some
common mistakes. BioSilico, a journal from Drug Discovery Today, is not yet available online.
If you e-mail me (melanie_renee_nelson at yahoo dot com), I can send you a reprint.
- Conceptual data modelling for bioinformatics. E. Bornberg-Bauer and N.W. Paton. (2002)
Briefings in Bioinformatics, 3: 166-180.
[ PubMed ]
An introduction to entity-relationship and UML modeling for bioinformatics.
- There are also many papers describing the development and/or use of specific biological databases.
Some of the most useful in terms of providing ideas for how to handle certain types of biological
information are:
- Representing and analysing molecular and cellular function using the computer.
J. van Helden, A. Naim, R. Mancuso, M. Eldridge, L. Wernisch, D. Gilbert, S.J. Wodak.
(2000) Biol. Chem., 381: 921-935.
[ PubMed ]
A good example of using generalization to build a flexible database that won't break as
biological knowledge grows. Specifically, the authors have good ideas about how to
represent interactions between biological molecules.
- A systematic approach to modeling, capturing, and disseminating proteomics experimental
data. C.F. Taylor, et al. (2003) Nature Biotech., 21: 247-254.
[ PubMed ]
A schema for handling proteomics data.
- Database Modeling and Design. T.J. Teorey.
[ Amazon ]
An excellent reference for data modeling.
- An Introduction to Database Systems. C.J. Date.
[ Amazon ]
The classic. This is a textbook, and not necessarily the most entertaining way to learn about databases,
but it is a thorough book, and one you will find yourself referring anytime you want to know the "right"
answer to a database question.
- Handbook of Relational Database Design. C.C. Fleming and B. von Halle.
[ Amazon ]
This book is a bit dated in places, but is still a good introduction to the process of database design.
- SQL: 1999 - Understanding Relational Language Components. J. Melton, A.R. Simon, J. Gray.
[ Amazon ]
Understanding the New SQL: A Complete Guide. J. Melton.
[ Amazon ]
These books provide good "vendor-neutral" references for SQL.
- UML Distilled: A Brief Guide to the Standard Object Modeling Language. M. Fowler.
[ Amazon ]
A good introduction to UML, for those who would like to do their data modeling using UML class diagrams.
- Oracle8 Design Using UML Object Modeling. P. Dorsey and J.R. Hudicka.
[ Amazon ]
This book is a great source of ideas about how to handle some common data modeling problems, as well as
for how to incorporate some object-oriented thinking into relational database design.
- Database Design for Smarties. R.J. Muller
[ Amazon ]
Another good resource for people wanting to use UML to design their databases.
- Practical Issues in Database Management: A Reference for the Thinking Practitioner. F. Pascal
[ Amazon ]
Database Debunkings website
Fabian Pascal is a bit of a lightening rod for controversey. His writings espouse the strictest adherence to
the correct theoretical principles of relational database design; some people argue that in the
"real world" there may come a time to compromise these principles. No matter what opinion you eventually
form about this debate,
you should at least understand the theoretical principles and the problems you may experience when you
stray from them. This website and book are good places to start building that understanding.
- Evoloutionary Database Design, M. Fowler and
P. Sadalage
Martin Fowler is a well-respected object-oriented programmer. He and his team have experimented with
a technique they call "evolutionary database design", for developing a database without intensive
upfront design. In general, I do not think this method will be successful in biological applications,
due to the complexity of the data and the "rules" that govern them. However, its a method that
is worth reading about, if only to be prepared to rebut any team-member who wants to skip the design
step and points to agile programming techniques for justification. Martin Fowler's website also
provides an excellent
general overview of agile programming methods, which are growing in popularity among
programmers.
- Integrating Biological Databases. L.D. Stein. (2003) Nature Rev. Genetics, 4: 337-345.
[ PubMed ]
A discussion of the challenges in integrating data from the various public biological databases.
There is a definite slant towards the needs and perspectives of academic researchers.
- Practical Data Integration in Biopharmaceutical Research and Development. R. Carel.
PharmaGenomics, June, 2003: 22-35.
[ PDF ]
A good overview of some of the challenges in data integration, and also some of the different
approaches to data integration. This article focuses almost exclusively on the needs and perspectives
of researchers in the pharmaceutical and biotechnology industry.
- Writing Quality Requirements, K. E. Weigers [ PDF ]
Describes what makes a good requirement. This is for software requirements, but the
same concepts are useful when thinking about database requirements.
Many people use performance issues as a justification for denormalizing their database. Denormalization
should never be the first thing you do to address performance issues. The first thing you should do is
tune your SQL statements.
- An Introduction to
Oracle SQL Statement Tuning, B. Magee
This is an excellent resource for tuning statements running on an Oracle database.
- SQL Performance Tuning, P. Gulutzan and T. Pelzer.
[ Amazon ]
This book provides some both theoretical justification and emperical observations about performance
tuning. It covers several popular database management systems.
- Database Answers website
A good collection of information about relational databases, with an emphasis on their uses in
business applications.
- XML and Databases,
R. Bourrett
This document provides an excellent summary of how XML relates to databases, and provides
practical guidance on whether to store entire XML documents or parse the document into
database fields.
- Normalizing XML, W. Provost [ Part
One | Part Two ]
An interesting article looking at whether relational normal forms can be applied to XML schema.
- Issues in Data Migration, G. Boyd. [ PDF ]
A white paper looking at issues to consider when migrating data.
Primary author: Melanie Nelson
Last edited: 1 February 2004