news / tech talk

Database Needs

by Lee LeClair
04/20/2007
As seen in Inside Tucson Business

Businesses that operate with data of any complexity use databases to store their data. Databases form the heart of information processing for virtually all applications, web-based or otherwise. Given that this is the case, one might think that application programmers are very familiar with the concept of databases and their proper design. That is typically not the case. Part of the reason is that programming and database concepts are taught as separate subjects at most universities and often students concentrate on programming and view databases simply as crude data repositories.

This is unfortunate as is the fact that in many information businesses, there is a clear distinction between “programmers” and “database guys”. Since programming deals with data manipulation and databases store data, these two disciplines are fundamentally linked. Yet without properly combining the two, it is easy to create a data nightmare that is desperately difficult to fix.

Data integrity is paramount. To that end, concentrate on good data design for your purposes. You may have heard terms like On-line Transaction Processing (OLTP), star schemas, data cubes, and data warehouses. These are all different aspects of data that are dependent on what you need. OLTP data is what most people use in their businesses. It maintains data with the greatest emphasis on data integrity so that it can be reliably used to power online applications, shopping carts, inventory systems, financial systems, etc.

Data warehousing is a huge topic but is generally related to storing all your data and than “mining” it for useful nuggets of veins of information (to coin mining terms). Mining is a special form of decision support and basically means that you are manipulating your data store for reports of particular types. Given that your purposes are different for reporting than for transactions, your database design is different to support fast reporting rather than absolutely consistent data transactions. So where an OLTP database ensures that you only store one copy of data to ensure integrity, reporting databases (star or cube designs) typically store multiple copies of data in special sets to support very fast data reads and report generation.

Relational database management systems are complex applications in their own right. They employ the concept of ACID (Atomicity, Consistency, Isolation, and Durability) to maintain integrity and have many features for security roles and permissions. Real databases use Structured Query Language (SQL) to communicate with programs and direct user queries. By “real” databases, I mean databases like Oracle RDBMS, IBM’s DB2, Microsoft’s SQL Server, PostgreSQL, and MySQL. There are many other databases included embedded systems used behind the scenes in accounting and other types of applications. If possible, I recommend using a separate database rather than an embedded one if there will be any possible need to access the data directly. This is because separate databases provide an exposed interface that you can use if you need to customize or integrate the data with other systems.

In developing an application, first work through what the application needs to do and thereby what data needs to be collected and stored. Think ahead to the reports to be gathered; be as detailed as possible about all the types of data to be gathered as well as how the data is inter-related. From this, a good data design will begin with an Entity-Relationship (ER) diagram that captures information about the data (e.g., what type of data, how many characters, etc.) and how the various types of data relate to one another (e.g., one-to-one, one-to-many, and so on). The ER diagram forms the basis of the DB tables and programmers can then begin implementation according to this central roadmap. Use integrity constraints to ensure the data design is solid. Once your application begins collecting data and populating the database, making changes is not trivial. Therefore, it definitely pays to think through the data design up-front as much as possible. Finally, use the database features to ensure its integrity rather than relying on programs to ensure integrity. That way, the database can be accessed independently or integrated with other systems without compromising data integrity.

Lee Le Clair is the CTO at Ephibian. His Tech Talk column appears the third week of each month in Inside Tucson Business