Chum 240

Relational Databases

History

In the 1870's mathematical notion of relations was proposed as
a way to solve problems in Boolean logic and set theory. The
idea used tables of values that satisfy some logical
statement, and a set of rules, relational algebra, that, when
applied to the tables, revealed underlying patterns and
answered questions about the relationships expressed in the
tables.

For example, a table showing some solutions for the statement:
Team X beat team Y by Z points
Would look like this:
X (winner) Y (loser) Z (spread)
BYU Utah 15
USU Utah 8
BYU USU 6
By applying the relational algebra to this table, we can find out which team is better, where we choose to define better as winning over Utah with the largest point spread while not losing to any team that also beat Utah. One hundred years later, in the 1970's, Edgar Frank "Ted" Codd at IBM proposed that relations and relational algebra could be the basis for a computer database system. The idea was slightly ahead of its time, as computer hardware and software was not quite ready. For example: * Storage was still too limited and expensive for large databases * Execution speed was too slow for the complicated operations * Algorithms for efficient storage and searching had not been developed Nevertheless, several attempts were made to produce educational and commercial products based on Codd's work. [quote from Harold's book here] As was shown earlier, hardware technology progresses quickly, doubling about every two years, which solved the first limitations. The early, poor-performance implementations revealed the need for better algorithms, which prodded researchers to investigate and develop them. By 1980 everything came together and IBM released DB2, the first commercially successful relational database management system. The ninth version is in use today, as are many, many major commercial and free competitors such as Oracle, Microsoft, Sybase, PostgreSQL, MySQL, and on, and on. Flat-files Compared Compared to flat-files, relational databases have the following advantages and disadvantages: Advantages: * Regular structure * They are a ready-made solution * Scale to large data sets * Easy to find documentation * Easy to find trained programmers * Useful for most applications * SQL (Structured Query Language) standardizes data manipulation * Client-server architecture separates programming concerns Disadvantages * Complex to install and maintain * Large storage requirements * Multiple competing vendors corrupted standard SQL (slightly) by adding features Comparing the structure of INI-files to relational tables: * A section corresponds to a table * A label/value pair correspond (roughly) to a column * No delimiters between fields are necessary * All rows in a table have the same layout For example, an INI file keeping track of books we have read might look like this:
; books-i-have-read.ini [Books I have read] titles=3 detail=author,theme,subject [titles] t1=Art of Computer Programming, The t2=Treasure Island t3=Computer Survival Handbook, The [t1 detail] author=Knuth, D. theme=computers subject=programming [t2 detail] author=Stevenson, R. theme=adventure subject=pirates [t3 detail] author=Wooldridge, S. and London, K. theme=computers subject=management
In a relational database, the same information might be kept in the same table in this form:
Books
Title Author Theme Subject
Art of Computer Programming, The Knuth, D. computers programming
Treasure Island Stevenson, R. adventure pirates
Art of Computer Programming, The Knuth, D. computers programming
Terminology Tables are described as having rows and columns, and those are the words used most often when talking about relational databases as well. But there are other terms in common use. A table is also called a relation. A row in a table may be called a record or a tuple. A column may be called a field or an attribute. The intersection of a row and column is a value. A collection of tables is a database. A collection of databases is a database management system or DBMS. The power of the relational model is that all information is stored in the same table structure, and this form has been mathematically proven to be sufficient for all the operations needed for set theory and logic. Its possible that something better may come along, but, for now, this is adequate for all our needs. Another giant advantage with modern DBMS is that how the data is stored and retrieved is completely irrelevant to the users. Some DBMSs, such as MySQL and MS SQL Server, allow the creator of a database to determine the underlying storage method used for tables, and a flat-file (but not an INI file) is ususally one of the forms available. Of course, this decision is only important to someone who understands the advantages and disadvantages of the various formats. SQL In 1976 Donald D. Chamberlin and Raymond F. Boyce, also at IBM, proposed a common computer language to manipulate and retrieve data from tables: Structured Query Language. Called either SQL or SEQUEL. Several companies produced commercial versions of relational databases using SQL as the 'system language', but each extending it in different ways in order to differentiate themselves from the competition. In 1986, ten years after the introduction of SQL, the language was formally standardized by ANSI. Current commercial products using SQL are said to be 'compliant' with some version of these standards, normally including the year of the standard. For example, SQL-86 is the original ANSI standard, SQL-92 is now considered the minimum subset of the language for practical use. Additional standards up to SQL:2006 incorporated changes accommodating new technologies, such as Object Oriented programming and XML. "The best thing about standards is that there are so many to choose from." Andrew Tanenbaum, "Computer Networks" The core function of SQL is to allow database users to: * CREATE databases and tables * INSERT, UPDATE or DELETE data * SELECT data of interest The most-oft used statement is the SELECT statement. It is also the most complex and easily misused. - - - - Copyright ©2007 Brigham Young University