Chum 240

Relational Databases

History

In the 1870's mathematical notion of relations was proposed as
a way to solve problems in Boolean logic and set theory. The
idea used tables of values that satisfy some logical
statement, and a set of rules, relational algebra, that, when
applied to the tables, revealed underlying patterns and
answered questions about the relationships expressed in the
tables.

For example, a table showing some solutions for the statement:

Team X beat team Y by Z points

Would look like this:


  
    X (winner)
    Y (loser)
    Z (spread)
  

    BYU
    Utah
    15
  

    USU
    Utah
    8
  

    BYU
    USU
    6
  




By applying the relational algebra to this table, we can find
out which team is better, where we choose to define better as
winning over Utah with the largest point spread while not
losing to any team that also beat Utah.

One hundred years later, in the 1970's, Edgar Frank "Ted" Codd
at IBM proposed that relations and relational algebra could be
the basis for a computer database system.

The idea was slightly ahead of its time, as computer hardware
and software was not quite ready. For example:

* Storage was still too limited and expensive for large
  databases
* Execution speed was too slow for the complicated operations
* Algorithms for efficient storage and searching had not 
  been developed

Nevertheless, several attempts were made to produce
educational and commercial products based on Codd's work.

[quote from Harold's book here]

As was shown earlier, hardware technology progresses quickly,
doubling about every two years, which solved the first
limitations.

The early, poor-performance implementations revealed the need
for better algorithms, which prodded researchers to
investigate and develop them. By 1980 everything came together
and IBM released DB2, the first commercially successful
relational database management system. The ninth version is in
use today, as are many, many major commercial and free
competitors such as Oracle, Microsoft, Sybase, PostgreSQL,
MySQL, and on, and on.


Flat-files Compared

Compared to flat-files, relational databases have the
following advantages and disadvantages:

Advantages:
* Regular structure
* They are a ready-made solution
* Scale to large data sets
* Easy to find documentation
* Easy to find trained programmers
* Useful for most applications
* SQL (Structured Query Language)
  standardizes data manipulation
* Client-server architecture separates
  programming concerns
  
Disadvantages
* Complex to install and maintain
* Large storage requirements
* Multiple competing vendors corrupted standard SQL
  (slightly) by adding features

Comparing the structure of INI-files to relational tables:
* A section corresponds to a table
* A label/value pair correspond (roughly) to a column
* No delimiters between fields are necessary
* All rows in a table have the same layout

For example, an INI file keeping track of books we have
read might look like this:


; books-i-have-read.ini
[Books I have read]
titles=3
detail=author,theme,subject

[titles]
t1=Art of Computer Programming, The
t2=Treasure Island
t3=Computer Survival Handbook, The

[t1 detail]
author=Knuth, D.
theme=computers
subject=programming

[t2 detail]
author=Stevenson, R.
theme=adventure
subject=pirates

[t3 detail]
author=Wooldridge, S. and London, K.
theme=computers
subject=management


In a relational database, the same information might be
kept in the same table in this form:



  
    Books
  
  
    Title
    Author
    Theme
    Subject
  
  
    Art of Computer Programming, The
    Knuth, D.
    computers
    programming
  
  
    Treasure Island
    Stevenson, R.
    adventure
    pirates
  
  
    Art of Computer Programming, The
    Knuth, D.
    computers
    programming
  




Terminology

Tables are described as having rows and columns, and those
are the words used most often when talking about relational
databases as well. But there are other terms in common use.

A table is also called a relation.

A row in a table may be called a record or a tuple.

A column may be called a field or an attribute.

The intersection of a row and column is a value.

A collection of tables is a database.

A collection of databases is a database management system or 
DBMS.

The power of the relational model is that all information
is stored in the same table structure, and this form has been
mathematically proven to be sufficient for all the operations
needed for set theory and logic. Its possible that something
better may come along, but, for now, this is adequate for all
our needs.

Another giant advantage with modern DBMS is that how the data
is stored and retrieved is completely irrelevant to the users.

Some DBMSs, such as MySQL and MS SQL Server, allow the creator 
of a database to determine the underlying storage method used
for tables, and a flat-file (but not an INI file) is ususally
one of the forms available. Of course, this decision is only
important to someone who understands the advantages and
disadvantages of the various formats.

SQL

In 1976 Donald D. Chamberlin and Raymond F. Boyce, also at
IBM, proposed a common computer language to manipulate and
retrieve data from tables: Structured Query Language.
Called either SQL or SEQUEL.

Several companies produced commercial versions of relational
databases using SQL as the 'system language', but each
extending it in different ways in order to differentiate
themselves from the competition.

In 1986, ten years after the introduction of SQL, the language
was formally standardized by ANSI. Current commercial products
using SQL are said to be 'compliant' with some version of
these standards, normally including the year of the standard.
For example, SQL-86 is the original ANSI standard, SQL-92 is
now considered the minimum subset of the language for
practical use. Additional standards up to SQL:2006
incorporated changes accommodating new technologies, such as
Object Oriented programming and XML.

"The best thing about standards is that there are so
many to choose from." 
  Andrew Tanenbaum, "Computer Networks"

The core function of SQL is to allow database users to:
* CREATE databases and tables
* INSERT, UPDATE or DELETE data
* SELECT data of interest

The most-oft used statement is the SELECT statement. It is
also the most complex and easily misused.



- - - -
Copyright Š2007 Brigham Young University
Books
Title	Author	Theme	Subject
Art of Computer Programming, The	Knuth, D.	computers	programming
Treasure Island	Stevenson, R.	adventure	pirates
Art of Computer Programming, The	Knuth, D.	computers	programming