Chum 240

Databases In History

The word data is plural for datum. A datum is a single, 
atomic piece of information. Therefore, data is a collection 
of these pieces of information.

One definition of the word base is a conceptual structure 
on which other things depend. Another is a place where a
particular activity is carried out.

Put these together we have a structured collection of 
information, or a place where information is processed.

Today, when we hear the word database we think almost 
immediately of computers and vast collections of information
that can be mined and manipulated willy-nilly for fun and 
profit, but it wasn't always so.

  


From the time of clay tablets to the time of paper and
microfilm, data storage and retrieval was pretty much library
science. If you have ever searched for something in a library,
you know how it works.

The originals are stored in some kind of natural order, such
as date, or author, or topic. These stacks of stuff can be
searched directly if this key information is known.

Large collections with diverse criteria for each item require
some kind of additional searching scheme. Usually this
involves an index or catalog which can have multiple entries
for the same item, each filed in different places in the
catalog depending on the distinguishing criterion. (E.g.
subject index, author index, keyword index, and so on.) All
the index entries for an item include its storage location or
natural key to make finding it relatively simple.

Cataloging is at the intersection of Library Science and
Database Theory.

Imagine you have to project next year's population in order to
plant enough grain to feed everyone in the kingdom, and your
database, consisting of population and harvest figures from
past years, is kept on clay tablets stored in the Imperial
library. What are some of the problems you are going to
encounter? (Remember: you write on clay tablets!)

Electronic Computers



The people in this image from the early 1900's are the
computers. They used pencil and paper and arithmetic skills
(sometimes with the help of mechanical calculators) to carry
out a small part of a much larger computational project. By
distributing the job among many people working simultaneously,
a large project could be finished much more quickly. This was
state of the art until the near the end of WWII.



The very earliest electronic computers were conceived of as
little more than programable calculators because that is what
they were intended to replace. Also, memory for storing
programs and values was very expensive, and therefore limited.
As the demand for more complex programs increased, so did the
requirement for more storage, and technology had to be
developed to meet that demand.



The storage capacity of computers did increase rapidly and
dramatically. In 1945 computers were overgrown calculators,
capable of storing about a dozen numbers. By 1950 computers
stored thousands of words in memory, and could access 100's of
thousands on secondary storage such as tape drives.

That may not sound like much today, but since that time
computing power and storage capacity have, on average, doubled
every two years or less, and growth continues at that rate
today despite worries about the approach of hard, physical
limitations.



With the increasing storage came the demand to organize it for
quick and easy retrieval, and so the first databases were
created.

Data Organization

As more and more data needed to be stored and retrieved,
better organization and access methods were needed.

Ad-Hoc

It could be said that every early program that stored and
retreieved data included its own data management system. Why?
Because there were no ready-made comercial products or
standards upon which to build data manipulation into programs.

Data was stored in a format designed for the purpose at hand.
Since computers were slow and storage space was limited, great
effort was put into making access quick for the particular
combination of computer and application, and also to reduce
the amount of storage needed.

The usual way to increase speed was by laying out the data in
a form that closely mapped to the particular computer it was
to be used by. The result was that a database that worked well
for one brand of computer might be very slow, or even
unusable, on another.

To reduce the size of the data stored in the database,
designers used short codes to stand for much longer values,
such a representing a date as the number of days since some
recent epoch. In this way the year, month and day could be
represented as three or four digits instead of eight or
ten--at the cost of having to compute the actual month and day
when the date needed to be displayed. Even later, when full
dates were stored, only the last two digits of the year were
used to conserve the two characters of space used to represent
the century.

The effort to keep data storage small continued to infect
computer programmers and database designers for years after
cheap, plentiful storage made it unnecessary. At issue was a
desire to make new programs and data consistent with the old,
familiar style.

Of course the price for all this extravagant economy had to be
paid when the year 2000 arrived, because two digit dates could
no longer be assumed to be in the 20th century.

Database Applications

The next step in databases was the stand-alone program
designed to store and retrieve information in a generic way
and not be connected to any particular application.

For example, a company might use one of these programs to
manage employee information, or to collect sales information
and generate sales-related reports.

Today, this might seem very limiting, as we have many, many
ways that data can be displayed and used. But at the time
almost all computer-related data came out of a printer that
was only capable producing fixed-pitch characters on a single
paper size.





- - - -
Copyright ©2007 Brigham Young University