What is a Database?
A database is an organised collection of (one or more) related data file(s). It is a collection of information related to a particular subject or purpose or a collection of related data or information grouped together under one logical structure. It can also be a logical collection of related files grouped together by a series of tables as one entity.
Examples of Databases
- Customers’ details. – Library records.
- Personal records. – Flight schedules.
- Employees’ records. – A music collection.
- An Address book (or Telephone directory), where each person has the Name, Address, City & Telephone no.
A Database is a common data pool, maintained to support the various activities taking place within an organization. The manipulation of database contents to yield information is by the user programs. The database is an organized set of data items that reduces duplication of the stored files.
Common terms used in databases include:
Data Files
A data file is a structured collection of data (information). The data are related in some manner. It is organised so that relationships within the data are revealable. A data file stores several (many) pieces of information about many data objects. The simplest and most efficient way of how data is organised in a data file is in form of a table of rows and column.
A data file is made up of a number of records; each row in a table is a separate record. Each record is made up of all the data about a particular entity in the file.
A record includes many data items, each of which is a separate cell in the table. Each column in the table is a field; it is a set of values for a particular variable, and is made up of all the data items for that variable. Examples include phone book, library catalogue, hospital patient records and species information.
Integrated File Systems
These refer to the traditional methods of storing files, i.e., the use of paper files. E.g., Manual & Flat files. In Integrated file systems, several inter-independent files are maintained for the different users’ requirements. The Integrated file systems have the problems of data duplication. In order to carry out any file processing task(s), all the related files have to be processed.
Problems of Traditional File- Based Approach
Each function or department in an organisation develops specific applications in isolation from other divisions with each application using its own data files. This leads to the following problems:
Data Redundancy– duplicate data in multiple data files and redundancy leads to inconsistency in data representation.
Inconsistencies in data representation e.g. refer to the same person as client or customer and in values of data items across multiple files
Data Isolation — multiple files and formats
Programme-data Dependence- tight relationship between data files and specific programs used to maintain files
Lack of Flexibility- Need to write a new programme to carry out each new task.
Lack of Data Sharing and Availability
Integrity Problems –Integrity constraints (e.g. account balance > 0) become part of programme code making it hard to add new constraints or change existing ones
Concurrent Access by multiple users is difficult. Concurrent access is needed for performance, uncontrolled concurrent access can lead to inconsistencies E.g. two people reading a balance and updating it at the same time
Security Problems – the data is insecure.
CLASSIFICATION OF DATABASE SOFTWARE
Database software is generally classified into 2:
- PC-based database software (or Personal Information Managers – PIMs).
- Corporate-based database software.
PC-based database software
The PC-based database programs are usually designed for individual users or small businesses. They provide many general features for organizing & analyzing data. For example, they allow users to create database files, enter data, organize that data in various ways, and also create reports. They do not have strict security features or complicated backup and recovery procedures.
Examples of PC-based systems;
* Microsoft Access. * FoxPro.
* Dbase III Plus * Paradox.
Corporate database software
They are designed for big corporations that handle large amounts of data. Issues such as security, data integrity (reliability), backup and recovery are taken seriously to prevent loss of information.
Examples of Corporate-based systems;
* Oracle. * Informix * Ingress.
* Progress. * Sybase. * SQL Server.
COMMON FEATURES OF
DATABASE PACKAGES
- Have facilities for creating
- Have facilities for Updating records or databases.
- Have facilities for generating Reports.
- Have a Find or Search facility that enables the user to scan through the records in the database so as to find information he/she needs.
- Allow Sorting that enables the user to organize & arrange the records within the database.
- Contain Query & Filter facilities that specify the information the database is to search or sort.
- Have a data Validating
Database Maintenance
A Database cannot be created fully at once. Its creation and maintenance is a gradual and continuous procedure. The creation & the maintenance of databases is under the influence of a set of user programs known as the Database Management Systems (DBMS). Through the DBMS, users communicate their requirements to the database using Data Description Languages (DDL’s) and Data Manipulation Languages (DML’s). In fact, the DBMS provide an interface between the user’s programs and the contents of the database. During the creation and subsequent maintenance of the database, the DDL’s and DML’s are used to:
- Add new files to the database.
- Incorporate fields onto the existing records in the database.
- Delete the obsolete (outdated) records.
- Carry out adjustments on (or amend) the existing records.
- Expand the database capacity, for it to cater for the growth in the volume for enhanced application requirements.
- Link up all the data items in the database logically.
Data Dictionary.
All definitions of elements in the system are described in detail in a Data dictionary. The elements of the system that are defined are: Dataflow, Processes, and Data stores. If a database administrator wants to know the definition of a data item name or the content of a particular dataflow, the information should be available in the dictionary.
Data Bank
A Data Bank can be defined as a collection of data, usually for several users, and available to several organizations. A Data Bank is therefore, a collection of databases. The Database is organizational, while a Data Bank is multi-organizational in use. The Database and the Data Bank have similar construction and purpose. The only difference is that, the term Data Bank is used to describe a larger capacity base, whose contents are mostly of historical references (i.e., the Data Bank forms the basis for data or information that is usually generated periodically). On the other hand, the contents of the Database are used frequently to generate information that influences the decisions of the concerned organization.
CHARACTERISTICS OF DATABASES AND DATABASE MANAGEMENT SYSTERMS
Traditionally data was organized in file formats. DBMS was all new concepts then and all the research was done to make it to overcome all the deficiencies in traditional style of data management. Modern DBMS have the following characteristics:
Real-world entity
Databases are more realistic and use real world entities to design their architecture by using the behavior and attributes too. For example, a school database may use student as entity and their age as their attribute.
Relation-based tables
Databases allow entities and relations among them to form as tables. This eases the concept of data saving. A user can understand the architecture of database just by looking at table names etc.
Isolation of data and application
Databases are entirely different than ther data. Where as database is said to be an active entity, data is said to be passive one on which the database works and organizes. DBMS also stores metadata which is data about data, to ease its own process.
Less redundancy
Databases follow rules of normalization, which splits a relation when any of its attributes is having redundancy in values. Following normalization, which itself is a mathematically rich and scientific process, make the entire database to contain as less redundancy as possible.
Consistency
Databases always enjoy the state of consistency. Consistency is a state where every relation in database remains consistent. There exist methods and techniques, which can detect attempt of leaving database in inconsistent state.
Query Language
Databases are equipped with query language, which makes it more efficient to retrieve and manipulate data.
ACID Properties
Databases follow the concepts for ACID properties, which stands for Atomicity, Consistency, Isolation and Durability. These concepts are applied on transactions, which manipulate data in database. ACID properties maintains database in healthy state in multi-transactional environment and in case of failure.
Multiuser and Concurrent Access
Databases support multi-user environment and allows them to access and manipulate data in parallel. Though there are restrictions on transactions when they attempt to handle same data item, but users are always unaware of them.
Multiple views
Databases offer multiple views for different users. A user who is in sales department will have a different view of database than a person working in production department. This enables user to have a view of database according to their requirements.
Security
Features like multiple views offers security at some extent where users are unable to access data of other users and departments. Databases offer methods to impose constraints while entering data into database and retrieving data at later stage. DBMS offers many different levels of security features, which enables multiple users to have different view with different features.
DATA FILES AND DATABASES
A data file is a structured collection of data (information). The data are related in some manner. It is organised so that relationships within the data are revealed (revealable). A data file stores several (many) pieces of information about many data objects.
The simplest and most efficient way of how data is organised in a data file is as a table of rows and columns. A data file is made up of a number of records; each row in a table is a separate record. Each record is made up of all the data about a particular entity in the file.
A record includes many data items, each of which is a separate cell in the table. Each column in the table is a field; it is a set of values for a particular variable, and is made up of all the data items for that variable. Examples include phone book, library catalogue, hospital patient records and students records.
A database is an organised collection of (one or more) related data file(s). The way the database organises data depends on the type of database, called its data model, which, may be hierarchical, network and relational models.
Benefits of the Database Approach
Provide Data Independence– separating the physical (how) and logical (what) aspects of the system Physical data independence protects the application programmes from changes in the physical placement, of the files and the ability to modify the physical schema without changing the logical schema
Logical data independence –Modify logical schema without changing application programmes
Reduce redundancy reduce duplication of data items although some redundancy may be necessary for business or technical reasons – DBA must ensure updates are propagated (a change to one is automatically applied to the other).
Avoid inconsistency (by reducing redundancy)- if it is necessary – propagate updates
Maintain integrity – i.e. ensure the data is accurate by:- reducing redundancy and implementing integrity rules, e.g. through foreign keys
Share data- among existing applications, can be used in new applications
Allow implementation of security restrictions – establish rules for different types of users for different types of update to database
Enforce standards for- data representation – useful for migrating data between systems,- data naming & documentation – aids data sharing and understandability
Balance conflicting requirements- structure the corporate data in a way that is best for the organization
DBMS Architecture
Data elements required to define a database are called metadata. There are three types of metadata: conceptual schema metadata, external schema metadata and internal schema metadata. If any one of these elements is missing from the data definition maintained within the DBMS, the DBMS may not be adequate to meet users’ needs. A Data Definition Language (DDL) is a component used for creating the schema representation necessary for interpreting and responding to the users’ requests.
Data Dictionary and Directory Systems (DD/DS) have been developed to define and store in source and object forms of all data definitions for external schemas, conceptual schemas, the internal schema and all associated mappings. The data dictionary contains an index and description of all the items stored in the database. The directory describes the location of the data and access method. Some of the benefits of using DD/DS include:
- Enhancing documentation.
- Providing common validation criteria.
- Facilitating programming by reducing the needs for data definition.
- Standardising programming methods.
IMPORTANCES OF DATABASES
- Database systems can be used to store data, retrieve and generate reports.
- It is easy to maintain the data stored within a database.
- A database is able to handle large amounts of data.
- Data is stored in an organized format, i.e. under different fieldnames.
- With modern equipment, data can easily be recorded.
- Data is quickly & easily accessed or retrieved, as it is properly organized.
- It helps in linking many database tables and sourcing of data from these tables.
- It is quite easy to update the data stored within a database.
- A database is a collection of files grouped together by a series of tables as one entity. These tables serve as an index for defining relationships between records and files maintained in the database. This makes updating of the data in the related tables very easy.
- Use of a database tool reduces duplication of the stored files, and the reprocessing of the same data items. In addition, several independent files are maintained for the different user requirements.
- It is used to query & display records satisfying a given condition.
- It is easy to analyze information stored in a database & to prepare summary reports & charts.
- It cost saving. These results from the sharing of records reduced processing times, reduced use of software and hardware, more efficient use of data processing personnel, and an overall improvement in the flow of data.
- Use of Integrated systems is greatly facilitated.
- A lot of programming time is saved because the DBMS can be used to construct & process files as well as retrieve data.
- Information supplied to managers is more valuable, because it is based on a widespread collection of data (instead of files, which contain only the data needed for one application).
- The database also maintains an extensive Inventory Control file. This file gives an account of all the parts & equipment throughout the maintenance system. It also defines the status of each part and its location.
- It enables timely & accurate reporting of data to all the maintenance centers. The same data is available and distributed to everyone.
- The database maintains files related to any work assigned to outside service centres.
DISADVANTAGES OF DATABASES
- A Database system requires a big size, very high cost & a lot of time to implement.
- A Database requires the use of a large-scale computer system.
- The time involved. A project of this type requires a minimum of 1 – 2 years.
- A large full-time staff is also required to design, program, & support the implementation of a database.
- The cost of the database project is a limiting factor for many organizations. Database-oriented computer systems are not luxuries, and are undertaken when proven economically reasonable.
- Most DBMS have internal security features that interface with the operating system access control mechanism/package, unless it was implemented in a raw device. A combination of the DBMS security features and security package functions is often used to cover all required security functions. This dual security approach, however, introduces complexity and opportunity for security lapses.
TYPES OF DATABASE STRUCTURES
The common database models are:
- Hierarchical database model
- Network database model
- Relational database model
- Object–oriented model
Hierarchical Database model
This model allows the data to be structured in a parent/child relationship (each parent may have many children, but each child would be restricted to having only one parent). Under this model, it is difficult to express relationships when children need to relate to more than one parent. When the data relationships are hierarchical, the database is easy to implement, modify and search.
A hierarchical structure has only one root. Each parent can have numerous children, but a child can have only one parent. Subordinate segments are retrieved through the parent segment. Reverse pointers are not allowed. Pointers can be set only for nodes on a lower level; they cannot be set to a node on a predetermined access path.
It is a data structure where the data is organized like a family tree or an organization chart. In a Hierarchical database, the records are stored in multiple levels. Units further down the system are subordinate to the ones above. In other words, the database has branches made up of parent and child records. Each parent record can have multiple child records, but each child can have only one parent.
Example 1
Network Database Model
This model allows children to relate to more than one parent. A disadvantage to the network model is that such structure can be extremely complex and difficult to comprehend, modify or reconstruct in case of failure. The network structure is effective in stable environments where the complex interdependencies of the requirements have been clearly defined.
The network structure is more flexible, yet more complex, than the hierarchical structure. Data records are related through logical entities called sets. Within a network, any data element can be connected to any item. Because networks allow reverse pointers, an item can be an owner and a member of the same set of data. Members are grouped together to form records, and records are linked together to form a set. A set can have only one owner record but several member records.
Relational Database Model
A Relational database is a set of data where all the items are related. The data elements in a Relational database are stored or organized in tables. A Table consists of rows & columns. Each column represents a Field, while a row represents a Record. The records are grouped under fields. A Relational database is flexible and easy to understand. A Relational database system, has the ability to quickly find & bring information stored in separate tables together using queries, forms and reports. This means that, a data element in any one table can be related to any piece of data in another table as long as both tables share common data elements.
Examples of Relational database systems; Microsoft Access, FileMaker Pro, Approach
This model is independent from the physical implementation of the data structure.
Advantages of Relational Database Structure
- Easier for users to understand and implement in a physical database system
- Easier to convert from other database structures
- Projection and joint operations (referencing groups of related data elements not stored together) are easier to implement and creation of new relations for applications is easier to do.
- Access control over sensitive data is easy to implement
- Faster in data search
- Easier to modify than hierarchical or network structures
Relational database technology separates data from the application and uses a simplified data model. Based on set theory and relational calculations, relational database models information in a table structure with columns and rows. Columns, called domains or attributes, correspond to fields. Rows or tuples are equal to records in a conventional file structure. Relational databases use normalization rules to minimise the amount of information needed in tables to satisfyusers’ structured and unstructured queries to the database.
Functions of Database Administrator
- Coordinates the activities of the database system. Duties include:
- Schema definition
- Storage structure and access method definition
- Schema and physical organisation modification
- Granting user authority to access the database
- Specifying integrity constraints
- Acting as liaison with users
- Monitoring performance and responding to changes in requirements
- Security definitions
Database Security, Integrity and Control
Security is the protection of data from accidental or deliberate threats, which might cause unauthorised modification, disclosure or destruction of data and the protection of the information system from the degradation or non-availability of service. Data integrity in the context of security is when data are the same as in source documents and have not been accidentally or intentionally altered, destroyed or disclosed. Security in database systems is important because:
- Large volumes of data are concentrated into files that are physically very small.
- The processing capabilities of a computer are extensive, and enormous quantities of data are processed without human intervention.
- Easy to lose data in a database from equipment malfunction, corrupt files, loss during copying of files and data files are susceptible to theft, floods etc.
- Unauthorised people can gain access to data files and read classified data on files.
- Information on a computer file can be changed without leaving any physical trace of change
- Database systems are critical in competitive advantage to an organisation
Some of the controls that can be put in place include:
1) Administrative controls – controls by non-computer based measures. They include:
Personnel controls e.g. selection of personnel and division of responsibilities, Secure positioning of equipment, Physical access controls, Building controls and Contingency plans.
2) PC controls eg Keyboard lock. Password Locking, disks Training, Virus scanning, Policies and procedures on software copying.
3) Database controls – a number of controls have been embedded into DBMS, these include:
- Authorisation – granting of privileges and ownership, authentication
- Provision of different views for different categories of users
- Backup and recovery procedures
- Checkpoints – the point of synchronisation between database and transaction log files. All buffers are force written to storage.
- Integrity checks e.g. relationships, lookup tables, validations
- Encryption – coding of data by special algorithm that renders them unreadable without decryption
- Journaling – maintaining log files of all changes made
- Database repair
4) Development controls – when a database is being developed, there should be controls over the design, development and testing e.g. Testing, Formal technical review, Control over changes, Controls over file conversion
5) Document standards – standards are required for documentation such as: Requirement specification, Programme specification, Operations manual, User manual
6) Legal issues
Escrow agreements – legal contracts concerning software, Maintenance agreements, Copyrights, Licenses, Privacy
7) Other controls include
Hardware controls such as device interlocks which prevent input or output of data from being interrupted or terminated, once begun
Data communication controls e.g. error detection and correction.
Database recovery is the process of restoring the database to a correct state in the event of failure