COMPUTER FILE ORGANIZATION METHODS
File organization refers to the way data is stored in a file. File organization is very important
because it determines the method of access, efficiency, flexibility and storage devices to be used.
There are four methods of organizing files on a storage media namely:
➢ indexed- sequential
Serial file Organization
• Serial file organization is the simplest file organization method. This is type of file design where records are stored in the storage media chronologically. i.e. in the order they occur. In serial files, records are entered in the order of their creation. As such, the file is unordered, and is at best in chronological order.
• Serial files are primarily used as transaction files in which the transactions are recorded in the order that they occur.
• The records are accessed are accessed from the storage media serially from head to tail.
• This type of access is normally used by magnetic tapes.
• The hit rate for serial files is high- Hit rate refers to the number of records accessed at a given period of time.
• Serial files are suited for high activity processing e.g. batch processing where a group of records are collected and processed at the same time.
• Serial file can only be accessed serially, that is search through the file starting from the head of the file to tail of the file.
Advantages Serial file Organization
i. It is simple method of file design
ii. Cheap method because it uses magnetic tapes
iii. It makes optimum use of storage media because no space is spared for record insertion as the records are stored in the order of occurrence.
Disadvantages Serial file Organization
i. It is cumbersome to access because you have to access all preceding records before retrieving the one being searched
ii. Wastage of space on medium in form of inter-record gap.
iii. It cannot support modern high-speed requirements for quick records access
iv. It takes longer time to retrieve a record of interest.
Sequential file Organization
• A sequentially organized file consists of records arranged in the sequence in which they are written to the file e.g. Alphabetically, numerically etc. (the first record written is the first record in the file, the second record written is the second record in the file, and so on). As a result, records can be added only at the end of the file. Attempting to add records at some place other than the end of the file will result in the file begin truncated at the end of the record just written.
• Sequential files are usually read sequentially, starting with the first record in the file. Sequential files with a fixed-length record type that are stored on disk can also be accessed by relative record number (direct access).
• Records in sequential files can be read or written only sequentially.
• After you have placed a record into a sequential file, you cannot shorten, lengthen, or delete the record. However, you can update (REWRITE) a record if the length does not change. New
records are added at the end of the file.
• If the order in which you keep records in a file is not important, sequential organization is a good choice whether there are many records or only a few. Sequential output is also useful for printing reports.
• The most suitable storage media for sequential files is magnetic tapes.
• Sequential files are ideal for high activity processing e.g. batch processing.
Advantages Sequential file Organization
i. Simple method of file design (simple to understand).
ii. Cheap because it uses a magnetic tape.
iii. Easy to organize, maintain and understand.
iv. Loading a record requires only a record key.
v. Sorting makes it easier to access records.
vi. Errors in the file remain localized.
Disadvantages Sequential file Organization
i. It has a high access time i.e. it takes longer to retrieve a record of interest.
ii. Entire file must be processed even when the activity rate is very slow
iii. Data redundancy is very high since the same data may be stored in several files sequenced in different keys.
iv. It does not make optimum use of the storage media because some spare space between records is left for record insertion
v. Random enquiries are virtually impossible to handle.
vi. Sorting does not remove the need to access other records as the search looks for a particular record
vii. Sequential records cannot support modern technologies that require fast access to stored records
viii. The requirement that all records be of the same size is sometimes difficult to enforce
Indexed- Sequential file Organization
• Indexed file contains records ordered by a record key. Each record contains a field that contains the record key. The record key uniquely identifies the record and determines the sequence in which it is accessed with respect to other records. A record key for a record might be, for example, an employee number or an invoice number.
• An indexed file can also use alternate indexes, that is, record keys that let you access the file using a different logical arrangement of the records. For example, you could access the file through employee department rather than through employee number.
• The records are arranged sequentially as in sequential files but the difference is that there is an index that allows for selective access. The indexes are used to point particular portion where the records are stored in groups, this allows the by-passing of a group of records that are not
required in a particular processing run.
• The best storage media for index sequential files is a magnetic disc (hard disk) the records can be accessed using the following methods: –
i. Sequential Access – this is where the user will use the specific sequence e.g. alphabetic or numeric to retrieve a record of interest.
ii. Use of Indices – this is where the user will use the unique index number to retrieve a record of interest.
iii. Random access – this is where user moves up and down in a none orderly manner in order to retrieve a record of interest.
The hit rate is both high and low – hit rate refers to the number of records that can be accessed at a given period of time. It is high because of the sequential access and low because of index access. Index sequential files are ideal for online processing and batch processing.
Benefits of index sequential Files
i. They give users many different options for access
ii. The index number provides a very fast method of access as they retrieve one record at a time
iii. The records cannot be duplicated as the indices ensure that each record is unique.
Limitation of index sequential Files
i. Index-sequential file do not make optimum use of the available memory because some
spare space is needed for record insertion.
ii. They increase storage overhead
iii. It is expensive method because it uses the magnetic disc.
Random file Organization
• This is the type of file design where the records are stored in a storage with no regard to any
• In random file organization, records are stored in random order within the file. Though there is no sequencing to the placement of the records, there is however, a pre-defined relationship between the key of the record and its location within the file. In other words, the value of the record key is mapped by an established function to the address within the file where it resides. Therefore, any record within the file can be directly accessed through the mapping function in roughly the same amount of time. The location of the record within the file therefore is not a
factor in the access time of the record. As such, random files are also known in some literature as direct access files.
• This method is normally used by optical disks like compact disks and magnetic disc. The hit rate is very low.
• Random files are suitable in real time applications such as Airline seat reservation, Hotel Booking, ATMs, Inventory controls and theatre ticketing.
Advantages / benefits of random files
i. It has lower storage overheads since it does not use index numbers.
ii. File updating and maintenance is easily achieved – update refers to adding, deleting or amending.
iii. Quick retrieval of records
iv. The records can be of different sizes
Limitation of Random Files
i. It is expensive as it uses the magnetic disc.
ii. It is difficult to find a way of uniformly distribution the records within the storage media.
iii. Data may be accidentally erased or over-written unless special precaution is taken.
iv. May be less efficient in the use of storage space that sequentially organization files.
v. System design around it is complex and costly.
Factors influencing file design (factors to consider when selecting a file organization method)
There are several methods of file organization and each one is suited for a particular task or purpose. Here are the factors to consider before choosing a file organization method;
i. Frequency of update/ Volatility: This refers to the frequency of adding or deleting records from a file. A file that needs to be updated every now and then needs an
organization method that will allow easy retrieval of information and ease of updating, example of such a file is the transaction file. Highly volatile files will require random organization while low volatile files will require serial or sequential files.
ii. File activity: This refers to the frequency of using a file. High activity files will require serial or sequential while low activity files will require random organization. Different files have different activities, example a sort file is used to sort data in sequential order
and therefore sequential method would be appropriate for such a file.
iii. Cost: It is essential that a cost benefit analysis be conducted because different file will
require different cost.
iv. Storage media: Different files design use different storage media. E.g. serial and sequential files use magnetic tapes while index-sequential and random use magnetic disc.
v. File access method: Definitely different files have different methods of being accessed, example a reference file is accessed using random method for easy retrieval of data.
vi. Nature of the system: Files that are used in a particular system will depend on the nature of the system i.e. the suitable organization method for that particular system.
vii. Area of application: different file designs are applicable in different areas. E.g. serial and sequential files are applicable in batch processing while index-sequential is applicable in both batch and online processing and random files in real time processing.
viii. Master file medium: The master file is the main file for keeping permanent updates of records from transaction files and other sources, the medium by which it is updated will determine the organization method to be used.
ix. Response time: This refers to the speed of access and when a fast response is required random and index-sequential files are ideal.
x. Expected file size and anticipated growth pattern: If a file is large and the anticipated growth rate is high, then random organization is preferred, otherwise serial and sequential
files are ideal.