Text File Schema


There are computer files that exist for the sole purpose of holding related data for computer consumption (called hereafter a "Data File"). These Data Files can contain multiple lines of human readable text with each line being made up of multiple fields and each field containing one piece of data (called hereafter a "Text File"). These lines of data are referred to as records. A sales order is an example.

One type of these Text Files is what's called a Flat File. Two sub-types of Flat Files are CSV (Comma Separated Values) or Tab Delimited files. These Text Files use a comma or tab to separate the different fields in each record. Another sub-type is called a Fixed-width file, where instead of a comma or tab to separate the fields, each field's starting point and width are defined. An example is where the "Name" field might start at position 45 and be 32 characters wide. Sometimes these three file types contain a special record at the very top, called a Header, that contains a name for each field.

The term Flat File refers to a Text File where all records have the same purpose. This is a Text File with source data that was "flattened" so that the file now looks like all the fields came from the same source data table. Another type of Text File is a Tagged File. These differ from the previous examples in that a Tagged File has different records for different purposes. An example would be a set of records that represent the shipping address, and another set of records that represent the items to be shipped. Each record is tagged in some method to identify what purpose it has.

In all of these cases, these Text Files represent the data being held in a fairly raw way in plain text unlike storing the data in a relational database or a Binary File. This makes the Text Files easy to transport and easy to edit, but very hard to understand if you aren't the author. There is no information about what the Text File is for, what the field data types are, and very little information about what the data means. There is just the raw data represented as text strings (plus maybe a Header if you're lucky).

What is needed then is a way to accurately describe the data in one of these Text Files, without having to modify the Text File itself. This will be done via an additional Text File that will define the data in all of the file types mentioned above. This secondary Text File will exist alongside the primary Text File and can be communicated when needed. It is called a Text File Schema Definition.


The core objective of this website is to describe the Text File Schema Definition in a way that can then be used to understand existing Schema Definition files, validate the data in a Text File or create new Schema Definition files.