Files, folders and directories

Introduction

On computers information is stored on magnetic disks, CD-ROMs, flash-memory, etc. in terms of files. Each file contains information pertaining to a particular task. Some files contain instructions for a computer to perform certain actions (your famous wordprocessor is actually scattered over many different files each called upon depending on the button you click on). Other files contain information about a document you wrote. Such files often not only contain the textual information but also indications as to which fonts to use, which words are in bold etc.

Files are identified by their names. What constitutes a legal filename depends on the Operating System your computer is running under, like Windows ME, XP, or various UNIX flavors (Macintosh,Linux). Matters are made more complicated under some operating systems (Microsoft Window flavors in particular) by hiding often the complete filename. I will come back to that later when discussing some of the different operating systems in detail.

Modern computers have tens or even hundreds of thousands of different files on their storage media. In order to avoid the trouble with duplicate filenames and to bring some order into the chaos there are some special files called directory files ( or directories for short ). Under Microsoft operating systems these files are called folders giving you a totally wrong impression as to what is going on. In reality, these folders are files containing the names of other files ( including other directory files ) and information about them ( like size and dates ) in a fashion similiar to a phone book. Like a phone book you can add or delete names or rename a file. Of course, inside a normal phone you don't find names referring to another phone book, but inside a directory file you do. We refer to these then as sub-directories and the directory they are listed in as the parent directory. This gives rise to what is called a filestructure as shown in the figure below. It starts with a root-directory indicated by a single forward slash at the top, which lists a number of file names and the names of sub-directories of 1.level, each such sub-directory in turn may list the names of files and subdirectories of the 2. level, and so on.

Directory Tree

In the figure above, red denotes the name of a directory and black is a filename. An individual file is now not only characterized by its name but also by the sequence of directories it is contained in, which is referred to as the path of the file. In the above figure there is a file :

/edg100/sketch3/sk.exe

and a file :

/windows/sk.exe

both files may contain totally different things although their names (sk.exe) are identical.

Note how the first slash is used to indicate the root-directory and subsequent slashes separate the names of directories. Also, in the Mircosoft world the backward slash (\) is used instead of the forward slash (/) under UNIX and Macintoshes (which are UNIX-based).

Files etc. in the UNIX world

Currently available UNIX operating systems run on a variety of different computers from personal computers to mainframes. Many UNIX flavors are commercial in nature while others (like LINUX the operating sytem on Mont Alto's webserver) are freeware and are improved and worked upon by thousands of computer enthusiasts from all over the world. Despite a huge variety of flavors the fundamentals including file and directory names and directory trees have remained unchanged over the last 35 years and are identical on all UNIX flavors.

One hallmark of all UNIX systems is that a single computer is host to only a single directory tree as shown below.

UNIX directory tree

Usually large sections of this tree will reside on different hard disks or on the same harddisk but different partitions. It is also possible to expand a directory tree by directories and files residing on a second (or third etc.) computer, provided the owner of the other computer(s) gives you permission to do so. The nice thing about this arrangement is that the casual user never needs to know about which files reside where on which computer.

As far as file and directory names in the UNIX world are concerned any string of characters consisting of :

letters a-z , letters A-Z , digits 0-9, period , minus sign, underscore

of any length (well there is somewhere a limit but I have yet to run into it) is totally legal. Every of these legal characters can appear multiple times or not at all and a lower case letter is distinct from its upper case counterpart. You therefore have 2*26 + 10 + 3 = 65 different characters at your disposal. Blanks (or spaces) are legal but somewhat ackward to work with and should be avoided at all costs. Actually, the above rules are conservatively stated, some of the not-mentioned characters on the keyboard (like the ~) and the blank can be used from the UNIX-point of view, but may cause some problems if your files are to be accessed by particular programs, for example web-browsers.

If you like to utilize filenames consisting of multiple words I strongly urge you to use something like :

MyOldResume , my_old_resume , my.old.resume

Files etc. in the DOS world

Although there is probably not a single computer left which uses the DOS operating system per se, most Microsoft Operating Systems (like Windows 98, Windows NT, and Windows XP) allow you to step back into this world. One does this occasionally when serious problems occur with your computer. There are also still some programs in wide use (sftp for example) which adhere to the DOS filename convention. In that sense the DOS conventions are the least common denominator of all windows-based operating systems and programs. In the DOS world no distinction is being made between lower and upper case letters (individual programs and passwords may be an exception).

There are some restrictions placed on filenames and extensions.

Use only the characters :  a-z , 0-9 , underscore , minus-sign
There are some more legal characters but I discourage their use.
Each filename consists of at most 8 of these characters followed by a period followed by at most 3 more characters. These last three characters are often refered to as the extension and particular programs produce files with particular extensions. Files containing text-based webmaterial have for example the extension .htm or .html, file containing pictures for the web have extensions of either .gif or .jpg .

Files are organized under DOS in the same fashion as under UNIX with the notable exception that a separate filestructure exists on each storage medium. A file is addressed by the name of the disk ( a single letter like A: or B: (both for floppies) and C: etc. for hardisks or CD-Roms ) followed by the path and then followed by the filename including its extension.

Filenames etc. in other worlds

The term other worlds refers here to various Microsoft Window versions, in particular Windows 95 and up. The filestructures these operating systems employ is very similar to that of the DOS world with the exception though that no restrictions are placed (at least the casual user does not run into them) on the number of characters making up a filename and its extension. Also, letters are case sensitive and the blank (or space) is a legal character which allows you to have a filename look like a whole sentence describing often the content of the file. Note, that in many cases the extension of a filename (the stuff behind the last period) is not display unless you know how to make that happen.

Filenames etc. for the purpose of the WWW

Here are a few rules I recomend to go by when it comes to chossing filenames and names for directories (folders) for the WWW. They take into consideration browser and operating compatibility, but might be at times more restrictive than necessary --- but they are safe.
Names for files and directories are :
  1. case-sensitive : "second.html" and "SeconD.html" are two different names under this rule referring to two different files.
  2. do not use blanks (spaces), although most operating systems do not mind, browsers usually do.
  3. do not use special characters to be part of filenames except maybe the underscore _ , and the minus sign - and additional isolated periods .. Stick with alphabetical and numerical characters.
    Hence, "second.third_item-5.html" is a legal filename.


Back to top of document


Send a Note to Zig

Zig Herzog © 2014
hgnherzog@yahoo.com
Last revised: 08/23/13