File "type"

Joe English Muffin jeenglis at girtab.usc.edu
Sun Sep 23 19:24:04 AEST 1990


robertb at cs.washington.edu (Robert Bedichek) writes:
>In article <171 at alchemy.UUCP> bbs at alchemy.UUCP (BBS Administration) writes:
>>
>>	Could someone explain how the command "file" works? Specifically, I am
>>writing a program that allows users to navigate their $HOME directory and
><text deleted>

>I suggest that you read the man page for 'file'.  Also, read the file
>that the man pages specifies as the database that 'file' uses.

Not all versions of 'file' use a separate database; I
believe the 4.2BSD 'file' has it hardcoded. (Not to
mention the fact that not all Unices have on-line
man pages, and not all sites make the hard-copy versions 
easy to get to, but that's another gripe :-)

To answer the original question, 'file' first does a
stat() to determine if the file is an executable,
setuid, symbolic link, etc.  Then it reads in the
first N characters of the file and checks it against a
predefined set of patterns.  Many of the patterns are
just ``magic numbers''; for example, under SunOS the
file types "mc68020 demand paged dynamically linked
executable" and "shell script" are determined from the
first two bytes of the file.

Some of the other patterns it looks for are a little
more complicated; for example, a period at the
beginning of the line indicates "[nt]roff, tbl, or eqn
input" (which is why it tends to think makefiles are
for troff so often.)  Certain patterns of punctuation
and capitalization (not too sure what they are)
distinguish "English text" from "ascii text."

If none of the patterns match, it looks for
non-printable characters; if there are any it will
report "data", otherwise "ascii text."

>There are many file types that editors will like besides files reported
>by 'file' as text.  For example shell scripts are usually reported as
>such and not as text.  So the result of 'file' isn't what I think that
>you want.  Also, some text editors can edit any file, including
>executable files.

This is true.  Your best bet is to write a simple C
program that reads in the first block of the file and
checks for non-printing characters and possibly for
lines that are too long as well. 

--Joe English

  jeenglis at alcor.usc.edu



More information about the Comp.unix.programmer mailing list