File "type"

Larry Wall lwall at jpl-devvax.JPL.NASA.GOV
Thu Sep 27 04:58:48 AEST 1990


In article <12141 at chaph.usc.edu> jeenglis at girtab.usc.edu (Joe English Muffin) writes:
: Not all versions of 'file' use a separate database; I
: believe the 4.2BSD 'file' has it hardcoded. (Not to
: mention the fact that not all Unices have on-line
: man pages, and not all sites make the hard-copy versions 
: easy to get to, but that's another gripe :-)
: 
: To answer the original question, 'file' first does a
: stat() to determine if the file is an executable,
: setuid, symbolic link, etc.  Then it reads in the
: first N characters of the file and checks it against a
: predefined set of patterns.  Many of the patterns are
: just ``magic numbers''; for example, under SunOS the
: file types "mc68020 demand paged dynamically linked
: executable" and "shell script" are determined from the
: first two bytes of the file.
: 
: Some of the other patterns it looks for are a little
: more complicated; for example, a period at the
: beginning of the line indicates "[nt]roff, tbl, or eqn
: input" (which is why it tends to think makefiles are
: for troff so often.)  Certain patterns of punctuation
: and capitalization (not too sure what they are)
: distinguish "English text" from "ascii text."
: 
: If none of the patterns match, it looks for
: non-printable characters; if there are any it will
: report "data", otherwise "ascii text."

Nice summary.

The main problem with using "file" it might induce bitrot when "file"
mutates out from under you.  Just because "file" reports "ascii text"
today is no guarantee that it won't report "D-News history file" sometime
next year.  :-)

: >There are many file types that editors will like besides files reported
: >by 'file' as text.  For example shell scripts are usually reported as
: >such and not as text.  So the result of 'file' isn't what I think that
: >you want.  Also, some text editors can edit any file, including
: >executable files.
: 
: This is true.  Your best bet is to write a simple C
: program that reads in the first block of the file and
: checks for non-printing characters and possibly for
: lines that are too long as well. 

Why write another one?  I've already got one you can use.  :-)

	perl -e 'print "text" if -T shift' filename

If you really do want a "simple" C program, rip out the routine that Perl
uses, do_fttext().  (But be advised that "simple" programs are just about
as hard to maintain across multiple architectures as complicated ones.
You get a lot of leverage by installing something like Perl across all
your architectures.  End of sermon.)

Larry Wall
lwall at jpl-devvax.jpl.nasa.gov



More information about the Comp.unix.programmer mailing list