magic(4)
NAME
magic - magic file interface
SYNOPSIS
#include <magic.h>
Magic_t
{
unsigned long flags;
};
Magic_t* magicopen(unsigned long flags);
void magicclose(Magic_t* magic);
int magicload(Magic_t* magic, const char* path, unsigned long flags);
int magiclist(Magic_t* magic, Sfio_t* sp);
char* magictype(Magic_t* magic, const char* path, struct stat* st);
DESCRIPTION
These routines provide an interface to the file(1) command magic file.
magicopen returns a magic session handle that is passed to all of the
other routines. flags may be
MAGIC_MIME
Return the MIME type string rather than the magic file descrip-
tion.
MAGIC_PHYSICAL
Don't follow symbolic links.
MAGIC_STAT
The stat structure st passed to magictype will contain valid
stat (2) information. See magictype below.
MAGIC_VERBOSE
Enable verbose error messages.
magicclose closes the magic session.
magicload loads the magic file named by path into the magic session.
flags are the same as with magicopen. More than one magic file can be
loaded into a session; the files are searched in load order. If path
is 0 then the default magic file is loaded.
magiclist lists the magic file contents on the sfio(3) stream sp. This
is used for debugging magic entries.
magictype returns the type string for path with optional stat(2) infor-
mation st. If st == 0 then magictype calls stat on a private stat
buffer, else if magicopen was called with the MAGIC_STAT flag then st
is assumed to contain valid stat information, otherwise magictype calls
stat on st. magictype always returns a non-null string. If errors are
encounterd on path then the return value will contain information on
those errors, e.g., cannot stat.
FORMAT
The magic file format is a backwards compatible extension of an ancient
System V file implementation. However, with the extended format it is
possible to write a single magic file that works on all platforms.
Most of the net magic files floating around work with magic, but they
usually double up on le and be entries that are automatically handled
by magic.
A magic file entry describes a procedure for determining a single file
type based on the file pathname, stat (2) information, and the file
data. An entry is a sequence of lines, each line being a record of
space separated fields. The general record format is:
[op]offset type [mask]expression description [mimetype]
# in the first column introduces a comment. The first record in an
entry contains no op; the remaining records for an entry contain an op.
Integer constants are as in C: 0x* or 0X* for hexadecimal, 0* for octal
and decimal otherwise.
The op field may be one of:
+ The previous records must match but the current record is
optional. > is an old-style synonym for +.
& The previous and current records must match.
{ Starts a nesting block that is terminated by }. A nesting block
pushes a new context for the + and & ops. The { and } records
have no other fields.
id{ A function declaration and call for the single character identi-
fier id. The function return is a nesting block end record }.
Function may be redefined. Functions have no arguments or
return value.
id() A call to the function id.
The offset field is either the offset into the data upon which the cur-
rent entry operates or a file metadata identifier. Offsets are either
integer constants or offset expressions. An offset expression is con-
tained in (...) and is a combination of integral arithmetic operators
and the @ indirection operator. Indirections take the form where inte-
ger is the data offset for the indirection value. The size of the
indirection value is taken either from one of the suffixes
B(byte,1char), H(short,2chars), L(long,4chars), pr Q(quead,8chars), or
from the type field. Valid file metadata identifiers are:
atime The string representation of stat.st_atime.
blocks stat.st_blocks.
ctime The string representation of stat.st_ctime.
fstype The string representation of stat.st_fstype.
gid The string representation of stat.st_gid.
The stat.st_mode file mode bits in modecanon(3) canonical represen-
tation (i.e., the good old octal values).
mtime The string representation of stat.st_mtime.
nlink stat.st_nlink.
size stat.st_size.
name The file path name sans directory.
uid The string representation of stat.st_uid.
The type field specifies the type of the data at offset. Integral
types may be prefixed by le or be for specifying exact little-endian or
big-endian representation, but the internal algorithm automatically
loops through the standard representations to find integral matches, so
representation prefixes are rarely used. However, this looping may
cause some magic entry conflicts; use the le or be prefix in these
cases. Only one representation is used for all the records in an
entry. Valid types are:
byte A 1 byte integer.
short A 2 byte integer.
long A 4 byte integer.
quad An 8 byte integer. Tests on this type may fail is the local
compiler does not support an 8 byte integral type and the corre-
sponding value overflows 4 bytes.
date The data at offset is interpreted as a 4 byte seconds-since-the-
epoch date and converted to a string.
edit The expression field is an ed(1) style substitution expression
del old del new del [ flags ] where the substituted value is
made available to the description field %s format. In addition
to the flags supported by ed(3) are l that converts the substi-
tuted value to lower case and u that converts the substituted
value to upper case. If old does not match the string data at
offset then the entry record fails.
match expression field is a strmatch(3) pattern that is matched
against the string data at offset.
string The expression field is a string that is compared with the
string data at offset.
The optional mask field takes the form where number is anded with the
integral value at offset before the expression is applied.
The contents of the expression field depends on the type. String type
expression are described in the type field entries above. * means any
value and applies to all types. Integral type expression take the form
[operator] operandP where operand is compared with the data value at
offset using operator. operator may be one of <. <=, ==, >= or >.
operator defaults to == if omitted. operand may be an integral con-
stant or one of the following builtin function calls:
magic()
A recursive call to the magic algorithm starting with the data
at offset.
loop(function,offset,increment)
Call function starting at offset and increment offset by incre-
ment after each iteration. Iteration continues until the
description text does not change.
The description field is the most important because it is this field
that is presented to the outside world. When constructing description
fields one must be very careful to follow the style layed out in the
magic file, lest yet another layer of inconsistency creep into the sys-
tem. The description for each matching record in an entry are concate-
nated to form the complete magic type. If the previous matching
description in the current entry does not end with space and the cur-
rent description is not empty and does not start with comma , dot or
backspace then a space is placed between the descriptions (most
optional descriptions start with comma.) The data value at offset can
be referenced in the description using %s for the string types and %ld
or %lu for the integral types.
The mimetype field specifies the MIME type, usually in the form a/b.
FILES
../lib/file/magic located on $PATH
EXAMPLES
0 long 0x020c0108 hp s200 executable, pure
o{
+36 long >0 , not stripped
+4 short >0 , version %ld
}
0 long 0x020c0107 hp s200 executable
o()
0 long 0x020c010b hp s200 executable, demand-load
o()
The function o(), shared by 3 entries, determines if the executable is
stripped and also extracts the version number.
0 long 0407 bsd 386 executable
&mode long &0111!=0
+16 long >0 , not stripped
This entry requires that the file also has execute permission.
SEE ALSO
file(1), mime(4), tw(1), modecanon(3)
MAGIC(3)
Man(1) output converted with
man2html