sort(C)
sort --
sort and merge files
Syntax
sort [-m]
[-bdfiMnru]
[-o output]
[-k keydef] ...
[-t x]
[-T tmpdir]
[-y [kmem]] [-z
recsz] [file ... ]
sort -c
[-bdfiMnru]
[-k keydef] ...
[-t x]
[-T tmpdir]
[-y [kmem]]
[-z recsz]
[file]
sort [-mu]
[-bdfiMnr]
[-o output]
[-t x]
[-T tmpdir]
[-y [kmem]]
[-z recsz]
[+pos1 [ -pos2]] ...
[file ... ]
sort -c [-u]
[-bdfiMnr]
[-t x]
[-T tmpdir]
[-y [kmem]]
[-z recsz]
[+pos1 [-pos2]] ...
[file]
Description
sort sorts lines of all the named files together
and writes the result on the standard output. The standard input is
read if ``-'' is used as a filename or if no input files are
named.
Comparisons are based on one or more sort keys extracted from each
line of input. A sort key defines a minimal sequence of characters
which are to be used in sorting. By default, there is one sort key,
the entire input line, and ordering is
determined by the collating sequence defined by the locale (see
locale(M)).
The following options alter the default behavior:
-c-
Check that the input file is sorted according to the ordering
rules. This option produces no output; it only affects the exit
value.
-m-
Merge only; the input files should already be sorted.
-o output-
The argument output is the name of a file to use instead
of the standard output. This file may be the same as one of the
input files. There may be optional blanks between
-o and output.
-T tmpdir-
tmpdir is the pathname of a directory to be used for
temporary files. The default is to try /usr/tmp and
/tmp. If -T is specified then tmpdir
and /tmp are tried. There must be a space between
-T and tmpdir.
-u-
Unique: suppress all but one in each set of lines having equal keys.
This option can result in unwanted characters placed at the end of
the sorted file.
-y [kmem]-
The amount of memory used by sort has a large impact on
its performance; for example, sorting a small file in a large amount
of memory is inefficient. If the -y option is omitted,
sort begins using the default memory size
(32KB), and allocates more memory as needed. If
kmem is specified, sort starts using that number
of kilobytes of memory, unless the administrative minimum
(32KB) or maximum (1MB) is violated. In this
case, sort uses the corresponding minimum or maximum
value.
If kmem is 0, sort uses the minimum memory
requirement of 16KB.
By convention, specifying -y with no argument uses the
maximum memory requirement of 1MB.
-z recsz-
Causes sort to use a buffer size of recsz bytes
for the merge phase. Input lines longer than the buffer size will
cause sort to terminate abnormally. Normally, the size of
the longest line read during the sort phase is recorded and this
maximum is used as the record size during the merge phase,
eliminating the need for the -z option. However, when the
sort phase is omitted (-c or -m options) a
system default buffer size is used, and if this is not large enough,
the -z option should be used to prevent abnormal
termination.
The following options override the default ordering rules.
-d-
``Dictionary'' order: only letters, digits and blanks (spaces and
tabs) are significant in comparisons. Dictionary order is defined by
the current setting of LC_CTYPE (see
locale(M)).
-f-
Fold lowercase letters into uppercase. Conversion between lowercase
and uppercase letters are governed by the current setting of
LC_CTYPE (see
locale(M)).
-i-
Ignore non-printable characters in non-numeric
comparisons. Non-printable characters are defined by the current
setting of LC_CTYPE (see
locale(M)).
-M-
Compare as months according to the current setting of
LC_TIME (see
locale(M)).
The first month in the year compares low to the second month and so
on; for example, in the POSIX locale, ``JAN'' <
``FEB'' < ... < ``DEC'' and invalid fields
compare low to ``JAN''. The -M option implies
the -b option.
-n-
An initial numeric string, consisting of optional blanks, an
optional minus sign, and zero or more digits with optional decimal
point, is sorted by arithmetic value. The -n option
implies the -b option. Note that the -b option
is only effective when restricted sort key specifications are in
effect.
-r-
Reverse the sense of comparisons.
The treatment of field separators can be altered using the options:
-b-
Ignore leading blanks when determining the starting and ending
positions of a restricted sort key. If the -b option is
specified before the first sort key argument, it will be applied to
all sort keys.
-t x-
Use x as the field separator character; x is not
considered to be part of a field (although it may be included in a
sort key). If x is a space, specified as -t
" ", all spaces (including those at the beginning of a line)
are treated as field separators. Each occurrence of x is
significant (for example, xx delimits an empty field).
When ordering options appear before restricted sort key
specifications, the requested ordering rules are applied globally to
all sort keys. When one or more of the flags b,
d, f, i, n, or r
is attached to a specific sort key (see
``Sort key field definition'')
the specified ordering options override all global ordering options
for that key.
When there are multiple sort keys, later keys are compared only
after all earlier keys compare equal. Lines that otherwise compare
equal are ordered with all bytes significant.
Input files are treated as sequences of records (lines), each of
which contains one or more fields. By default, the first blank
character (space or tab) of a sequence of blank characters acts as
the field separator. Remaining blank characters in the sequence are
treated as part of the field unless the -b option (ignore
leading blanks) is specified. If the -t option is used to
specify a field separating character, all occurrences of that
character are interpreted as separating fields.
The option -t " " specifies that a space character is to
be used as the field separator. In this case, any tab characters are
interpreted as being part of a field; any leading tab characters are
ignored if the -b option is specified. All space
characters are interpreted as field separators and are unaffected by
the -b option.
Sort key field definition
Sort key fields may be defined in two ways:
-k keydef-
keydef is a key field definition for a restricted
sort. There may be more than one key field defined. Each takes the
form:
start[flag][,end[flag]]
start and end restrict a key field to part of a
line. flag is one of the modifiers b,
d, f, i, n, or
r. These modifiers act like the options -b,
-d, -f, -i, -n, and
-r respectively, but they only apply to the key field
except for b; it acts only on the start or
end to which it is attached.
A key field start is specified in the form
field[.first] where the field numbers
start at 1 for the first field on a line. first defines
the number of the character that starts the key field. Characters in
fields are also numbered from 1; if first is missing, 1 is
assumed. Similarly, a key field end has the form
field[.last] where last specifies the
last character of a key field; default is the last character in
field. If end is missing, the key field is
assumed to extend to the end of the line.
The -b option and the b modifier cause
characters in a field to be counted from the first non-blank
character.
+pos1 [ -pos2 ]-
This notation restricts a sort key to one beginning at
pos1 and ending at pos2. The characters at
positions pos1 and pos2 are included in the sort
key (provided that pos2 does not precede
pos1). A missing -pos2 means the end
of the line.
In this form of key field specification, fields are numbered in
ascending order, starting from 0. The character position in a field
can also be referenced, starting from 0 (for the first
character). All blanks in a sequence of blanks are considered to be
part of the next field. For example, all blanks at the beginning of
a line are considered to be part of the first field.
pos1 and pos2 each have the form:
m[.n][flag]
A starting position specified by +m.n
is interpreted to mean the (n+1)th character in the
(m+1)th field. A missing .n means .0,
indicating the first character of the (m+1)th
field. flag is one of the modifiers b,
d, f, i, n, or
r. If the b flag is in effect, n is
counted from the first non-blank in the (m+1)th field;
+m.0b refers to the first non-blank
character in the (m+1)th field.
A last position specified by -m.n is
interpreted to mean the nth character (including
separators) after the last character of the mth field. A
missing .n means .0, indicating the last character of
the mth field. If the b flag is in effect,
n is counted from after the final leading blank in the
(m+1)th field; -m.0b
refers to the first non-blank in the (m+1)th field.
It is not possible to use a sort key field to extend the span of a
field outside the separator characters that delimit the field. Use
the -t option if you need to specify a key field based on
column position alone; see
``Sorting a file by columns''
for an example that uses this method.
Exit values
sort returns the following exit values:
0-
sort processed all input successfully; with the
-c option, the input file was correctly sorted.
1-
Using the -c option, sort found that the file
was not ordered as specified. Using options -c and
-u, sort found two input lines with identical
keys.
>1-
An error occurred in sort, such as input lines being too
long.
Diagnostics
When the last line of an input file is missing a newline character,
sort appends one, prints a warning message, and
continues.
Examples
All examples are given for both forms of sort key field syntax.
Sort the contents of infile with the second field as the
sort key:
sort -k 2,2 infile
sort +1 -2 infile
Sort, in reverse order, the contents of infile1 and
infile2, placing the output in outfile and using
the first character of the second field as the sort key:
sort -r -o outfile -k 2,2.1 infile1 infile2
sort -r -o outfile +1.0 -1.1 infile1 infile2
Sort, in reverse order, the contents of infile1 and
infile2 using the first two non-blank characters of the
second field as the sort key:
sort -r -k 2.1b,2.2b infile1 infile2
sort -r +1.0b -1.2b infile1 infile2
Print the password file
(passwd(F)) sorted by the numeric user ID (the
third colon-separated field):
sort -t: -k 3n,3 /etc/passwd
sort -t: +2n -3 /etc/passwd
Print the lines of the already sorted file infile,
suppressing all but the first occurrence of lines having the same
third field (the options -um with just one input file make
the choice of a unique representative from a set of equal lines
predictable):
sort -um -k 3,3 infile
sort -um +2 -3 infile
Sorting a file by columns
To sort a file based on columns, use the -t option to
specify a field separator character which does not appear in the
input. This will cause each line to be treated as a single
field. The -k option or the +pos1 and
-pos2 specifiers can then be used to sort on particular
ranges of columns. The -b option and b modifier
flag can also be used to ignore leading blanks (spaces or tabs).
For example, if the character ``:'' does not appear in the
file infile, sort this file on the contents of columns 9
through 72 using:
sort -t: -k 1.9,1.72 infile
sort -t: +0.8 -0.72 infile
Files
/usr/tmp/stm???
Open UNIX 8 compatibility notes
When running ACP on Open UNIX 8 and UnixWare 7 systems,
set OSRCMDS=on to use
the SCO OpenServer version of the <sort> command.
This provides the expected behaviors
for SCO OpenServer applications.
The SCO OpenServer version of this command
is also provided on Open UNIX 8 systems under the OSP feature
See the
Running SCO OpenServer Applications
topic in the Open UNIX 8 documentation set.
See also
coltbl(M),
comm(C),
join(C),
locale(M),
uniq(C)
Standards conformance
sort is conformant with:
ISO/IEC DIS 99452:1992, Information technology Portable Operating System Interface (POSIX) Part 2: Shell and Utilities (IEEE Std 1003.21992);
AT&T SVID Issue 2;
X/Open CAE Specification, Commands and Utilities, Issue 4, 1992.
© 2003 Caldera International, Inc. All rights reserved.
SCO OpenServer Release 5.0.7 -- 11 February 2003