Where Did You Get That Tape?
R. King Ables
Sending tapes to and receiving tapes from other UNIX
machines is a
fairly simple task. If a tape is created using tar(1)
virtually any other UNIX machine will be able to read
it because almost
all versions of UNIX have one (if not both) of these
But what do you do if you get a tape from a non-UNIX
on your UNIX box cannot understand an EBCDIC tape from
an IBM system.
What if you need to send data to a site that has no
UNIX hosts? An
IBM mainframe cannot make sense of a tar tape.
Commands like tar write files onto a tape in a specific
and include operating-system-dependent information about
(like date of creation, owner, protection, etc.). The
keeps the user from having to know exactly what the
data on the tape
looks like and makes moving data between UNIX machines
since other sites can read the tape with their own version
But a computer that doesn't have tar cannot use the
the tape as easily.
Since no single tape utility runs on every known machine
system (because most are so dependent on operating system
there is no single format that can be used to make a
tape easily readable
to any computer. The most generic way to prepare a tape
computer system is to write only the data in the file
onto the tape
(that is, to exclude information about the file such
as its name,
creation date, or protection). But to do this, the user
must be concerned
with things like blocking factors, record lengths, and
If you get a tape from a non-UNIX site, it may very
well contain only
data (that is, it may not be in tar or cpio format or
any other format that would allow you just to pull files
off the tape).
You need to know (or be able to figure out) what the
layout of the
data is in order to construct a set of commands to read
the data off
There are several attributes that make a tape unique.
the different possible formats and layouts of a magnetic
help you figure out how best to read data from a particular
The most obvious tape attribute is its physical type.
Most tapes that
come from non-UNIX sites will be 9-track tapes. These
tapes rolled onto large reels (like the ones you see
spinning on tape
drives in bad 1960s movies). More modern UNIX sites
now use cartridge
tapes (which look much like audio cassettes but about
twice as large),
8mm, or 4mm tapes. While everything I discuss here would
to cartridge, 8mm, or 4mm tapes, you will most likely
from non-UNIX sites on 9-track tapes.
Density refers to the number of frames written per inch
of tape and
determines how much information will fit on the tape.
The more densely
the information is recorded on the tape, the more you
can fit on the
tape. But (just as with audio and video recordings)
the more dense
your recording, the lower the quality (and with computer
higher the chance of having an error occur when reading
The three most common density settings for 9-track tapes
1600, and 6250 bpi (originally meaning bits per inch,
isn't exactly true). 1600 bpi used to be the most common
but as tape technology improved, 6250 bpi became more
and more popular
(since more information can be put on the tape at 6250
bpi), and now
most tapes are certified to work at the higher density.
Density is usually set on the tape drive itself or by
density on the command line when writing a tape. Many
use different device names for the same tape drive with
densities. When reading a tape, most tape drives can
sense the density
and adjust to read the data at the appropriate setting.
A further attribute of a tape is its character coding.
The two most
commonly used codes are ASCII and EBCDIC. EBCDIC is
mainly used by
IBM mainframes; ASCII is the standard character code
used by most
PCs and UNIX machines, among others. Knowing which code
was used will
also help you figure out what is on a tape.
A tape should never be written in the internal display
code of a machine
(if different from EBCDIC or ASCII), and internally
(binary values of floating point numbers or even integers)
never be written directly to tape. The internal representation
this information is different for different processors
systems. If numeric data is to be sent to another machine,
should be written out in a human-readable form (in character
of the values the way you would print the values) and
then read back
in and converted to numeric values.
Data is written onto a tape in groups of bytes called
tape blocks are referred to as "records."
This was an unfortunate
choice since the text data in a tape block is sometimes
said to be
made up of records (see LRECL below), which are really
in the file. In reading documentation from different
may find the word "record" used to describe
either a tape
record (which I will call a block) or a data record
(which is a single
line of text inside a file on a tape). You must determine
from the context.
Separating the blocks of data on a tape are empty spaces
interblock gaps (sometimes called interrecord gaps).
allow the tape drive to stay synchronized, read the
of data, and recover if it encounters an error in a
can be just about any size, but each block should be
the same size
as the others in a file (variable-length blocks greatly
the situation). The larger the blocksize, the more efficiently
data can be written on the tape (since there will be
gaps and, thus, less tape used). However, if a block
contains an error,
all of the data contained in that block will be lost.
So the risk
of larger blocks is that if an error occurs, more data
Note that some sites bill for resources and may charge
per block read
or written, so judicious use of blocksize can make a
in the charges for processing a tape.
Logical Record Length (LRECL)
Typically, each block contains some number of data records.
IBM mainframe world, a "record" refers to
what the rest of
the world calls a "line" of a text file. This
is where we
get the quasi-acronym "LRECL," meaning "logical
length," which is the number of characters in each
line of text
contained in a block.
No end-of-line characters are included in the data,
operating systems use different characters or combinations
to represent end-of-line. Records (lines of text) are
spaces so that they are all the same length. If records
are all the
same length, no end-of-line notations are necessary.
Usually a block
is made up of some whole number of padded records. Records
not be split across block boundaries.
A typical blocksize for a text file is 5120 bytes consisting
records with a record length of 80 characters (80x64=5120).
This is known as a fixed-length record because every
record is the
same length (short lines are padded, long lines are
every block in the file is 5120 bytes (except, perhaps,
the last one),
then the file is said to be a fixed-record, fixed-block
A tape may or may not have a standard label on it. The
label is used
by the machine that wrote the tape to keep information
about the tape
such as a volume serial number or volume name. Many
do not use tape labels, so the labels simply appear
as another file
on the tape. In most cases you will simply skip over
the label when
you're reading such a tape. Rarely does the label contain
that will be useful on a different kind of computer.
UNIX Utilities to Read These Tapes
A number of UNIX commands that can be used to examine
tapes and read
the information contained therein. First, though, there
are a few
common-sense steps that can be helpful.
Ask the person sending the tape to include a copy of
the run of the
job or the commands used to write the tape (even if
you don't know
the operating system, command-line arguments like LRECL=80
you some clues). Ideally, the sender should also provide
of the data and maybe a printout of the first and last
so that you have something against which to validate
If you write a tape to send to someone else, the recipient
appreciate getting this kind of information from you.
In the section that follows, I describe the UNIX commands
examining foreign tapes and discuss what these commands
can do and
how you will most often use them. You should also read
pages on your local system for a full description the
how they will work on your specific version of UNIX.
UNIX Tape Devices
UNIX often uses separate names in the /dev directory
devices with different characteristics. There are several
write to a tape drive and each method is represented
by a different
To do buffered writes to a tape (where you want the
to do your blocking for you), use the regular tape device.
something like /dev/mt0 on a Berkeley (BSD) UNIX machine
/dev/rmt/c0s0 under System V Release 4 (SVR4). "mt"
generally refers to a 9-track tape drive, though if
there is no 9-track
drive, it may simply refer to the default tape drive.
The unit number
may be used to refer to one of several tape drives or
it may be used
to specify a density for the tape. You must consult
for your version of UNIX to determine the proper device
To do unbuffered writes to a tape, where you will be
blocking (which is generally preferred), use the "raw"
device. This is something like /dev/rmt0 for BSD or
When you access a tape device and close it (that is,
the command you
are using terminates), the tape device driver rewinds
the tape. If
you have a single file on the tape that you are reading
time, this is the behavior you want. But if you have
a tape containing
multiple files, you want to be positioned at the end
of the file you
just read (and at the beginning of the next one) when
finishes, so that you can then process the next file.
In this case,
you should use the "no rewind" device when
accessing the tape.
This is generally denoted by adding an n to the device
like /dev/nrmt0 (BSD) or /dev/rmt/c0s0nr. Again, consult
your local documentation for the exact name.
Moving around on the Tape
A tape may contain zero or more files. A file is made
up of zero or
more blocks of data. All blocks should be the same size
last block of the file, which may be a short block (it
is not necessary
to pad the last block).
Multiple files are separated by end-of-file (EOF) marks.
The end of
the tape is generally represented by two sequential
EOF marks (that
is, an empty file).
The Berkeley UNIX mt(1) command is used to move the
or backward on file marks. Solaris and AIX also have
the mt command.
SVR4 uses the tapecntl(1) command to perform this task.
If you want to read the third file of a tape, for example,
use one of these commands to space forward twice (skipping
two files and setting the pointer to the beginning of
the third file).
On a Berkeley UNIX system, the command
$ mt -f /dev/nrmt0 fsf 2
skips two files (fsf stands for "forward skip
file"), leaving the tape at the beginning of the
third file. In
SVR4, the command is:
$ tapecntl -p 2 /dev/rmt/c0s0nr
Note that if you are moving around on the tape, you
must specify the
"no rewind" device to the mt and tapecntl
to prevent the tape from rewinding after the command
is the default).
Scan of Files and Blocksizes
Once you have the tape set up and you can access it,
you need to figure
out the data format (or verify the data format if you're
to have been sent information about the tape along with
The tcopy(1) command prints out information about the
found on a tape.
$ tcopy /dev/rmt0
file 1: records 1 to 25: size 51200
file 1: record 26: size 5120
file 1: eof after 26 records: 1285120 bytes
file 2: records 1 to 77: size 51200
file 2: record 78: size 10240
file 2: eof after 78 records: 3952640 bytes
file 3: records 1 to 31: size 51200
file 3: record 32: size 25600
file 3: eof after 32 records: 1612800 bytes
total length: 6850560 bytes
Note that tcopy uses "records" to mean
tape records, which I call "blocks."
The tcopy output says that there are three files on
(end-of-tape was encountered after the third file) and
that all files
have the same blocksize (51200). Assuming for the moment
files have an LRECL of 80, you can calculate 640 lines
block, thus, the first file has 16064 lines (640 per
block for the
first 25 and 64 lines in the last short block), the
second file contains
49408 lines, and the third file contains 20160 lines.
This can be
verified by dividing 80 into the total byte count for
each file. At
this point, however, that LRECL value is only a guess.
can be fairly sure, though, that the LRECL is some even
of 51200, since it is bad practice to break records
Examining Data on Tape
Given some information about the tape, the next step
is to read off
the first few blocks of each file and see if you can
make sense out
of the data. Two different commands will help with this.
The dd(1) command is used to read raw data from a device.
od(1) command is used to display the contents in various
So to read the first block and display it, use:
$ dd if=/dev/rmt0 ibs=51200 count=1 | od -c
0000000 @ @ @ @ @ @ 342 344 302 331 326 344 343 311 325 305
0000020 @ 311 325 304 305 347 M 311 325 361 k 311 325 362 k 311
0000040 304 311 324 305 325 k 311 346 326 331 304 k 311
327 326 342
(for the sake of brevity, I won't list the entire block
Clearly this makes no sense whatsoever. However, the
at-signs (@) at the beginning are interesting. Often
spaces look like at-signs when converted to ASCII (which
is how the
od command interprets the data). So we can have dd convert
from EBCDIC to ASCII by adding an argument:
$ dd if=/dev/rmt0 ibs=51200 conv=ascii count=1 | od -c
0000000 S U B R O U T I N E
0000020 I N D E X ( I N 1 , I N 2 , I
0000040 D I M E N , I W O R D , I P O S
This does indeed look like the first line of some old
FORTRAN source code. If at this point the output still
nonsense, you could use other flags to the od command
out a hexadecimal or octal dump of the data on the tape
so you could
check for other formats. If you request an octal or
be sure not to use the "conv" argument on
otherwise, dd will convert the data and you will not
values that are really on the tape.
Looking at more of the dump, you can also verify that
the lines are
padded out to column 80 with spaces. This being the
case (and expected
with FORTRAN source code), the original hypothesis of
is true and you now know enough to read the tape. It
is a fixed-record
length, fixed-block length, EBCDIC tape.
You could use mt (or tapecntl) to skip to the second
and third file and verify their format, but since tcopy
that the blocksizes were the same, it is probably safe
to assume the
other files are in the same format. Had the blocksizes
it would have been necessary to go through this same
to figure out the LRECL for those files.
To read off the complete file index.for (the name could
been sent with the tape or could be assigned at the
based on that first line), we would use the following
$ dd if=/dev/rmt0 of=index.for ibs=51200 cbs=80 conv=ascii,unblock
This would produce a "normal" UNIX file that
can be edited or compiled or used in any way you wish.
Writing Data to a Tape for Someone Else
As I mentioned earlier, if you are writing a tape for
most important thing you can do is give them a detailed
of what is on the tape along with a snapshot of at least
block or so of data. To write a very generic format
tape of a file,
use a command that is somewhat the opposite of the command
read the tape:
$ dd if=test.c of=/dev/rmt0 obs=51200 cbs=80 conv=block
This writes the file test.c out to the tape in
ASCII in a fixed-record length of 80 characters per
record and 640
records per block (just like the example tape). You
may adjust the
arguments to the dd command to write out the specific
as required by the data you are writing on the tape.
About the Author
R. King Ables has been a UNIX user since 1980 and has
been managing systems
or developing system management and networking tools
since 1983. He is
currently doing system and network management development
for HaL Computer
Systems in Austin, TX.