Cover V09, I08


How to Read a SCSI Bus Trace

Timothy Swenson

When hooking up SCSI devices, like CD-ROMs and tape drives, things don’t always work. Granted, SCSI is a standard, but with the variances in SCSI controllers, device firmware, and SCSI drivers, a SCSI device that works well on one system may totally fail on another system.

The typical response for systems administrators is to contact support for both the computer vendor and device vendor. If the systems administrator is lucky, one of the vendors will have seen the problem before and know the answer. If the systems administrator is unlucky, the two vendors will start pointing fingers at each other, and the problem will not be resolved. If a SCSI bus analyzer is available, then the sys admin can get a few SCSI bus traces and figure out the problem.

SCSI Bus Analyzer

A SCSI bus analyzer listens on the SCSI cable and logs a variety of information about what it sees on the cable. The analyzer consists of a hardware component that plugs into the SCSI chain, and a software component that takes the data from the hardware box and displays it in a human-readable format.

A quick browse on the Web shows that there are a nmber of different types of analyzers from several different vendors. Some are add-ons to computers, some are add-ons to logic analzers, and some are stand-alone. Of those that need a computer to work, some are PCI cards and others link via a PCMCIA card slot. For further information see the following manufacturers:

A Verisys SCSI-View SV-3000 analyzer was used to create the examples in this article.

SCSI Bus Phase Report

A SCSI bus analyzer can generate a number of different reports. The primary report that is of the most value is the SCSI bus phase report. This report shows the different phases that take place on the bus, including the commands with input and output.

In example trace #1 below, the first column is the sequential number of the phase, with the numbers starting at 1 and going up. The second column is the time, in nanoseconds, between each phase. A timing of 1_281_815 is .1281815 seconds. Timing is usually only important when looking at high wait times between phases, which could result in a SCSI bus timeout and reset. The third column is the phase and shows the individual phase and any data passed. Phases can be broken down into three types:

Waiting Phase — Bus Free

Bus Control Phases — Arbitration, Selection, Reselection

Information Transfer Phases — Message In/Out, Command, Data In/Out, Status

Reading Through the Phases

Again, looking at example trace #1 below, here is a simple walk-through of the phases.

Example trace #1:

 1           117_180 Bus Free
 2            10_560  Arb
 3            32_800  Arbwin 0
 4           104_800 +Select 0,5
 5             0_615 +Sel/Resel End
 6            22_030 +MsgOut C0 Identify
 7         1_538_600  CMD - Req Sense   
                     03 00 00 00 40 00
 8           373_765  DataIn (N)
                     F0 00 03 00 ....
                     02 00 00 0A ....
                     00 00 00 00 ....
                     31 00 00 46 1..F
                    (16 Bytes)
 9         1_281_815  DEnd  18 Bytes
10           164_250  Status 00 Good
11             9_925  MsgIn  00 Cmd Complete
12        19_729_780 Bus Free
Line 1 — Bus Free. This means that the bus is free and is available for use.

Line 2 — Arb. Arbitration is taking place. A SCSI device is asking for control of the bus so it can use it.

Line 3 — Arbwin 0. This shows that SCSI device 0 has won the arbitration and has control of the bus.

Line 4 — Select 0,5. SCSI Device 0 is selecting to talk with SCSI Device 5.

Line 5 — Sel/Resel End. Just shows the end of this phase.

Line 6 — MsgOut C0 Identify

Line 7 — CMD — Req Sense. CMD is used to mark each SCSI command. In this case it is the Request Sense command. The numbers below this line is the actual hex data sent with the Request Sense command.

Line 8 — DataIn. Since the previous command/phase requested data, the DataIn phase shows that data being passed back to the requesting device. The following lines are the hex data being passed back. See a SCSI book for exactly what each value means.

Line 9 — DEnd 18 Bytes. This shows that the DataIn phase had ended and that 18 bytes are returned.

Line 10 — Status 00 Good. This means that everything checks out so far.

Line 11 — MsgIn 00 Cmd Complete. This means that the command was successfully sent to the device and it successfully responded back.

Line 12 — Bus Free. Back to Bus Free to show that the bus is now available for other devices to use.

Disconnect and Reselection

Individual SCSI conversations between SCSI devices (say a controller and a tape drive) are very phase oriented. Not getting back a reply from a command can cause a SCSI Controller to reset the bus, bringing it and all attached devices back to a known state. But, the SCSI protocol does allow for more than one conversation to take place at about the same time. Since tape devices are relatively slow compared to disk drives, there are times when a SCSI conversation is “put on hold” until a tape device is finished doing what it needs to do. A good example is the rewind command.

Given the size of some tapes, rewinding from the end can easily take quite a few minutes. The Disconnect message is a way for one device to tell another device that it needs some time to complete a command. It is up to the device sending the Disconnect message to re-establish the conversation, via a Reselection.

Example #2:

1        295_215 Bus Free
2          2_440  Arb
3          1_360  Arbwin 0
4          0_930 +Select 0,3
5          0_555 +Sel/Resel End
6          1_950 +MsgOut C0 Identify
7        279_140  CMD - Wrt Filemrks
                  10 00 00 00 01 00
8          1_920  MsgIn  04 Disconnect
9    817_346_105 Bus Free
10          2_440  Arb
11          1_350  Arbwin 0
12          0_940 +Select 0,2
13          0_565 +Sel/Resel End
14          1_950 +MsgOut C0 Identify
15        425_900  CMD - Mode Sns(6)
                  1A 00 00 00 94 00
16          0_505  DataIn (W)
                  0B 85 10 08 ....
                  85 00 00 00 ....
                  00 00 00 00 ....
                 (12 Bytes)
17         76_545  DEnd  12 Bytes
18          1_350  Status 00 Good
19          0_910  MsgIn  00 Cmd Complete
20  1.576_622_755 Bus Free
21          2_330  Arb
22          1_260  Arbwin 3
23          0_950  Resel  0,3
24          0_535  Sel/Resel End
25         53_860  MsgIn  80 Identify
26         11_450  Status 00 Good
27          0_910  MsgIn  00 Cmd Complete
28 278_935 Bus Free
In this example, the controller starts a conversation with the tape device (SCSI ID 3) and sends the command “Wrt Filemrks”. Once the tape device receives the command, it replies with the message “Disconnect”.

The bus then goes free. Now the controller starts a conversation with another device (SCSI ID 2). It send the command “Mode Sns” (Mode Sense). Device ID 2 then responds to the command, and that conversation is complete.

After the bus is free, the tape device wins the arbitration and starts a conversation with the Controller (ID 0). It then does a “Resel 0,3”, which means a reselection of the previous conversation. The tape device replies with the message it needs to (Line 25) and now the whole conversation is complete (Line 27 Cmd Complete), and the bus goes free again.

If a device sends a Disconnect message and does not respond within certain period of time (6-20 min.), a timeout value can be reached, and the bus can be reset. A Bus Reset will cause all of the other devices on the same bus to reset, interrupting any data transfer. This is especially bad if tape devices share the same bus with disk drives. Keeping track of the start and finish of commands can show a command being dropped or show that all commands are finished properly.

Bus Reset and Disconnect

The previous example showed a Disconnect when all went well. In this example, not all goes well.

Example Trace #3:

1            2_580  Arb
2            1_230  Arbwin 0
3            1_230 +Select 0,3
4            1_230 +Sel/Resel End
5        3_272_100 +MsgOut C0 Identify
                      01 02 03 01
                    Wide Negotiation Width:16 bits
6           22_500  MsgIn  01 02 03 01
                    Wide Negotiation Width:16 bits
7        2_499_600 +MsgOut 01 03 01 0C 08
                    Sync Negotiation Pw:50ns,off:8
8        2_552_910  MsgIn  01 03 01 19 08
                    Sync Negotiation Pw:100ns,off:8
9          457_410  CMD - Stop/Start  
                      1B 00 00 00 00 00
10            2_970  MsgIn  04 Disconnect
11  382.331_911_740 Bus Free
12 24_750 Bus Reset
In this example, the device SCSI ID 3 received a command “Stop/Start” then sent the message Disconnect and did not reply to the command in 382 seconds, or 6.5 minutes. It looks like either the Controller or an application has a timeout of 6.5 minutes and reset the SCSI bus. This is fine, unless the operation that the device was asked to perform would take longer than 6.5 minutes. A tape rewind would be an operation that could take a long time to finish. In the case of a system resetting the SCSI bus before a rewind command can finish, the proper thing to do is to change the timeout on the system.

Bus Reset

Sometimes a SCSI bus trace can show that the problem is not really based on the commands going across the bus, but lies elsewhere.

Example Trace #4

1   27.763_462_465 Bus Free
2           2_330  Arb
3           1_260  Arbwin 3
4           0_960  Resel  0,3
5           0_525  Sel/Resel End
6          53_850  MsgIn  80 Identify
7          11_450  Status 00 Good
8           0_920  MsgIn  00 Cmd Complete
9  479.373_184_515 Bus Free
10          24_990 Bus Reset
11 0_000 Bus Free
In this trace, everything seems to look fine. We can see device 3 winning the arbitration, then selecting to talk with device 0 (the SCSI Controller). A message is then sent to the controller, the data is received just fine (Status 00 Good), the command is completed, and the bus goes free.

Now it looks like the bus is reset. A bus reset can occur for a variety of reasons, but usually because some timeout value has been reached. In this example trace, there is a corresponding message in the system messages file (/var/adm/SYSLOG or /var/adm/messages):

Dec 2 17:26:46 6A:<host> unix: ql2d6:  \
   <6>SCSI command timeout: 1 commands:
<6>0x1a <6> Dec 2 17:26:46 6A:<host>  \
   unix: ql2: <6>Resetting SCSI chip.
To find what is going on, one approach is to look over the whole SCSI bus trace and see whether there are any outstanding commands. In other words, look for any CMD lines with no corresponding “Cmd Complete” lines. In this example, there were no outstanding commands in the trace, so this provided evidence that the root cause was not at the SCSI bus level, but was higher up on the host system. In this case, it turned out to be a firmware bug with the SCSI controller card.

Check Condition

Some SCSI drivers may not pass all errors that they get to the higher operating system level and act on them silently. A SCSI bus trace can show exactly what the errors are and which command they were in response to.

Example Trace #5:

1        684_120 Bus Free
2          2_400  Arb
3          1_410  Arbwin 0
4          0_960 +Select 0,5
5          0_570 +Sel/Resel End
6          1_950 +MsgOut C0 Identify
7        330_720  CMD - Tst Unit Rdy
                 00 00 00 00 00 00
8          1_560  Status 02 Check Condition
9          1_020  MsgIn  00 Cmd Complete
10         33_060 Bus Free
11          2_370  Arb
12          1_410  Arbwin 0
13         59_070 +Select 0,5
14          0_570 +Sel/Resel End
15        430_740 +MsgOut 80 Identify
                  01 03 01 32 08
16        105_210  MsgIn  01 03 01 32 08
17        600_630  CMD - Req Sense   
                  03 00 00 00 20 00
18         43_000  DataIn
                  0h: 70 00 00 00 02 00 00 00 p.......
                  8h: 00 00 00 00 00 00 15 00 ........
                  10h: 00 00 00 00 00 00 00 00 ........
                  18h: 04 00 01 00 00 00 00 00 ........
                  20h: 00 00 00 00 86 00 02 00 ........
                  28h: CE 00 00 00 00 00 27 00 ......’.
                  30h: 30 00 00 00 00 00 00 00 0.......
                      (58 Bytes)
19         31_700  DEnd  58 Bytes
20          1_410  Status 00 Good
21          1_020  MsgIn  00 Cmd Complete
22  2.018_814_360 Bus Free
23          2_370  Arb
24          1_410  Arbwin 0
25          0_960 +Select 0,5
26          0_600 +Sel/Resel End
27          1_950 +MsgOut C0 Identify
28        304_200  CMD - Tst Unit Rdy
                  00 00 00 00 00 00
29          1_620  Status 02 Check Condition
30          0_990  MsgIn  00 Cmd Complete
31         33_030 Bus Free
In this example, the command Test Unit Ready (Tst Unit Rdy) was sent to the tape drive. Notice that the “Status” line says “Check Condition” instead of the usual “Good”. This means that there was some error in either the sent command or in the execution of the command. Check Condition is the response to the Test Unit Ready command, but it does not specify exactly what the error is. A Request Sense (Req Sense) command must be sent to the tape drive and ask it what the error was.

The Next Level

The next level of analysis, which is beyond the scope of this article, is to analyze the data sent in commands and the data sent in the response. Each command and its response has a defined byte sequence, and each bit or byte has a meaning. By referencing the data returned from a command, especially the return from a Request Sense command, to its definition, the exact meaning of the reply can be determined. At the level mentioned in this article, we can only tell that a Check Condition happened and that a Request Sense command what sent. By digging deeper, the exact error can be derived from the reply bytes of the Request Sense command.

Sources of Additional Information

Besides the sources below, tape drive vendors have some fairly detailed books that cover their tape drives and the specific SCSI commands that they support. Some of these documents cover the SCSI commands in great detail.

Web Pages:


The Book of SCSI: A Guide for Adventurers by Peter M. Ridge and David Deming. 1995. No Starch Press.

The SCSI Bus and IDE Interface by Friedhelm Schmidt. 1997. Addison-Wesley.

Programmer’s Guide to SCSI by Brian Sawert. 1998. Addison-Wesley.

About the Author

Timothy Swenson is a UNIX contractor with Taos Mountain, in Silicon Valley. After receiving a B.S. in Computer Science from Cal State Hayward, Tim spent 8 years as an Air Force Officer, doing sys admin and acquisition work. Tim is also the editor of the QL Hacker’s Journal and writes for other Sinclair publications. He can be reached at