IN-62 Hardware Problems of the DR11-W device Vicky White Peter Heinicke Fermilab, P.O. Box 500, Batavia, IL 60510 In order to use the DR11-W as a link device a protocol has been developed. This protocol consists of a series of transactions between software on either end of a link. Each end of the link can interrupt the other machine by 'toggling' the FNCT2 bit of its CSR, which causes the ATTN bit in the CSR on the other machine to come on. The setting of the ATTN bit will cause the ERROR bit to be set, and if the IE bit is also set at this time, the DR11-W will interrupt. This protocol is used Hardware Problems of the DR11-W device Page 2 to transfer data blocks and one byte signals across the link. By this means transactions between the ends of the link take place. One end puts a code word into its output data register and then interrupts the other machine, which will read the code word. The protocol is implemented in the form of a driver under both RSX-11M and RT-11. During testing of this software we came across a strange problem. From time to time, under some timing conditions, the DR11-W became interrupt disabled. Although the IE bit in the hardware was set, the raising of the ATTN bit failed to cause the ERROR bit to be set and so failed to cause the device to interrupt. Once this situation occurred we found that the only way to re-gain the interrupt capability was to either do a bus reset or to reset the device, by setting the maintainence mode bit and clearing it again. This latter type of reset itself led us to discover a further unpleasant feature of the device, which we will describe below. 1.0 LOSING INTERRUPT CAPABILITY We investigated the problem in some depth. Such features as reading out the error register (by writing the error bit of the CSR) and then resetting the CSR, reading back the last data word read in on a DMA were investigated. In fact none of these seemed to be related to the problem. We believe that the problem can be demonstrated by the Hardware Problems of the DR11-W device Page 3 following test - either as a privileged task under RSX-11M, or as a standalone test. A task connects to the DR11W interrupt vector and supplies an interrupt handler routine. In the interrupt routine it immediately resets the CSR of the device - re-enabling the interrupt, by resetting with octal 100. The interrupt routine now delays for a few hundred instructions before exiting from the interrupt. The main loop of the program can loop reading the CSR of the device and perhaps writing it out to a display register. If desired it may check for the error condition, which shows itself as a persistent condition where the ATTN bit is set, but the ERROR bit is not. This can also be seen from the display register if used. A program on the partner machine should repeatedly toggle the FNCT2 bit in its CSR in order to attempt to repeatedly interrupt the other machine. Some variable time delay between attempted interrupts can be introduced. By varying both the delays between repeated attempts to interrupt on one machine, and the amount of time spent in the interrupt routine on the other machine (at device priority), the symptoms of the problem can be produced. We therefore believe that this problem is not connected with the process of using the device to do DMA's but rather with the timing of the interrupt handling versus the interrupt production. Hardware Problems of the DR11-W device Page 4 We have modified our software to whenever possible avoid the situation of having a machine interrupt its partner machine very soon after the partner machines DR11-W is already "interrupting". 2.0 RESET PROBLEM Since we could not be sure to avoid the above problem we decided to also use the maintainence mode bit reset mechanism whenever we had to restart our protocol on the link. In doing this we discovered the following feature. If the IE bit on the device is set immediately after or at the same time as clearing the MAINT bit then the device does not remain interrupt enabled. The IE bit 'fades' away. A delay of at least 5 instructions must occur after resetting the MAINT bit, before attempting to set the IE bit. This feature can be easily demonstrated on a single machine with a test program which sets the MAINT mode bit, clears the CSR then waits for some number of instructions before setting the IE bit, then reads back the CSR immediately and after some long time delay.