Jumping back on board the digital train - part II
When I began writing real-time code for the 68k, the price of entry into the development game was rather high. My first project was a tachometer processor which my boss was applying on a patent for: US patent 4924420. In order to develop this we purchased a hardware emulator which became my pride and joy - it cost a sum equivalent to around two months of my salary at that time. This was a considerably more expensive solution than the other commonly used development technique in those days - EPROM emulators - it did though provide a much faster development path by virtue of providing a window into the interior of the CPU as well as a history in its trace buffer of everything it had done. During the course of this and subsequent projects which also used the 68k (later we added a 68020 too) I became a confirmed devotee of the architecture. One of my nicknames in the company was 'the cycle stealer' - if someone had some 68k code that wasn't running fast enough, I'd find some way to get it going faster.
In subsequent jobs, I never quite got 'down to the metal' again with any architecture in the same manner though I did program some PICs. Couldn't say as I loved the instruction set of any of those though with the passion I found for the 68k.
All this preamble is designed to set the context for taking up embedded system development again, this time as a hobbyist after over ten years without a single assembler byte being written. I've been seeking ways to regain those heady 68k days once again with a processor family that has a future. Given the huge overhead of learning a new architecture and tools environment I wanted to be totally sure I was making a considered decision, so I've taken a lot of time and research in coming to my conclusion.
The first thing I noticed is that assembler code is really out of fashion nowadays. Almost everyone wants to program in C and CPUs are being designed to be 'C compiler friendly' more than ever before. If you ask around you'll hear all the usual arguments against using assembler - lack of code readability, ease of making mistakes, no portability. 'Modern C compilers are so efficient that nobody codes in assembler any more' is what I've heard, time and time again. I'm not going to get into that debate here - just I have a preference for assembler myself, as a hardware engineer. I might even use some C for the bits that'll be painful to write in assembler - like anything user-interface like. Those aren't the fun parts for me though, guess I can't shake off my 'cycle stealing' heritage. C code is ultimately more wasteful of resources than assembler, it fails to grab me in the elegance stakes. Ultimately, if anyone's concerned about wringing the most out of a system, they're going to be fighting the compiler as part of that process.
The more I've thought about it, the more the arguments for using 'C' sound like arguments from 'suits' who take the fun out of things by trying to 'professionalize' methodologies. If respectable programmers use C then I for one will be disrespectable. I take some solace from the story that the original Mac was by and large hand-coded in assembler even though much of the code was lifted from Pascal used for the earlier (and unsuccessful) Lisa. Apparently they gained a factor of 50% code size reduction and I'm sure there was also a commensurable speed up.

In subsequent jobs, I never quite got 'down to the metal' again with any architecture in the same manner though I did program some PICs. Couldn't say as I loved the instruction set of any of those though with the passion I found for the 68k.
All this preamble is designed to set the context for taking up embedded system development again, this time as a hobbyist after over ten years without a single assembler byte being written. I've been seeking ways to regain those heady 68k days once again with a processor family that has a future. Given the huge overhead of learning a new architecture and tools environment I wanted to be totally sure I was making a considered decision, so I've taken a lot of time and research in coming to my conclusion.
The first thing I noticed is that assembler code is really out of fashion nowadays. Almost everyone wants to program in C and CPUs are being designed to be 'C compiler friendly' more than ever before. If you ask around you'll hear all the usual arguments against using assembler - lack of code readability, ease of making mistakes, no portability. 'Modern C compilers are so efficient that nobody codes in assembler any more' is what I've heard, time and time again. I'm not going to get into that debate here - just I have a preference for assembler myself, as a hardware engineer. I might even use some C for the bits that'll be painful to write in assembler - like anything user-interface like. Those aren't the fun parts for me though, guess I can't shake off my 'cycle stealing' heritage. C code is ultimately more wasteful of resources than assembler, it fails to grab me in the elegance stakes. Ultimately, if anyone's concerned about wringing the most out of a system, they're going to be fighting the compiler as part of that process.
The more I've thought about it, the more the arguments for using 'C' sound like arguments from 'suits' who take the fun out of things by trying to 'professionalize' methodologies. If respectable programmers use C then I for one will be disrespectable. I take some solace from the story that the original Mac was by and large hand-coded in assembler even though much of the code was lifted from Pascal used for the earlier (and unsuccessful) Lisa. Apparently they gained a factor of 50% code size reduction and I'm sure there was also a commensurable speed up.
Total Comments 6
Comments
-
Great exposition of your past adventures in CPU programming - can't wait to hear where you're going with this & your journey getting there. Keep it coming.
Posted 30th April 2011 at 11:57 AM by jkeny -
Hehehe It takes me back to my Amiga programming days, I was slowly re-writing all of the standard OS commands in assembler. Typically programs that were say 12KB would come down to 1KB or less. which when you were running off 800K floppy disks really made a difference.
I'll attach one of my pieces of code for nostalgia purposes
Note that this is a rather obscure bit of code. I had an Amiga 1000 that was upgraded with a phoenix board (a replacement motherboard for the original A1000. One of the things it had was a coprocessor socket for a 68881. The 68000 did not have the necessary control logic built in to be able to write "normal 68881 assembler". The amiga's math library however could use a 68881 co-processor if one was present. It did work, and it did make my application twice as fast but I wasn't happy with that, so I disasembled the math library to see what it was doing.
Basically it simply stuffed the registers, put something in a place that started the coprocessor, checked whether it had finished and then cleared the registers.... it did this for EVERY calculation. This was insane as the coprocessor had a number of registers, so I wrote the following code to control the coprocessor efficiently and as a result my program went from twice as fast with the co-processor to 7 times as fast with the co-processor
Anyway It's been too long since I did 68K so I'm having some trouble comprehending what this code did (even though I wrote it AND commented it). I'll leave it to you to see if you can figure out what it was calculating
SECTION Iterate_Section,code
XREF _MathIeeeDoubBasBase
XDEF @resetfpu
XDEF @iterate
K equ 199
response equ $00ee0000
command equ $00ee000a
control equ $00ee0002
operand equ $00ee0010
restore equ $00ee0006
fadd0t0 equ $0022 ;multiply fp0 by two
fadd5t0 equ $1422 ;add fp6 to fp0
fadd6t0 equ $1822 ;add fp6 to fp0
fadd7t0 equ $1c22 ;add fp7 to fp0
fsub1f0 equ $0428 ;sub fp1 from fp0
fsub5f0 equ $1428 ;sub fp5 from fp0
fmul0b3 equ $0c23 ;multiply fp0 by fp3
fmul4b4 equ $1223 ;square value in fp4
fmul5b5 equ $16a3 ;square value in fp5
fcmp1t0 equ $0438 ;compare fp1 to fp0;
fmsrtdn equ $c800 ;move the fpu sr to a data reg
fm0t2 equ $0100
fm0t3 equ $0180
fm0t4 equ $0200
fm0t5 equ $0280
fm2t0 equ $0800
fm2t4 equ $0a00
fm3t5 equ $0e80
fm4t0 equ $1000
fmdt6 equ $5700
fmdt7 equ $5780
fmzt2 equ $5d0f ;move rom zero from rom to fpn
fmzt3 equ $5d8f
fmzt4 equ $5e0f
fmzt5 equ $5e8f
fm100t1 equ $5cb4 ;move rom 100 to fp1
fmlf0 equ $6000
;**********************************
jsrlib MACRO
XREF _LVO\1
jsr _LVO\1(A6)
ENDM
;**********************************
testresp MACRO
testresp\@ tst.w response
bmi.s testresp\@
ENDM
;**********************************
fpn MACRO
move.w #\1,command
testresp
ENDM
;**********************************
fpo MACRO
move.w #\1,command
jsr chkfpu
ENDM
;**********************************
;_iterate move.l 4(sp),a0 ;get pointer to first arg;
; move.l 8(sp),a1 ;get pointer to second arg;
; movem.l d6,-(sp) ;save all our registers
@resetfpu move.w #0,restore ;initialise the coprocessor
move.w #0,restore ;by restoring a null state frame
rts
@iterate fpo fmdt6
move.l (a0)+,operand ;move first part of X coord
move.l (a0),operand ;move second part of X coord
; testresp
fpo fmdt7
move.l (a1)+,operand ;move first part of Y coord
move.l (a1),operand ;move second part of Y coord
testresp
fpn fmzt2 ;put a zero in fp2 for a
fpn fmzt3 ;put a zero in fp3 for b
fpn fmzt4 ;put a zero in fp4 for a1
fpn fmzt5 ;put a zero in fp5 for b1
move.w #K,d1 ;want to iterate k + 1 times
ILoop1 fpn fm2t0 ;put a val in fp0
fpn fadd0t0 ;multiply it by two
fpn fmul0b3 ;multiply it by b
fpn fadd7t0 ;add the y value
fpn fm0t3 ;save the new b value
fpn fm4t0 ;put a1 value in fp0
fpn fsub5f0 ;subtract b1 from it
fpn fadd6t0 ;add the x value
fpn fm0t2 ;save the new a value
fpn fm2t4 ;a1 = a * a so move a to a1
fpn fmul4b4 ;and square it
fpn fm3t5 ;b1 = b * b so move b to b1
fpn fmul5b5 ;and square it
fpn fm4t0 ;want a1 in fp0
fpn fadd5t0 ;add b1 to it
fpn fm100t1 ;put our K value in fp1
fpn fsub1f0 ;see if K is still bigger
fpo fmlf0 ;move the status register to
move.l operand,d0 ;somewhere we can read it
; testresp
tst.l d0
dbpl d1,ILoop1
end_iterate move.w #K,d0
sub.w d1,d0
; movem.l (sp)+,d6
rts
chkfpu move.w response,d0
btst #12,d0
beq.s chkfpu1
btst #11,d0
beq.s chkfpu_exit
chkfpu1 tst.w d0
bmi.s chkfpu2
cmp.w #4900,d0
beq.s chkfpu
cmp.w #0900,d0
ble.s chkfpu
move.w #2,control
; moveq #1,d0
bra.s chkfpu_exit
chkfpu2 cmp.w #$8900,d0
beq.s chkfpu
cmp.w #$c900,d0
beq.s chkfpu
;chkfpu3 moveq #0,d0
chkfpu_exit rts
END
Tony.Posted 6th May 2011 at 12:24 PM by wintermute -
and I think that this was the same code when I got an amiga 3000 with 68882 co-processor that I could actually use the assembler to code for! somewhat simpler
section text,code
XREF _MathIeeeDoubBasBase
XDEF @iterate
XDEF @resetfpu
K equ 255
M equ 10
;@iterate: move.l 4(sp),a0 ;get pointer to first arg;
; move.l 8(sp),a1 ;get pointer to second arg;
; movem.l d2-d6/a3-a6,-(sp) ;save all our registers
@iterate: fmovem.x fp0-fp7,-(sp)
fmove.d (a0),fp6
fmove.d (a1),fp7
fmovecr.x #15,fp2 ;put a zero in fp2 for a
fmovecr.x #15,fp3 ;put a zero in fp3 for b
fmovecr.x #15,fp4 ;put a zero in fp4 for a1
fmovecr.x #15,fp5 ;put a zero in fp5 for b1
fmove.l #M,fp1
move.w #K,d1 ;want to iterate k + 1 times
ILoop1: fsub.x fp5,fp4 ;subtract b1 from it
fmul.x fp2,fp3 ;multiply it by b
fadd.x fp6,fp4 ;add the x value
fadd.x fp7,fp3 ;add the y value
fmove.x fp4,fp2 ;a1 = a * a so move a to a1
fmul.x fp4,fp4 ;and square it
fmove.x fp3,fp5 ;b1 = b * b so move b to b1
fmul.x fp5,fp5 ;and square it
fmove.x fp4,fp0 ;want a1 in fp0
fadd.x fp2,fp2 ;multiply by 2
fadd.x fp5,fp0 ;add b1 to it
fcmp.x fp1,fp0 ;see if K is still bigger
fdboge d1,ILoop1
end_iterate: move.w #K,d0
tst.w d1
bpl skip
moveq #0,d1
skip: sub.w d1,d0
; movem.l (sp)+,d2-d6/a3-a6
fmovem.x (sp)+,fp0-fp7
rts
@resetfpu: rts
END
Tony.Posted 6th May 2011 at 01:43 PM by wintermute -
So I wonder how they were doing the hardware interface from the 68k to the 68881/2 ? I remember those fmove instructions were called 'F-line' and if there was no co-pro present they generated an undefined instruction exception which allowed for software emulation. Think the 68881 could manage somewhat less than half a mega-flop.... Yet the 68k took something like 4.2uS to do a 16X16 multiply and it wasn't a fixed time either, depended on the bit patterns in the operands...
Posted 8th May 2011 at 02:30 PM by abraxalito -
Now it is coming back to me, the 68000 didn't have any commands to access the co-processors registers, or to tell it do do anything. I think the fmove instruction was introduced in the 68020 and up. I could go and dig up my 68K assembler book
Oh and I know what the code did, just a bit rusty on the actual instructions.... It was used to create my avatar...
Tony.Posted 8th May 2011 at 09:20 PM by wintermute
Updated 8th May 2011 at 09:22 PM by wintermute -
Posted 9th May 2011 at 12:05 AM by abraxalito