Dbug's blog - Mind Bender

2020

Sat 15th August 2020

Oric minigames Procedural image Changelog Day One (2020-05-28)Day Two (2020-05-29)Day Three (2020-05-30)Implementation notes

Size coding is an interesting challenge: After spending a day at work using a computer that counts memory in gigabytes and storage in terabytes, it's refreshing to go back to basics.

Back in the days, people HAD to write compact and efficient code, simply because the hardware would not allow for anything else.

These days it's more like an intellectual puzzle game, which I find deeply satisfying.

Mind Bender, my entry to Sommarhack 2020's 256 bytes intro competition is not my first attempt at size coding.

Oric minigames

If you've followed my productions over time you probably already know three of my Oric minigames:

4K Kong

4K Kong
Cyclotron
3K FreeCell

The three games all have in common the fact that they use quite a lot more memory than their claimed size:

They are all stored on disk as compressed executables, and decompressed in memory before they run.

Additionally, they are all written for the 6502 CPU, which may not be very advanced in term of supported operations, but has the benefit of using 8 bit opcodes, resulting in quite compact code compared to the 68000 cpu which supports a rich instruction set and quite advanced addressing modes, but requires 16 bit opcodes for each instruction.

Cyclotron

The main issue in using a compressor, is that the decompressor itself takes room, and unfortunately on most platform this is not supported by the operating system, so the decompressor has to be part of your executable - uncompressed -.

If your target size is quite large (all things considered), then you can definitely pay the cost of having the depacker, but when targeting very small sizes, the trade off become very hard to justify.

If you have to drop the idea of using compressed executables, then the set of tricks you can use changes drastically!

3K FreeCell

A common source of size executable is loop unrolling: To offset the cpu cost of testing "are we there yet", it is common to do more than one operation on each loop, so for example instead of testing the end of the buffer to recopy every byte, you can do it every two bytes, 4 bytes, 8 bytes, or even more if you really want to maximize the performance.

Obviously this "copy pasted"¹ code takes room, but if your executable is compressed, things like sequences of "move.l (a0)+,(a1)+" will be encoded in just a few bytes.

When using compressed executables, size coding become a game of "making patterns that can be repeated efficiently".

Procedural image

The "Mind Bender" idea started when Wietze contacted me on IRC to ask me if I would participated with a remote entry for Sommarhack 2020.

The main topic was "medium resolution", which was kind of inspiring to me since I've done trickery in both Low and High resolution, but not quite anything in Med rez, so I said yes.

Then some time later there started to be people playing with bootsectors, like in the good old days, and somebody suggested that if there was enough entries then they could add a 256 bytes intro category to Sommarhack... and since I was starting to get a bit short on time, I decided to try that first, and if I still had time, then eventually the Med Rez challenge.

Then at about the same time, Patapom shared this picture on Facebook, with the following comment

"TIEEEEEEENS ! CERVEAU DE MERDE !!!"
(Take that, shitty brain!!!)

Patapom inspiration

It's one of these typical pictures that seems to move despite being totally static, and I wondered if it was possible to redo in 256 bytes, because after all it's just a chessboard with an alternate set of colors and some small flowery patterns at the intersection between tiles.

How hard can that be?

Changelog

Looking at the change log, it actually took about three days.

I basically followed by usual process, which starts by trying to get something running at all, even if it's ugly, slow, too big, etc... It's pointless to spend time optimizing at the start if you don't even have a rough idea of what a basic implementation would look like!

Then, when you have something running, look at it in detail (in this case, by tracing the code line by line in the debugger) to get a better idea of what is actually taking room or could be differently.

So, let's see how the process day by day

Day One (2020-05-28)

1207 bytes - first working version of the effect with the visible distortion
1051 bytes - removed setup and debug infos
930 bytes - can't quit anymore, no restore, no keypress
482 bytes - not sure what I changed...
478 bytes - replaced supexec by super
463 bytes - not clearing IMRA and IMRB anymore
331 bytes - Replaced the nops by a dbra
325 bytes - Removed one useless color change
296 bytes - Replaced the color mapping table by a rotated bitfield

So, the first implementation took 1207 bytes, which is quite far from the 256 bytes target!

Here is what it looks like:

First prototype

It's not particularly awesome, the black bars are horrible, the dots are truncated and not centered, but the visual effect kind of work, so there's still some hope :D

Looking at the code history, the drop from 1207 bytes to 1051 bytes was caused by removing some of the uncessary system setup and restore code which I reused from a previous intro, as well as disabling debug information (symbols) from the final executable file format.

To drop from 1051 to 482 I introduced conditional assembly to switch between "development" and "release" modes: In the final version it was not necessary to be able to properly handle the keyboard and cleanly return to the operating system, so in "final release" mode I simply skipped all this part!

Simple gain really :)

The drop to 478 bytes was calling an alternative function of the operating system to switch to Supervisor mode²which did the same thing in less bytes.


 ifne development_mode
	 ; Development mode: We call the main routine in supervisor mode 
	 ; and when we come back, we return to the caller 
	 move.l #super_main,-(sp)
	 move.w #$26,-(sp)         ; XBIOS: SUPEXEC
	 trap #14
	 addq.w #6,sp

	 clr.w -(sp)               ; GEMDOS: PTERM(0)
	 trap #1
 else
	 ; Final Release: We switch to supervisor, but never come back 
	 ; so no need to correct the stack pointer
	 clr.l -(sp)
	 move.w #$20,-(sp)  ; GEMDOS: SUPER
	 trap #1
 endc

To achieve 463 bytes I stopped setting the values of the MFP IMRA and IMRB registers (since I was not relying on any timer).

To go down to 331 bytes, I replaced a sequences of nops by a timed loop of equivalent duration:


 rept 67
 nop
 endr


 moveq #22-1,d0
delay
 dbra d0,delay

And finally to go down to 296 bytes, I replaced a small table of bytes used to define the alternate color patterns of dots from


 SECTION DATA
	
colors
 dc.b 0,1,1,0,1,0,0,1,0,1,1,0,1,0,0,1
 dc.b 0,1,1,0,1,0,0,1,0,1,1,0,1,0,0,1
 (...)
 lea colors,a3
 add d1,a3
 (...)
 move.b (a3)+,d2


 move.l #%0110100101101001,d3

 move.l d3,d4
 lsr.l #1,d3
 (...)
 lsr.l #1,d4
 addx d2,d2

This basically achieve the same result of having a list of values that change³.

Day Two (2020-05-29)

284 bytes - Replaced the blocks 2 and 5 nops by single instructions taking the same time
282 bytes - Replaced a move.b #0 by a sf
265 bytes - Changed the generation of the bitmap pattern
260 bytes - Replaced a few move.w #0, by clr.w
258 bytes - Accessed $ff8240 through an address register
256 bytes - Played with the initial temporisation code

To go down from 296 to 284 bytes, I basically replaced the remaining blocks of nops by neutral instructions that took the same amount of time to execute but using less room in memory, such as


 lsl.l #6,d3

which takes 20 clock cycles so is the equivalent of


 nop
 nop
 nop
 nop
 nop

where each of the 5 nop take 4 cycle to execute.

Obviously you better have to be sure that d3 can be modified at this place in the code!

Regarding the move.b #0 and move.w #0, each of these takes 4 bytes: 2 bytes for the move instruction opcode, and 2 bytes for the "0" immediate payload, which neither SF (Set False) or CLR require - easy gains.

At this point, we are down to 256 bytes (from the original 1207) and it still look exactly the same.

256 bytes version

So the big question is: Now that we have something that technically fits the constraints and could be sent to the Sommarhack people, is it possible to reduce even more the size so we can make it prettier?

Let's try!

Day Three (2020-05-30)

244 bytes - Used the default screen instead of setting up mine
240 bytes - Changed the pattern generation to use bitplans offsets
238 bytes - Optimize the color setting for the bitmaps
254 bytes - Added a text and some fancy colors on the top
252 bytes - Reduced a constant value from .l to .w
250 bytes - Removed a sf $ffff8209.w which I did not need anymore
240 bytes - Reorganized the post video sync delay loop

The big win was achieve on day 3, when I decided to use the screen information setup in the system instead of defining my own screen.

A typical screen setup would be something either involving a call to a XBIOS function, or manual setup of screen registers:


 ; Set the screen on the buffer
 move.l #screen,d0
 sf.b d0
 move.l d0,a0
 lsr.l #8,d0
 move.b d0,$ffff8203.w
 lsr.l #8,d0
 move.b d0,$ffff8201.w

 (...)

 SECTION BSS

save_screen_addr_1 	ds.b 1
save_screen_addr_2 	ds.b 1
save_vbaselo 		ds.b 1

 even 

screen              ds.l 32000/4

while in practice we can just read the current screen address in page 4 system's variables:


 move.l $44e.w,a0            ; screenpt

The main issue, is that you inherit of the screen as it was, which means in our case, a big ugly green image with a title bar with the name of our executable!

So the question become: How can I efficiently erase the screen?
Obviously we can just use brute force and some code to loop around to erase the 32000 bytes we found by reading $44e, but I decided to go for something a little bit fancy, and instead use the print function of the Gemdos to display a text!


 ;
 ; Print the Dbug title
 ;
 pea Message(pc)
 move.w #9,-(sp)
 trap #1

 (...)

Message
 dc.b 27,"b",3      ; Set 1 as Ink color
 dc.b 27,"E"        ; Erase Screen
 dc.b "Dbug"
 dc.b 0

In case you did not know, the Atari ST system print function supports VT52 control codes, which means a single print out can do things like changing the color of the text, of the background, enable inverted video, move the cursor, etc... so I used that to select the color of my "Dbug" text while also deleting the entire image, just in a few additional bytes.

The rest of the optimisation was just some small things, like replacing the setting of two colors from two instructions to one, from


 move.w #$707,$ffff8242.w    ; The Purple crosses
 move.w #$777,$ffff8246.w    ; The White crosses


 move.l #$07770707,$ffff8242.w  ; The White and Purple crosses

And the rest is some small tweaks here and there, like playing with the color registers during the display to generate some gradient colors which are more satisfying to the eye.

Anyway, here is the final result:

The image generated by my 256 bytes program

Implementation notes

If you don't know the Atari ST, it may not be obvious what I did.

The image generated by my program appears to occupy the entire screen, including the borders, which normally cannot contain any graphics and only show the content of the first entry of the color palette (register 0 at address $ffff8240).

The normal display area is 320x200, which is much smaller than the actual viewable area - in this case 384x270 but it's more or less variable depending of the machines and screens -.

You can visualize the two areas on this annotated screenshot:

Image layout in detail

If you look at the inner part of the image, you will notice that the "Dbug @ Sommarhack 2020" text, as well as all the small white and purple patterns are inside the 320x200 rectangle: The reason is that I'm not doing any of the usual "fullscreen/overscan" trickery normally used to coerce and Atari ST in displaying actual bitmap in the borders.

The reason why you can see the green chessboard tiles over all the screen, is that it's not made of bitmap, it's only done by changing the background color register during the entire display:

change color 0 to dark green
wait
wait
wait
change color 0 to light green
wait
wait
wait
rince and repeat

if you do that often enough, and using the right timings, you can easily do vertical bands of alternative colors, and if every "n" lines, you add a few more additional delays, you can shift your entire pattern of column by one, resulting in the chessboard pattern you are seeing.

The nice thing about the Atari ST, is that the timings of the screen and instructions is 100% reliable and predictable, which makes such a code relatively easy to do, so here are some numbers for you:

The CPU runs at 8mhz (8 millions cycles per second)
At 50hz (screen refresh) we have 160256 cycles per frame (roughly 8mz/50hz)
These 160256 cycles are spread of 313 scanlines, with 512 cycles per scanline

When you know these numbers, and when you know how long each instruction take, it's quite easy to synchronize the color changes with the display and give the impression that things happen in a certain way :)

Anyway, here is the complete source code of what is actually assembled in the final release⁴.

I hope that was interesting for you, if you have questions don't hesitate to ask :)



 opt o+,w+

 SECTION TEXT
  
; ------------------
;   Program start
; ------------------
ProgStart 
 clr.l -(sp)
 move.w #$20,-(sp)  ; GEMDOS: SUPER
 trap #1
	
super_main
 lea $ffff8240.w,a6
 move.l #$07770727,$ffff8242.w  ; The White and Purple crosses

 ; Print the Dbug title
 pea Message(pc)
 move.w #9,-(sp)
 trap #1

 ; Draw the cross patterns
 move.l $44e.w,a0            ; screenpt
 lea 160*5(a0),a0 

 move.w #%0110100101101001,d3

 moveq #7-1,d1
loop_y 
 move.l a0,a1

 move.l d3,d4
 lsr.l #1,d3

 moveq #10-1,d0
loop_x 
 move.w d4,d2
 lsr.l #1,d4
 and #2,d2

 lea 0(a1,d2.w),a2

 lea Patterns(pc),a3
 moveq #10-1,d5
loop_pattern 
 move.w (a3)+,(a2)
 lea 160(a2),a2
 dbra d5,loop_pattern

 lea 16(a1),a1

 dbra d0,loop_x 
 lea 160*30(a0),a0
 dbra d1,loop_y

 ; VSync and disable interrupts
 move.w #37,-(sp)       ; VSYNC
 trap #14

 ; Main loop
 move.w #$2700,sr
loop
 move.w #$345,(a6)      ; Light blue-gray background
 
 move.w #$370,d6        ; Color of first tile
 move.w #$263,d7        ; Color of second tile
 move.w #$fff,8(a6)


 ; Line zero synchronisation
 moveq #14,d2
.wait_sync
 move.b $ffff8209.w,d0
 beq.s .wait_sync
 sub.b d0,d2
 lsl.b d2,d0

 ; To leave room for the "Dbug" line
 ; 64 nops*8 = 
 move.w #157-1,d0
delay
 subq.w #1,8(a6)
 dbra d0,delay

 ; The alternated color grid
 moveq #10-1,d1 
loop_lines

 move.w #242-1,d0
loop_squares
 move.w d6,(a6)                ; 8/2   background color change

 lsl.l #8,d3        		   ; 24/6 (DELAY)

 move.w d7,(a6)                ; 8/2    background color change

 move.w (a6),(a6)              ; (DELAY)

 dbra d0,loop_squares          ; 12/3 if branches / 16/4 if not taken

 addq #1,d6
 nop

 dbra d1,loop_lines 
 bra.s loop

Patterns
 dc.w %0000000001100000
 dc.w %0000000001100000
 dc.w %0000000011110000
 dc.w %0000001111111100
 dc.w %0000111111111111
 dc.w %0000111111111111
 dc.w %0000001111111100
 dc.w %0000000011110000
 dc.w %0000000001100000
 dc.w %0000000001100000

Message
 dc.b 27,"b",4      ; Set 4 as Ink color
 dc.b 27,"E"        ; Erase Screen
 dc.b "Dbug @ Sommarhack 2020"
 dc.b 0
 
 end

1. In practice we tend to use macro functionalities in our assemblers, like REPT-ENDR in Devpac↩
2. On the Atari ST, you are not allowed to access hardware registers and system variables as a simple user, you need to escalate to supervisor mode↩
3. I originally choose to have a table because I was not certain I only have two values there↩
4. I removed the system saving/restoring code but you can see it on the SVN repository ↩