The CarebearsThe 3D Doc demoAtari ST demos 101Tools of the tradeEasy RiderManaging expectationsCode: StartupCode: InitializationCode: MusicCode: SpritesCode: ScrollingCode: RastersCode: RastersThe ResultThanks to a correct alignment of planets I ended up with a four days long weekend.
I had plenty of things to do, but I decided instead to spend the time looking at how one of my favorite Atari demo screens was done.
The CarebearsIf people had to pick one single demo to choose to represent the early Atari ST demos many would go with the Cuddly Demos by The Carebears (TCB) from The Union.
A game inspired main menu where you can select individual demoscreens, animated loader and decruncher, and finally a reset demo.
Technically this demo had also many people scratch their head wondering how some of the effects were made; not everybody knew of fullscreen and sync scrolling, and this demo has quite a lot of that.
Now of course this demo is far from perfection, if only by the large amount of unoriginal content such as musics and character sets ripped out of games of font collections.
Also to note, The Carebears members were responsible for some of the most technically achieved games on the Atari, such as Enchanted Lands1.
The 3D Doc demoInterestingly, my favorite demoscreen in the Cuddly Demos was not the most technical one; for some reason I can't really explain I found the 3D Doc demo screen mesmerizing.
If you look at it with your 2016 goggles you will notice that there's been many more demos since then with better rasters effects, more advanced scrollings, more sprites, etc... but as a whole it's a simple satisfying experience.
I was discussing on #atari.fr, and we ended up talking about this demo screen, somebody linked to the video above and the thing that came to my mind was that it would be cool to fix the few issues this demo has, mostly the fact that the rasters are not stable, but also that graphics are clipped in the border... which look weird because of the rasters continuing on both sides.
One thing leading to another I ended up looking at the problem!
Atari ST demos 101The first thing you notice when you put a floppy containing an Atari ST demo in your drive, is that there are no files on disk, no folder either. Sometimes all you see is garbage making the operating system to crash or corrupt the screen!
The main reason is that demos are trying to use the hardware to the maximum: If you rely on the operating system you are spending memory and cpu time you could use for your own, so most demos just reimplement all the low level functions they need - including disk reading routines2.
This look like a large amount of work, but it also makes it possible to use more efficient storage on the disk, decompress data on the fly and also protect the content against disassembly.
The main consequence is that extracting just the one demo screen you want to look at is not an easy feat: You need to find the location of the file on disk, which means locating the loader, which often requires you to crack a protection of a complexity level routinely equivalent to the most hardened games.
3 had in the past released versions of the demo converted to normal executable files4.
I finally managed to get the DOC_DEMO.PRG I was looking for.
Actual size: 24576 bytes
A quick look in an hexadecimal editor and my suspicions are confirmed, the file is compressed.
Fortunately, we have all the tools we need.
Few seconds later I have the executable uncompressed.
Actual size: 49934 bytes
Still tiny, but now the size makes sense considering the content of the demo.
Time to look inside!
Tools of the tradeThere are quite many ways to look inside a program:
- Using some ripping software or hardware cartridge to stop the program while it runs
- Trace the code with a debugger such as MonST, Adebug or Bugaboo
- Use a disassembler, ideally interactive, such as Easy Rider
Tracing the code is a good way to understand interactively how the code works, though it comes with its own set of challenges: The code of a demo will most probably disable interruptions, play with palette and resolution changes, ... which may end up interfering with the debugger itself and causing difficulties.
Disassemblers exist in two kinds: Interactive and non interactive. The last kind just generate an output from an input and there's no much you can do. The first type allows you to add information to improve the results.
My favorite method is to use a combination of real time debugging and real time disassembler, because it's a pretty fool proof method:
- Disassemble the program (using Easy Rider 4)
- Save the disassembled result
- Reassemble to another executable
- Compare the results
If you get the same binary, you are good to go, it means there are no surprises in stock.
If you get differences, you need to find out where they come from: Mix of relocatable and fixed address areas in the code, use of some addressing modes or optimizations that differed between the two programs, and fix them before you can continue.
When you finally get a working executable, then you can start the proper investigations
Easy RiderI've been using Easy Rider for quite a while, it's definitely not a perfect tool but it's definitely usable for this purpose.
It's quite easy to use: Load a binary file, let it analyze it for few seconds to few minutes depending of the size.
After you can look at the result, add information, edit parameters, save the result as an assembler source code, or save the session so you can continue from where you stopped.
Quite practical really.
The main things to get the hang on with Easy Rider is that the display has two modes: Disassemble and Reassemble (that you can select in the Option menu).
Both views are similar, but they show things differently:
The disassemble view is equivalent to what you would see in a debugger, complete with the hexadecimal values on the left hand side
The reassemble view is what you would generate if you tried to save the source code so it can be reassembled, with some additional annotations for things such as system calls.
Most of the work I do in reassemble mode, but I switch to the disassemble mode when I'm in doubt about some sections that looks like code may have been data, or the other way around.
Now that the program is loaded you can scroll to see it in its entirety, or export it directly to assembler format to check how it compiles.
The way I do things, I first look for things that are easy to identify such as text strings and hardware registers and I give these locations some easy to recognize names.
This will make some other places in the code suddenly display "ScrollTextMessage" instead of "L12345", indicating that the place in question is probably related to displaying the text message.
Use this new knowledge to give names to other anonymous labels, and before you know it you will have a much better vision of the program without even have looked much.
Even better, if you manage to reassemble the program, you now have an executable with labels, which will make tracing the code much easier
I don't know for you, but for me it's easier to read "ClearPalette" than "L0002".
Managing expectationsFrom just watching the demo, we can learn the following information:
- The demo barely run on a half-megabyte machine
- There is some free CPU time
- It uses a music from Mad Max called "Bangkok (K?)nights"
- The main font is the well known "Knight Hawks" font
- There is a 8 colors mountain range in the background
- The sprites are also using 8 colors
- There is a moving carpet style ground effect at the bottom
- There are at least two colors changed using rasters
The vertical placement, movement of sprites over the elements, tells us that there are probably more than one color palette, and the vibration on the left side is a sure indication that the rasters are drawn using some MFP timers.
The fact it was memory constrained probably means that there was some ugly memory management with reuse of buffers, stuff copied over here and there, etc... but also that it will probably not run cleanly from the desktop5, so running it will probably crash our assembler and text editor.
Beware the dragons!
Code: StartupSo, let start looking at the code.
The code starts by this routine, which may (or may not) have been added by the group that ripped the demoscreen out of the main demo:
What this code does is simpler that it looks:
MOVE.W #$20,-(A7) ;SUPER
CLR.W $B0.W ; Trap #12 is used for attact mode
After switching to supervisor mode it copies the RelocateLoop code low in memory (here in $140.w, saved in A4), then it calls the code it just copied.
This will then copy the rest of the program from wherever it was loaded in memory to the hardcoded address $AC00.
Small note: The Cuddly Demos has an "attract mode" that can be enabled on startup when pressing some combinations of keys, the demo will then automatically go through every single demo effect and run it for few seconds. In this particular screen this is handled later in the main loop:
; Apparently for the attract mode:
; If enabled, instead of reading the keyboard the intro
; will automatically quit after 3600 frames (about 72 seconds)
TST.W $B0.L ; Trap #12 ???
Code: InitializationAfter the relocation is done, the hand is given to the program which proceeds with the initialization.
The sequence starts by the relocation of the music, this has to be done first because the area is reused later to store the generated data.
Note that the demo starts immediately after InitializeScreenAndIrq has been called: This starts the first scroll text message, and during that time the rest of the demo elements are generated.
By the time the end of the scroller is reached, the sprites have been scaled and preshifted, the ground floor is ready.
Code: MusicBack in the days, most musics in the intros on the Atari were ripped out of the games, which resulted in some problems: The music player was assembled with the game, and generally was not using PC-Relative code, which means that to run the code you have to make sure it is exactly in memory at the location as it was originally.
The modern sceners who wanted to save the music patrimony have heavily hacked these music drivers to make them run at arbitrary locations, but that fancy option was not available when AN-COOL wrote his demo, so he had to do it the hard way by relocating the music to the original location, in this case the address $6E000.
Considering he was already struggling with memory, he had to make sure that all the memory buffers would be before and after these 10 kilobytes of musical data :)
MOVEA.L #StartMusicDataToRelocate,A0 ; $14980
MOVEA.L #RelocatedMusicAdress,A2 ; $6E000
MOVEA.L #EndMusicData,A1 ; $16E60 (unused)
MOVE.W #MusicDataSize/2-1,D1 ; 4720 bytes to copy
FlagMusicEnabled DC.B $00 ; EC20
The CMPI check is just looking at the content of memory, the binary value is actually the hexadecimal code for 'BRA 82(PC)' which is the first instruction of the music player.
This is used to detect if the music was already relocated or not, if not, then it copies it, and clear the original location as well so it can be used to store other stuff required by the intro (which is why this routine is the first thing called during the initialization sequence).
When the relocation is done, the music is initialized by just calling the start of the copied buffer.
Note that the A1 register is never used, that's probably some remnants of code variants that ended up staying in the final binary (I found some other stuff like that laying around), and that for some reason he is testing twice if the buffer contains the correct data.
If everything is fine, then the flag is set to true (used by the IRQ routine to know if it should play the music or not).
Code: SpritesI was wondering how the sprites were generated.
Were they present in all the different sizes in the executable?
As it happens, there is a 32x32 sprite in memory, which is used to generate all the intermediate sizes (8 of these), then each sprite is also duplicated in variants shifted from 1 to 15 pixels to allow for fine placement all over the screen.
The code is interesting, certainly not the most optimal way to do it, that explains the duration of the black sequence on startup :)
So basically there is this table:
RescaleTables: ; 00C176
What it contains is information on which pixel to use from the source pixel to rebuild the new rescaled sprite.
The $20 values (32) are ignored and replaced by background color.
If you look at the last entry, you get all the values from 0 to 31, which means that all the pixels are copied "as is" (which also means that the rescaling routine is used for this non rescaled sprite as well!)
The first line has a bunch of values 32 on the left and right side, and values like 0,2,4,5,7,9,11... which are more or less the pixels you need for a 50% resized version of the sprite.
The actual rescaling routine is very simple, and actually use a GetPixel and PutPixel routine!
; Using the rescaling tables, the sprite is analyzed 8 times,
; using a combination of GetPixel and PutPixel to rebuild the sprite
; at a different size in a temporary buffer
MOVE.W (A2)+,D1 ; Rescale Y Value
MOVE.W (A3)+,D0 ; Rescale X value
BSR GetSpritePixelColor ; Returns the color of the pixel(x,y) in d5
That's definitely simple, but performance wise that would not be super optimal if you wanted to generate more sizes of sprites.
Now that the sprites have been resized, they have to be preshifted, else they would be slow to display on non aligned positions.
Since the sprites are using 8 colors (3 bitplans), they have to be masked else they would erase parts of the scroller and background mountain graphics.
This is solved by the preshifting routine: It mixes the three bitplans and saves a fourth values containing the merge result to be used as a mask.
MOVE.L A1,(A5)+ ; Store the address of the begining of this sprite
Code: ScrollingThe scrolling is relatively straightforward, but it looks complicated due to the sheer number of tables used to make it run.
It all start by the actual text content.
The second table if used to remap the value of the characters from the text to point to the right character6
DC.B ' YO, THE MEGAMIGHTY CAREBEARS ARE PROUD TO PRESENT THE 3D-DOC D'
DC.B 'EMO! IF YOU',$27,'VE ALREADY FORGOTTEN WHO CODED WHAT, WE',$27,'D '
DC.B 'BETTER TELL YOU AGAIN. '
DC.B 'ALL CODING BY THE CAREBEARS. MUZEXX BY -MAD, THE SLEEPER, MAX- ''
'(OF THE EXCEPTIONS, SCHMUCK!). F'
DC.B 'ONT BY THE NIGHTHAWKS (WHACK OF STARLIGHT SENT ME A INTRO-COLLECTION ''
'BY THE BLACK CATS YESTERDAY'
DC.B '. THE AMIGA-FONT-RIPPING-SYNDROME SEEMS TO HAVE HIT THE ST TOO.''
'MORE THAN HALF OF THOSE INTROS USED THIS FONT).'
DC.B ' MOUNTAINS BY A.D. I '
DC.B 'WRITE THIS TEXT THE DAY AFTER THE -WHATTAHECK- DEMO WAS RELEASED'
DC.B '. EVERYBODY IS COMPLAINING, TELLING US THAT IF WE DON',$27,'T '
'RELEASE THIS DEMO SOON, THEY WILL START FUCKING GREET U'
DC.B 'S. I ACTUALLY HAVE PROCESSOR TIME LEFT IN THIS DEMO, BUT I ''
'DON',$27,'T HAVE NO MEMORY LEFT.'
DC.B ' IT',$27,'S AGAINST OUR PRINCIPLES NOT TO USE EVERY DARNED ONE OF'
'THOSE 160256 VBL-CYC'
DC.B 'LES. BUT IF THIS DEMO IS GOING TO BE RELEASED THIS YEAR, WE SIMPLY'
'HAVE TO BE SATISFIED WITH ALMOST PERFECT SCREENS... '
DC.B ' AS WE TOLD YOU BEFORE, WE',$27,'RE OUT OF MEMORY, AND WHEN I ''
'SAY OUT OF MEMORY, I MEAN SO FUCKING OUT OF ME'
DC.B 'MORY THAT I WILL HAVE TO STOP WRITING THIS SCROLLTEXT, OR IT ''
'WON',$27,'T RUN ON A 520ST... '
DC.B ' ',$00,$FF,$FF
DC.B ' YO, LAMERS!!! TCB HAVE DONE IT ONCE AGAIN... '
DC.B ' ',$00,$FF,$FF
The distorter is implemented using the mirrored sliding window method.
Yeah, sounds complicated, and I totally did not invent the name right now. Promised!
Basically the idea is that the machine is too slow to distort in real time the text, but precomputing all the positions for the entire text would use too much memory.
Instead you use a buffer marginally larger that the screen (in this case, 368 pixels wide) which will be fast to blit to the screen, and you just insert on one side the new bits of text, and you dynamically change the view of the buffer to simulate the scroll... until the point where you reach the end of the buffer where you seen an horrible glitch because there's nothing more to show.
Actually that does not happen, because instead of one buffer, you double the size of the buffer, and you copy everything you write to each half of the buffer, so by the time you reach the end of the first half, you have a perfect copy that you can continue to display, and then by the magic of the modulo you are back to the first buffer again and there was no glitch.
Except that works only for speeds of 16 pixels.
In order to get this nice pixel perfect distorter you need to preshift, so ultimately the scroller buffer is multiplied 16 times, and then again by 25 because you need to do that for the entire character set height.
That sounds like it uses a lot of memory, and it does, but that works :)
(and yes, that's way too much code for this already too long blog post)
Code: RastersThe rasters are using a daisy-chained B-Timer handlers.
The rasters are initialized as usual from the VBL interrupt routine
This routine is quite simple, all it does is to play the music if it is started, alternate a flag that indicates if we are displaying an odd or even frame, and starts the Timer B interrupt.
(see music section)
CLR.B $FFFFFA1B.W ; TBCR - Disable timers
MOVE.B #4,$FFFFFA21.W ; TBDR - Trigger every 4 scanlines
MOVE.B #8,$FFFFFA1B.W ; TBCR - Event Count Mode
BSET #0,$FFFFFA07.W ; IERA - Enable Timer B (HBL)
BSET #0,$FFFFFA13.W ; IMRA - Enable Timer B
MOVE.W #0,$FFFF8240.W ; Start with a black background
The scroller is using color changes every 4 scanlines, and changes both the background color and the inside color of the text
Before changing for the mountain irq, the palette is manually changed to replace the colors of the text to be the colors used by the mountain graphics.
HblRoutineScrollerGradient: ; E628
MOVE.W (A6)+,$FFFF8240.W ; Background color gradient
MOVE.W (A6)+,$FFFF8244.W ; Gradient color inside the scroll text
DualGradientScrollText: ; E72C - 15 entries * 4 scanlines = 60 lines
The oddflag is used to alternate between two tables of gradient colors to generate more shades of grey7.
This code looks like it's actually doing pointless stuff such as changing an unmodified color and setting twice the timer frequency.
HblRoutineMountainGradient: ; E668
MOVE.W (A6)+,$FFFF8240.W ; Background color gradient
MOVE.W (A6)+,$FFFF8244.W ; (Seems unused)
MOVE.B #0,$FFFFFA1B.W ; TBCR - Disable timers
MOVE.B (A6)+,$FFFFFA21.W ; TBDR - Trigger every 'n' scanlines...
MOVE.B #1,$FFFFFA21.W ; ...and then overwritten?
MOVE.B #8,$FFFFFA1B.W ; TBCR - Event Count Mode
DualGradientMountain: ; E768 - 7 entries * 4 scanlines = 28 lines
Interestingly the multiple IRQ routines have different ways of changing the colors, and use different spacing8.
HblRoutineGroundGradient: ; E6AA
MOVE.L (A6)+,$FFFF8240.W ; Background color
MOVE.W (A6)+,$FFFFFA20.W ; TBDR - Trigger every 'n' scanlines...
DualGradientGroundFlip: ; E784
The bottom gradient is actually interesting: Only one set of color is provided by the code, the second set of colors is generated by code from the first color set using some dithering algorithm to generate the rest of the colors.
Simple but effective: Take the original color, flip some bits, get a new color.
MOVE.L (A0),D0 ; Read the two source color from the first table
BSR DitherTint ; Generate two colors from it
MOVE.L D0,(A1)+ ; Write back the first one in the array
MOVE.L D1,(A2)+ ; Write the second color in the second table
MOVE.W 14(A0),(A1)+ ; Happens to always be 0
MOVE.W 14(A0),(A2)+ ; Happens to always be 0
Code: RastersTo be honest, I did not look much at the way the ground was generated.
All I can say is that the various combinations are generated in a series of one bitplan buffers, all the magic is in the code generation, the display is pretty basic:
The only important thing here is the value of register D6 which is initialized to either 60000 or -60000.
Basically the table read that A2 reads contains the number of segments to display for each section of the board, and after each section the d6 value is flipped, making the code read either normal segments or inverted segments to show the alternance of tiles.
I guess that's all for today.
The ResultSo after few days of hacking, by just replacing two bitmaps (the mountains and the balls) as well as using a different gradient table for the background, here is the result after reassembling the demo:
I did not cheat, it's the actual code, unmodified, with just some conditional assembly to load alternate graphics :)
You can find the semi-commented source code here.
Have fun reading, don't hesitate pointing out my mistakes.
1. Unfortunately the gameplay was not always there...↩
2. It's basically what I did on the Oric with my FloppyBuilder system: By not using Sedoric I get almost 16 kilobytes of additional free memory for my demos or games.↩
3. Ripped Off, Blue Software, Persistence Of Vision, The Source, D-Bug↩
4. Practical for owners of hard drives↩
5. Because it probably overwrites some system variables↩
6. generally used when a font does not have many characters, to avoid crashing the demo when adding text that uses invalid characters↩
7. you can notice if you watch the video frame by frame, it alternates between greenish and purpleish↩
8. Also seem that there is some temporary code that was finally not used↩