Last active
November 8, 2020 09:01
-
-
Save runer112/aa317e6b8325e9292e9d9ed6f0e8a0f4 to your computer and use it in GitHub Desktop.
*Really* fast GB sample playback through $FF1C writes
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
; Response to https://gist.github.com/nitro2k01/e45e47f8469009b8a43ad5e48d951442, | |
; which is nitro2k01's response to RGME about Pokemon Yellow sample playback. | |
; For this to make any sense, you'll definitely want to watch the video and | |
; then read nitro2k01's response. | |
; With the restriction that the sound data must end at the end of a page (which | |
; seems fine, you're probably not fitting multiple of these onto a page), 24 | |
; cycles per sample is achievable: | |
; hl = data | |
ld c,$1C | |
ld a,[hl+] | |
.loop: ld [$FF00+c],a | |
rlca | |
rlca | |
ld b,[hl] | |
ld [$FF00+c],a | |
rlca | |
rlca | |
inc hl | |
ld [$FF00+c],a | |
rlca | |
rlca | |
bit 6,h | |
ld [$FF00+c],a | |
ld a,b | |
jr nz,.loop | |
; To push it to the limit, since our need for precise timing certainly requires | |
; that interrupts be disabled, we can pull out the legendary speed hack that is | |
; pointing `sp` to data. `pop r16` is like a super `ld a,[hl-]`, reading 1 extra | |
; byte for only 4 extra cycles. Critically, reading two bytes at once grants us | |
; a brief respite between the last output of the first byte and the first output | |
; of the second byte where we don't have to read another byte from memory, and | |
; that's where we can slip in a jump instruction for looping. The looping does | |
; get a little "loopy," requiring the main 8-output loop to start in the middle, | |
; and requiring a nearly identical extra copy, with each handling a different 8 | |
; bits of the 16-bit counter. That's right, the counter is back, which means the | |
; restriction on sound data alignment is gone. In exchange, though, the data | |
; length must be odd and stored (or calculated) as `-(length-1)/2`. But with all | |
; of this, what seems like the absolute limit of 20 cycles per sample is | |
; achievable: | |
; bc = -(length-1)/2 | |
; hl = data | |
ld [sp_save],sp | |
ld sp,hl | |
ld hl,$FF1C | |
jr .start | |
.loop: ld [hl],d | |
ld a,d | |
rlca | |
rlca | |
ld [hl],a | |
rlca | |
rlca | |
nop | |
ld [hl],a | |
rlca | |
rlca | |
nop | |
ld [hl],a | |
.start: pop de | |
ld [hl],e | |
ld a,e | |
rlca | |
rlca | |
ld [hl],a | |
rlca | |
rlca | |
nop | |
ld [hl],a | |
rlca | |
rlca | |
inc c | |
ld [hl],a | |
jr nz,.loop | |
nop | |
ld [hl],d | |
ld a,d | |
rlca | |
rlca | |
ld [hl],a | |
rlca | |
rlca | |
nop | |
ld [hl],a | |
rlca | |
rlca | |
nop | |
ld [hl],a | |
pop de | |
ld [hl],e | |
ld a,e | |
rlca | |
rlca | |
ld [hl],a | |
rlca | |
rlca | |
nop | |
ld [hl],a | |
rlca | |
rlca | |
inc b | |
ld [hl],a | |
jr nz,.loop | |
ld sp,[sp_save] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment