This was posted by Simon Goodwin in the BASIC on the ZX Spectrum Next Facebook group:
One of the greatest performance bottlenecks of ZX BASIC is the lack of any efficient facility to search within one string for another. Where other BASIC interpreters have functions like INSTR or POS, we Sinclairphiles are reduced to stepping through, comparing at each character's position. Even at 28 MHz NextBASIC 2.07 can't manage more than about 600 bytes per second searching t$ for p$ as follows:
k=0: REPEAT :k=k+1:REPEAT UNTIL p$=t$(k TO k+7)
This uses a REPEAT loop rather than integer FOR as integer variables cannot be used in string expressions, e.g. to slice t$. Here the pattern in p$ is eight bytes long, hence the slice t$(k to k+7) to match the length, and for the sake of simplicity I've assumed that p$ does exist somewhere in the target string t$; otherwise an additional check that k did not exceed LEN t$ would need to be added, e.g. WHILE k+7<=LEN t$, slowing the loop down even more.
Wouldn't it be great if we could call a subroutine and get the matching index - or zero for no match - hundreds of times faster? It'd also be cool not to care about the pattern length but to have that picked up automatically, along with checks that neither was empty and the pattern was not longer than the target.
Well, now you can. I find the speedup factor is about 750 times, in my checks, though less if the target is small or the first character of the pattern occurs often within the target. This voodoo uses a mixture of NextBASIC and standard Z80 code.
At the weekend I wrote a 74-byte relocatable memory-searching Z80 routine for Matt Langley's Horizon project. This can find any sequence of bytes in any other, given only the 16-bit start address and length of the pattern to be found and the target to find it inside.
To make this more relevant and useful to NextBASIC programmers I have added a 'wrapper' which uses some hairy hacks discovered a decade ago by a person called BattleBunny to find string addresses and lengths (for a memory-copy subroutine which POKEd a ZX BASIC assignment statement to use the ROM string copy function to move arbitrary bytes around).
My 2023 remix uses NextBASIC goodies like DPOKE and %DPEEK to simplify that code and make it more general. It's still a hack, with necessary setup (a GO SUB 250 needed after each edit to the program, to set an integer variable %F to the address of a DEF PROC line for later POKing) and a slightly weird calling sequence:
GO SUB FN q(t$, p$)
which looks for p$ in t$ and sets %m to 0 if there's no match, and the index of the first matching character in t$ otherwise. So if t$ and p$ are identical, it'll return 1 in %M. Alter line 290 if you want the result in a different variable. Don't use %S or %F as those are preset to the address of the machine-code helper and the DEF FN line used to pass the strings to it, respectively. No other global variables are used or affected.
One nice thing about this approach is that you can use any string for p$, or a literal like "this" or even an expression like CHR$ 0, and the same goes for the target. If you're searching memory rather than a string it makes more sense to call the machine code directly. I'll explain how to do that a bit later, but will focus on the BASIC side here.
The following example is an extended version of the above screen-grab. It includes the same lines (so you can cut and paste them for .txt2bas or Remy Sharp's tools) plus extra fluff to compare the speed of BASIC and machine code when searching a much larger target (in this case, the first 8K of the ROM that implements PEEK$, as I happen to know that contains the string "PLUS3DOS", at least on my current 2.07 setup).
Here's the full example, with commentary afterwards. Don't try to re-order or renumber this without reading the commentary first.
100 CLEAR % DPEEK 23730-74:%s=%1+ DPEEK 23730: REM Set RAMTOP
110 DPOKE %s,%33,%4352,%0,%1,%31744,%51381,%45691,%60872,%55378,%54555,
%19780,%8451,%0,%17,%6656,%45549,%5920,%24797,%27101,%31169,%10416,%50459,
%5093,%60698,%8353,%30728,%8369,%49654,%53515,%57801,%17629,%19933,%45432,
%54816,%62232,%19755,%51524
120 GO SUB 250: REM Point %f to DEF FN q; search code is at %s
130 b$="Good":a$="Simon N Goodwin": GO SUB FN q(a$,b$)
140 IF %m THEN PRINT "Match at ";%m: ELSE PRINT "No match"
150 t$= PEEK$ (0,8192):z$="PLUS3DOS": DPOKE 23672,0
160 FOR %i=1 TO 100: GO SUB FN q(t$,z$): NEXT %i
170 t=% DPEEK 23672:m=%m: PRINT t;" frames for ";m*100;" bytes"
180 PRINT "Searched at "; INT (m*48.2*100/t);" bytes/s"
190 IF %m THEN PRINT "Match at ";%m: ELSE PRINT "No match"
200 i=0: DPOKE 23672,0
210 REPEAT :i=i+1:%m=z$=t$(i TO i+7): REPEAT UNTIL %m:t=% DPEEK 23672
220 t=% DPEEK 23672: PRINT "Match at ";i;" after ";t;" frames"
230 PRINT "Searched at "; INT (i*48.2/t);" bytes/s": STOP
240
250 DEF FN q(a$,b$)=270: RESTORE % DPEEK 23621:%f=%8+ DPEEK 23639: RETURN : REM Find FN q from DATADD system variable
260
270 DPOKE %s+1,% DPEEK (f+6): DPOKE %s+4,% DPEEK (f+15)
280 DPOKE %s+24,% DPEEK (f+4): DPOKE %s+27,% DPEEK (f+13)
290 IF % NOT s THEN %m=0: ELSE %m=% USR s: IF %m THEN %m=%m+1- DPEEK (f+4)
300 RETURN
Lines 100 and 110 put the Z80 code just below the current RAMTOP, moving it down 74 bytes, and setting %s to its start address. You only need to do this once, as long as you've not lost track of the location of the code. If you've not moved RAMTOP since, you can recover that address with %s=%1+ DPEEK 23730. You may want to check that the first byte is 33, just in case, and if really cautious also look for 201 (the Z80 RET instruction) at %PEEK(s+73).
Line 120 calls the subroutine at line 250. This needs to be done once each time the program runs, as it finds the address of the DEF FN line at its start, using RESTORE % DPEEK 23621 (PPC) to temporarily set the DATA pointer (DPEEK 23639) to the start of the current line (PPC) and the offset +8 to skip over the start of the line to the hidden space used for parameters. After you've called this you can RESTORE wherever you like, but should bear in mind that this setup will rewind the READ pointer to the first DATA after line 250.
With that necessary setup out of the way we can compare as many strings as we like while the unedited program continues to RUN. Line 130 is a trivial example, looking for "Good" in my name and hopefully finding it at the ninth character (unless you've scandalously omitted my middle N 🙂.
Line 140 just checks %m and indicates if the match was found and, if so, where.
Lines 150 to 230 are a more lengthy test, with timing using the system variable FRAMES, to find the string "PLUS3DOS" in the ROM revealed by PEEK$. There's a BASIC loop for comparison, which reveals a speed-up factor of around 750 times. For accurate times in seconds the factor 48.2 is used to convert HDMI frames to seconds; use 50 there for VGA0 or PAL SCART, and higher values for VGA1 to 6 corresponding to the extent they speed up the Z80n clock and video display to suit PC left-overs (see page 267 of the manual, REG 17).
The slightly deeper voodoo is in the subroutine from line 270 onwards, which is what the GO SUB FN actually runs - the result of the DEF FN is the line number, 270, so you must alter the reference in line 250 if you renumber the program, or all bets are off. The earlier the DEF FN is in your program, the sooner NextBASIC will find it, just as for PROCs, GO SUBs etc.
The function call pokes the DEF FN line (this is how ZX BASIC has always done it!) with the parameter details, and line 270 reads the lengths of the target and pattern strings from offsets on %F hidden inside the program source (!) and plonks them where the Z80 code needs them, at offsets 1 and 4 respectively (for LD HL and LD DE instructions, if you wonder). Line 280 does the same for the start addresses, at offsets 24 and 27 in the code. So this is all you need to call the code directly, to search arbitrary memory areas, rather than strings.
The USR function checks all these DPOKEs (as long as you've not misplaced them and clobbered the code rather than its data, in which case expect trouble) and searches for the first pattern byte in the target, at a peak speed well over a megabyte a second, using the Z80s distinctly CISC CPIR instruction. The rest of the code checks for the remainder of the pattern. If it's all there, USR returns the memory address of the first match in the target. If not, it returns 0. Line 290 converts the address into a string index in the target, assuming you've remembered to set %S and the pattern is found.
There are other ways to pass strings to code but this one, due mostly to BattleBunny, suits the Z80 already written for Horizon and shows some rather esoteric NextBASIC DPEEKs and DPOKEs along the way.
The calling overhead is substantial for small strings. You can speed it up by omitting pairs of DPEEKs and DPOKEs if either string is unchanged since the last search (or to search further on in the pattern, not forgetting to change the length as well as the start) or if you're quite sure that %S and %F are valid. But that's on your own head - start with the belt-and-braces version here, as if you break it in custom ways I probably won't be inclined to help.
I shall work on ways to search arrays, and to step by preset strides between mismatches (e.g. by five to match floats, or the structure or string array length) but that'll require fresh Z80 code and I feel I've earned some easy NextBASIC time now.