trying to 4× upscale a 2016 jrpg's textures, part 2

picking up from where the last article left off. the plan was frida — hook the engine functions producing the wrong HUD positions, watch the values flow through at runtime, patch only what needed patching. that was the plan.

frida didn't work

steam runs this game inside a pressure-vessel container. the container isolates the process namespace and blocks ptrace from outside, even with kernel.yama.ptrace_scope=0. attaching frida to the wine process failed with i/o errors. spawning via frida-spawn worked just long enough for the bootstrapper to crash with signal 11 once the game actually started loading.

i tried for an evening. tried WINE_NO_PRESELOADER, tried running the exe directly outside steam (the game refuses to launch — it checks for a steam app context), tried bind-mounting /proc into the container. none of it gave frida enough access to inject.

so the runtime instrumentation had to come from inside the rendering path itself. the obvious place: the apitrace wrapper i'd built last session. it's already running inside the wine process, it sees every gl call, and it's my code — i can add whatever logging i want.

the wrapper grew teeth

wgltrace.cpp is huge — about 110k lines of generated wrapper code, one function per gl entry point. i added a static _sosc_log_quad helper at the top of the file and called it from the glBufferSubData wrapper whenever size == 112. 112 = 4 vertices × 28 bytes (pos.xyzw + uv + rgba8 color), the engine's standard quad submission.

the helper used RtlCaptureStackBackTrace to grab the return addresses. windows x64 doesn't maintain a frame-pointer chain, so the typical __builtin_return_address(N>0) walk crashes — RtlCaptureStackBackTrace uses the .pdata unwind info instead and gets it right.

static void _sosc_log_quad(const void* data) {
    if (!_sosc_drawlog) _sosc_open_log();
    const float* fv = (const float*)data;
    float x0=fv[0], y0=fv[1];
    float x1=fv[7], y1=fv[8];
    float x2=fv[14], y2=fv[15];
    float x3=fv[21], y3=fv[22];
    void* frames[4] = {0};
    USHORT n = RtlCaptureStackBackTrace(1, 4, frames, NULL);
    fprintf(_sosc_drawlog, "q v0=(%.3f,%.3f) ... s=", x0,y0,...);
    for (USHORT i = 0; i < n; i++)
        fprintf(_sosc_drawlog, "%p,", frames[i]);
    fprintf(_sosc_drawlog, "\n");
    fflush(_sosc_drawlog);
}

i wrote to Z:\tmp\sosc_drawlog.txt — Z: in wine maps to / on linux, so the file lands in /tmp/sosc_drawlog.txt and i can read it without mounting anything. after one cockpit-HUD session i had ~40k quads logged, each with a four-frame stack.

two render paths, neither was the right one

bucketing the stacks by return address, almost everything pointed at two functions:

FUN_140193fe0 — ~95% of the quads. matched the function we already had a bg-clip-coord cave on.
FUN_140194a70 — ~5%. the sprite quad builder from last article.

the LV labels and HP bars in the cockpit HUD were rendering way smaller than they should. the obvious hypothesis was that the digits go through one of these two paths and i should add a conditional scale. i tried both. they were both wrong.

first attempt: patch FUN_140194a70's sprite_w read site to multiply by 4 when sprite_w < 128 (small UI glyphs only, leaving big portraits untouched). result: main menu text got stretched 4× and LV digits were unchanged. the menu used that path; LV digits didn't.

second attempt: extend the bg cave to do a centered ×4 scale on in-NDC small quads in the lower portion of the screen (cockpit HUD region). the math:

delta_w = xmm6 - xmm7      ; current width
xmm6 += delta_w * 1.5      ; new right = right + delta*1.5
xmm7 -= delta_w * 1.5      ; new left  = left  - delta*1.5
                           ; ⟹ width becomes 4× while center stays put

with a guard avg_y < -0.4 AND width < 0.5 to only fire on small HUD-region quads. result: scaled UI elements that were already correctly sized, and didn't touch LV/HP at all. wrong path again.

REX prefix bugs

while building that cave i made encoding mistakes that are worth documenting because they're hard to catch without disassembly. xmm10 needed REX.R=1 (to extend the reg field), but for movaps xmm10, xmm6 i also set REX.B=1 (which extends the rm field). that re-encoded the instruction as movaps xmm10, xmm14 — silently. the game crashed mid-cockpit-render and i thought my logic was broken.

correct:    44 0f 28 d6    movaps xmm10, xmm6     (R=1 B=0)
i wrote:    45 0f 28 d6    movaps xmm10, xmm14    (R=1 B=1)

REX bits: 0100 WRXB. you set R when the register operand needs extending; you set B when the rm/memory operand needs extending. i'd been thinking "this instruction has any high register, set REX fully" and that's wrong. ended up writing a small post-patch verification step:

os.system(f'objdump -D -b binary -m i386:x86-64 -M intel '
          f'--adjust-vma={CAVE_VA:#x} /tmp/cave.bin | head -60')

just dump the patched cave back out, eyeball the disassembly. caught six REX bugs in one go. should have been the first step, not the last.

persistent buffers and a third path

after another wrong cave i went back to the trace and counted: ~40k small quads per session, distributed across ~5 seconds of gameplay. at 60 fps that's ~130 quads per frame. way too few for a typical UI with dozens of widgets on screen.

the explanation: the engine uploads most UI vertex data once to a persistent vbo and draws from it every frame with glDrawArrays. glBufferSubData only fires for things that change per-frame (the bg, fullscreen post-process passes). my wrapper, instrumented only on buffer uploads, was blind to the LV/HP draws entirely.

so i added a _sosc_log_draw hook into the glDrawArrays and glDrawElements wrappers. filter to small count (≤ 8 vertices = single quad or triangle pair), to keep the log size sane:

static void _sosc_log_draw(const char* fn, unsigned mode,
                            int first, int count) {
    if (count > 8) return;
    void* frames[4] = {0};
    USHORT n = RtlCaptureStackBackTrace(1, 4, frames, NULL);
    fprintf(_sosc_drawlog, "D %s mode=%u first=%d count=%d s=",
            fn, mode, first, count);
    ...
}

re-ran. 23,452 small-quad glDrawArrays(GL_TRIANGLE_STRIP, 0, 4) calls, every single one with the same stack:

00006FFFFB92942E,00000001401EF3F7,00000001401C5AFD,0000000140181233

frame[1] = 0x1401EF3F7, inside FUN_1401ee780. that turned out to be a command-buffer interpreter — a giant function with a 24-case jump table at +0xc77. each case implements a different gl primitive (POINTS, LINES, TRIANGLE_STRIP, ...). every UI quad in the cockpit, including LV/HP/EXP, goes through this one site. the function is called in a loop by FUN_140181233, which walks an in-memory command queue at [r13+0x10] with 0x68-byte entries.

the LV digits do go through this path. but so does every other UI element. no obvious discriminator. another dead end for a conditional patch.

the texparts.lod hypothesis (wrong, but instructive)

since binary patches kept missing, i tried a file-format approach again. texparts.lod has 1398 entries with LEFT/TOP/WIDTH/HEIGHT/ RIGTH/BOTTOM fields. the cockpit-related entries i could read:

  85 CockpitGaugeBaseBox      tex=1  L=392 T=640 W=86 H=14
  86 CockpitGaugeHP_Poison    tex=1  L=392 T=596 W=80 H=8
 113 CockpitLevelPlate        tex=3  L=511 T=751 W=27 H=17
 948 Number_2432_1            tex=1  L=378 T=536 W=21 H=27
 ... (10 digits × multiple color variants)

hypothesis: WIDTH/HEIGHT control screen-pixel size, LEFT/TOP/RIGTH/ BOTTOM control atlas uv. so multiplying W/H by 4 should scale the digit on-screen without breaking uv sampling.

wrote patch_texparts.py, scaled 26 cockpit-related entries by 4, launched the game. the LV digits came out as garbled mirror text ("lv1234" instead of "Lv03"), the HP gauges showed four stacked overlapping bars where one should be, and the EXP labels were garbage.

apitrace showed why. the engine also uses WIDTH for uv sampling width. scaling WIDTH 4× meant each digit's UV extent stretched 4× beyond its actual atlas region, pulling in adjacent characters (numbers stored side by side in the atlas got included in the "digit" sample). the dual-purpose-field problem from the last article, again. reverted.

the question i should have asked first

after several failed instrumentation experiments and a broken texparts attempt, i went back to basics and re-read my own notes. the relevant detail i'd lost track of:

nothing about the output resolution had been changed.
the LV/HP bars had started rendering small after the bg cave was applied, not before.

i'd been treating "LV labels are small" as an independent problem that needed its own patch. but the symptom appeared as a consequence of the bg cave, which meant the cave was the thing to look at, not the LV path. the bg cave's ×0.25 when |xmm9|>1 was firing on elements whose vanilla layout positions stayed within NDC range — small UI like LV labels — and not firing on them. it only fired on the bg and the portrait quads. so:

LV/HP/EXP at vanilla layout → xmm9 stays in [-1, +1] → cave passes through → renders correctly at vanilla size
bg + portraits with 4× upscaled textures → xmm9 goes out of NDC → cave fires → ×0.25 brings them back to expected size

the previous session had 4×-scaled layout positions for "Level" and "EXP" entries, intending to compensate for an imagined upscaled UI quad size. but the engine wasn't producing upscaled UI quads — only the bg and portrait paths were. those 4× layout edits pushed LV labels' xmm9 out of NDC range, which made the cave fire on them, and the cave shrunk them.

so i'd been chasing a problem i'd created. the fix was to delete the fix.

what worked

cleaner state:

kept the 4× upscaled texture files (the actual upscale work).
kept the bg cave at FUN_140193fe0 — mulss xmm6/7/8/9, [0.25] when |xmm9| > 1. now it's the only binary patch.
reverted layout.lod to vanilla.
reverted every other code cave.

result with this state: bg correct, LV/HP/EXP correct, but all six portraits collapsed into the screen center. the cave scales toward NDC origin (0,0), and portraits are positioned off-center in the cockpit layout. uniform ×0.25 on a quad at NDC x=+0.5 puts it at x=+0.125 — closer to center than vanilla intends.

so the targeted fix: patch only the 6 ShowCockpitPortraitRank* entries in layout.lod, multiplying their X coords by 4× outward so the cave's pull-toward-origin lands them back at vanilla spots. everything else stays at vanilla layout.

PORTRAIT_POSITIONS = {
    'ShowCockpitPortraitRank1': ('-1728', '554'),
    'ShowCockpitPortraitRank2': ('-1240', '554'),
    'ShowCockpitPortraitRank3':  ('-752', '554'),
    'ShowCockpitPortraitRank4':   ('608', '554'),
    'ShowCockpitPortraitRank5':  ('1096', '554'),
    'ShowCockpitPortraitRank6':  ('1584', '554'),
}

(layout.lod is a 1037-record file. each record is 5 length-prefixed strings — Name, X, Y, Z, Parm — with null terminators between fields. my parser had to be tolerant of header bytes and scan forward on parse failures because the header isn't documented anywhere and i didn't want to misalign by one byte.)

with that targeted edit: bg correct, portraits spread out across the cockpit at vanilla positions, LV/HP/EXP matching the vanilla reference shot.

where i am right now

at 4× source resolution: every background, every NPC, every character portrait, every effect, every encounter sprite, every single-frame comtex element, the picturegate scenes. cockpit HUD elements render at correct vanilla proportions, sampling from the upscaled atlases.

still at 1× source resolution: the title screen, trophy icons, the menu text glyph atlases. same issues as last article — the dual-purpose WIDTH field in texparts means you can't decouple "UV extent" from "cursor advance" without engine-level patches.

binary patches alive: still just the one bg cave. 5-byte JMP at 0x1401941f5, 21-byte body in the .text padding, 4 floats of constants. no conditional logic, no per-quad heuristics, no scratch registers.

file patches alive: 6 lines of layout.lod (portrait positions).

i kept the apitrace wrapper instrumented because there are still a few small HUD quirks to chase down before this is fully buttoned up. will revert it when we're done.

what i learned this time

the meta-lesson: when something looks wrong, don't immediately reach for a new patch. first check whether a previous patch is causing the wrongness. i added three layers of compensating fixes for a problem the bg cave wasn't actually creating, and each layer made the diagnosis harder.

the smaller lesson: always disassemble your patched bytes before testing. REX prefix bugs are invisible at the source level but loud at the cpu level. one objdump -D -b binary after every cave write would have saved a couple of crashes.

still not done. but the HUD wall i was stuck on last article is mostly past. more to come.