This commit is contained in:
David Rice
2026-05-26 17:33:37 +02:00
parent 0f7b0e1ac5
commit 2c4400914f

View File

@@ -252,16 +252,53 @@ transient that completes within one poll interval. Specifically:</p>
hardware-level events that do not change any register I am polling
(e.g. a brief PHY FIFO timing slip with no associated interrupt)
are also outside what this instrumentation can detect.</li>
<li><strong>Why I did not poll faster.</strong> The DSIM register reads
go through the <code>memtool</code> userspace tool via subprocess
calls, which has non-trivial overhead. Pushing the wide poll much
below ~100 ms started loading the device CPU and impacting the I²C
poll loop. I did not try to optimise this further.</li>
<li><strong>Why I did not push the wide poll faster.</strong> The
<code>memtool</code> subprocess calls have non-trivial overhead and
pushing the wide poll much below ~100 ms started loading the device CPU
and impacting the I²C poll loop.</li>
</ul>
<p>So the correct way to read section 4 is: at the resolutions I could poll,
nothing showed up. That is consistent with — but not proof of — the fault
being below register visibility (e.g. a sub-frame transient timing event).</p>
<h3>Follow-up trial: high-rate /dev/mem mmap poll</h3>
<p>To try to see below the 500 ms ceiling I added a second poller that
bypasses <code>memtool</code> by mmap-ing <code>/dev/mem</code> directly in
Python and reading the DSIM registers via load instructions, aiming for
~1 ms resolution. A short captured window showed:</p>
<ul class="tight">
<li><code>DSIM_CLKCTRL</code> (offset 0x008) <em>is</em> active at sub-
millisecond timescales — it flips from its steady-state value of
<code>0x02</code> to <code>0x10</code>, <code>0x100</code> or
<code>0x01</code> in 12 ms wide pulses, occurring roughly every
200 ms during normal operation.</li>
<li>The pulses appear at a very regular cadence and are not visibly
aligned with the video switches I was driving. The pattern looks
consistent with routine per-frame clock-enable management by the
kernel driver rather than fault-correlated events — but I am
<em>not</em> claiming to have shown that, only that the activity I
saw was regular.</li>
<li>The mmap poll at that rate destabilised the device — the full unit
(not just the userspace service) crashed/rebooted within seconds of
starting, on multiple attempts at 1 ms target rate. I therefore could
not run it long enough to mark visible flicker events and compare
timestamps against the CLKCTRL transitions.</li>
</ul>
<p>So the most accurate position on register-level diagnostics for this
investigation is:</p>
<ul class="tight">
<li>At the resolutions I could safely sustain (500 ms wide poll, 10 ms PLL
poll), nothing showed up during a confirmed flicker burst.</li>
<li>At higher rate (1 ms) the registers are demonstrably busy, but the
activity I captured looks like normal driver behaviour, and I could
not keep the device alive long enough to test correlation with
flicker timing.</li>
<li>The combined evidence is consistent with — but not proof of — the
fault being below register-level visibility on this rig, e.g. a
sub-frame transient timing event that does not change any register
I am able to read safely.</li>
</ul>
<h2>5. Reproducer without GStreamer or video</h2>