Flicker Investigation Continued

Author: David Rice (Electronics)  ·  Hawk unit at 10.32.33.157  ·  Kernel 6.6.44-pknbsp-svn9520

1. Visual evidence — the artifact is "screen shift"

Captured slow-motion phone video of the screen during a flicker event and pulled individual frames out. Frame 362 is a clean baseline. Frames 363, 370, 376, 381, 382 are flickers. The consistent signature:

To my eye this is the signature of a frame-tearing / VSYNC-alignment slip. Not a data integrity problem — a timing-alignment problem.

Reference frames (slow-motion phone capture, resized for inclusion; I have the original full-size PNGs if you want them):

2. Test-pattern bisection rules out the LVDS output stage and panel

While a burst of flicker was actively occurring on the MIPI input path, I enabled the SN65DSI83's internal LVDS test pattern (write 0x10 to CSR 0x3C, i.e. bit 4 of LVDS_FORMAT — confirmed via the known i2cset sequence). With no other change:

SN65 sourceObserved
MIPI input (normal) Flicker happening as before
Internal LVDS test pattern enabled Clean colour-bar pattern, no flicker, for as long as enabled
Back to MIPI input Flicker resumes

The internal test pattern is generated inside the SN65 at the LVDS output stage, downstream of the MIPI input and the MIPI-to-LVDS conversion. The fact that it is clean while the MIPI-fed image flickers means everything from the SN65 LVDS PLL through the panel is physically healthy. The fault must therefore lie upstream of that point — somewhere in the i.MX DSIM controller output, the MIPI bus, the SN65's MIPI receiver, or the SN65's internal MIPI-to-LVDS conversion logic.

3. On-target instrumentation I built today

To complement the visual capture I wrote a small Flask HTTP service that runs on the Hawk and polls register state at high rate, plus a host-side Python tool that drives it and records events. This is not production-quality software — it's a measurement harness, and Claude helped me write the Python.

FileRole
device_server.py Runs on Hawk as a systemd service. Polls SN65 registers (csr_0a, csr_e5) every 10 ms via I²C, and a block of DSIM registers (STATUS, CLKCTRL, CONFIG, ESCMODE, MDRESOL, MVPORCH, MHPORCH, MSYNC, SDRESOL, INTSRC, INTMSK, PHYTIMING0/1/2) every 500 ms via memtool. Logs transitions. Exposes endpoints to start/stop monitoring and toggle the SN65 test pattern.
trigger.py Host-side. Talks to the on-target service over HTTP, prints alerts as events arrive, lets the operator press f to mark a visible flicker observation, correlates marks against switch/unlock timing, and writes a full session log to file.
display_test_nexio.py The existing GStreamer kiosk, modified to support a UDP "switch video" trigger and a --start CLI flag so the test rig can drive switches deterministically.
cycle.sh Shell-side reproducer that loops echo 4 / echo 0 > /sys/class/graphics/fb0/blank. Drives many pipeline restart cycles in sequence so flicker rate can be eyeballed.

4. Captured a burst of 93 flickers with no register-level signal

In one session, with the monitor running and an operator (me) pressing f on each visible flicker, I recorded a single burst of 93 visible flickers in ~29 s (~3 Hz, randomly spaced from roughly 100 ms to 700 ms apart). All 93 marks fell within a single ~30-second window between two scheduled video switches.

Throughout the burst, the on-target poller reported:

So from the register-polling perspective the system looks healthy at the sampling rates I used. I have the full session log if useful.

How confident am I that nothing happened?

Less confident than the above bullet list might suggest, and I want to be honest about that. 500 ms is a long time compared to a single video frame (40 ms at 25 Hz), so the wide DSIM-register snapshot could easily miss a transient that completes within one poll interval. Specifically:

Follow-up trial: high-rate /dev/mem mmap poll

To try to see below the 500 ms ceiling I added a second poller that bypasses memtool by mmap-ing /dev/mem directly in Python and reading the DSIM registers via load instructions, aiming for ~1 ms resolution. A short captured window showed:

So the most accurate position on register-level diagnostics for this investigation is:

5. Reproducer without GStreamer or video

To check whether GStreamer / userspace was contributing, I stopped both the kiosk and the device-server service, leaving just the Linux text console on screen. Then ran the cycle script:

echo 4 > /sys/class/graphics/fb0/blank
echo 0 > /sys/class/graphics/fb0/blank
sleep ...
(repeat)

A random subset of the cycles produced the same vertical-shift artifact as during video playback, at roughly the same hit rate. The flicker is also visible on the U-Boot splash screen during boot, before Linux is up.

As I read this, the fault is reachable from both the U-Boot DSI bring-up path and the Linux DRM/KMS pipeline-restart path on every cycle. GStreamer is not required and not the cause.

6. dmesg signature on every pipeline restart

Every blank/unblank cycle produces this exact 3-line sequence in dmesg, deterministically across many cycles:

mxsfb 32e00000.lcdif: [drm] magic pixel crtc=0 offset 0x3e7ffc
sn65dsi83 4-002c: Using DSI clock for LVDS
@MF@ sn65dsi83_get_dsi_range dsi_rate=216000000 mode_clock=72000000 min_rate=216000000 => range=0x2b

So the SN65 bridge driver is re-initialising on every cycle (recomputing its DSI range and re-applying LVDS configuration). The visible flicker variability appears to be in how this re-init lands relative to the panel's own scan cycle — sometimes the timing aligns and the panel locks cleanly; sometimes it does not. But that is software-side interpretation and I am not confident in it.

At boot the dmesg also shows the line:

samsung-dsim 32e10000.dsi: @MF@ flb-flags=0x0

which led me to poke around in the source code, see section 7.

7. Something I noticed in the BSP source — may well be a red herring

This may turn out to be unrelated, but while poking around in the kernel patches looking for anything that might be connected to what I am seeing, I came across flb-hack-dsi-autoflush in parkeon_linux_bsp/trunk/kernel/patches-6.6/ (dated 2025-05-16, authored by Martin Fuzzey). The commit message reads:

"dsi: allow forcing of FIFO flush on vsync from DT — Hack for screen shift problem to let electronics team test while I'm on holiday..."

The phrase "screen shift" jumped out because it is exactly the words I would use to describe the artifact I am seeing. From my limited reading of the patch source, it looks like it adds two device-tree flags on the Samsung DSIM driver:

BitFlagWhat I think it does
0FLB_FLAG_AUTO_FLUSH Clears DSIM_MFLUSH_VS, switching from manual FIFO flush to auto flush on VSYNC
1FLB_FLAG_NO_BURST Clears DSIM_BURST_MODE

The patch also adds dev_info(dev, "@MF@ flb-flags=0x%x\n", dsi->flb_flags); in samsung_dsim_parse_dt(), which I believe is where the @MF@ flb-flags=0x0 line in my boot dmesg comes from. If that mapping is right, then the patch is in the running kernel, but flb_flags == 0 at runtime, and the behavioural changes the patch adds are wrapped in two if blocks that both evaluate to false:

if (dsi->flb_flags & FLB_FLAG_AUTO_FLUSH)
    reg &= ~DSIM_MFLUSH_VS;
if (dsi->flb_flags & FLB_FLAG_NO_BURST)
    reg &= ~DSIM_BURST_MODE;

So, again from my limited understanding: it looks like this patch may have been written as a fix associated with the same kind of artifact I am investigating, and on the unit I have it is compiled in but not actually doing anything because no flb-flags property has been set in the device tree. If that interpretation is correct, the fix would be to add flb-flags = <1>; to the DSIM device-tree node (which I think lives in the flb-dt-hawk patch, but I have not opened it to confirm).

I am writing this section mainly to check on its validity — or at the very least to try to understand it. I am not making a claim that this is the fix. Specifically I do not know:

If you can tell me whether this is a real lead or a red herring, that alone would be very useful. And if it is a real lead, a one-line device-tree change would let me test it against the blank/unblank reproducer immediately — I would expect dmesg to then show flb-flags=0x1 and the visible flicker rate to drop sharply if the hypothesis is right.

8. What I have on my side, if any of it is useful

All of this lives on my workstation; happy to share any of it on request:

Written 2026-05-26 to summarise the electronics-side investigation done in a single day. Hardware observations and on-target test results are solid; the patch interpretation in §7 is speculative and may be wrong.