Captured slow-motion phone video of the screen during a flicker event and pulled individual frames out. Frame 362 is a clean baseline. Frames 363, 370, 376, 381, 382 are flickers. The consistent signature:
To my eye this is the signature of a frame-tearing / VSYNC-alignment slip. Not a data integrity problem — a timing-alignment problem.
Reference frames (slow-motion phone capture, resized for inclusion; I have the original full-size PNGs if you want them):
While a burst of flicker was actively occurring on the MIPI input path, I
enabled the SN65DSI83's internal LVDS test pattern (write 0x10
to CSR 0x3C, i.e. bit 4 of LVDS_FORMAT — confirmed via the known
i2cset sequence). With no other change:
| SN65 source | Observed |
|---|---|
| MIPI input (normal) | Flicker happening as before |
| Internal LVDS test pattern enabled | Clean colour-bar pattern, no flicker, for as long as enabled |
| Back to MIPI input | Flicker resumes |
The internal test pattern is generated inside the SN65 at the LVDS output stage, downstream of the MIPI input and the MIPI-to-LVDS conversion. The fact that it is clean while the MIPI-fed image flickers means everything from the SN65 LVDS PLL through the panel is physically healthy. The fault must therefore lie upstream of that point — somewhere in the i.MX DSIM controller output, the MIPI bus, the SN65's MIPI receiver, or the SN65's internal MIPI-to-LVDS conversion logic.
To complement the visual capture I wrote a small Flask HTTP service that runs on the Hawk and polls register state at high rate, plus a host-side Python tool that drives it and records events. This is not production-quality software — it's a measurement harness, and Claude helped me write the Python.
| File | Role |
|---|---|
device_server.py |
Runs on Hawk as a systemd service. Polls SN65 registers
(csr_0a, csr_e5) every 10 ms via I²C, and a
block of DSIM registers (STATUS, CLKCTRL, CONFIG, ESCMODE, MDRESOL,
MVPORCH, MHPORCH, MSYNC, SDRESOL, INTSRC, INTMSK, PHYTIMING0/1/2)
every 500 ms via memtool. Logs transitions. Exposes endpoints to
start/stop monitoring and toggle the SN65 test pattern. |
trigger.py |
Host-side. Talks to the on-target service over HTTP, prints alerts as
events arrive, lets the operator press f to mark a visible
flicker observation, correlates marks against switch/unlock timing, and
writes a full session log to file. |
display_test_nexio.py |
The existing GStreamer kiosk, modified to support a UDP "switch video"
trigger and a --start CLI flag so the test rig can drive
switches deterministically. |
cycle.sh |
Shell-side reproducer that loops echo 4 / echo 0 >
/sys/class/graphics/fb0/blank. Drives many pipeline
restart cycles in sequence so flicker rate can be eyeballed. |
In one session, with the monitor running and an operator (me) pressing
f on each visible flicker, I recorded a single burst of
93 visible flickers in ~29 s (~3 Hz, randomly spaced from
roughly 100 ms to 700 ms apart). All 93 marks fell within a single
~30-second window between two scheduled video switches.
Throughout the burst, the on-target poller reported:
csr_0a = 0x85 the entire timecsr_e5 = 0x00INTSRC stayed at 0x02000018 (the
PllStable bit plus some status bits the kernel does not
clear; no transient error bits caught)So from the register-polling perspective the system looks healthy at the sampling rates I used. I have the full session log if useful.
Less confident than the above bullet list might suggest, and I want to be honest about that. 500 ms is a long time compared to a single video frame (40 ms at 25 Hz), so the wide DSIM-register snapshot could easily miss a transient that completes within one poll interval. Specifically:
DSIM_INTSRC
error bits are latched but the kernel's interrupt handler clears them,
typically much faster than 500 ms, so a transient error bit could be
set and cleared between my polls and I would never see it. Pure
hardware-level events that do not change any register I am polling
(e.g. a brief PHY FIFO timing slip with no associated interrupt)
are also outside what this instrumentation can detect.memtool userspace tool via subprocess
calls, which has non-trivial overhead. Pushing the wide poll much
below ~100 ms started loading the device CPU and impacting the I²C
poll loop. I did not try to optimise this further.So the correct way to read section 4 is: at the resolutions I could poll, nothing showed up. That is consistent with — but not proof of — the fault being below register visibility (e.g. a sub-frame transient timing event).
To check whether GStreamer / userspace was contributing, I stopped both the kiosk and the device-server service, leaving just the Linux text console on screen. Then ran the cycle script:
echo 4 > /sys/class/graphics/fb0/blank echo 0 > /sys/class/graphics/fb0/blank sleep ... (repeat)
A random subset of the cycles produced the same vertical-shift artifact as during video playback, at roughly the same hit rate. The flicker is also visible on the U-Boot splash screen during boot, before Linux is up.
As I read this, the fault is reachable from both the U-Boot DSI bring-up path and the Linux DRM/KMS pipeline-restart path on every cycle. GStreamer is not required and not the cause.
Every blank/unblank cycle produces this exact 3-line sequence in dmesg, deterministically across many cycles:
mxsfb 32e00000.lcdif: [drm] magic pixel crtc=0 offset 0x3e7ffc sn65dsi83 4-002c: Using DSI clock for LVDS @MF@ sn65dsi83_get_dsi_range dsi_rate=216000000 mode_clock=72000000 min_rate=216000000 => range=0x2b
So the SN65 bridge driver is re-initialising on every cycle (recomputing its DSI range and re-applying LVDS configuration). The visible flicker variability appears to be in how this re-init lands relative to the panel's own scan cycle — sometimes the timing aligns and the panel locks cleanly; sometimes it does not. But that is software-side interpretation and I am not confident in it.
At boot the dmesg also shows the line:
samsung-dsim 32e10000.dsi: @MF@ flb-flags=0x0
which led me to poke around in the source code, see section 7.
This may turn out to be unrelated, but while poking around in the kernel
patches looking for anything that might be connected to what I am seeing,
I came across flb-hack-dsi-autoflush in
parkeon_linux_bsp/trunk/kernel/patches-6.6/ (dated 2025-05-16,
authored by Martin Fuzzey). The commit message reads:
The phrase "screen shift" jumped out because it is exactly the words I would use to describe the artifact I am seeing. From my limited reading of the patch source, it looks like it adds two device-tree flags on the Samsung DSIM driver:
| Bit | Flag | What I think it does |
|---|---|---|
| 0 | FLB_FLAG_AUTO_FLUSH |
Clears DSIM_MFLUSH_VS, switching from manual FIFO flush
to auto flush on VSYNC |
| 1 | FLB_FLAG_NO_BURST |
Clears DSIM_BURST_MODE |
The patch also adds dev_info(dev, "@MF@ flb-flags=0x%x\n",
dsi->flb_flags); in samsung_dsim_parse_dt(), which I
believe is where the @MF@ flb-flags=0x0 line in my boot dmesg
comes from. If that mapping is right, then the patch is in the running
kernel, but flb_flags == 0 at runtime, and the behavioural
changes the patch adds are wrapped in two if blocks that both
evaluate to false:
if (dsi->flb_flags & FLB_FLAG_AUTO_FLUSH)
reg &= ~DSIM_MFLUSH_VS;
if (dsi->flb_flags & FLB_FLAG_NO_BURST)
reg &= ~DSIM_BURST_MODE;
So, again from my limited understanding: it looks like this patch may have
been written as a fix associated with the same kind of artifact I am
investigating, and on the unit I have it is compiled in but not actually doing
anything because no flb-flags property has been set in the device
tree. If that interpretation is correct, the fix would be to add
flb-flags = <1>; to the DSIM device-tree node (which I
think lives in the flb-dt-hawk patch, but I have not opened it
to confirm).
I am writing this section mainly to check on its validity — or at the very least to try to understand it. I am not making a claim that this is the fix. Specifically I do not know:
If you can tell me whether this is a real lead or a red herring, that
alone would be very useful. And if it is a real lead, a one-line device-tree
change would let me test it against the blank/unblank reproducer immediately
— I would expect dmesg to then show flb-flags=0x1 and the
visible flicker rate to drop sharply if the hypothesis is right.
All of this lives on my workstation; happy to share any of it on request:
device-server.service on the Hawk)