Hi Devs,
Before I file a bug on this I'd like to try to narrow it down a bit so I'm looking for some pointers on how TDM works and where I might start looking.
Since upgrading to Debian Stretch and TDE: R14.0.6, we're seeing an issue where TDM won't start a new session for a display and has zombie child process(es). This is with networked X-Terminals using XDMCP.
It's happening roughly every week or so (no fixed time of day) on a system with around 30 users/terminals logging in and out.
It might happen after a user logs out or after the remote terminal is reset or, according users rumour, after a kdesktoplock has been unlocked; I'm failing to get real concrete info on that aspect (I'm sure you know what users are like!).
Anyway, the terminal gets fired up/restarts X11 after a logout but doesn't then get a response to its XDMCP requests and just sits there showing a default X11 background (hence the grey-screen-of-death moniker).
When TDM is in this state, doing "ps waux | grep tdm" shows:
root 18481 0.0 0.0 0 0 ? Z Feb19 0:00 [tdm] <defunct> root 20178 0.0 0.0 0 0 ? Z Feb19 0:00 [tdm] <defunct> root 25478 0.0 0.0 0 0 ? Z Feb19 0:00 [edm] <defunct>
These processes in pstree look like:
|-tdm(12251)-+-tdm(1004)---tdm_greet(1005)-+-krootimage(1009) | | |-twin(1013) | | `-{tdm_greet}(1020) | |-tdm(1389)---tdm_greet(1390)-+-krootimage(1392) | | `-{tdm_greet}(2309) <snip> | |-tdm(4546)---tdm_greet(4547)-+-krootimage(4549) | | |-twin(4552) | | `-{tdm_greet}(4556) | |-tdm(4738)---starttde(21417)-+-ssh-agent(21539) | | `-tdeinit_phase1(21613)---kwrapper(21614) | |-tdm(18481) | |-tdm(20178) | |-tdm(20178) | |-tdm(25478) <snip> |-tdmtsak(15721)
Doing an strace on the main TDM process, shows it doing a select on a bunch of fds but nothing else.
If I restart tdm (/etc/init.d/tdm restart), it will start working again and, mostly, preserve the live users sessions... but sometimes not and everyone gets booted out :-(
Once restarted like this, some tdms have a parent of systemd and some are children of the new? main tdm process:
systemd(1)-+-ModemManager(1591)-+-{gdbus}(1599) | `-{gmain}(1595) <snip> |-tdm(16655)---starttde(6171)-+-ssh-agent(6293) | `-tdeinit_phase1(6369)---kwrapper(6370) |-tdm(29430)---starttde(26723)-+-ssh-agent(26845) | `-tdeinit_phase1(26933)---kwrapper(26934) |-tdm(32506)---starttde(21854)-+-ssh-agent(21976) | `-tdeinit_phase1(22057)---kwrapper(22058) |-tdm(32680)-+-tdm(424)---tdm_greet(429)-+-krootimage(1068) | | |-twin(1071) | | `-{tdm_greet}(1075) | |-tdm(426)---starttde(7968)-+-ssh-agent(8092) | | `-tdeinit_phase1(8229)---kwrapper(8231) | |-tdm(428)---tdm_greet(431)-+-krootimage(1150) | | |-twin(1155)
So... can anyone suggest where I should be looking to try an narrow down the cause of this?
How does TDM know when a session ends and what stops it providing a new one?
Should it have zombies hanging around? Should it reap it's zombie children? I see there's a ReapChildren function in dm.c which seems like it should get called when it gets a SIGCHLD. How could that not be working?
Thanks in advance.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
On 2019/02/20 09:06 PM, Russell Brown wrote:
Since upgrading to Debian Stretch and TDE: R14.0.6, we're seeing an issue where TDM won't start a new session for a display and has zombie child process(es). This is with networked X-Terminals using XDMCP.
Hi Russell, this could be a bug, but a hard one to reproduce. Are you able to find out a way to systematically creae the problem? It would allow to investigate this further. Otherwise I think it would be really hard to find out what is going on.
Cheers Michele