Hi Devs,
Before I file a bug on this I'd like to try to narrow it down a bit so
I'm looking for some pointers on how TDM works and where I might start
looking.
Since upgrading to Debian Stretch and TDE: R14.0.6, we're seeing an
issue where TDM won't start a new session for a display and has zombie
child process(es). This is with networked X-Terminals using XDMCP.
It's happening roughly every week or so (no fixed time of day) on a
system with around 30 users/terminals logging in and out.
It might happen after a user logs out or after the remote terminal is
reset or, according users rumour, after a kdesktoplock has been
unlocked; I'm failing to get real concrete info on that aspect (I'm sure
you know what users are like!).
Anyway, the terminal gets fired up/restarts X11 after a logout but
doesn't then get a response to its XDMCP requests and just sits there
showing a default X11 background (hence the grey-screen-of-death
moniker).
When TDM is in this state, doing "ps waux | grep tdm" shows:
root 18481 0.0 0.0 0 0 ? Z Feb19 0:00 [tdm] <defunct>
root 20178 0.0 0.0 0 0 ? Z Feb19 0:00 [tdm] <defunct>
root 25478 0.0 0.0 0 0 ? Z Feb19 0:00 [edm] <defunct>
These processes in pstree look like:
|-tdm(12251)-+-tdm(1004)---tdm_greet(1005)-+-krootimage(1009)
| | |-twin(1013)
| | `-{tdm_greet}(1020)
| |-tdm(1389)---tdm_greet(1390)-+-krootimage(1392)
| | `-{tdm_greet}(2309)
<snip>
| |-tdm(4546)---tdm_greet(4547)-+-krootimage(4549)
| | |-twin(4552)
| | `-{tdm_greet}(4556)
| |-tdm(4738)---starttde(21417)-+-ssh-agent(21539)
| |
`-tdeinit_phase1(21613)---kwrapper(21614)
| |-tdm(18481)
| |-tdm(20178)
| |-tdm(20178)
| |-tdm(25478)
<snip>
|-tdmtsak(15721)
Doing an strace on the main TDM process, shows it doing a select on a
bunch of fds but nothing else.
If I restart tdm (/etc/init.d/tdm restart), it will start working again
and, mostly, preserve the live users sessions... but sometimes not and
everyone gets booted out :-(
Once restarted like this, some tdms have a parent of systemd and some
are children of the new? main tdm process:
systemd(1)-+-ModemManager(1591)-+-{gdbus}(1599)
| `-{gmain}(1595)
<snip>
|-tdm(16655)---starttde(6171)-+-ssh-agent(6293)
| `-tdeinit_phase1(6369)---kwrapper(6370)
|-tdm(29430)---starttde(26723)-+-ssh-agent(26845)
| `-tdeinit_phase1(26933)---kwrapper(26934)
|-tdm(32506)---starttde(21854)-+-ssh-agent(21976)
| `-tdeinit_phase1(22057)---kwrapper(22058)
|-tdm(32680)-+-tdm(424)---tdm_greet(429)-+-krootimage(1068)
| | |-twin(1071)
| | `-{tdm_greet}(1075)
| |-tdm(426)---starttde(7968)-+-ssh-agent(8092)
| |
`-tdeinit_phase1(8229)---kwrapper(8231)
| |-tdm(428)---tdm_greet(431)-+-krootimage(1150)
| | |-twin(1155)
So... can anyone suggest where I should be looking to try an narrow
down the cause of this?
How does TDM know when a session ends and what stops it providing a new
one?
Should it have zombies hanging around? Should it reap it's zombie
children? I see there's a ReapChildren function in dm.c which seems
like it should get called when it gets a SIGCHLD. How could that not be
working?
Thanks in advance.
--
Regards,
Russell
--------------------------------------------------------------------
| Russell Brown | MAIL: russell(a)lls.com PHONE: 01780 471800 |
| Lady Lodge Systems | WWW Work:
http://www.lls.com |
| Peterborough, England | WWW Play:
http://www.ruffle.me.uk |
--------------------------------------------------------------------