Slavek,
One more issue with the tdeio_slaves failing to close is if kmail is left open, the automatic checks for new mails done every 10 minutes or so, will cause the mail server to begin rejecting connections after its anvil connection limit is reached (due to the imap connections never being closed):
This was from my mail server after I began getting imap connection rejects: (all of those logins are from kmail that are stuck open)
953 ? Ss 0:29 /usr/sbin/dovecot 1087 ? S 0:05 _ dovecot/anvil 1088 ? S 0:05 _ dovecot/log 15680 ? S 0:09 _ dovecot/config 31108 ? S 0:05 _ dovecot/imap-login 31112 ? S 0:11 _ dovecot/imap 1022 ? S 0:04 _ dovecot/imap-login 1026 ? S 0:06 _ dovecot/imap 20731 ? S 0:01 _ dovecot/imap-login 20737 ? S 0:02 _ dovecot/imap 21919 ? S 0:00 _ dovecot/imap-login 21921 ? S 0:01 _ dovecot/imap 23254 ? S 0:00 _ dovecot/imap-login 23255 ? S 0:01 _ dovecot/imap 25991 ? S 0:00 _ dovecot/imap-login 25993 ? S 0:01 _ dovecot/imap 32553 ? S 0:00 _ dovecot/imap-login 32557 ? S 0:01 _ dovecot/imap 15881 ? S 0:00 _ dovecot/imap-login 15883 ? S 0:00 _ dovecot/imap 16677 ? S 0:00 _ dovecot/imap-login 16678 ? S 0:00 _ dovecot/imap 19776 ? S 0:00 _ dovecot/imap-login 19780 ? S 0:00 _ dovecot/imap 23531 ? S 0:00 _ dovecot/imap-login 23535 ? S 0:00 _ dovecot/imap 23537 ? S 0:00 _ dovecot/imap-login 23538 ? S 0:00 _ dovecot/imap 1039 ? Ss 0:00 /usr/sbin/faxq
Le 02/03/2014 05:43, David C. Rankin a écrit :
Slavek,
One more issue with the tdeio_slaves failing to close is if kmail is left open, the automatic checks for new mails done every 10 minutes or so, will cause the mail server to begin rejecting connections after its anvil connection limit is reached (due to the imap connections never being closed):
This was from my mail server after I began getting imap connection rejects: (all of those logins are from kmail that are stuck open)
953 ? Ss 0:29 /usr/sbin/dovecot 1087 ? S 0:05 _ dovecot/anvil 1088 ? S 0:05 _ dovecot/log 15680 ? S 0:09 _ dovecot/config 31108 ? S 0:05 _ dovecot/imap-login 31112 ? S 0:11 _ dovecot/imap 1022 ? S 0:04 _ dovecot/imap-login 1026 ? S 0:06 _ dovecot/imap 20731 ? S 0:01 _ dovecot/imap-login 20737 ? S 0:02 _ dovecot/imap 21919 ? S 0:00 _ dovecot/imap-login 21921 ? S 0:01 _ dovecot/imap 23254 ? S 0:00 _ dovecot/imap-login 23255 ? S 0:01 _ dovecot/imap 25991 ? S 0:00 _ dovecot/imap-login 25993 ? S 0:01 _ dovecot/imap 32553 ? S 0:00 _ dovecot/imap-login 32557 ? S 0:01 _ dovecot/imap 15881 ? S 0:00 _ dovecot/imap-login 15883 ? S 0:00 _ dovecot/imap 16677 ? S 0:00 _ dovecot/imap-login 16678 ? S 0:00 _ dovecot/imap 19776 ? S 0:00 _ dovecot/imap-login 19780 ? S 0:00 _ dovecot/imap 23531 ? S 0:00 _ dovecot/imap-login 23535 ? S 0:00 _ dovecot/imap 23537 ? S 0:00 _ dovecot/imap-login 23538 ? S 0:00 _ dovecot/imap 1039 ? Ss 0:00 /usr/sbin/faxq
I'm making some progress on the investigation, but I'm not done yet. Unlike what I thought before, the problem is not in the TDElauncher. It's more likely around the TDEIO scheduler (tdelibs/tdeio/tdeio/scheduler.cpp).
I've found out that the bug affect ioslaves having "remote files listing" feature (e.g. fish, ftp ...) but not the others (http ...). I believe the problem is inside the "kdirlister" class (tdelibs/tdeio/tdeio/kdirlister.cpp). When browsing a remote folder using fish://remote_host/remote_directory/ , the scheduler spawns a new tdeioslave to get the remote content, but this job never informs the scheduler that it has finished working. So the scheduler still believes the ioslave is busy, and does no reuse it. The iolave is stale forever. This is certainly a signal/slot issue.
Probably one of the last commits here is the culprit: http://git.trinitydesktop.org/cgit/tdelibs/log/tdeio/tdeio/kdirlister.cpp
When using the same protocol to get an exact file (no directory browsing), the problem does never occur. E.g: konqueror fish://remote_host/image1.jpg
Francois
On 03/02/2014 05:10 AM, François Andriot wrote:
Le 02/03/2014 05:43, David C. Rankin a écrit :
Slavek,
One more issue with the tdeio_slaves failing to close is if kmail is left open, the automatic checks for new mails done every 10 minutes or so, will cause the mail server to begin rejecting connections after its anvil connection limit is reached (due to the imap connections never being closed):
This was from my mail server after I began getting imap connection rejects: (all of those logins are from kmail that are stuck open)
953 ? Ss 0:29 /usr/sbin/dovecot 1087 ? S 0:05 _ dovecot/anvil 1088 ? S 0:05 _ dovecot/log 15680 ? S 0:09 _ dovecot/config 31108 ? S 0:05 _ dovecot/imap-login 31112 ? S 0:11 _ dovecot/imap 1022 ? S 0:04 _ dovecot/imap-login 1026 ? S 0:06 _ dovecot/imap 20731 ? S 0:01 _ dovecot/imap-login 20737 ? S 0:02 _ dovecot/imap 21919 ? S 0:00 _ dovecot/imap-login 21921 ? S 0:01 _ dovecot/imap 23254 ? S 0:00 _ dovecot/imap-login 23255 ? S 0:01 _ dovecot/imap 25991 ? S 0:00 _ dovecot/imap-login 25993 ? S 0:01 _ dovecot/imap 32553 ? S 0:00 _ dovecot/imap-login 32557 ? S 0:01 _ dovecot/imap 15881 ? S 0:00 _ dovecot/imap-login 15883 ? S 0:00 _ dovecot/imap 16677 ? S 0:00 _ dovecot/imap-login 16678 ? S 0:00 _ dovecot/imap 19776 ? S 0:00 _ dovecot/imap-login 19780 ? S 0:00 _ dovecot/imap 23531 ? S 0:00 _ dovecot/imap-login 23535 ? S 0:00 _ dovecot/imap 23537 ? S 0:00 _ dovecot/imap-login 23538 ? S 0:00 _ dovecot/imap 1039 ? Ss 0:00 /usr/sbin/faxq
I'm making some progress on the investigation, but I'm not done yet. Unlike what I thought before, the problem is not in the TDElauncher. It's more likely around the TDEIO scheduler (tdelibs/tdeio/tdeio/scheduler.cpp).
Interesting. Any progress is good progress. One concern I had -- that not somehow placing preprocessor conditionals around the tdm/backend/dm.h (line 40) '#define WITH_CONSOLE_KIT' was causing an incorrect declaration of StartClient() at dm.h line 487:
#ifdef WITH_CONSOLE_KIT int StartClient( const char *ck_session_cookie ); #else int StartClient( void ); #endif
I added fprintf(stderr,....) statements inside each of the WITH_CONSOLE_KIT statements to try and catch whether the WITH_CONSOLE_KIT statements were being utilized. So far, I haven't seen any pop up in the logs. (I've attached the patch with the fprintfs if you are curious).
I have also built with an explicit register of the the greater with pam as suggested by freedesktop.org:
#ifdef WITH_SYSTEMD if ((pretc = pam_misc_setenv( pamh, XDG_SESSION_CLASS, "greeter", 0 )) !=PAM_SUCCESS) { ReInitErrorLog(); LogError( "pam_misc_setenv() for %s failed: %s\n", curuser, pam_strerror( pamh, pretc ) ); return 0; } if ((pretc = pam_misc_setenv( pamh, XDG_SEAT, "seat0", 0 )) != PAM_SUCCESS) { ReInitErrorLog(); LogError( "pam_misc_setenv() for %s failed: %s\n", curuser, pam_strerror( pamh, pretc ) ); return 0; } if ((pretc = pam_misc_setenv( pamh, XDG_VTNR, "7", 0 )) != PAM_SUCCESS) { ReInitErrorLog(); LogError( "pam_misc_setenv() for %s failed: %s\n", curuser, pam_strerror( pamh, pretc ) ); return 0; }
The build it complete, but I have not yet had time to test. I'll report back if that makes any difference with logind.
I've found out that the bug affect ioslaves having "remote files listing" feature (e.g. fish, ftp ...) but not the others (http ...).
I don't think this is right. I have many, many abandoned tdeio_http and tdeio_file processes left running on my system. I think it effects ALL tdeio.
I believe the problem is inside the "kdirlister" class (tdelibs/tdeio/tdeio/kdirlister.cpp). When browsing a remote folder using fish://remote_host/remote_directory/ , the scheduler spawns a new tdeioslave to get the remote content, but this job never informs the scheduler that it has finished working. So the scheduler still believes the ioslave is busy, and does no reuse it. The iolave is stale forever. This is certainly a signal/slot issue.
Probably one of the last commits here is the culprit: http://git.trinitydesktop.org/cgit/tdelibs/log/tdeio/tdeio/kdirlister.cpp
Was this another renaming patch to blame? I'll look.
When using the same protocol to get an exact file (no directory browsing), the problem does never occur. E.g: konqueror fish://remote_host/image1.jpg
Francois
Thank you for your help Francios. I look forward to fixing the tdeio issue and then fixing the logind/loginctl problem so I can actually have sound and use usb drives again.
On 03/02/2014 03:59 PM, David C. Rankin wrote:
I've found out that the bug affect ioslaves having "remote files listing"
feature (e.g. fish, ftp ...) but not the others (http ...).
I don't think this is right. I have many, many abandoned tdeio_http and tdeio_file processes left running on my system. I think it effects ALL tdeio.
Francios,
Just logging in leave 2 open tdeio_file connections and 4 tdeio_http connections from KOrganizer checking the groupware calendar, etc. That is 6 open/abandoned tdeio processes before I have even launched any application.
578 ? S 0:00 _ tdelauncher [tdeinit] --new-startup 589 ? S 0:01 _ twin [tdeinit] -session 10d7cdd8c9000139108311200000009720000_1393 605 ? S 0:00 _ tdeio_file [tdeinit] file /tmp/tdesocket-david/tdelauncherbcVCMs.s 606 ? S 0:00 _ tdeio_file [tdeinit] file /tmp/tdesocket-david/tdelauncherbcVCMs.s 612 ? S 0:00 _ notification-daemon-tde 631 ? S 0:00 _ konqueror [tdeinit] -session 10d7cdd8c9000139235256900000063930014 632 ? S 0:00 _ konqueror [tdeinit] -session 10d7cdd8c9000139241737300000063930020 633 ? S 0:00 _ konqueror [tdeinit] -session 10d7cdd8c9000139269956500000004590015 634 ? S 0:00 _ konqueror [tdeinit] -session 10d7cdd8c9000139271054100000004590028 635 ? S 0:00 _ konqueror [tdeinit] -session 10d7cdd8c9000139340009600000013660015 636 ? S 0:00 _ konqueror [tdeinit] -session 10d7cdd8c9000139346014900000005280015 675 ? S 0:00 _ tdeio_http [tdeinit] https /tmp/tdesocket-david/tdelauncherbcVCMs. 679 ? S 0:00 _ tdeio_http [tdeinit] https /tmp/tdesocket-david/tdelauncherbcVCMs. 682 ? S 0:00 _ tdeio_http [tdeinit] https /tmp/tdesocket-david/tdelauncherbcVCMs. 683 ? S 0:00 _ tdeio_http [tdeinit] https /tmp/tdesocket-david/tdelauncherbcVCMs.
I think the tdeio_http process ARE included as part of this problem, BUT I think the idle_timeout for http is working and will eventually kill the tdeio_http procs. They are definitely left open after the http query from KOrganizer, but something kills them some time later (~60 seconds later).
Then opening konqueror and selecting 1 remote file to edit in kate gives the following .xsession-error output:
konqueror:
tdeio_file: ========= LIST file:/// ========= tdeio_file: ========= LIST file:///home/david ========= tdeio_file: ============= COMPLETED LIST ============ tdeio_file: ============= COMPLETED LIST ============ tdeio: KSambaShare: Could not found smb.conf! tdeio: KNFSShare: Could not found exports file! tdeio (TDEIOJob): stat sftp://phoinix.rlfpllc.com/dat_e/ [tdeinit] Got EXEC_NEW 'tdeio_sftp' from launcher. tdeio (TDELauncher): tdeio_sftp (pid 725) up and running. tdeio_sftp: ERROR: KSshProcess::version(): pclose failed. tdeio (TDEIOJob): LocalURLJob::slotLocalURL(sftp://phoinix.rlfpllc.com/dat_e/) [tdeinit] Got EXEC_NEW 'tdeio_sftp' from launcher. tdeio (TDELauncher): tdeio_sftp (pid 729) up and running. tdeio_sftp: ERROR: KSshProcess::version(): pclose failed. tdeio (TDEIOJob): LocalURLJob::slotLocalURL(sftp://phoinix.rlfpllc.com/dat_e/tde) [tdeinit] Got EXEC_NEW 'tdeio_sftp' from launcher. tdeio (TDELauncher): tdeio_sftp (pid 733) up and running. tdeio (TDEIOJob): LocalURLJob::slotLocalURL(sftp://phoinix.rlfpllc.com/dat_e/tde/tmp) [tdeinit] Got EXEC_NEW 'tdeio_sftp' from launcher. tdeio (TDELauncher): tdeio_sftp (pid 737) up and running. tdeio_sftp: ERROR: KSshProcess::version(): pclose failed.
open remote file in kate, edit and close:
tdeio (TDELauncher): TDELauncher: Got start_service_by_desktop_path('/opt/trinity/share/applications/tde/kate.desktop', ...) [tdeinit] Got EXT_EXEC 'kate' from launcher. tdeio (TDELauncher): kate (pid 749) up and running. tdecore (TDEAction): WARNING: TDEActionCollection::TDEActionCollection( TQObject *parent, const char *name, TDEInstance *instance ) tdeio (TDEIOJob): Starting tdeio_uiserver tdeio (TDELauncher): TDELauncher: Got start_service_by_desktop_path('tdeio_uiserver.desktop', ...) [tdeinit] Got EXT_EXEC 'tdeio_uiserver' from launcher. tdeio (TDELauncher): tdeio_uiserver (pid 750) up and running. tdeio (TDEIOJob): startServiceByDesktopPath returned 0 tdeio (TDEIOJob): tdeio_uiserver registered [tdeinit] Got EXEC_NEW 'tdeio_file' from launcher. tdeio (TDELauncher): tdeio_file (pid 752) up and running. [tdeinit] Got EXEC_NEW 'tdeio_sftp' from launcher. tdeio (TDELauncher): tdeio_sftp (pid 753) up and running. tdeio_file: Starting 752 tdeio_file: ========= LIST file:///home/david ========= tdeio_file: ============= COMPLETED LIST ============ tdeio: KSambaShare: Could not found smb.conf! tdeio: KNFSShare: Could not found exports file! tdeio_sftp: ERROR: KSshProcess::version(): pclose failed. tdeio_sftp: ERROR: sftpRead: read failed with code 1 tdeio_file: Done tdeio_file: Done
After opening/edit/closing the remote file I have 3 additional hung tdeio_sftp processes and 1 more tdeio_file process:
725 ? S 0:00 _ tdeio_sftp [tdeinit] sftp /tmp/tdesocket-david/tdelauncherbcVCMs.s 729 ? S 0:00 _ tdeio_sftp [tdeinit] sftp /tmp/tdesocket-david/tdelauncherbcVCMs.s 733 ? S 0:00 _ tdeio_sftp [tdeinit] sftp /tmp/tdesocket-david/tdelauncherbcVCMs.s 752 ? S 0:00 _ tdeio_file [tdeinit] file /tmp/tdesocket-david/tdelauncherbcVCMs.s
These don't ever go away.
Le 02/03/2014 23:36, David C. Rankin a écrit :
On 03/02/2014 03:59 PM, David C. Rankin wrote:
I've found out that the bug affect ioslaves having "remote files listing"
feature (e.g. fish, ftp ...) but not the others (http ...).
I don't think this is right. I have many, many abandoned tdeio_http and tdeio_file processes left running on my system. I think it effects ALL tdeio.
Francios,
Just logging in leave 2 open tdeio_file connections and 4 tdeio_http connections from KOrganizer checking the groupware calendar, etc. That is 6 open/abandoned tdeio processes before I have even launched any application.
[...] After opening/edit/closing the remote file I have 3 additional hung tdeio_sftp processes and 1 more tdeio_file process:
725 ? S 0:00 _ tdeio_sftp [tdeinit] sftp /tmp/tdesocket-david/tdelauncherbcVCMs.s 729 ? S 0:00 _ tdeio_sftp [tdeinit] sftp /tmp/tdesocket-david/tdelauncherbcVCMs.s 733 ? S 0:00 _ tdeio_sftp [tdeinit] sftp /tmp/tdesocket-david/tdelauncherbcVCMs.s 752 ? S 0:00 _ tdeio_file [tdeinit] file /tmp/tdesocket-david/tdelauncherbcVCMs.s
These don't ever go away.
Hello, from what I've seen in the code, the "long " (about 1 minute) timeout is nominal. The "tdeio_file" is intended to keep always at least one process.
I have the same behaviour in TDE 3.5.13.2 and TDE 14.0.0 : when openng konqueror on the "home page" (blue background with TDE main shortcuts such as "file explorer" etc ...), it opens instantly 4 tdeio_files process, which end up vanishing except one. I think this is normal.
The difference between 3.5.13.2 and 14.0.0 occurs in remote tdeioslaves that are listing directories, such as fish and ftp (not tried sftp yet). In 14.0.0, Every time you list a directory content, you get a new stale tdeioslave. So if you navigate in your folders, you can get lots of stale processes ! This problem does not exist in 3.5.13.2.
Try to download a flle with a direct file with direct url: sftp://remote_host/remote_file and see how many tdeio_sftp appear.
Francois
On 03/03/2014 12:01 AM, François Andriot wrote:
The difference between 3.5.13.2 and 14.0.0 occurs in remote tdeioslaves that are listing directories, such as fish and ftp (not tried sftp yet). In 14.0.0, Every time you list a directory content, you get a new stale tdeioslave. So if you navigate in your folders, you can get lots of stale processes ! This problem does not exist in 3.5.13.2.
Try to download a flle with a direct file with direct url: sftp://remote_host/remote_file and see how many tdeio_sftp appear.
Francois
Yes, I saw where you wrote that opening a 'single' remote file with a unique URL does not generate additional tdeio_x slaves. I have confirmed that if I use konqueror in type the complete URL:
sftp://somehost.tld/path/to/a/filename.ext
No stale tdeio_sftp processes are created. I have also, confirmed that the tdeio_http processes are ultimately killed by something (presumably the failsafe idle_timeout), but I don't think that behavior is correct.
The dirlist on remote hosts does look like it is part of the problem. It's like TDE loses track of all the tdeio_x processes created to build the remote '/path/to/some/' before getting to 'filename.ext'
Likewise, I was not able to resolve the 'loginctl show-session $XDG_SESSION_ID' problem by explicitly adding 'greeter' to pam environment. I need to know what 'magic' is being done on your pure-systemd boxes when the display manager is launched that allows you to get the needed session tracking established. Take a look at your setup and see if you can isolate the code.
On Arch, the only thing I am doing to launch tdm.service is to enable that service in systemd. My service file contains nothing but:
[Unit] Description=TDE Display Manager After=systemd-user-sessions.service
[Service] ExecStart=/opt/trinity/bin/tdm
[Install] Alias=display-manager.service
Does yours contain anything else?
On 03/03/2014 08:17 AM, David C. Rankin wrote:
On 03/03/2014 12:01 AM, François Andriot wrote:
The difference between 3.5.13.2 and 14.0.0 occurs in remote tdeioslaves that are listing directories, such as fish and ftp (not tried sftp yet). In 14.0.0, Every time you list a directory content, you get a new stale tdeioslave. So if you navigate in your folders, you can get lots of stale processes ! This problem does not exist in 3.5.13.2.
Try to download a flle with a direct file with direct url: sftp://remote_host/remote_file and see how many tdeio_sftp appear.
Francois
Yes, I saw where you wrote that opening a 'single' remote file with a unique URL does not generate additional tdeio_x slaves. I have confirmed that if I use konqueror in type the complete URL:
sftp://somehost.tld/path/to/a/filename.ext
No stale tdeio_sftp processes are created. I have also, confirmed that the tdeio_http processes are ultimately killed by something (presumably the failsafe idle_timeout), but I don't think that behavior is correct.
The dirlist on remote hosts does look like it is part of the problem. It's like TDE loses track of all the tdeio_x processes created to build the remote '/path/to/some/' before getting to 'filename.ext'
Likewise, I was not able to resolve the 'loginctl show-session $XDG_SESSION_ID' problem by explicitly adding 'greeter' to pam environment. I need to know what 'magic' is being done on your pure-systemd boxes when the display manager is launched that allows you to get the needed session tracking established. Take a look at your setup and see if you can isolate the code.
On Arch, the only thing I am doing to launch tdm.service is to enable that service in systemd. My service file contains nothing but:
[Unit] Description=TDE Display Manager After=systemd-user-sessions.service
[Service] ExecStart=/opt/trinity/bin/tdm
[Install] Alias=display-manager.service
Does yours contain anything else?
The minimal kde:session tracking is working on my system (since my xsession fix on 1/30), but none of the internal user session tools polkit/udev, etc. work to provide access to sound, user mount, etc. The logs are clear that the kde:session is seen, opened and closed:
Feb 28 17:51:59 valhalla [393]: pam_unix(kde:session): session opened for user david by (uid=0) Feb 28 17:56:40 valhalla [393]: pam_unix(kde:session): session closed for user david Feb 28 18:06:41 valhalla [407]: pam_unix(kde:session): session opened for user david by (uid=0) Mar 01 18:23:42 valhalla [407]: pam_unix(kde:session): session closed for user david Mar 01 21:39:48 valhalla [1435]: pam_unix(kde:session): session opened for user david by (uid=0) Mar 01 22:32:40 valhalla [1435]: pam_unix(kde:session): session closed for user david Mar 02 16:14:06 valhalla [485]: pam_unix(kde:session): session opened for user david by (uid=0)
Le 03/03/2014 15:17, David C. Rankin a écrit :
Yes, I saw where you wrote that opening a 'single' remote file with a unique URL does not generate additional tdeio_x slaves. I have confirmed that if I use konqueror in type the complete URL:
sftp://somehost.tld/path/to/a/filename.ext
No stale tdeio_sftp processes are created. I have also, confirmed that the tdeio_http processes are ultimately killed by something (presumably the failsafe idle_timeout), but I don't think that behavior is correct.
It looks like tdeio_http is a special case with hardcoded longer timeouts than the default ones.
The dirlist on remote hosts does look like it is part of the problem. It's like TDE loses track of all the tdeio_x processes created to build the remote '/path/to/some/' before getting to 'filename.ext'
To be more precise, TDE does not lose tracks of the tdeio process. The tdeio scheduler is always aware of its slave threads. The actual problem is that the tdeio scheduler never receives the "job is finished" notification from some slaves. So it considers this slave as being eternally busy and keeps spawning new ones ...
The nominal scenario looks like: 1) an application requests an URL to the tdeio scheduler (e.g. konqueror asks "directory listing for sftp://remotehost/") 2) the tdeio scheduler instantiates a "job" 3) the job looks for an idle "slave" that can do the job (e.g. correct protocol), uses one if it exists, or else asks the scheduler to instantiate a new slave. 4) the slave spawns a 3rd party process (ssh in my case) and waits for text output. (note: stheome slaves do the job directly without spawning a 3rd party process) 5) The 3rd party process does its job (remote directory listing for example) and writes output to the slave. 6) After the command is complete, the slave ceases receiving data because nothing is written anymore by the 3rd party process. 7) The slave sends "finished" to the job. 8) the job sends "finished" to the scheduler. 9) the scheduler deletes the job and puts the slave in the "idle slave list" so that it can be reused by another job, or will be killed after some minutes of idleness.
What happens with my "kdirlist" problem (and probably your imap problem too), is that step 7 never occurs. For an unknown reason, the slave, after having received the correct data, never notifies the job that it has finished, then the job never notifies the scheduler, then the scheduler think the slave is still active and does not mark it as "idle" ... and here is our stale tdeioslave ... (note: the slave eventually gets killed if the remote host closes network connection for idleness ... but it looks like this does not happen with ssh protocol)
I'm currently looking into the cache mechanism of the "kdirlist" job class. I believe (to be confirmed) that when kdirlist uses its internal cache, it still spawns a slave but does NOT uses it at all, since it already has the data it is looking for in its cache. Then it returns the cached data and ignores the spawned slave, which sits there forever, waiting for a query from the job that never comes ...
Francois
On 03/03/2014 02:06 PM, François Andriot wrote:
To be more precise, TDE does not lose tracks of the tdeio process. The tdeio scheduler is always aware of its slave threads. The actual problem is that the tdeio scheduler never receives the "job is finished" notification from some slaves. So it considers this slave as being eternally busy and keeps spawning new ones ...
The nominal scenario looks like:
- an application requests an URL to the tdeio scheduler (e.g. konqueror asks
"directory listing for sftp://remotehost/") 2) the tdeio scheduler instantiates a "job" 3) the job looks for an idle "slave" that can do the job (e.g. correct protocol), uses one if it exists, or else asks the scheduler to instantiate a new slave. 4) the slave spawns a 3rd party process (ssh in my case) and waits for text output. (note: stheome slaves do the job directly without spawning a 3rd party process) 5) The 3rd party process does its job (remote directory listing for example) and writes output to the slave. 6) After the command is complete, the slave ceases receiving data because nothing is written anymore by the 3rd party process. 7) The slave sends "finished" to the job. 8) the job sends "finished" to the scheduler. 9) the scheduler deletes the job and puts the slave in the "idle slave list" so that it can be reused by another job, or will be killed after some minutes of idleness.
What happens with my "kdirlist" problem (and probably your imap problem too), is that step 7 never occurs. For an unknown reason, the slave, after having received the correct data, never notifies the job that it has finished, then the job never notifies the scheduler, then the scheduler think the slave is still active and does not mark it as "idle" ... and here is our stale tdeioslave ... (note: the slave eventually gets killed if the remote host closes network connection for idleness ... but it looks like this does not happen with ssh protocol)
I'm currently looking into the cache mechanism of the "kdirlist" job class. I believe (to be confirmed) that when kdirlist uses its internal cache, it still spawns a slave but does NOT uses it at all, since it already has the data it is looking for in its cache. Then it returns the cached data and ignores the spawned slave, which sits there forever, waiting for a query from the job that never comes ...
Francois
Francois,
I'll say it again, "you're good!"
Is the slave in #7 actually sending the "finished" and it never makes it to the job? Or is it not sending "finished" at all?
Is there any possibility that #7 does not occur because the slave does not know where to send "finished"? What links/connects the slave to the job? Is it a signal/slot, or some memory address that the job originally passed to the slave in #3? Or does the slave just generate the "finished" and pass some type of job number along with it after #6??
Could the slave/job connection created in #3 be broken somehow such that the reverse path in #7 no longer exits after the delay in #6?
Le 06/03/2014 06:45, David C. Rankin a écrit :
On 03/03/2014 02:06 PM, François Andriot wrote:
[...]
Francois,
I'll say it again, "you're good!"
Is the slave in #7 actually sending the "finished" and it never makes it to the job? Or is it not sending "finished" at all?
Is there any possibility that #7 does not occur because the slave does not know where to send "finished"? What links/connects the slave to the job? Is it a signal/slot, or some memory address that the job originally passed to the slave in #3? Or does the slave just generate the "finished" and pass some type of job number along with it after #6??
Could the slave/job connection created in #3 be broken somehow such that the reverse path in #7 no longer exits after the delay in #6?
Please check bug report #1902 : I've posted 2 patches there. If it's confirmed working, I'll explain what I believe is the root cause of this problem.
Francois
On 03/06/2014 12:14 AM, François Andriot wrote:
Please check bug report #1902 : I've posted 2 patches there. If it's confirmed working, I'll explain what I believe is the root cause of this problem.
Francois
OK, will report back!
On 03/06/2014 12:14 AM, François Andriot wrote:
Le 06/03/2014 06:45, David C. Rankin a écrit :
On 03/03/2014 02:06 PM, François Andriot wrote:
[...]
Francois,
I'll say it again, "you're good!"
Is the slave in #7 actually sending the "finished" and it never makes it to the job? Or is it not sending "finished" at all?
Is there any possibility that #7 does not occur because the slave does not know where to send "finished"? What links/connects the slave to the job? Is it a signal/slot, or some memory address that the job originally passed to the slave in #3? Or does the slave just generate the "finished" and pass some type of job number along with it after #6??
Could the slave/job connection created in #3 be broken somehow such that the reverse path in #7 no longer exits after the delay in #6?
Please check bug report #1902 : I've posted 2 patches there. If it's confirmed working, I'll explain what I believe is the root cause of this problem.
Francois
Francios,
I have patches 1981 and 1982 for tdebase, but is there another patch for tdelibs?
On 03/06/2014 12:58 AM, David C. Rankin wrote:
Francios,
I have patches 1981 and 1982 for tdebase, but is there another patch for tdelibs?
Nevermind, I see the one for tdelibs/tdebase.
On 03/06/2014 12:14 AM, François Andriot wrote:
Le 06/03/2014 06:45, David C. Rankin a écrit :
On 03/03/2014 02:06 PM, François Andriot wrote:
[...]
Francois,
I'll say it again, "you're good!"
Is the slave in #7 actually sending the "finished" and it never makes it to the job? Or is it not sending "finished" at all?
Is there any possibility that #7 does not occur because the slave does not know where to send "finished"? What links/connects the slave to the job? Is it a signal/slot, or some memory address that the job originally passed to the slave in #3? Or does the slave just generate the "finished" and pass some type of job number along with it after #6??
Could the slave/job connection created in #3 be broken somehow such that the reverse path in #7 no longer exits after the delay in #6?
Please check bug report #1902 : I've posted 2 patches there. If it's confirmed working, I'll explain what I believe is the root cause of this problem.
Francois
Francios,
I tested kioslave behavior in kde3. There are NO stale kioslave processes produced even with multiple sftp connections open in konqueror - none at all. So all of this stale tdeio mess is broken TDE behavior. Here is a screenshot and output of ps axf:
http://www.3111skyline.com/dl/dt/trinity/ss/kde-konqueror-no-kioslaves.jpg
10:53 alchemy:~/cnf/mediawiki/img/bn2/128> ps axf | grep kio 8840 ? S 0:00 _ kio_file [kdeinit] file /tmp/ksocket-david/klauncher0uEG8g.slav 24358 ? S 0:04 kio_uiserver [kdeinit]
This has got to be the result of some stray k->tde rename that has messed up tdeio job/slave/scheduler communications.
On 03/06/2014 12:14 AM, François Andriot wrote:
Le 06/03/2014 06:45, David C. Rankin a écrit :
On 03/03/2014 02:06 PM, François Andriot wrote:
[...]
Francois,
I'll say it again, "you're good!"
Is the slave in #7 actually sending the "finished" and it never makes it to the job? Or is it not sending "finished" at all?
Is there any possibility that #7 does not occur because the slave does not know where to send "finished"? What links/connects the slave to the job? Is it a signal/slot, or some memory address that the job originally passed to the slave in #3? Or does the slave just generate the "finished" and pass some type of job number along with it after #6??
Could the slave/job connection created in #3 be broken somehow such that the reverse path in #7 no longer exits after the delay in #6?
Please check bug report #1902 : I've posted 2 patches there. If it's confirmed working, I'll explain what I believe is the root cause of this problem.
Francois
Whoop!
After patch and rebuild of tdebase and tdelibs with the patches, I no longer have any stale tdeio_sftp processes hanging around (details posted to bug 1902). Will continue to test, but it is looking good.
XDG_SESSION_ID still does not show proper user session tracking :(
On 03/06/2014 03:13 PM, David C. Rankin wrote:
On 03/06/2014 12:14 AM, François Andriot wrote:
Le 06/03/2014 06:45, David C. Rankin a écrit :
On 03/03/2014 02:06 PM, François Andriot wrote:
[...]
Francois,
I'll say it again, "you're good!"
Is the slave in #7 actually sending the "finished" and it never makes it to the job? Or is it not sending "finished" at all?
Is there any possibility that #7 does not occur because the slave does not know where to send "finished"? What links/connects the slave to the job? Is it a signal/slot, or some memory address that the job originally passed to the slave in #3? Or does the slave just generate the "finished" and pass some type of job number along with it after #6??
Could the slave/job connection created in #3 be broken somehow such that the reverse path in #7 no longer exits after the delay in #6?
Please check bug report #1902 : I've posted 2 patches there. If it's confirmed working, I'll explain what I believe is the root cause of this problem.
Francois
Whoop!
After patch and rebuild of tdebase and tdelibs with the patches, I no longer have any stale tdeio_sftp processes hanging around (details posted to bug 1902). Will continue to test, but it is looking good.
XDG_SESSION_ID still does not show proper user session tracking :(
It looks like the fix was:
tdelibs/tdeio/tdeio/kdirlister.cpp - line 1974:
job->slaveDone();
That would explain why #7 never got done.