On 03/03/2014 02:06 PM, François Andriot wrote:
To be more precise, TDE does not lose tracks of the tdeio process. The tdeio scheduler is always aware of its slave threads. The actual problem is that the tdeio scheduler never receives the "job is finished" notification from some slaves. So it considers this slave as being eternally busy and keeps spawning new ones ...
The nominal scenario looks like:
- an application requests an URL to the tdeio scheduler (e.g. konqueror asks
"directory listing for sftp://remotehost/") 2) the tdeio scheduler instantiates a "job" 3) the job looks for an idle "slave" that can do the job (e.g. correct protocol), uses one if it exists, or else asks the scheduler to instantiate a new slave. 4) the slave spawns a 3rd party process (ssh in my case) and waits for text output. (note: stheome slaves do the job directly without spawning a 3rd party process) 5) The 3rd party process does its job (remote directory listing for example) and writes output to the slave. 6) After the command is complete, the slave ceases receiving data because nothing is written anymore by the 3rd party process. 7) The slave sends "finished" to the job. 8) the job sends "finished" to the scheduler. 9) the scheduler deletes the job and puts the slave in the "idle slave list" so that it can be reused by another job, or will be killed after some minutes of idleness.
What happens with my "kdirlist" problem (and probably your imap problem too), is that step 7 never occurs. For an unknown reason, the slave, after having received the correct data, never notifies the job that it has finished, then the job never notifies the scheduler, then the scheduler think the slave is still active and does not mark it as "idle" ... and here is our stale tdeioslave ... (note: the slave eventually gets killed if the remote host closes network connection for idleness ... but it looks like this does not happen with ssh protocol)
I'm currently looking into the cache mechanism of the "kdirlist" job class. I believe (to be confirmed) that when kdirlist uses its internal cache, it still spawns a slave but does NOT uses it at all, since it already has the data it is looking for in its cache. Then it returns the cached data and ignores the spawned slave, which sits there forever, waiting for a query from the job that never comes ...
Francois
Francois,
I'll say it again, "you're good!"
Is the slave in #7 actually sending the "finished" and it never makes it to the job? Or is it not sending "finished" at all?
Is there any possibility that #7 does not occur because the slave does not know where to send "finished"? What links/connects the slave to the job? Is it a signal/slot, or some memory address that the job originally passed to the slave in #3? Or does the slave just generate the "finished" and pass some type of job number along with it after #6??
Could the slave/job connection created in #3 be broken somehow such that the reverse path in #7 no longer exits after the delay in #6?