On 03/03/2014 02:06 PM, François Andriot wrote:
To be more precise, TDE does not lose tracks of the
tdeio process. The tdeio
scheduler is always aware of its slave threads.
The actual problem is that the tdeio scheduler never receives the "job is
finished" notification from some slaves.
So it considers this slave as being eternally busy and keeps spawning new ones ...
The nominal scenario looks like:
1) an application requests an URL to the tdeio scheduler (e.g. konqueror asks
"directory listing for sftp://remotehost/")
2) the tdeio scheduler instantiates a "job"
3) the job looks for an idle "slave" that can do the job (e.g. correct
protocol), uses one if it exists, or else asks the scheduler to instantiate a
new slave.
4) the slave spawns a 3rd party process (ssh in my case) and waits for text
output. (note: stheome slaves do the job directly without spawning a 3rd party
process)
5) The 3rd party process does its job (remote directory listing for example) and
writes output to the slave.
6) After the command is complete, the slave ceases receiving data because
nothing is written anymore by the 3rd party process.
7) The slave sends "finished" to the job.
8) the job sends "finished" to the scheduler.
9) the scheduler deletes the job and puts the slave in the "idle slave list"
so
that it can be reused by another job, or will be killed after some minutes of
idleness.
What happens with my "kdirlist" problem (and probably your imap problem too),
is
that step 7 never occurs.
For an unknown reason, the slave, after having received the correct data, never
notifies the job that it has finished, then the job never notifies the
scheduler, then the scheduler think the slave is still active and does not mark
it as "idle" ... and here is our stale tdeioslave ... (note: the slave
eventually gets killed if the remote host closes network connection for idleness
... but it looks like this does not happen with ssh protocol)
I'm currently looking into the cache mechanism of the "kdirlist" job
class.
I believe (to be confirmed) that when kdirlist uses its internal cache, it still
spawns a slave but does NOT uses it at all, since it already has the data it is
looking for in its cache.
Then it returns the cached data and ignores the spawned slave, which sits there
forever, waiting for a query from the job that never comes ...
Francois
Francois,
I'll say it again, "you're good!"
Is the slave in #7 actually sending the "finished" and it never makes it to
the job? Or is it not sending "finished" at all?
Is there any possibility that #7 does not occur because the slave does not
know where to send "finished"? What links/connects the slave to the job? Is it
a
signal/slot, or some memory address that the job originally passed to the slave
in #3? Or does the slave just generate the "finished" and pass some type of job
number along with it after #6??
Could the slave/job connection created in #3 be broken somehow such that the
reverse path in #7 no longer exits after the delay in #6?
--
David C. Rankin, J.D.,P.E.