On Sun, Oct 1, 2023 at 10:02 AM, David Armour wrote:
Thanks for your comment Buddy.
You're welcome. I think you've found an interesting oversight.
I agree that the logic would seem that setting m_isAlive to false during the
~EReader() call should prevent the code trying to read from the socket.
Yeah, superficially and at first blush it would seem that way. But it'd be a mistake to think so. I can understand how that initial thought may have fooled the original author into thinking the code was correct though. All this is post-mortem speculation on my part however.
Nevertheless, once considered more thoughtfully, it becomes apparent that nothing prevents the destructor from being called while the processing thread's readToQueue is in the while block.
The std::atomic will only insure that reading and writing m_isAlive from multiple threads is well-defined; but guarantees nothing w.r.t. the destructor and what's happening in the while loop.
I tried playing around with that. If I pause after m_isAlive is set to false,
then continue, the code works successfully without 509 error. It is as if at
"Pause", with a sleep or something? No... that's surely a game of whack-a-mole. It might work sometimes, depending on unpredictable factors, and might not.
It can be fun to experiment with the timing while you investigate though. You can also try playing w/ ordering by running the code on a single core/cpu and see how that affects things.
full speed the atomic operation is not being completed before the call to
eDisconnect() happens. On Visual Studio this even happens when I turn off all
Optimisations. As eDisconnect() does not use anything related to that atomic,
I wonder if the compiler is parallelising the operation as all atomic
operations are very slow.
Compiler optimizations fall somewhat into the category of "unpredictable factors" I alluded to (although they'd be deterministic given the compiler source and enough work). It's a bit surprising that a critical section wasn't implemented via mutex, but maybe the author wanted to avoid a performance penalty.
I also noticed that eDisconnect() sets m_fd = -1.? This should prevent the
call to processNonBlockingSelect() from trying to call onReceive() which
ultimately triggers receive() and recv() causing the error. Again the setting
of the atomic variable is not being done in time.
I am still scratching my head over this. I suspect some synchronization issue
between the various threads.
Anyway... I think you found a legitimate problem and I wouldn't worry about it too much. I like your proposed solution as well. Joining the thread and then disconnecting makes clear sense to me. And, as long as m_pClientSocket isn't being shared all over the place I don't see a reason to use more complicated locking via mutex.
If I were the reviewer I could see signing off after some confirmation tests... open a ticket and see what they have to say. If you submit a patch remember to change the IB_POSIX ifdef block too :-)