On restarting the TWS Application manually from TWS Application drop down menu while TWS API process is up & running? TWS Api replies with Connection Closed callback via Updated source code
void Connect() { ? if (m_pClient && m_pClient->isConnected())? ?m_pClient->eDisconnect();
?
? bool bRes = m_pClient->eConnect((const char*)host_ip.c_str(), host_port, ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? clientId, m_extraAuth);
? bool connection_result = m_pClient->isConnected();
? if (bRes && connection_result) { ? ? printf("Connected to %s:%d clientId:%d\n", m_pClient->host().c_str(), ? ? ? ? ? ?m_pClient->port(), clientId); ? ??
? ? if (m_pClient && m_pClient->isConnected()) { ? ? ? m_pReader = std::unique_ptr<EReader>(new EReader(m_pClient, &m_osSignal)); ? ? ? if (m_pReader)? m_pReader->start(); ? ? ? ?else { ? ? ? ? std::cout << "Failed to create EReader instance\n"; ? ? ? ? exit(1); ? ? ? } ? ? } else { ? ? ? std::cout << "Client is not connected. Cannot create EReader\n"; ? ? ? exit(1); ? ? }
? } else { ? ? printf("Cannot connect to %s:%d clientId:%d\n", m_pClient->host().c_str(), ? ? ? ? ? ?m_pClient->port(), clientId);
? ? exit(1); ? }
? return; }
void connectionClosed() { ?std::this_thread::sleep_for(std::chrono::seconds(30)); ?Connect();??
} Response received on simulating above mentioned activity 
|
It is hard to follow what you are doing from the above code, however, you should not be calling "stop()" yourself. This is a function that is supposed to be internal to the API.
?
I see you are resetting your EReader object pointer in your code. Why? If the object is disconnected properly then this should not be required and the object can be directly reused.
?
Anyway, based on this error:
EReader::processMsgs (this=0x0) at IBUtilsCode/EReader.cpp (Segmentation Fault)
?
?
it looks like you are trying to de-reference the EReader object pointer when it has already been set to nullptr somewhere in your code.
?
?
?
?
|
Hello David Armour,
I have been encountering a similar issue regarding the TWS application.
My goal is to simulate scenarios where the TWS application experiences a freeze or requires a restart due to memory-related issues, and to ensure a smooth failover to minimize disruptions.
To test this, I manually restart the TWS application after it has been successfully launched and logged in. Once both the TWS application and TWS API are up and running, I initiate a manual restart directly from within the TWS application itself to observe how the reconnection process behaves.
For handling reconnections after such restarts, I have been using the following pieces of code:
printf("Connection Closed\n");
void CBOEEngine::Connect() {
std::cout << "Connect to TWS called\n";
dbglogger_ << "Connect to TWS called\n";
dbglogger_ << "TRYING CONNECTING TO HOST_IP " << host_ip << " HOST_PORT " << host_port << " WITH CLIENT ID "
<< clientId << "\n";
//Disconnect is already connected
if (m_pClient && m_pClient->isConnected()) {
DBGLOG_CLASS_FUNC_LINE << "CLIENT ALREADY CONNECTED DISCONNECTING CLIENT & TRYING\n" << DBGLOG_ENDL_FLUSH;
m_pClient->eDisconnect();
}
//std::this_thread::sleep_for(std::chrono::seconds(1));
if (m_pReader) {
DBGLOG_CLASS_FUNC_LINE << "Deleting previous reader instance\n" << DBGLOG_ENDL_FLUSH;
m_pReader->stop();
m_pReader.reset();
}
//std::this_thread::sleep_for(std::chrono::seconds(1));
DBGLOG_CLASS_FUNC_LINE << "After disconnection request connection_result: " << m_pClient->isConnected() << DBGLOG_ENDL_FLUSH;
//! [connect]
bool bRes = m_pClient->eConnect((const char*)host_ip.c_str(), host_port, clientId, m_extraAuth);
//std::this_thread::sleep_for(std::chrono::seconds(1));
bool connection_result = m_pClient->isConnected();
DBGLOG_CLASS_FUNC_LINE << "bRes: " << bRes << " connection_result: " << connection_result << DBGLOG_ENDL_FLUSH;
if (bRes && connection_result) {
printf("Connected to %s:%d clientId:%d\n", m_pClient->host().c_str(), m_pClient->port(), clientId);
//! [ereader]
if(!m_pReader)
dbglogger_ << "Reader class instance is destroyed\n";
if (m_pClient && m_pClient->isConnected()) {
m_pReader = std::unique_ptr<EReader>(new EReader(m_pClient, &m_osSignal));
if (m_pReader) {
m_pReader->start();
} else {
dbglogger_ << "Failed to create EReader instance\n";
exit(1);
}
} else {
dbglogger_ << "Client is not connected. Cannot create EReader\n";
exit(1);
}
//! [ereader]
p_engine_listener_->OnConnect(true);
is_connected = true;
} else {
dbglogger_ << "CONNECTION FAILED " << host_ip << " HOST_PORT " << host_port << " @ CLIENT ID " << clientId << "\n";
dbglogger_ << "RETRY "
<< "\n";
printf("Cannot connect to %s:%d clientId:%d\n", m_pClient->host().c_str(), m_pClient->port(), clientId);
exit(1);
}
return;
// return bRes;
}
m_pClient->eDisconnect();
void EClientSocket::eDisconnect(bool resetState)
{
if ( m_fd >= 0 )
// close socket
SocketClose( m_fd);
m_fd = -1;
if (resetState) {
eDisconnectBase();
}
}
?
?
m_pReader->stop();
void EReader::stop() {
if (m_hReadThread) {
m_isAlive = false;
#if defined(IB_POSIX)
if (!pthread_equal(pthread_self(), m_hReadThread))
pthread_join(m_hReadThread, NULL);
#elif defined(IB_WIN32)
if (!(GetCurrentThreadId() == GetThreadId(m_hReadThread)))
WaitForSingleObject(m_hReadThread, INFINITE);
#endif
}
}
However, during the reconnection process, I frequently encounter the following errors
EReader::processMsgs (this=0x0) at IBUtilsCode/EReader.cpp (Segmentation Fault)
Connection Errors? Msg Not connected Socket is closed.
Interestingly, I do not experience these issues when I manually turn off the internet connection and then re-enable it after some time. In such cases, the reconnection process appears to handle the disconnection more gracefully without triggering segmentation faults or socket closure errors.
Current Setup:
- Operating System: Linux
- TWS Version: [Build 10.30.1k, Aug 5, 2024 4:19:58 PM]
- TWS API Version: [API_Version=10.30.01]
Looking forward to your guidance and suggestions.
?
|
Just tested API version 10.33.02. It works as expected. Thanks!
|
You are running an old version. There was a bug in the Windows version which was corrected on 5 Sep 2024 and release in the "latest" version.
?
Version 10.30.1t was released back in July 2024.
?
If you do not want to install the full latest version 10.33 you can simply modify the stop() function as follows:
?
void EReader::stop() { ? if (m_hReadThread) { ? ? m_isAlive = false; #if defined(IB_POSIX) ? ? if (!pthread_equal(pthread_self(), m_hReadThread)) ? ? ? pthread_join(m_hReadThread, NULL); #elif defined(IB_WIN32) ? ? if (!(GetCurrentThreadId() == GetThreadId(m_hReadThread))) ? ? ? WaitForSingleObject(m_hReadThread, INFINITE); #endif ? } }
?
?
There is a detailed explanation for the hang in the bug fix documentation:
|
My IB Gateway version is latest stable 10.30.1t, API version is latest stable 10.30.01 running on Windows. I replicated this on the sample TestCppClientStatic (debug, both win32 and x64). 1) start IB gateway, 2) start TestCppClientStatic, 3) shutdown IB gateway, 4) the program hangs. After putting a breakpoint at m_pEReader->stop() in void EClientSocket::eDisconnect(bool resetState), it seems the hang happened at WaitForSingleObject in EReader::stop(). See call stack below.
?
--hang here --> WaitForSingleObject(m_hReadThread, INFINITE);
> ? ?TestCppClientStatic.exe!EReader::stop() Line 70 ? ?C++ ? ? ?TestCppClientStatic.exe!EClientSocket::eDisconnect(bool resetState) Line 241 ? ?C++ ? ? ?TestCppClientStatic.exe!EClientSocket::onClose() Line 370 ? ?C++ ? ? ?TestCppClientStatic.exe!EClientSocket::receive(char * buf, unsigned int sz) Line 274 ? ?C++ ? ? ?TestCppClientStatic.exe!EReader::onReceive() Line 191 ? ?C++ ? ? ?TestCppClientStatic.exe!EReader::processNonBlockingSelect() Line 173 ? ?C++ ? ? ?TestCppClientStatic.exe!EReader::readToQueue() Line 93 ? ?C++ ? ? ?TestCppClientStatic.exe!EReader::readToQueueThread(void * lpParam) Line 85 ? ?C++
|
I cannot replicate your problem. When I shutdown TWS manually, the EReader stops properly and calls EWrapperImpl::connectionClosed() which is what is supposed to happen.
?
A 509 error is not what is expected. A 509 error means your code is trying to read on a closed socket which is a bug. By commenting out "stop()" your EReader thread has hung in an infinite wait and will never end until you force the program to stop.
?
What version TWS and API are you using? Are you on Windows or Linux?My test just now is using the latest version of TWS and latest API running on Windows.
|
?Not sure if this is the right topic but I got a problem with these newly added code in EClientSocket::eDisconnect.
void EClientSocket::eDisconnect(bool resetState) {
// Stop EReader thread to avoid 509 error caused by reading on a closed socket.
? if (m_pEReader)
? ? m_pEReader->stop();
...
When TWS is shutdown manually, the code will hang at m_pEReader->stop(). The call stack is:
?
> ? ?APITWS.dll!EClientSocket::eDisconnect(bool resetState) Line 241 ? ?C++ ? ? ?APITWS.dll!EClientSocket::onClose() Line 370 ? ?C++ ? ? ?APITWS.dll!EClientSocket::receive(char * buf, unsigned int sz) Line 274 ? ?C++ ? ? ?APITWS.dll!EReader::onReceive() Line 199 ? ?C++ ? ? ?APITWS.dll!EReader::processNonBlockingSelect() Line 181 ? ?C++ ? ? ?APITWS.dll!EReader::readToQueue() Line 101 ? ?C++ ? ? ?APITWS.dll!EReader::readToQueueThread(void * lpParam) Line 93 ? ?C++
?
Commenting out m_pEReader->stop() the code runs happily and will generate a 509 error as expected.
?
|
A fix has been submitted a while ago and is now in the "latest" version of the API.
|
Hey, if you feel that strongly about it and have the time then go ahead and make the pull request. In the worst case scenario it'll be good practice, a learning experience and software development lesson.
?袄冲(ツ)冲/?
|
@Gordon, @Buddy,
"mountain"?? : hardly. There is no impact to existing code with this proposed change (tested). I am simply adding functionality that allows users to manually stop the thread that they started manually.? In fact I can leave the destructor of EReader the same as it was without swapping anything and just create a new function "stop()" which will allow users to disconnect cleanly if they desire to,
(Note to Gordon on your concern about the initial use of EReader during the connection, this is what the "if (m_hReadThread)" statement takes care of in the original TWS API code which is still there in my proposed change.)
Let me remind you that the current situation results in a 509 error, which is basically the API's way of saying something bad happened and I do not know what it is. Your code must be bad, which is not the case here. It is the API code that is bad. Now that we know what causes it, we can probably live with it and safely ignore it, but personally I don't like that sort of situation. If the developers intended to use this way of disconnecting the socket, then they should have properly taken care of the error message it creates. The fact that they didn't means that what they coded was unintentional and therefore is clearly a bug. A bug needs to be fixed.
However, I realise changing (adding to) the interface will make the C++ interface different than the other languages which is probably not desirable.
The only way to correct this and make the existing interface the same is to create an EMutex variable which is shared by both EReader and EClientSocket (suggest using a composition pattern here). Changes to the m_isAlive variable need to be locked by the mutex as well as inside the "while(m_isAlive)" loop. This is a larger code change and needs more care to be done right but it is not that difficult and I have implemented it successfully as well. In fact using EMutex here does not slow the loop down (checking the mutex is as simple as reading a flag) since the only time it is ever locked is when we are disconnecting which means speed is not a "critical trading" issue.
I agree it is good to have this thread capture this discussion for the record but I will definitely raise this problem to the coders as I still firmly believe this to be a bug. I do not wish to have to fix this myself every time there is a new API update.
|
Yeah, I see most of this as making a mountain out of a molehill. It strikes me, largely, as a misunderstanding; much like issuing a SIGKILL but expecting SIGTERM.
If you are familiar with unix signal processing you know that SIGKILL is immediately forced upon the process by the OS and the process my be left in an untidy state. SIGTERM, on the other hand, gives the process an opportunity to clean up and exit gracefully.
Since the API code is readily available, folks can implement their own version of EReader which suits their preference in this regard. Therefore the distinction becomes moot.
There may be some chance of acceptance for a minor pull request which, e.g., swaps the calls to the thread join and socket disconnect. But, even this could be seen as a dog chasing it's own tail since the argument would merely come down to the dislike of a default.
Moreover though, when you consider that the code for EReader hasn't changed in years... even a small change becomes unlikely. And, the suggestion of anything "major", like an intermediate "stop" method, becomes far fetched IMHO.
It's certainly an interesting nuance but probably better addressed by documentation alone. At least we have this thread of conversation now for reference :-)
|
I like evolution, so I may be wrong but here I am not sure I would flag this as a bug. Just out of curiosity what undesirable side effect the current implementation generate? ?
Some food for thoughts: Theses are tricky parts as it require tracking all threads together to do proper analysis. again I won't pretend I know all the implications. I use it in C++/Win32 and it work really fine and fast (async) without any issues that worth a change from my perspective (IMHO).
Because: 1- If I ask a "disconnect" it's to be acted upon immediately I don't want any new data to be processed. Exiting gracefully and fast a mutithread application can become a tricky process; I am ready to live with recv receiving remnant of previous telco. It's a philosophical/aesthetic issue I agree, in an ideal world, we are 100% sure that the last transaction was completed. yes. but ... we are not (at least I am not ) in this ideal world.
Also the "error" system of IB have to be seen as a client messaging service Same for WIN32 where GetLastError() is sometime used as a signal (Example when looking for enumeration of files, this is the only way to get a "no more file" info.) The OS handle gracefully zombie packet on close socket, so no harm. (surely not the only apps that do that). Codes are used to that, just look at handleSocketError you see a hard errno = 0; typical of way to reset it without concerns for unhandled error.
2- Related to #1 above, Putting the WaitForSingleObject(m_hReadThread, INFINITE); before eDisconnect may execute a last putMessageToQueue. While the basic supplied code will do a hard beak, the while() loop with m_buf.size==0 , exit because m_isAlive==false, without executing anything else.
3- Notice that a EReader object is used during connection as a limited life (scoped) heap object to fetch server protocol revision in a synchronous mode, and to call startApi() (I am not aware of what this proc? does but by the name of it I would avoid missing this call) this looks like a "trick" to overcome an un-anticipated evolution of the protocol, See EClientSocket::eConnectImpl(....) Actions happening in the dctor are then to be taken carefully.
I would agree that the inversion of code lines you suggested might not affect the behavior at it's synchronous, (m_hReadThread=0) , that you ought to get the protocol revision and I don't even see how you can abort this single message operation anyway. (aside of a socket shutdown) never see a need.
As buddy point it out: - it show that playing with the dctor of EReader may have side effect. In particular there are legacy code without a call to a ->stop(), ?? it's implicit in ~dctor and somewhere does the job. - Just to mention, the m_signal used for this Connect EReader is shared by this ephemeral EReader, you most probably will use the same for your main EReader.? It need care, there are people who use more than one EReader on same m_signal
|
... I should add with the proposed code change, you can now stop the reader and disconnect the socket without actually destroying the EReader object by doing the following:  ' The last line here is where I wait for my? own message processing thread to end. This is now very clean and you can connect and disconnect from TWS without error and without having to destroy the EReader.
|
What makes me feel uneasy is that according to the documentation:  It makes users think that by calling eDisconnect() you can safely disconnect from TWS without causing errors but we have shown that this is not the case. You must "stop" the thread, then disconnect from the socket, in that order. It is not possible for eDisconnect() to do that without creating a mutex locking system that shares mutexes between both EClientSocket and EReader. I agree with you that adding this complexity is not the right approach, for one as it will slow down the message receiving loop. What I am opting for and will propose in the GitHub when I get the approval to create a branch, is creating an EReader::stop() function like this:  then modifying the destructor to the following:  Then I will also propose that the C++ documentation for the eDisconnect() function be changed to say that to disconnect safely you must call EReader::stop() before calling EClientSocket::eDisconnect().
|
On Sun, Oct 1, 2023 at 03:15 PM, David Armour wrote:
I do not feel that destroying the EReader object is the correct way to
disconnect.
IDK if this is what made you uneasy or what you meant, but maybe you'd prefer to see the call to eDisconnect happen in readToQueue as the penultimate action (i.e. before the final signal is issued)? In retrospect this makes a bit more sense to me.
|
On Sun, Oct 1, 2023 at 03:15 PM, David Armour wrote:
I agree with everything you said although I am not entirely happy with my
final solution. I do not feel that destroying the EReader object is the
correct way to disconnect. It is certainly not intuitive which means I cannot
be the only person finding this problem. Is everybody using C++ just ignoring
this error?
You may see this as a wart but I, personally, won't comment with regard to how intuitive things are or any design aesthetic. I will say that consideration should be given to backward compatibility and just how much existing code would break (and therefore need to change) if the disconnect where removed from the destructor.
Just some food for thought... since there's usually ripple effects, unintended consequences, and API stability is actually very important. Also, I don't think it's wise to hold off fixing some simple, obvious, and existing issue merely because there's an intention of addressing it in a more significant re-design. As they say, there is no time like the present and I appreciate small and tractable, incremental changes.
However, this may come down to a judgment call. Since, I guess, people have indeed been ignoring it or working around it (or it's simply gone unnoticed). After all, it only happens late in the game when there usually isn't much network activity expected or necessarily happening.
Anyway, I will sleep on it one more night before submitting a bug report.
Is there a special place to report API bugs or do we just raise ticket as
usual?
A pull request is probably the best way to go about this:
|
I agree with everything you said although I am not entirely happy with my final solution. I do not feel that destroying the EReader object is the correct way to disconnect. It is certainly not intuitive which means I cannot be the only person finding this problem. Is everybody using C++ just ignoring this error?
Anyway, I will sleep on it one more night before submitting a bug report.
Is there a special place to report API bugs or do we just raise ticket as usual?
|
On Sun, Oct 1, 2023 at 10:02 AM, David Armour wrote:
Thanks for your comment Buddy.
You're welcome. I think you've found an interesting oversight.
I agree that the logic would seem that setting m_isAlive to false during the
~EReader() call should prevent the code trying to read from the socket.
Yeah, superficially and at first blush it would seem that way. But it'd be a mistake to think so. I can understand how that initial thought may have fooled the original author into thinking the code was correct though. All this is post-mortem speculation on my part however.
Nevertheless, once considered more thoughtfully, it becomes apparent that nothing prevents the destructor from being called while the processing thread's readToQueue is in the while block.
The std::atomic will only insure that reading and writing m_isAlive from multiple threads is well-defined; but guarantees nothing w.r.t. the destructor and what's happening in the while loop.
I tried playing around with that. If I pause after m_isAlive is set to false,
then continue, the code works successfully without 509 error. It is as if at
"Pause", with a sleep or something? No... that's surely a game of whack-a-mole. It might work sometimes, depending on unpredictable factors, and might not.
It can be fun to experiment with the timing while you investigate though. You can also try playing w/ ordering by running the code on a single core/cpu and see how that affects things.
full speed the atomic operation is not being completed before the call to
eDisconnect() happens. On Visual Studio this even happens when I turn off all
Optimisations. As eDisconnect() does not use anything related to that atomic,
I wonder if the compiler is parallelising the operation as all atomic
operations are very slow.
Compiler optimizations fall somewhat into the category of "unpredictable factors" I alluded to (although they'd be deterministic given the compiler source and enough work). It's a bit surprising that a critical section wasn't implemented via mutex, but maybe the author wanted to avoid a performance penalty.
I also noticed that eDisconnect() sets m_fd = -1.? This should prevent the
call to processNonBlockingSelect() from trying to call onReceive() which
ultimately triggers receive() and recv() causing the error. Again the setting
of the atomic variable is not being done in time.
I am still scratching my head over this. I suspect some synchronization issue
between the various threads.
Anyway... I think you found a legitimate problem and I wouldn't worry about it too much. I like your proposed solution as well. Joining the thread and then disconnecting makes clear sense to me. And, as long as m_pClientSocket isn't being shared all over the place I don't see a reason to use more complicated locking via mutex.
If I were the reviewer I could see signing off after some confirmation tests... open a ticket and see what they have to say. If you submit a patch remember to change the IB_POSIX ifdef block too :-)
|
Thanks for your comment Buddy.
I agree that the logic would seem that setting m_isAlive to false during the ~EReader() call should prevent the code trying to read from the socket.
I tried playing around with that. If I pause after m_isAlive is set to false, then continue, the code works successfully without 509 error. It is as if at full speed the atomic operation is not being completed before the call to eDisconnect() happens. On Visual Studio this even happens when I turn off all Optimisations. As eDisconnect() does not use anything related to that atomic, I wonder if the compiler is parallelising the operation as all atomic operations are very slow.
I also noticed that eDisconnect() sets m_fd = -1.? This should prevent the call to processNonBlockingSelect() from trying to call onReceive() which ultimately triggers receive() and recv() causing the error. Again the setting of the atomic variable is not being done in time.
I am still scratching my head over this. I suspect some synchronization issue between the various threads.
|