<div dir="ltr"><div>Hello Michael,</div><div><br></div><div>Thanks for all those information. <br></div><div><br></div><div>I corrected your suggested point (close parent process sockets). I also activated keepalive, with values adapted to my application. <br></div><div><br></div><div>I hope this will solve my issue, but as the problem may take several weeks to occur, I will not know immediately if this was the origin :-)</div><div><br></div><div>Many thanks for your help.</div><div><br></div><div>Regards,</div><div>Brice</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Le ven. 13 nov. 2020 à 18:52, Michael Wojcik <<a href="mailto:Michael.Wojcik@microfocus.com">Michael.Wojcik@microfocus.com</a>> a écrit :<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">> From: Brice André <<a href="mailto:brice@famille-andre.be" target="_blank">brice@famille-andre.be</a>><br>

> Sent: Friday, 13 November, 2020 09:13<br>

<br>

> "Does the server parent process close its copy of the conversation socket?"<br>

> I checked in my code, but it seems that no. Is it needed?<br>

<br>

You'll want to do it, for a few reasons:<br>

<br>

- You'll be leaking descriptors in the server, and eventually it will hit its limit.<br>

- If the child process dies without cleanly closing its end of the conversation,<br>

the parent will still have an open descriptor for the socket, so the network stack<br>

won't terminate the TCP connection.<br>

- A related problem: If the child just closes its socket without calling shutdown,<br>

no FIN will be sent to the client system (because the parent still has its copy of<br>

the socket open). The client system will have the connection in one of the termination<br>

states (FIN_WAIT, maybe? I don't have my references handy) until it times out.<br>

- A bug in the parent process might cause it to operate on the connected socket,<br>

causing unexpected traffic on the connection.<br>

- All such sockets will be inherited by future child processes, and one of them might<br>

erroneously perform some operation on one of them. Obviously there could also be a<br>

security issue with this, depending on what your application does.<br>

<br>

Basically, when a descriptor is "handed off" to a child process by forking, you<br>

generally want to close it in the parent, unless it's used for parent-child<br>

communication. (There are some cases where the parent wants to keep it open for<br>

some reason, but they're rare.)<br>

<br>

On a similar note, if you exec a different program in the child process (I wasn't<br>

sure from your description), it's a good idea for the parent to set the FD_CLOEXEC<br>

option (with fcntl) on its listening socket and any other descriptors that shouldn't<br>

be passed along to child processes. You could close these manually in the child<br>

process between the fork and exec, but FD_CLOEXEC is often easier to maintain.<br>

<br>

For some applications, you might just dup2 the socket over descriptor 0 or<br>

descriptor 3, depending on whether the child needs access to stdio, and then close<br>

everything higher.<br>

<br>

Closing descriptors not needed by the child process is a good idea even if you<br>

don't exec, since it can prevent various problems and vulnerabilities that result<br>

from certain classes of bugs. It's a defensive measure.<br>

<br>

The best source for this sort of recommendation, in my opinion, remains W. Richard<br>

Stevens' /Advanced Programming in the UNIX Environment/. The book is old, and Linux<br>

isn't UNIX, but I don't know of any better explanation of how and why to do things<br>

in a UNIX-like OS.<br>

<br>

And my favorite source of TCP/IP information is Stevens' /TCP/IP Illustrated/.<br>

<br>

> May it explain my problem?<br>

<br>

In this case, I don't offhand see how it does, but I may be overlooking something.<br>

<br>

> I suppose that, if for some reason, the communication with the client is lost<br>

> (crash of client, loss of network, etc.) and keepalive is not enabled, this may<br>

> fully explain my problem ?<br>

<br>

It would give you those symptoms, yes.<br>

<br>

> If yes, do you have an idea of why keepalive is not enabled?<br>

<br>

The Host Requirements RFC mandates that it be disabled by default. I think the<br>

primary reasoning for that was to avoid re-establishing virtual circuits (e.g.<br>

dial-up connections) for long-running connections that had long idle periods.<br>

<br>

Linux may well have a kernel tunable or similar to enable TCP keepalive by<br>

default, but it seems to be switched off on your system. You'd have to consult<br>

the documentation for your distribution, I think.<br>

<br>

By default (again per the Host Requirements RFC), it takes quite a long time for<br>

TCP keepalive to detect a broken connection. It doesn't start probing until the<br>

connection has been idle for 2 hours, and then you have to wait for the TCP<br>

retransmit timer times the retransmit count to be exhausted - typically over 10<br>

minutes. Again, some OSes let you change these defaults, and some let you change<br>

them on an individual connection.<br>

<br>

--<br>

Michael Wojcik<br>

<br>

</blockquote></div>