Today, I'm going to figure out how execve() manages to keep track of open fds while replacing the executable image inside a task.

Something needs to keep track of which fds are open and what Mach ports they map to. Something needs to close CLOEXEC fds. And this all should happen without racing too much.

It might be simple or it might be complicated. Let's see

I'm looking at _hurd_exec_paths(), which exec's the given executable file in the given task

Yes, in a true capabilities manner it's possible to do an exec() in *any* task you have a port to, not just your own task! This is useful for spawning children (and this is how glibc implements posix_spawn) without actually copying your memory into the child task (fork). As a nice bonus, in this scheme it's the parent who gets the detailed error info if spawning goes wrong.

Whether the same task or not, the new program inherits the state from the program calling _hurd_exec_paths(), such as open file descriptors, cwd/root, umask, which signals are blocked. And in addition to that Unix state, it inherits Hurd-specific state such as which auth server to use.

All this info is packed into arrays of ints and ports, and sent to the exec server in exec_exec_paths() along with the file to execute and the port names to deallocate/destroy. This is how CLOEXEC happens.

Now we're in the exec server. The exec server, despite implementing a seemingly simple feature (that should actually be implementable in-task without a separate server), is in fact one of the most essential Hurd servers.

As I've previously mentioned it's one of the two servers (along with the root filesystem) that are started directly by the GNU Mach kernel on bootup — it has to, because without an exec server nothing else could be launched, and without the root filesystem, there would be nothing to exec.

The exec server will actually replace the given task with a fresh one if requested explicitly or if the EXEC_SECURE flag is set. This makes sure anyone who has a port to the old task cannot control the new program.

Additionally, EXEC_SECURE will cause the exec server to replace some of the provided ports (namely, ports to auth, proc, and root filesystem servers) with pristine versions.


How EXEC_SECURE gets set deserves its own digression:

In order to make setuid execs possible, _hurd_exec_paths() doesn't directly call into the exec server. Instead it asks the filesystem implementing the file-to-be-exec'ed to do that. The filesystem forwards the arguments to the exec server, but it can alter the provided auth and add EXEC_SECURE if it believes the executable is setuid.

After the exec server is done replacing task's virtual memory by loading the new executable image into it, it replaces the task's bootstrap port with a fresh port to itself.

If you need a refresher, the bootstrap port is one of the "special" ports that Mach stores for each task. It's generally used to provide the new task with some way to bootstrap other connections. On Darwin, it's used to connect tasks to launchd, aka the bootstrap server, which gives out ports to other servers.

On the Hurd, the bootstrap port, as seen inside main(), is used when starting translators (filesystems). ("Must be started as a translator" is the error message they typically print if they found out their bootstrap port is null.)

But it turns out each task *actually* starts up with the bootstrap port provided by the exec server. glibc initialization code calls exec_startup_get_info() on it, to which the exec server replies with all that data sent by whoever's started this exec in the first place.

This data includes the "real" bootstrap port — the one the task had before getting exec'ed, and the one main() expects to see — which glibc sets back as this task's bootstrap port.

This is also where glibc unpacks fds, essential server ports, and other info. So this is how fds and other state is preserved across exec — manually, by packing and all the relevant info, sending it to the filesystem, then to the exec server, then back to the task, then unpacking it back into place.

P.S. but what about the exec server itself? what bootstrap port does it get?

It turns out that it gets a port to the root filesystem, the other task started on bootup, as its bootstrap port. (The root filesystem gets the exec server port, which is normal.) So when the exec server itself starts up, it expects to have been just exec'ed by the root filesystem, and as any task it calls exec_startup_get_info() on that bootstrap port.

The root filesystem knows how to handle that by sending back a special flag (EXEC_STACK_ARGS) that tells the exec server to look for args on its stack, which is where the kernel loader places them, unlike the exec server, which sends them over in reply to exec_startup_get_info().


Sign in to participate in the conversation
Mastodon for Tech Folks

The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon!