• Updated 2023-07-12: Hello, Guest! Welcome back, and be sure to check out this follow-up post about our outage a week or so ago.

Long shot: A/UX and spurious ENOTSOCK

cheesestraws

Well-known member
While I've been away from home I've been playing with writing a hybrid A/UX application or two.

I'm hitting a very strange thing, and I'm posting in case anyone has any ideas. I'm using ordinary UNIX sockets for TCP, using the normal send/recv affair.

Every so often, after somewhere between a minute and an hour of runtime, send() returns an error with errno=58, which according to sys/errno.h is ENOTSOCK.

Which is odd, because it's very definitely a socket. And recv() can carry on receiving on that socket, and netstat -a shows the connection as ESTABLISHED.

Digging into the kernel with Ghidra suggests that the thing that returns ENOTSOCK is getsock(), which maps from a file descriptor to an internal file struct (following the cognate code in the OpenBSD kernel for types).

Screenshot 2022-11-08 at 23.04.31.png
I can't find anything else raising that error.

So, how come send() is deciding my socket isn't a socket? How come recv, despite also calling getsock, isn't seeing this? Has anyone else seen this before? How long is a piece of string? Whither must I wander? What is the airspeed velocity of an unladen swallow?

Urgh
 

atax1a

Member
haven't seen that _particular_ error, if we had to look, we would probably break out the kernel debugger and set a breakpoint on that return code.
 

Corgi

Well-known member
Is your send code doing anything to the file pointer? Honestly, I'm wondering if you have an invalid write into the file pointer struct somewhere, changing the type code. Can you somehow find out what `fp->f_type` is when ENOTSOCK is returned? (is `pstat` available?) Do you ever `fdopen` the socket?

I only have 4.2 and 4.3 here, not A/UX itself, but assuming the networking code is as close to 4.3 as I think it is, `f_type` is only ever set in `socket`.
 

cheesestraws

Well-known member
haven't seen that _particular_ error, if we had to look, we would probably break out the kernel debugger and set a breakpoint on that return code.

That's what I've been resisting using, partly because I'm still away and neither of the emulators I have to run stuff under seem to have functioning NMI emulation, and partly because I have no clue what I'm doing. Do you know of any documentation for the kernel debugger that I had missed?

Is your send code doing anything to the file pointer?

Nope, I don't use anything except the file descriptor given back to me by socket().

My current only guess is that this is a reentrancy problem of some kind because I've got a timer in play (which is really a signal handler, if I'm reading the Patches right).
 

atax1a

Member
I dunno of any documentation for the kernel debugger itself, other than the fact that it exists, unfortunately. We would figure the A/UX kernel debugger to be broadly similar to other SysV kernel debuggers of the era, though.

With qemu-system-ppc we know that typing nmi at the "qemu monitor console" doesn't work to interrupt the mac, and we have to use sendkey Super_R-Power or something like that.

Given that the error return you're describing only seems to come into play when (checks notes) the value, at the address, of the FD, plus 0x10, is not 2, it definitely sounds like intermittent corruption. Doing something funky from a signal handler is a likely cause. The general recommendation with signal handlers is to set a flag that the main loop will see when the interrupt returns, but that also doesn't necessarily work for time-critical stuff.
 

cheesestraws

Well-known member
With qemu-system-ppc we know that typing nmi at the "qemu monitor console" doesn't work to interrupt the mac, and we have to use sendkey Super_R-Power or something like that.

a-ha, that was the information I was missing there. Thanks!

We would figure the A/UX kernel debugger to be broadly similar to other SysV kernel debuggers of the era, though.

Yeah, I suppose I'm just being lazy and hoping someone has a manual :D. I'll do my own research.

Given that the error return you're describing only seems to come into play when (checks notes) the value, at the address, of the FD, plus 0x10, is not 2, it definitely sounds like intermittent corruption.

Well, as I said, what I really don't understand is that recv() calls, which do exactly the same check at the beginning (per ghidra, at least), do not fail. So some deeper order is at play here, and the fact it's systematically reproducible without damaging stability suggests very tightly targetted corruption.

Unfortunately setting a flag and letting the mainloop handle it isn't going to work because of the way the Mac toolbox deals with things like menus and modal dialog boxes. My current Great Hope is either patching WaitNextEvent/GetNextEvent (ew) or perhaps using the deferred task manager, which is available in A/UX 3+. But we'll have to see.
 

cheesestraws

Well-known member
NB: we forget the exact key that maps into cmd-pwr. Might be Meta_R and not Super_R.

This is a good starting point, anyway. I was nonplussed by typing 'nmi' and the emulator segfaulting... Thanks for the pointers! (Pointers, see what I did there? I need more sleep...)
 

atax1a

Member
qemu-system-ppc just doesn't do anything when you type nmi. we'd be upset too if it crashed the thing.
 

atax1a

Member
This is a good starting point, anyway. I was nonplussed by typing 'nmi' and the emulator segfaulting... Thanks for the pointers! (Pointers, see what I did there? I need more sleep...)
sorry to double-post, but, some other random tidbits floated to the top of our mind just before we fell asleep - on a lot of UNIX systems, the kernel debugger is fairly likely to communicate via serial port. in emulation this is less of a problem because you can tell qemu to connect the serial ports to standard in/out. (for future searchers trying to figure out how to do this on hardware, we'd start with the modem port, at 9600 8-N-1). If you're lucky, Apple's kernel debugger will be like MacsBug and take over the screen.

Some of the docs we're seeing say that you have to make sure to compile the debugger into the kernel when you build it.

Also, we found the AT&T SysV 4 For Motorola manual, PDF page 613, which describes kdb. With any luck, Apple's kernel debugger should be based on this.
 

cheesestraws

Well-known member
For anyone interested here, the kernel debugger is... rudimentary. And won't load the symbols from the kernel. Which makes it less than entirely useful.

So instead, I built a little kernel module thingy, which will leak f_type to me through a controlled interface for any given FD. The source for this is here: https://github.com/cheesestraws/auxleak. I want to check whether, when the socket falls over, whether its underlying f_type actually changes.

So now I'm sitting here like a lemon waiting for the VNC server to break. It doesn't seem inclined to do so now that I'm watching it.
 

mcayland

Member
qemu-system-ppc just doesn't do anything when you type nmi. we'd be upset too if it crashed the thing.

The programmer's switch is currently only wired up for the PMU, so you'll need to run qemu-system-ppc with -M mac99,via=pmu currently for it to work. At the very least it fired up MacsBug as expected last time I tried.

If anyone knows how to wire up the programmer's switch for the g3beige and standard mac99 machines then please get in touch :)
 

cheesestraws

Well-known member
The thick plottens! According to the module above, when I get ENOTSOCK, the file struct that the FD refers to does indeed think it's a vnode not a socket...
 

cheesestraws

Well-known member
Worked this out.

Calling getpid() with each error message and printing the results demonstrated that when the fd isn't a socket, getpid() isn't returning the pid of startmac but of CommandShell (!!). So my Mac code isn't actually running in startmac at all, it's been lent to CommandShell for a bit. So it was instead looking at the corresponding fd for CommandShell.

Wat.

Note that this sn't taken into account by A/UX's MacTCP either. It doesn't check the pid, which explains why I was also getting ENOTSOCK from MacTCP.

Good grief.
 

CC_333

Well-known member
Good grief.
No kidding!

I understand little of what you're doing here, but I think I can figure out enough of it to know that things don't seem quite right. Certainly not Mac-like, and frankly rather confusing.

It perhaps explains in part why hardly anyone used A/UX, and why there are so relatively few third party programs, drivers and tools for it (particularly of the networking kind), eh? o_O

c
 
Top