The problems I had with Euclid last night re-enforced my opinions of Solaris’s boot process. It’s got too many things it depends upon which can become damaged or corrupted too easily which can prevent a sysadmin from booting the machine to the point where the system can be recovered.
The problem with Euclid was that the software install had broken /etc/inittab. This caused init to go into a loop and never allow the system to present a shell. The only way to recover the system was to boot off either the network or an installation CD and getting to the point where the filesystems can be mounted and then repairs enacted.
Now, in the days of SunOS 4.1.x the only things required to get to a shell prompt where the sysadmin could start to rebuild the machine to the point where it’s usable again was the boot block, /vmunix and /sbin directory. Even if only /sbin/init and /sbin/sh worked it was enough. (Yes, I’ve recovered a couple of systems when all that was available was this sort of environment after the root disk got errors on it. I had to use ifconfig, mount, echo, cat, ftp and ln to get a server running well enough to serve NFS disks while we awaited an engineer with a replacement part once.)
Solaris, however, needs loads of configuration files in /etc to be not only present but also syntactically correct. Lots of the initial boot process needs shell scripts to run before you can even get near a shell prompt. Add to this the frailty of the whole device driver system and you can see that recovering a Solaris system into a minimal rescue mode is all but impossible unless a large number of things are OK. This, I feel, is Solaris’ reliability achilles heel.
Now, I agree it’s not only Solaris which has this problem. More and more Linux is showing the same problem and Windows has always been flakey on booting.
Oh well, I’ve got this off my chest now. It seems systems are becoming more and more vulnerable to this sort of thing.