So what is exactly the system element that "restricted" the graphic to 1bit? The graphic chips? There's actually no other limitation on the se/30? Even the crt monitor is able to display grayscale.
Two system limitations and the lack of some components that would be needed for greater color depth.
The screen image is stored in memory. This video memory is called a frame buffer. It is a buffer of one image frame. In the case of one bit color (monochrome), each pixel on the screen has just one corresponding bit in memory. When the CRT's electron beam is pointed at that pixel on the screen, the pixel's corresponding memory bit controls a switching transistor (on or off states, not much in between), and that transistor feeds the electron gun. If the transistor is on, the gun is on ans a white spot appears. Transistor off, electron gun off, just blackness at that pixel.
For greater color depth, more bits of memory are used to represent the state of the pixel. This means that the system needs more video memory, and that, for the same display frame rate, the bits must be shifted out of memory from two to several times the monochrome rate.
Suppose the system is built to support 4 bit color. Now, the frame buffer must be shifted to the display circuitry at four times the previous rate. Each pixel needs four bits of data from memory.
This is where the extra components come in, and you can see this on the Xceed CRT yolk board schematic. Those four bits can't just be sent to a transistor. The four bit pixel data is fed to a digital to analog converter. A four bit number goes in one side and at the other end a voltage appears at one of sixteen different possible levels. Of course a resistor ladder can be used to make a crude D to A converter (DAC), but in any case, this adds a component here.
Then the analog voltage level from the DAC is fed to the transistor that feeds the electron gun. Except a simple switching transistor is no use in this position now. Instead a transistor that operates in an analog mode as an amplifier is used. A higher voltage at the gate causes a corresponding increase in current through the transistor to drive the electron gun. As the electron gun is driven more strongly or weakly, the corresponding pixel can appear at sixteen levels of brightness (sixteen in our example, more or less if more or fewer bits are used) from white, through levels of gray down to black.
So grayscale support requires:
1) More video RAM (larger frame buffer)
2) Circuitry that can shift more bits out of video memory per pixel displayed
3) A digital to analog converter (DAC) that supports however many bits of grayscale wanted (no use have 8 bits per pixel in memory if your DAC only has inputs for 4 bit numbers).
4) A linear amplifying transistor instead of a switching transistor to drive the electron gun.
All of this ignores the upstream requirement of telling the operating system to store more bits per pixel when it draws the screen image in the frame buffer.