Performance
Contents
Clocking
Through a hardware accelerated pattern generator known as "Clocking Engine", STARGRASP is able to generate IR and CCD clocking patterns with transitions (clock edges) on almost any 10 nanosecond timing boundary. Other than feeding the op-codes which drive the Clocking Engine into a FIFO periodically, the STARGRASP's PowerPC 405 embedded CPU is free to manage other things during clocking.
Each boardset contains its own master clock. A software-programmable hardware cross-trigger line is able to synchronize multiple boardsets' Clocking Engines to within 15 nanoseconds of each other. A Pan-STARRS Giga Pixel Camera (GPC) uses 32 boardsets with cross-trigger resulting in a 512-channel system (8 4-slot chassis of STARGRASP boardsets total.)
Sampling
A second, custom hardware accelerator known as the "Math Engine" allows STARGRASP to perform simple math operations on raw 16-bit samples from its A/D input channels. The math operations are intended to compress a serial sequence of data and are not suitable for "frame math" because only a small number of samples from the current frame are available to it at a time. The input samples to the Math Engine may include pedestal values for each pixel (depending on sampling method used - this is used for our CCD systems), or multiple samples of the same video level to reduce certain sources of noise. A single subtracted or averaged value for each set of these, representing the pixel value for the imaging device, is the output of Math Engine and eventually gets transferred to the controller's on-board DDR memory. The limit for this data path is highly dependent on the complexity of the clocking pattern (number of clocking instructions the controller CPU must feed to clocking engine), and the number of A/D channels being used per boardset.
When results are available, we intend to post performance benchmarks for sampling rates here.
For many applications, maximum sampling rates are only interesting if the controller can also off-load the data on the network simultaneously. For others, a burst rate where readouts are buffered in the controller's on-board memory and transferred later are useful. See the final section on "Data Transfer + Sampling" for sustained video streaming mode capabilities of the controller.
Data Transfer
Each boardset has its own Gigabit Ethernet link for transferring pixel data from the on-board DDR memory to a host computer. In a mode where data is transferred serially (after readout), STARGRASP can sustain a rate of approximately 600 Mbits/second of pixel data per boardset. This is with a default Ethernet MTU (maximum transmission unit) size of 1500 bytes per packet. Recent experiments with 9000 MTU or "Jumbo" packets have achieved rates around 900 Mbits/second of pixel data.(*) In both cases, overhead for frame headers, IP headers, padding, and protocol acknowledgment is not included in the figure, meaning that we transfer an actual 900 Mbits/second worth of useful pixel data. So in this latter case the Ethernet link is likely operating at full line speed. The controller, however, already had the image in DDR to reach this speed.
Data Transfer + Sampling (Streaming Mode)
The controller is capable of managing clocking, sampling, and data transfer simultaneously and continuously. We call this "streaming mode" or video mode. Such a mode is necessary when the controller's onboard memory cannot hold an entire observation data set or the data needs to be externally processed and latency is important (for example, in telescope guiding applications or realtime sensing.) When there is no integration time (i.e., pixels are flowing continuously from one readout to the next) this mode also uses multi-buffering. We are currently investigating the limits of streaming video mode of operation. Due to embedded processor speed, cache sizes, bus architecture, and/or our firmware design, a maximum rate of approximately 250 Mbits/second of sustained throughput is the current limit with normal Ethernet packets. Jumbo 9000 MTU packet experiments have yielded close to 400 Mbits/second. The PowerPC 405 architecture is a low power embedded design with much more severe limits than a typical PC host, not just in CPU clock speed but also CPU cache sizes and sophistication, interrupt handling, instruction set, and bus speed. It is possible that the we are up against a hard limit of one of these aspects with 400 Mbits/sec performance. Investigations are underway to determine this.
Aggregate Speed
Systems utilizing a single slot, while limited to 400 MBits/sec in video mode and 32 channels, are the simplest because they do not require cross-triggering between boardsets most likely to result in the lowest noise performance. STARGRASP v1 controllers come in 2-slot and 4-slot chassis designs. Depending on the version of the preamplifier, each "slot" is capable of managing 16 or 32 input channels. Regardless of this, on the output side each slot provides one Ethernet link. Some applications may choose to use fewer channels per boardset and aggregate multiple Ethernet links with a standard low-cost switch. A STARGRASP controller with parallel data links can achieve any desired data rate (up to the limit of 400 Mbits/sec/channel when using only a single channel from each boardset) in this manner.
(*) Jumbo or 9000 MTU packets are not standardized, and not universally supported by all Ethernet switches and NICs.