The default RGB color space for both device and user is sRGB. Thus code from elsewhere specifying sRGB colors will work as expected. By adding an RGB matrix display profile in /tmp/ctx.icc the SDL,DRM and fbdev backends use the display space instead of sRGB for compositing.
TODO: color manage conversions between CMYK and RGB
The ctx vector rasterizer is an active edge table scanline rasterizer, with adaptive vertical oversampling. If there is starting/ending edges within the scanline or very near horizontal edges 15 levels of vertical AA is used, 5 and 3 are used to progressively get towards the non-versampled but accrurate horizontally single scanline. For a description of how traditional scanline rasterizer with vertical oversampling works see the introduction of How the stb_truetype Anti-Aliased Software Rasterizer v2 Works.
Render targets handled natively are 8bit sRGBA RGBA8, floating point scRGB RGBAF 8bit and floating point grayscale with alpha. GRAYA8 and GRAYAF and floating point CMYK CMYKAF. Integration points are catered for in API and protocol for color management, which will be done with babl. The formats RGB332, RGB565, RGB565_BYTESWAPPED, CMYKA8, RGB8, BGRA8, GRAY1, GRAY2, GRAY4, GRAY8 and GRAYF are handled by converting processed scanlines back and forth to one of the supported targets.
static void __ctx_u8_porter_duff (CtxRasterizer *rasterizer, int components, uint8_t * dst, uint8_t * src, int x0, uint8_t * coverage, int count, CtxCompositingMode compositing_mode, CtxFragment fragment, CtxBlend blend);
Overrides for RGBA8 for sourceOver and normal are provided separately, with compiletime optional AVX2 acceleration. The prototype of all the innerloop functions follow are:
void ctx_composite_pixels (CtxRasterizer *rasterizer, uint8_t *dst, uint8_t *src, int x0, uint8_t *coverage, int count);
The following are two images from the test suite used for evaluating and verifying the possible antialiasing modes.
ctx is designed from the beginning to act as a software GPU for modern microcontrollers, some of which are more powerful than the PCs in the mid 90s. Different optimization settings of the compiler give a wide range for the performance/binary size tradeoff. But the small footprint in RAM and ability to use a shared read-only display list makes the ctx rasterizer core also useful in multi-threaded rendering.
font data size: 12186 bytes (A sans font subsetted to only ASCII, latin1 ~= 23kb ) RGBA8 rasterizer: 33054 bytes (-Os ~38kb - with many features disabled ~30kb -O0 90-631kb -O2 57-114kb -Ofast/-O3 76-181kb, size (and compiletime) bump due to SIMD tree-vectorizer) ctx parser: 24608 bytes (not needed for direct use from C, but also on embedded this can be useful for ease of integration with other languages or directly using ctx+mictrocontroller+display as a serial display.)
The RAM requirements are small and by tuning the engine to have only a couple of save/restore states, and paths with fewer than 256 edges, the total RAM footprint of the rasterizer can be as low as ~5kb on 32bit platforms, the parser for the ctx protocol needs an additional 1kb. Where framebuffer is too large to fit in RAM, the allocation needed for scanline(s) must be wheighed against RAM needed for renderstream. Commands take a multiple of 9bytes, there is code/provisions for runtime compacting of the renderstream in prior git revisions.
This table contains the time it takes to fill a 512x512px buffer with an inscribed circle with various fill sources for all of ctx' supported pixelformat target encodings, these measurements are done with a locked cpu frequency of 2.5ghz on an i7-8650U
|format||color a=1.0||color a=0.75||lgrad||rgrad||texture||sCopy a=1.0||sCopy a=0.75||sCopy lgrad||sCopy rgrad||sCopy texture|
smaller numbers are better, numbers are time to render one frame in us,
All of these tests are single threaded, in normal interactive use ctx distributes both the rasterizing and compositing workload over multiple threads, further improving performance.
When targeting a microcontroller that has a small enough framebuffer that it is possible to keep RGBA8 instead of RGB565 - making for a smaller but faster work-chunk that is converted once to the scanout encoding. The 8bit versions trades of a little bit of precision for performance, while the floating point versions aim for both precision and performance.
More benchmarks can be seen in blend2d bench - which can be visualized in blend2d's performance comparison page with the custom dataset buttom. The non-SIMD code of ctx holds up well against the SSE/AVX2 competition.