The default RGB color space for both device and user is sRGB. Thus code from elsewhere specifying sRGB colors will work as expected. By adding an RGB matrix display profile in /tmp/ctx.icc the SDL,DRM and fbdev backends use the display space instead of sRGB for compositing.
TODO: color manage conversions between CMYK and RGB
The ctx vector rasterizer is an active edge table scanline rasterizer, with adaptive vertical oversampling. The oversampling only occurs for scanlines where edges are closer to horizontal than a threshold or edges start or end. For a description of how traditional scanline rasterizer with vertical oversampling works see the introduction of How the stb_truetype Anti-Aliased Software Rasterizer v2 Works.
As can be seen in test renders on the bottom of this page, the adaptive renders produce similar results to the non-adaptive - but they are faster. It makes sense to render interactive animations with lower AA settings and rerendering at full quality when the UI settles. Due to bitrot; the adaptive part of rendering has been turned off in favor of 3/5 levels of vertical supersampling in ctx, the renders on this page was made when it was working well.
At the moment the rasterizer generates high quality 8bit masks, this also works well also in floating point; but full floating point is desirable and would be best achieved with a renderer similar to v2 of the stb_trutetype rasterizer. The current approach will be much slower when doing to satisfaction for u16.
Render targets handled natively are 8bit sRGBA RGBA8, floating point scRGB RGBAF 8bit and floating point grayscale with alpha. GRAYA8 and GRAYAF and floating point CMYK CMYKAF. Integration points are catered for in API and protocol for color management, which will be done with babl. The formats RGB332, RGB565, RGB565_BYTESWAPPED, CMYKA8, RGB8, BGRA8, GRAY1, GRAY2, GRAY4, GRAY8 and GRAYF are handled by converting processed scanlines back and forth to one of the supported targets.
The compositing is written generically for N number of components in u8 or float. Pre-processor acrobatics is used to make the compiler able to do inlining and code-elimination. (work is in progress on an AVX2 version of the generic u8, eventually SIMD through intrinsics will be attempted also for floating point - but is less pressing there since autovectorization works better on float than u8.
static void __ctx_u8_porter_duff (CtxRasterizer *rasterizer, int components, uint8_t * dst, uint8_t * src, int x0, uint8_t * coverage, int count, CtxCompositingMode compositing_mode, CtxFragment fragment, CtxBlend blend);
Overrides for RGBA8 for sourceOver and normal are provided separately, with compiletime optional AVX2 acceleration. The prototype of all the innerloop functions follow are:
void ctx_composite_pixels (CtxRasterizer *rasterizer, uint8_t *dst, uint8_t *src, int x0, uint8_t *coverage, int count);
The following are two images from the test suite used for evaluating and verifying the possible antialiasing modes.
85x3 adaptive CTX_ANTIALIAS_FAST
In the following graphic, we want symmetric artifacts, and can observe the differences in vertical fidelity by the quantization in the near horizontal spokes. The horizontal fidelity can be examined in the vertical spokes - by counting how many steps the gradient gets.
ctx is designed from the beginning to act as a software GPU for modern microcontrollers, some of which are more powerful than the PCs in the mid 90s. Different optimization settings of the compiler give a wide range for the performance/binary size tradeoff. But the small footprint in RAM and ability to use a shared read-only display list makes the ctx rasterizer core also useful in multi-threaded rendering.
font data size: 12186 bytes (A sans font subsetted to only ASCII) RGBA8 rasterizer: 38869 bytes (-Os ~38kb - with many features disabled ~30kb -O0 90-631kb -O2 57-114kb -Ofast/-O3 76-181kb, size (and compiletime) bump due to SIMD tree-vectorizer) ctx parser: 16384 bytes (not needed for direct use from C, but also on embedded this can be useful for ease of integration with other languages or directly using ctx+mictrocontroller+display as a serial display.)
Even more agressive optimization with exponential compile time has been tested by forcing inlining and separate compilation of all combinations of blend, composite and image source modes - building ctx then takes 5-6minutes and the resulting binary is close to a megabyte, performance gains exist but in practice they are near neglible.
The RAM requirements are small and by tuning the engine to have only a couple of save/restore states, and paths with fewer than 256 edges, the total RAM footprint of the rasterizer can be as low as ~5kb on 32bit platforms, the parser for the ctx protocol needs an additional 1kb. Where framebuffer is too large to fit in RAM, the allocation needed for scanline(s) must be wheighed against RAM needed for renderstream. Commands take a multiple of 9bytes, there is code/provisions for runtime compacting of the renderstream in prior git revisions.
This table contains the time it takes to fill a 512x512px buffer with an inscribed circle with various fill sources for all of ctx' supported pixelformat target encodings. Tests done on battery - the race between ctx and cairo is close enough that for solid RGBA8 fills ctx is faster on battery and cairo/pixmans is faster on AC. (proably related to available cpu frequency boosting.).
|format||color a=1.0||color a=0.75||lgrad||rgrad||sAtop a=1.0||sAtop a=0.75||sAtop lgrad||sAtop rgrad|
smaller numbers are better, numbers are time to render one frame in us,
All of these tests are single threaded, in normal interactive use ctx contexts would use paralell rendering, the rasterization happens twice but the compositing wors is fully parallelized.
As seen with the variation with AA=5 and AA=3, tuning the cost of the rasterizer down makes ctx compare OK with cairo, the visual fidelity of the default - and high quality - aa in cairo is equivalent to ctx' AA=15; which is the default value used if not specified. There is still refactoring room to improve the performance/memory access patterns of the rasterizer.
When targeting a microcontroller that has a small enough framebuffer that it is possible to keep RGBA8 instead of RGB565, to keep number of intermediate conversions between RGB565 and RGBA8 down, one should use RGBA8 and convert on-the-fly when copying out, for better performance, some of this might later be done automatically by ctx; doing this has the advantage of higher quality compositing - this is similar to how 16bit RGB could be used to get linear compositing alternates for RGBA8 formats.