ctx rasterizer

Small footprint
Can be tuned for microcontrollers down to ~7kb of RAM + 40kb of code + 12kb of fontdata, combined with immediate mode UI that can be re-run, it is sufficient to have a framebuffer covering one or a few scanlines. More RAM permits more flexible arrangement of more components using the CTX protocol.
Portable
The core rasterizer is C99 code that also compiles as C++, with a makefile build system and only optional dependencies, and thus should run on most 32bit CPU architectures.

ctx supports grayscale, RGB and CMYK color models, all of which can be used and freely mixed while drawing. Conversion to the device/compositing representation is done during rasterization / rendering; at this point conversion between ICC matrix profiles for RGB spaces is currently supported when babl support is built in; making a hard-coded set of primaries known to match the specific display used - without babl - would be nice for microcontroller use.

The default RGB color space for both device and user is sRGB. Thus code from elsewhere specifying sRGB colors will work as expected. By adding an RGB matrix display profile in /tmp/ctx.icc the SDL,DRM and fbdev backends use the display space instead of sRGB for compositing.

TODO: color manage conversions between CMYK and RGB

The ctx vector rasterizer is an active edge table scanline rasterizer, with adaptive vertical oversampling. If there is starting/ending edges within the scanline or very near horizontal edges 15 levels of vertical AA is used, 5 and 3 are used to progressively get towards the non-versampled but accrurate horizontally single scanline. For a description of how traditional scanline rasterizer with vertical oversampling works see the introduction of How the stb_truetype Anti-Aliased Software Rasterizer v2 Works.

Render targets handled natively are 8bit sRGBA RGBA8, floating point scRGB RGBAF 8bit and floating point grayscale with alpha. GRAYA8 and GRAYAF and floating point CMYK CMYKAF. Integration points are catered for in API and protocol for color management, which will be done with babl. The formats RGB332, RGB565, RGB565_BYTESWAPPED, CMYKA8, RGB8, BGRA8, GRAY1, GRAY2, GRAY4, GRAY8 and GRAYF are handled by converting processed scanlines back and forth to one of the supported targets.

static void
__ctx_u8_porter_duff (CtxRasterizer         *rasterizer,
                     int                    components,
                     uint8_t *              dst,
                     uint8_t *              src,
                     int                    x0,
                     uint8_t *              coverage,
                     int                    count,
                     CtxCompositingMode     compositing_mode,
                     CtxFragment            fragment,
                     CtxBlend               blend);
       

Overrides for RGBA8 for sourceOver and normal are provided separately, with compiletime optional AVX2 acceleration. The prototype of all the innerloop functions follow are:

void ctx_composite_pixels (CtxRasterizer *rasterizer,
                           uint8_t       *dst,
                           uint8_t       *src,
                           int            x0,
                           uint8_t       *coverage,
                           int            count);
        

The following are two images from the test suite used for evaluating and verifying the possible antialiasing modes.

optimization vs binary size

ctx is designed from the beginning to act as a software GPU for modern microcontrollers, some of which are more powerful than the PCs in the mid 90s. Different optimization settings of the compiler give a wide range for the performance/binary size tradeoff. But the small footprint in RAM and ability to use a shared read-only display list makes the ctx rasterizer core also useful in multi-threaded rendering.

font data size:    12186 bytes (A sans font subsetted to only ASCII,
                                latin1 ~= 23kb )
RGBA8 rasterizer:  33054 bytes (-Os ~38kb - with many features disabled ~30kb
                                -O0  90-631kb
                                -O2  57-114kb
                                -Ofast/-O3  76-181kb, size (and compiletime)
                                bump due to SIMD tree-vectorizer)

ctx parser:        24608 bytes (not needed for direct use from C, but also
                                on embedded this can be useful for ease of
                                integration with other languages or directly
                                using ctx+mictrocontroller+display as a serial
                                display.)

The RAM requirements are small and by tuning the engine to have only a couple of save/restore states, and paths with fewer than 256 edges, the total RAM footprint of the rasterizer can be as low as ~5kb on 32bit platforms, the parser for the ctx protocol needs an additional 1kb. Where framebuffer is too large to fit in RAM, the allocation needed for scanline(s) must be wheighed against RAM needed for renderstream. Commands take a multiple of 9bytes, there is code/provisions for runtime compacting of the renderstream in prior git revisions.

is ctx fast yet?

This table contains the time it takes to fill a 512x512px buffer with an inscribed circle with various fill sources for all of ctx' supported pixelformat target encodings, these measurements are done with a locked cpu frequency of 2.5ghz on an i7-8650U

format color a=1.0 color a=0.75 lgrad rgrad texture sCopy a=1.0 sCopy a=0.75 sCopy lgrad sCopy rgrad sCopy texture
cairo2314784534655646023123145716590227
RGBA8281317681107088327928016562503999
BGRA83744127751165987374374175025971093
RGB56549853889912891098499498187927231228
RGB565_BS51655691613061118516515189827431252
RGBAF62762635304208601816591660547755474686
GRAYA8248249647477247200495496443229214719
GRAY11113111576688600810713871387500531185387
GRAY8276280683477797231527527416022774522
RGB33254658694813391149546547194927711273
GRAYAF45545559826077517915551556567859184728
CMYKAF722724106748356808621072101942466866498
CMYKA833793989137011156910702476344941246898889230
GRAY21498150280498984846617741774539835045745
GRAY41338134278718819829316161615521933415582
GRAYF57557560976195532216771679529455894436
RGB847351287212631076473474185026971204
CMYK82629262912494102359915399939991126685548439

smaller numbers are better, numbers are time to render one frame in us,

some observations on the above

All of these tests are single threaded, in normal interactive use ctx distributes both the rasterizing and compositing workload over multiple threads, further improving performance.

When targeting a microcontroller that has a small enough framebuffer that it is possible to keep RGBA8 instead of RGB565 - making for a smaller but faster work-chunk that is converted once to the scanout encoding. The 8bit versions trades of a little bit of precision for performance, while the floating point versions aim for both precision and performance.

More benchmarks can be seen in blend2d bench - which can be visualized in blend2d's performance comparison page with the custom dataset buttom. The non-SIMD code of ctx holds up well against the SSE/AVX2 competition.