ctx rasterizer

The ctx vector rasterizer is an active edge table scanline rasterizer, with per-scanline choice between related rasterization strategies.

Scanlines where the number of active edges change within the scanline are rasterized with 15 level vertical oversampling, this is the fallback when other strategies fail us, and it is expensive - the algorithms worst case scanline.

If the slope of all scanlines crossing the scanline are steeper than 45 degrees and it is cheap to estimate coverage, this is the best case.

If our goal is to compute accurate 8bit coverage, which is the current high quality output of ctx' rasterizer, we use either 3, 5 or 15 levels of oversampling, which is a medium expensive rasterization strategy. For the same cases a different faster rasterizer is used by default, starting instead a bit lower than 45 degrees; around the point where the horizontal aliasing results in two grayscale pixels we use an adapted scanline handler similar the best-case, that can compute the gray-levels of coverage interspersed with opaque and transparent spans.

For an introduction to how scanline rasterizerization and vertical oversampling works oversampling works, which might prove useful in understanding the above explaination. See How the stb_truetype Anti-Aliased Software Rasterizer v2 Works.

Render targets handled natively are 8bit sRGBA RGBA8, floating point scRGB RGBAF 8bit and floating point grayscale with alpha. GRAYA8 and GRAYAF and floating point CMYK CMYKAF. Integration points are catered for in API and protocol for color management, which will be done with babl. The formats RGB332, RGB565, RGB565_BYTESWAPPED, CMYKA8, RGB8, GRAY1, GRAY2, GRAY4, GRAY8 and GRAYF are handled by converting processed scanlines back and forth to one of the supported targets. BGRA8 is handled by swapping components in the compositing source.

ctx supports grayscale, RGB and CMYK color models, all of which can be used and freely mixed while drawing. Conversion to the device/compositing representation is done during rasterization / rendering; at this point conversion between ICC matrix profiles for RGB spaces is currently supported when babl support is built in; making a hard-coded set of primaries known to match the specific display used - without babl - would be nice for microcontroller use.

The default RGB color space for both device and user is sRGB. Thus code from elsewhere specifying sRGB colors will work as expected. By adding an RGB matrix display profile in /tmp/ctx.icc the SDL,KMS and fbdev backends use the display space instead of sRGB for compositing.

optimization vs binary size

ctx is designed from the beginning to act as a software GPU for modern microcontrollers, some of which are more powerful than the PCs in the mid 90s. Ctx xan be tuned for microcontrollers down to ~7kb of RAM + 42kb of code + 12kb of fontdata, combined with immediate mode UI that can be re-run, it is sufficient to have a framebuffer covering one or a few scanlines. More RAM permits more flexible arrangement of more components like the parser for the text version of the CTX protocol. The resource constrained programming suitable for a microcontroller is also suitable for rendering cores as represented by threads, in this scenario all the rendering threads render from a shared read-only drawlist, while sharing textures.

font data size:    18027 bytes (A sans font subsetted to only ASCII,
                                latin1 ~= 33kb )
RGBA8 rasterizer:  43597 bytes (compiled with -Os, can triple in size with -O3)
ctx parser:        24608 bytes (not needed for direct use from C, but also
                                on embedded this can be useful for ease of
                                integration with other languages or directly
                                using ctx+mictrocontroller+display as a serial
                                display.)

The RAM requirements are small and by tuning the engine to have only a couple of save/restore states, and paths with fewer than 256 edges, the total RAM footprint of the rasterizer can be as low as ~5kb on 32bit platforms, the parser for the ctx protocol needs an additional 1kb. Where framebuffer is too large to fit in RAM, the allocation needed for scanline(s) must be wheighed against RAM needed for renderstream. Commands take a multiple of 9bytes, there is code/provisions for runtime compacting of the renderstream in prior git revisions.

is ctx fast yet?

This table contains the time it takes to fill a 512x512px buffer with an inscribed circle with various fill sources for all of ctx' supported pixelformat target encodings, these measurements are done with a locked cpu frequency of 2.5ghz on an i7-8650U

format color a=1.0 color a=0.75 lgrad rgrad texture sCopy a=1.0 sCopy a=0.75 sCopy lgrad sCopy rgrad sCopy texture
cairo2936025724840328829028957498302288
RGBA816523676517903671661646131582165
BGRA816323676718136231651646141650166
RGB56519326480117834041941936501681196
RGB565_BS19626480517814031951966561686197
RGBAF98698541324664791415411541154215441539
GRAYA824624246986785229920091999379258172005
GRAY11049105355517618315328382838456466232799
GRAY828128447926975236720822074388659012073
RGB33219926781018584132012006591694201
GRAYAF397398726274127820700699702703699
CMYKAF80080113101104171083920512026209320852040
CMYKA83736373216725140721390249215596560856444950
GRAY21381138158738002349831613160495170073168
GRAY41223122858317869329530093010481668643018
GRAYF544543745876187994855853858859854
RGB818926279818334041901886481621189
CMYK82865287515161123871276540964053406940794106

numbers are execution time, smaller numbers are better. ctx can render all its sources to all pixel formats, RGBA8 is the format that has received most optimization; since it is the basis also for RGB565 and other UI formats. GEGL is using RGBAF and CMYKAF - which also should get optimizations.

When targeting a microcontroller that has a small enough framebuffer that it is possible to keep RGBA8 instead of RGB565 - making for a smaller but faster work-chunk that is converted once to the scanout encoding. The default antialiasing of ctx sacrifices some accuracy for performance, by using ctx_antialias to set higher quality, accurate antialiasing similar to cairo's is achieved - at a level of performance for solid color that is between default ctx and cairo.

More detailed benchmarking and comparison with blend2d, AGG, qt and cairo can be seen in a blend2d-bench dataset - that can be visualized in blend2d's performance comparison page by using the custom dataset functionality. The non-SIMD code of ctx holds up well against the SSE/AVX2 powered competition