I'm a bit overwhelmed

honey the codewitch

I'm printing "hello\tworld\nfoo bar!" (C string) as a test in 3 different classes of font 1. 16-bit Windows raster - 31 frames per second 2. VLW anti-aliased raster - 15 frames per second 3. Truetype vector - 6 frames a second* varies widely depending on the font, but always slow This leads me to believe I need an LRU caching system for two things Pairs of 32-bit codepoints to glyph metrics for caching kerning tables A single 32-bit codepoint to a glyph for caching the actual renders The issue is this. Every LRU implementation I've seen seems to whip the heap with a lot of little allocations and frees. I don't necessarily want to implement a pool for this, because it causes a lot complication due to the frees. So I have to maybe roll my own LRU cache engine, or at least understand the algorithm beyond the broad concept so I can modify something existing. This is the kind of thing that makes me wish I went to school. My understanding of algorithms is broad, but somewhat shallow especially in the dark corners. I haven't needed to roll my own LRU ever.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

jschell

honey the codewitch wrote:

Every LRU implementation I've seen seems to whip the heap with a lot of little allocations and frees.

That seems a bit backwards. You need a custom heap. LRU is just one way to do that.

honey the codewitch wrote:

Truetype vector

Why don't you do a fixed render of the entire code set before you even use it. Presumably you don't need UNICODE. I also presume you can't just load a rendered set from persistent store. If you did that then your memory size would be fixed.

honey the codewitch wrote:

This is the kind of thing that makes me wish I went to school

I went to school and did not learn anything that would have helped solve this. But you can learn some basics by reading "Algorithms" by Sedgwick. Different editions exists for different programming languages. Seems readable in all of the ones I looked at though. It covers 'algorithms' but also 'data structures'

honey the codewitch

jschell wrote:

Why don't you do a fixed render of the entire code set before you even use it. Presumably you don't need UNICODE.

I already have that. You can use .vlw fonts which are pre-rendered truetype fonts at a particular scale, and subset of characters you want. Sometimes it's not practical, like if you have one bit of code that's meant to run on devices with different size screens (my esp mon project), or if you have a project with a lot of different sized fonts (my core2 speedometer project, and clock project) it can get prohibitive in size, even compared to a single truetype file.

jschell wrote:

You need a custom heap.

Yeah, I'm realizing that, but I don't have it all thought out in terms of freeing. I'm kind of stuck on that.

jschell wrote:

LRU is just one way to do that.

I wouldn't use a least-recently-used algorithm for a custom heap, because you don't arbitrarily free blocks when memory is under pressure, because you'd be invalidating active pointers. I'm using it for a cache, but now that I've implemented it, heap frag is killing me.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

englebart

I was thinking along similar lines, but I know you work on various hardware. I also know that you often don’t need very much help when you are venting! My guess: you are working on an e-paper reader where you will need to render whatever font comes with the e book.

megaadam

Prolly I do not grok the problem here, but... What if you only save one single set of text + glyphs? lastText, lastWinGlyphs ... I wouldn't even call that a cache. I assume we are not talking C++. If we were, the top Goog hit, using std::list + unordered_map, seems fine to me... And being the Devil's advocate, mewonders, if you might be optimising before you have noticed any performance/resource issue?

"If we don't change direction, we'll end up where we're going"

honey the codewitch

std::list and unordered map together would whip the heap harder than my current implementation that already crashes the system due to heap fragmentation. As a rule, it's a terrible idea to use most of the STL on embedded devices. So no "std" containers. And particularly not a linked list. Jesus.

megaadam wrote:

if you might be optimising before you have noticed any performance/resource issue?

7 frames per second without caching 16 with. If it wasn't for the heap frag forcing me to run the cache in extended memory everything would be fine.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

megaadam

Noted. THX!

"If we don't change direction, we'll end up where we're going"

honey the codewitch

I think I may have come up with an idea. My cache has a max_size. I think if I make a private heap of max_size() whenever it is set I may be able to reduce fragmentation. The issue with that is managing removal. Generally when I make a simple memory pool, i allow you to clear it all at once, but not invalidate individual regions. This won't work in this case, because all I'd be doing is moving the fragmentation to the private heap. What if I could make a compact garbage collector on a private heap though, and just indirect my pointers into it by one more level. (pointer to a pointer so I can change the 2nd pointer after I collect - i got the idea from Boehm's collector - an old GC for C that I would use, but it's a bit heavy handed for just a cache)

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

megaadam

I once saw a linked list implementation where the pointers were inside the data objects. I hated it, at first, but I later realised their point. In your case, only if the number of objects allows for a sequential search, you could preallocate a bunch of something like

struct TextGlyph {
char text[LEN];
Glyph g; // or ofc a * if size varies
TextGlyph* next;
}

You would never erase anything, just overwrite data and pointers. But I assume you are already well beyond such naïve stuff, even if tend to love naïve. :)

"If we don't change direction, we'll end up where we're going"

megaadam

Alternatively, even more naïve, if your application knows which texts are the most frequent, you could create a permanent store for the most frequently used Text/Glyphs at startup.

"If we don't change direction, we'll end up where we're going"

honey the codewitch

I actually am doing something similar. Here's my structure:

typedef struct {
int accessed;
gfx::size16 dimensions;
uint8_t* data;
} cache_entry_t;

Accessed just supports the LRU. The data is malloc'd and I thought that was causing my problem so I was going to create a simple "vector" to hold all the bytes for every entry, kind of like a private heap, but then I discovered a memory leak. I fixed the leak, and my problems went away. I did not expect a leak. I'm usually fastidious about free'ing what I allocate. In fact, I type free() at the same time as malloc() so I don't forget. Well, I forgot, apparently.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

Gary Wheeler

honey the codewitch wrote:

run the cache in extended memory

Oh ye gods. That just triggered a flashback to my MS-DOS programming days. I had an app that ran under the DOS4GW DOS extender (the same as the original DOOM). I had extended memory, expanded memory, real-mode memory, protected memory, protected memory mapping of the real-mode address space... :omg:

Software Zen: delete this;

honey the codewitch

That's basically the situation. I have 512KB of primary memory like 8MB of extended memory. However, if you want to use more than 4MB of it, you need to page. Remember Himem.sys? Yeah. That.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

Gary Wheeler

honey the codewitch wrote:

Himem.sys

Flashback2...

Software Zen: delete this;

jschell

honey the codewitch wrote:

I wouldn't use a least-recently-used algorithm for a custom heap

Algorithm 1 - Render the entire font specifically has needed. This takes a fixed amount of space when done. 2 - Then allocate only that space - fixed size 3 - Copy rendered into that space. 4 - Discard original space. If 2 is not true then I can only suppose that you are not actually rendering what you need. Make a decision to either be flexible and then mess with memory problems. Or go with an actual fixed size which is probably going to be good enough. If you set up the memory map correctly then the font comes from the fixed space and thinks it is coming from the heap. If you do it very carefully you can even eliminate some unneeded overhead that a real heap would have.

honey the codewitch

The issue with fixed size is non-monospaced fonts requiring varying glyph sizes, and while I could use the "largest" size for each, rendering the transparency map to the screen requires alpha-blending - non-trivial. So the smaller the glyphs the better. I found in my tests that I had a memory leak. Haven't seen many of those in recent years, but here we are. I fixed that, sorted my cache problem, and wrote a screen full of text in about 17KB of cache, truetype. Heap frag wasn't as big an issue as I thought - the leak was. What I'm currently doing is I allocate a buffer for the first glyph. I render the glyph as an alpha transparency map, and then send that to the draw callback which puts in on the display. If the next glyph is bigger, I realloc() the buffer, and go for it again. None of this happens if I find it in the cache first. The callback never gets called, nor does the glyph copy memory get allocated (if everything is already cached) This takes a small amount of memory for the (uncached) rendering on a glyph per glyph basis and seems to work well. Caching improves the results significantly (a factor of 2 at least) when it doesn't have to expire, and moderately when it does. So I'm ready to move on to the next thing for now.

Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix