Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. I'm a bit overwhelmed

I'm a bit overwhelmed

Scheduled Pinned Locked Moved The Lounge
graphicsdesigncomiotalgorithms
16 Posts 5 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H Offline
    H Offline
    honey the codewitch
    wrote on last edited by
    #1

    I'm printing "hello\tworld\nfoo bar!" (C string) as a test in 3 different classes of font 1. 16-bit Windows raster - 31 frames per second 2. VLW anti-aliased raster - 15 frames per second 3. Truetype vector - 6 frames a second* varies widely depending on the font, but always slow This leads me to believe I need an LRU caching system for two things Pairs of 32-bit codepoints to glyph metrics for caching kerning tables A single 32-bit codepoint to a glyph for caching the actual renders The issue is this. Every LRU implementation I've seen seems to whip the heap with a lot of little allocations and frees. I don't necessarily want to implement a pool for this, because it causes a lot complication due to the frees. So I have to maybe roll my own LRU cache engine, or at least understand the algorithm beyond the broad concept so I can modify something existing. This is the kind of thing that makes me wish I went to school. My understanding of algorithms is broad, but somewhat shallow especially in the dark corners. I haven't needed to roll my own LRU ever.

    Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

    J M 2 Replies Last reply
    0
    • H honey the codewitch

      I'm printing "hello\tworld\nfoo bar!" (C string) as a test in 3 different classes of font 1. 16-bit Windows raster - 31 frames per second 2. VLW anti-aliased raster - 15 frames per second 3. Truetype vector - 6 frames a second* varies widely depending on the font, but always slow This leads me to believe I need an LRU caching system for two things Pairs of 32-bit codepoints to glyph metrics for caching kerning tables A single 32-bit codepoint to a glyph for caching the actual renders The issue is this. Every LRU implementation I've seen seems to whip the heap with a lot of little allocations and frees. I don't necessarily want to implement a pool for this, because it causes a lot complication due to the frees. So I have to maybe roll my own LRU cache engine, or at least understand the algorithm beyond the broad concept so I can modify something existing. This is the kind of thing that makes me wish I went to school. My understanding of algorithms is broad, but somewhat shallow especially in the dark corners. I haven't needed to roll my own LRU ever.

      Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

      J Offline
      J Offline
      jschell
      wrote on last edited by
      #2

      honey the codewitch wrote:

      Every LRU implementation I've seen seems to whip the heap with a lot of little allocations and frees.

      That seems a bit backwards. You need a custom heap. LRU is just one way to do that.

      honey the codewitch wrote:

      Truetype vector

      Why don't you do a fixed render of the entire code set before you even use it. Presumably you don't need UNICODE. I also presume you can't just load a rendered set from persistent store. If you did that then your memory size would be fixed.

      honey the codewitch wrote:

      This is the kind of thing that makes me wish I went to school

      I went to school and did not learn anything that would have helped solve this. But you can learn some basics by reading "Algorithms" by Sedgwick. Different editions exists for different programming languages. Seems readable in all of the ones I looked at though. It covers 'algorithms' but also 'data structures'

      H E 2 Replies Last reply
      0
      • J jschell

        honey the codewitch wrote:

        Every LRU implementation I've seen seems to whip the heap with a lot of little allocations and frees.

        That seems a bit backwards. You need a custom heap. LRU is just one way to do that.

        honey the codewitch wrote:

        Truetype vector

        Why don't you do a fixed render of the entire code set before you even use it. Presumably you don't need UNICODE. I also presume you can't just load a rendered set from persistent store. If you did that then your memory size would be fixed.

        honey the codewitch wrote:

        This is the kind of thing that makes me wish I went to school

        I went to school and did not learn anything that would have helped solve this. But you can learn some basics by reading "Algorithms" by Sedgwick. Different editions exists for different programming languages. Seems readable in all of the ones I looked at though. It covers 'algorithms' but also 'data structures'

        H Offline
        H Offline
        honey the codewitch
        wrote on last edited by
        #3

        jschell wrote:

        Why don't you do a fixed render of the entire code set before you even use it. Presumably you don't need UNICODE.

        I already have that. You can use .vlw fonts which are pre-rendered truetype fonts at a particular scale, and subset of characters you want. Sometimes it's not practical, like if you have one bit of code that's meant to run on devices with different size screens (my esp mon project), or if you have a project with a lot of different sized fonts (my core2 speedometer project, and clock project) it can get prohibitive in size, even compared to a single truetype file.

        jschell wrote:

        You need a custom heap.

        Yeah, I'm realizing that, but I don't have it all thought out in terms of freeing. I'm kind of stuck on that.

        jschell wrote:

        LRU is just one way to do that.

        I wouldn't use a least-recently-used algorithm for a custom heap, because you don't arbitrarily free blocks when memory is under pressure, because you'd be invalidating active pointers. I'm using it for a cache, but now that I've implemented it, heap frag is killing me.

        Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

        J 1 Reply Last reply
        0
        • J jschell

          honey the codewitch wrote:

          Every LRU implementation I've seen seems to whip the heap with a lot of little allocations and frees.

          That seems a bit backwards. You need a custom heap. LRU is just one way to do that.

          honey the codewitch wrote:

          Truetype vector

          Why don't you do a fixed render of the entire code set before you even use it. Presumably you don't need UNICODE. I also presume you can't just load a rendered set from persistent store. If you did that then your memory size would be fixed.

          honey the codewitch wrote:

          This is the kind of thing that makes me wish I went to school

          I went to school and did not learn anything that would have helped solve this. But you can learn some basics by reading "Algorithms" by Sedgwick. Different editions exists for different programming languages. Seems readable in all of the ones I looked at though. It covers 'algorithms' but also 'data structures'

          E Offline
          E Offline
          englebart
          wrote on last edited by
          #4

          I was thinking along similar lines, but I know you work on various hardware. I also know that you often don’t need very much help when you are venting! My guess: you are working on an e-paper reader where you will need to render whatever font comes with the e book.

          1 Reply Last reply
          0
          • H honey the codewitch

            I'm printing "hello\tworld\nfoo bar!" (C string) as a test in 3 different classes of font 1. 16-bit Windows raster - 31 frames per second 2. VLW anti-aliased raster - 15 frames per second 3. Truetype vector - 6 frames a second* varies widely depending on the font, but always slow This leads me to believe I need an LRU caching system for two things Pairs of 32-bit codepoints to glyph metrics for caching kerning tables A single 32-bit codepoint to a glyph for caching the actual renders The issue is this. Every LRU implementation I've seen seems to whip the heap with a lot of little allocations and frees. I don't necessarily want to implement a pool for this, because it causes a lot complication due to the frees. So I have to maybe roll my own LRU cache engine, or at least understand the algorithm beyond the broad concept so I can modify something existing. This is the kind of thing that makes me wish I went to school. My understanding of algorithms is broad, but somewhat shallow especially in the dark corners. I haven't needed to roll my own LRU ever.

            Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

            M Offline
            M Offline
            megaadam
            wrote on last edited by
            #5

            Prolly I do not grok the problem here, but... What if you only save one single set of text + glyphs? lastText, lastWinGlyphs ... I wouldn't even call that a cache. I assume we are not talking C++. If we were, the top Goog hit, using std::list + unordered_map, seems fine to me... And being the Devil's advocate, mewonders, if you might be optimising before you have noticed any performance/resource issue?

            "If we don't change direction, we'll end up where we're going"

            H 1 Reply Last reply
            0
            • M megaadam

              Prolly I do not grok the problem here, but... What if you only save one single set of text + glyphs? lastText, lastWinGlyphs ... I wouldn't even call that a cache. I assume we are not talking C++. If we were, the top Goog hit, using std::list + unordered_map, seems fine to me... And being the Devil's advocate, mewonders, if you might be optimising before you have noticed any performance/resource issue?

              "If we don't change direction, we'll end up where we're going"

              H Offline
              H Offline
              honey the codewitch
              wrote on last edited by
              #6

              std::list and unordered map together would whip the heap harder than my current implementation that already crashes the system due to heap fragmentation. As a rule, it's a terrible idea to use most of the STL on embedded devices. So no "std" containers. And particularly not a linked list. Jesus.

              megaadam wrote:

              if you might be optimising before you have noticed any performance/resource issue?

              7 frames per second without caching 16 with. If it wasn't for the heap frag forcing me to run the cache in extended memory everything would be fine.

              Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

              M G 2 Replies Last reply
              0
              • H honey the codewitch

                std::list and unordered map together would whip the heap harder than my current implementation that already crashes the system due to heap fragmentation. As a rule, it's a terrible idea to use most of the STL on embedded devices. So no "std" containers. And particularly not a linked list. Jesus.

                megaadam wrote:

                if you might be optimising before you have noticed any performance/resource issue?

                7 frames per second without caching 16 with. If it wasn't for the heap frag forcing me to run the cache in extended memory everything would be fine.

                Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                M Offline
                M Offline
                megaadam
                wrote on last edited by
                #7

                Noted. THX!

                "If we don't change direction, we'll end up where we're going"

                H 1 Reply Last reply
                0
                • M megaadam

                  Noted. THX!

                  "If we don't change direction, we'll end up where we're going"

                  H Offline
                  H Offline
                  honey the codewitch
                  wrote on last edited by
                  #8

                  I think I may have come up with an idea. My cache has a max_size. I think if I make a private heap of max_size() whenever it is set I may be able to reduce fragmentation. The issue with that is managing removal. Generally when I make a simple memory pool, i allow you to clear it all at once, but not invalidate individual regions. This won't work in this case, because all I'd be doing is moving the fragmentation to the private heap. What if I could make a compact garbage collector on a private heap though, and just indirect my pointers into it by one more level. (pointer to a pointer so I can change the 2nd pointer after I collect - i got the idea from Boehm's collector - an old GC for C that I would use, but it's a bit heavy handed for just a cache)

                  Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                  M 2 Replies Last reply
                  0
                  • H honey the codewitch

                    I think I may have come up with an idea. My cache has a max_size. I think if I make a private heap of max_size() whenever it is set I may be able to reduce fragmentation. The issue with that is managing removal. Generally when I make a simple memory pool, i allow you to clear it all at once, but not invalidate individual regions. This won't work in this case, because all I'd be doing is moving the fragmentation to the private heap. What if I could make a compact garbage collector on a private heap though, and just indirect my pointers into it by one more level. (pointer to a pointer so I can change the 2nd pointer after I collect - i got the idea from Boehm's collector - an old GC for C that I would use, but it's a bit heavy handed for just a cache)

                    Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                    M Offline
                    M Offline
                    megaadam
                    wrote on last edited by
                    #9

                    I once saw a linked list implementation where the pointers were inside the data objects. I hated it, at first, but I later realised their point. In your case, only if the number of objects allows for a sequential search, you could preallocate a bunch of something like

                    struct TextGlyph {
                    char text[LEN];
                    Glyph g; // or ofc a * if size varies
                    TextGlyph* next;
                    }

                    You would never erase anything, just overwrite data and pointers. But I assume you are already well beyond such naïve stuff, even if tend to love naïve. :)

                    "If we don't change direction, we'll end up where we're going"

                    H 1 Reply Last reply
                    0
                    • H honey the codewitch

                      I think I may have come up with an idea. My cache has a max_size. I think if I make a private heap of max_size() whenever it is set I may be able to reduce fragmentation. The issue with that is managing removal. Generally when I make a simple memory pool, i allow you to clear it all at once, but not invalidate individual regions. This won't work in this case, because all I'd be doing is moving the fragmentation to the private heap. What if I could make a compact garbage collector on a private heap though, and just indirect my pointers into it by one more level. (pointer to a pointer so I can change the 2nd pointer after I collect - i got the idea from Boehm's collector - an old GC for C that I would use, but it's a bit heavy handed for just a cache)

                      Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                      M Offline
                      M Offline
                      megaadam
                      wrote on last edited by
                      #10

                      Alternatively, even more naïve, if your application knows which texts are the most frequent, you could create a permanent store for the most frequently used Text/Glyphs at startup.

                      "If we don't change direction, we'll end up where we're going"

                      1 Reply Last reply
                      0
                      • M megaadam

                        I once saw a linked list implementation where the pointers were inside the data objects. I hated it, at first, but I later realised their point. In your case, only if the number of objects allows for a sequential search, you could preallocate a bunch of something like

                        struct TextGlyph {
                        char text[LEN];
                        Glyph g; // or ofc a * if size varies
                        TextGlyph* next;
                        }

                        You would never erase anything, just overwrite data and pointers. But I assume you are already well beyond such naïve stuff, even if tend to love naïve. :)

                        "If we don't change direction, we'll end up where we're going"

                        H Offline
                        H Offline
                        honey the codewitch
                        wrote on last edited by
                        #11

                        I actually am doing something similar. Here's my structure:

                        typedef struct {
                        int accessed;
                        gfx::size16 dimensions;
                        uint8_t* data;
                        } cache_entry_t;

                        Accessed just supports the LRU. The data is malloc'd and I thought that was causing my problem so I was going to create a simple "vector" to hold all the bytes for every entry, kind of like a private heap, but then I discovered a memory leak. I fixed the leak, and my problems went away. I did not expect a leak. I'm usually fastidious about free'ing what I allocate. In fact, I type free() at the same time as malloc() so I don't forget. Well, I forgot, apparently.

                        Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                        1 Reply Last reply
                        0
                        • H honey the codewitch

                          std::list and unordered map together would whip the heap harder than my current implementation that already crashes the system due to heap fragmentation. As a rule, it's a terrible idea to use most of the STL on embedded devices. So no "std" containers. And particularly not a linked list. Jesus.

                          megaadam wrote:

                          if you might be optimising before you have noticed any performance/resource issue?

                          7 frames per second without caching 16 with. If it wasn't for the heap frag forcing me to run the cache in extended memory everything would be fine.

                          Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                          G Offline
                          G Offline
                          Gary Wheeler
                          wrote on last edited by
                          #12

                          honey the codewitch wrote:

                          run the cache in extended memory

                          Oh ye gods. That just triggered a flashback to my MS-DOS programming days. I had an app that ran under the DOS4GW DOS extender (the same as the original DOOM). I had extended memory, expanded memory, real-mode memory, protected memory, protected memory mapping of the real-mode address space... :omg:

                          Software Zen: delete this;

                          H 1 Reply Last reply
                          0
                          • G Gary Wheeler

                            honey the codewitch wrote:

                            run the cache in extended memory

                            Oh ye gods. That just triggered a flashback to my MS-DOS programming days. I had an app that ran under the DOS4GW DOS extender (the same as the original DOOM). I had extended memory, expanded memory, real-mode memory, protected memory, protected memory mapping of the real-mode address space... :omg:

                            Software Zen: delete this;

                            H Offline
                            H Offline
                            honey the codewitch
                            wrote on last edited by
                            #13

                            That's basically the situation. I have 512KB of primary memory like 8MB of extended memory. However, if you want to use more than 4MB of it, you need to page. Remember Himem.sys? Yeah. That.

                            Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                            G 1 Reply Last reply
                            0
                            • H honey the codewitch

                              That's basically the situation. I have 512KB of primary memory like 8MB of extended memory. However, if you want to use more than 4MB of it, you need to page. Remember Himem.sys? Yeah. That.

                              Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                              G Offline
                              G Offline
                              Gary Wheeler
                              wrote on last edited by
                              #14

                              honey the codewitch wrote:

                              Himem.sys

                              Flashback2...

                              Software Zen: delete this;

                              1 Reply Last reply
                              0
                              • H honey the codewitch

                                jschell wrote:

                                Why don't you do a fixed render of the entire code set before you even use it. Presumably you don't need UNICODE.

                                I already have that. You can use .vlw fonts which are pre-rendered truetype fonts at a particular scale, and subset of characters you want. Sometimes it's not practical, like if you have one bit of code that's meant to run on devices with different size screens (my esp mon project), or if you have a project with a lot of different sized fonts (my core2 speedometer project, and clock project) it can get prohibitive in size, even compared to a single truetype file.

                                jschell wrote:

                                You need a custom heap.

                                Yeah, I'm realizing that, but I don't have it all thought out in terms of freeing. I'm kind of stuck on that.

                                jschell wrote:

                                LRU is just one way to do that.

                                I wouldn't use a least-recently-used algorithm for a custom heap, because you don't arbitrarily free blocks when memory is under pressure, because you'd be invalidating active pointers. I'm using it for a cache, but now that I've implemented it, heap frag is killing me.

                                Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                                J Offline
                                J Offline
                                jschell
                                wrote on last edited by
                                #15

                                honey the codewitch wrote:

                                I wouldn't use a least-recently-used algorithm for a custom heap

                                Algorithm 1 - Render the entire font specifically has needed. This takes a fixed amount of space when done. 2 - Then allocate only that space - fixed size 3 - Copy rendered into that space. 4 - Discard original space. If 2 is not true then I can only suppose that you are not actually rendering what you need. Make a decision to either be flexible and then mess with memory problems. Or go with an actual fixed size which is probably going to be good enough. If you set up the memory map correctly then the font comes from the fixed space and thinks it is coming from the heap. If you do it very carefully you can even eliminate some unneeded overhead that a real heap would have.

                                H 1 Reply Last reply
                                0
                                • J jschell

                                  honey the codewitch wrote:

                                  I wouldn't use a least-recently-used algorithm for a custom heap

                                  Algorithm 1 - Render the entire font specifically has needed. This takes a fixed amount of space when done. 2 - Then allocate only that space - fixed size 3 - Copy rendered into that space. 4 - Discard original space. If 2 is not true then I can only suppose that you are not actually rendering what you need. Make a decision to either be flexible and then mess with memory problems. Or go with an actual fixed size which is probably going to be good enough. If you set up the memory map correctly then the font comes from the fixed space and thinks it is coming from the heap. If you do it very carefully you can even eliminate some unneeded overhead that a real heap would have.

                                  H Offline
                                  H Offline
                                  honey the codewitch
                                  wrote on last edited by
                                  #16

                                  The issue with fixed size is non-monospaced fonts requiring varying glyph sizes, and while I could use the "largest" size for each, rendering the transparency map to the screen requires alpha-blending - non-trivial. So the smaller the glyphs the better. I found in my tests that I had a memory leak. Haven't seen many of those in recent years, but here we are. I fixed that, sorted my cache problem, and wrote a screen full of text in about 17KB of cache, truetype. Heap frag wasn't as big an issue as I thought - the leak was. What I'm currently doing is I allocate a buffer for the first glyph. I render the glyph as an alpha transparency map, and then send that to the draw callback which puts in on the display. If the next glyph is bigger, I realloc() the buffer, and go for it again. None of this happens if I find it in the cache first. The callback never gets called, nor does the glyph copy memory get allocated (if everything is already cached) This takes a small amount of memory for the (uncached) rendering on a glyph per glyph basis and seems to work well. Caching improves the results significantly (a factor of 2 at least) when it doesn't have to expire, and moderately when it does. So I'm ready to move on to the next thing for now.

                                  Check out my IoT graphics library here: https://honeythecodewitch.com/gfx And my IoT UI/User Experience library here: https://honeythecodewitch.com/uix

                                  1 Reply Last reply
                                  0
                                  Reply
                                  • Reply as topic
                                  Log in to reply
                                  • Oldest to Newest
                                  • Newest to Oldest
                                  • Most Votes


                                  • Login

                                  • Don't have an account? Register

                                  • Login or register to search.
                                  • First post
                                    Last post
                                  0
                                  • Categories
                                  • Recent
                                  • Tags
                                  • Popular
                                  • World
                                  • Users
                                  • Groups