Posts made by jkohler

jkohler

Thanks for the thoughts - definitely on the mark. I've already decided to do the C/C++ approach (maybe MMX if time allows) - it's been a few years since I've done any bit bangin' like this. It's actually fun.

Subvert The Dominant Paradigm

jkohler

To all, Thank you one and all for your responses. Of all the suggestions the one with the largest impact on speed is the one going to the 32bpp format. This reduced the conversion time from ~50ms to ~42ms but the extra 1.9Mb required for each image is not (in my particular case) a good trade off. The other suggestions resulted in 1ms or maybe 2ms improvements with no single technique showing a clear improvement. This project involves inspecting components in trays - there may be up to 4 images per component with (so far) a max of 52 components per tray. All these images need to be available to the operator at "a touch of the screen". With this many images (each is 1600x1200) I really just need to dust off the ol' C/ASM skills and convert to a 16bpp format - gaining 2Mb per image in the process. Again, thanks for the suggestions.

Subvert The Dominant Paradigm

jkohler

A union.... Apparently I've forgotten the basics. Thanks for the reminder - even if it doesn't shave any cycles it'll definitley have a higher cool factor. As to reversing the loop - your point is taken and appreciated.

Subvert The Dominant Paradigm

jkohler

I just tried the 32bpp image format which on my development system gives a ~5% improvement. Unfortunately, on the target system (VIA C7) it makes little measurable difference. I expect any gain in execution speed is consumed in the increase in bitmap size (adding an additional byte per pixel to a 1600x1200 image is a significant increase in terms of CPU cache, etc.). Moving the loop invariant outside the loop does indeed make a small difference when running in the debugger but release code? No difference at all. And interstingly the 32bpp format does not require that little calculation to be done at all. In case you're interested, the 32bpp image version:

    private static unsafe void CopyColorPlanes32( BitmapData bmp, IntPtr \_b, IntPtr \_g, IntPtr \_r )
    {
        int\* imgPtr = (int\*)bmp.Scan0;

        const int h = bmp.Height;
        const int w = bmp.Width;
        const int alphaValue = 0xff << 24; // opaque

        byte\* b = (byte\*)\_b;
        byte\* g = (byte\*)\_g;
        byte\* r = (byte\*)\_r;

        for ( int row = 0; row < h; row++ )
            for ( int col = 0; col < w; col++ ) 
                \*imgPtr++ = alphaValue | \*b++ | ( \*g++ << 8 ) | ( \*r++ << 16 );
    }

Subvert The Dominant Paradigm

jkohler

VIA C7 @ 1.8 MHz It does pretty well until images get large and they no longer fit in the onboard caches....

Subvert The Dominant Paradigm

jkohler

Ennis Ray Lynch, Jr. wrote:

About as fast as C# will get. C++ is must faster.

Ther's nothing like horsepower to cover lazy programmers. For giggles I tried this:

        Parallel.For( 0, bmp.Height, row => {
            //
            // establish pointer into the three source color planes, each based on the the row this
            // task is to process
            byte\* b = (byte\*)\_b + ( row \* bmp.Width );
            byte\* g = (byte\*)\_g + ( row \* bmp.Width );
            byte\* r = (byte\*)\_r + ( row \* bmp.Width );
            //
            // calc the starting position in the destination bitmap for the first byte of this row
            byte\* dst = row <= 0 ? (byte\*)bmp.Scan0 : (byte\*)bmp.Scan0 + ( ( bmp.Width \* 3 ) \* row ) + ( ( bmp.Stride / 3 ) - bmp.Width ) \* 3;
            //
            // copy the bytes from the three sources into the bitmap row
            for ( int col = 0; col < bmp.Width; col++ ) {
                \*dst++ = \*b++;
                \*dst++ = \*g++;
                \*dst++ = \*r++;
            }
        } );

On my dual I9 it's pretty fast (well, reallllyyyy fast). Unfortunately that's not the production target.

Subvert The Dominant Paradigm

jkohler

Ahhhh... assemble into a single uint and then store. I'll try that. Thanks.

Subvert The Dominant Paradigm

jkohler

Hi all, I'm working with a machine vision package which uses a proprietary internal image format. Their package will convert to many common formats (jpg/bmp/png/etc) but this conversion is only done to disk and I want to do an in-memory conversion because the to-disk conversion is just plain too slow (>200ms). Their format for 8bit color images is 3 separate planes (R,G&B) which easily combine into a 24bpp BitMap like this:

    private static unsafe void CopyColorPlanes( BitmapData bmp, IntPtr \_b, IntPtr \_g, IntPtr \_r )
    {
        byte\* imgPtr = (byte\*)bmp.Scan0;

        //
        // single tasked version - one row at a time.....

        int h = bmp.Height;
        int w = bmp.Width;
        int s = bmp.Stride;
        byte\* b = (byte\*)\_b;
        byte\* g = (byte\*)\_g;
        byte\* r = (byte\*)\_r;

        for ( int row = 0; row < h; row++ ) {
            for ( int col = 0; col < w; col++ ) {
                \*imgPtr++ = \*b++;
                \*imgPtr++ = \*g++;
                \*imgPtr++ = \*r++;
            }
            imgPtr += ( ( s / 3 ) - w ) \* 3;  // ensures we're starting the row properly aligned
        }
    }

This works well and is "reasonably" fast - a 1600x1200 color image conversion takes roughly 42ms on a 1.8MHz VIA C7 (the target system). Two questions: 1) Does anyone see anything in the above method that could be tweaked (staying within "pure" C#) to make it faster? (I've already tried partitioning the source planes into halves and quarters to make them fit better in the CPU cache and while this has a small impact it's not significant and interestingly, doing columns in the outer loop runs about 10% faster on an AMD DualCore 4200 - go figger). 2) Is there some native Windows API that will do this job? I know I probably end up crafting this in assembly but I view that as a last resort... and deadlines loom...

Subvert The Dominant Paradigm

jkohler

ahhh... reference to empty versus null reference. Got it. Now I have to rework a whole bunch of code... I foresee a generic solution

Subvert The Dominant Paradigm -- bumper sticker, circa 1971

jkohler

Isn't that just making a copy of the reference? And therefore no safer than testing/invoking directly on the original reference?

Subvert The Dominant Paradigm -- bumper sticker, circa 1971

jkohler

In hardware land, GPIO usually means "General Purpose I/O", i.e. "we added some I/O points for cuz we had some extra space and though it'd be cool". In all my years I've never seen a standard way of attacking a GPIO interface - they're all different. G'Luck

Subvert The Dominant Paradigm -- bumper sticker, circa 1971