This is actually cleaned up!

honey the codewitch

:laugh: The original code was a bunch of preprocessor macros. This is compile time if statements and lots of const folding. It's deliberately unrolled so it's as inline as possible - this is very timing sensitive. If you think this is bad, you should have seen the original code.

if(has_data_low_pins && has_data_high_pins) {

// 3x for bus access stabilization
uint32\_t pins\_l = gpio\_input\_get();
pins\_l = gpio\_input\_get();
pins\_l = gpio\_input\_get();
uint32\_t pins\_h = gpio\_input\_get\_high();
if(pin\_d0>31) {
    b = (((pins\_h>>((pin\_d0-32)&31))&1)<<0);
} else if(pin\_d0>-1) {
    b = (((pins\_l>>(pin\_d0))&1)<<0);
} else {
    b=0;
}
if(pin\_d1>31) {
    b |= (((pins\_h>>((pin\_d1-32)&31))&1)<<1);
} else if(pin\_d1>-1) {
    b |= (((pins\_l>>(pin\_d1))&1)<<1);
}
if(pin\_d2>31) {
    b |= (((pins\_h>>((pin\_d2-32)&31))&1)<<2);
} else if(pin\_d2>-1) {
    b |= (((pins\_l>>(pin\_d2))&1)<<2);
}
if(pin\_d3>31) {
    b |= (((pins\_h>>((pin\_d3-32)&31))&1)<<3);
} else if(pin\_d3>-1) {
    b |= (((pins\_l>>(pin\_d3))&1)<<3);
}
if(pin\_d4>31) {
    b |= (((pins\_h>>((pin\_d4-32)&31))&1)<<4);
} else if(pin\_d4>-1) {
    b |= (((pins\_l>>((pin\_d4)&31))&1)<<4);
}
if(pin\_d5>31) {
    b |= (((pins\_h>>((pin\_d5-32)&31))&1)<<5);
} else if(pin\_d5>-1) {
    b |= (((pins\_l>>(pin\_d5))&1)<<5);
}
if(pin\_d6>31) {
    b |= (((pins\_h>>((pin\_d6-32)&31))&1)<<6);
} else if(pin\_d6>-1) {
    b |= (((pins\_l>>(pin\_d6))&1)<<6);
}
if(pin\_d7>31) {
    b |= (((pins\_h>>((pin\_d7-32)&31))&1)<<7);
} else if(pin\_d7>-1) {
    b |= (((pins\_l>>(pin\_d7))&1)<<7);
}

} else if(has_data_low_pins) {
uint32_t pins_l = gpio_input_get();
pins_l = gpio_input_get();
pins_l = gpio_input_get();
if(pin_d0>-1) {
b = (((pins_l>>(pin_d0))&1)<<0);
} else {
b=0;
}
if(pin_d1>-1) {
b |= (((pins_l>>(pin_d1))&1)<<1);
}
if(pin_d2>-1) {
b |= (((pins_l>>(pin_d2))&1)<<2);
}
if(pin_d3>-1) {
b |= (((pins_l>>(pin_d3))&1)<<3);
}
if(pin_d4>-1) {
b |= (((pins_l>>(pin_d4))&1)<<4);
}
if(pin_d5>-1) {
b |= (((pins_l>>(pin_d5))&1)<<5);
}
if(pin_d6>-1) {
b |= (((pins_l>>(pin_d6))&1)<<6);
}
if(pin_d7>-1) {
b |= (((pins_l>>(pin_d7))&1)<<7);
}
} else {
uint32_t pins

11917640 Member

Seems OK. Any problems? Looks like a code reflecting some hardware documentation table.

honey the codewitch

Maintenance, testing, readability. It's kind of messy in that regard. It drives an 8-bit parallel bus using software. This is part of the code anyway.

Real programmers use butterflies

Rick York

I would stick with the preprocessor macros myself. The results will be the same and it will be MUCH cleaner. One thing that is puzzling is the gpio_input functions are called three times. Is that because of timing reasons?

"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"

Fueled By Decaff

Yep, you are right, I would not want to see this preprocessor macros, that would fry my brain. Could you put the pins_l and and pins_h into a single uint64_t or is that not supported? That would make the processing easier. Failing that you could simplify it a little.

if(pin\_d0>31) {
    b = (pins\_h>>(pin\_d0&31))&1;
} else if(pin\_d0>-1) {
    b = (pins\_l>>pin\_d0)&1;
} else {
    b=0;
}

I am sure there are other optimisations that could be done, but I am done for the day. Good luck!

honey the codewitch

It would require more shifts because those routines that fill it return uint32_t Also it supports it but the proc is native 32-bit and I don't want to have any hidden overhead for uint64_t. Finally, in some of the code paths, I am only using pins_l, or pins_h. Only in one fork am I using both.

Real programmers use butterflies

honey the codewitch

Yes, it is. There's a comment to that effect the *first* time I do it in the code. I don't believe the results are cleaner. For starters, you should have seen the nested #ifdefs it took to do this. Second, the preprocessor method of doing this suffered from a serious design difficiency. You couldn't use multiple static "instances" of that to drive multiple buses, which is a problem when you have a device that runs more than one display, or even more than one SPI device (doesn't apply to the parallel code but in principle it could) Everything defined in this code is inside a template, meaning the statics are one-per-instantiation and the arguments to the template are the pin assignments. Ergo, for each different collection of pins tied to a bus, you have a different set of statics to work with, meaning you can drive multiple displays. I'd also argue this is cleaner because it's all typed, whereas the preprocessor macros are not. That matters for more than safety. These days it also means better intellisense/autocomplete, which means more productive mucking about with the source.

Real programmers use butterflies

Rick York

I guess I disagree. With this macro :

#define SetPinA( pin, shift ) \
( pin > 31 ) ? ( ( pins_h >> ( ( pin - 32 ) & 31 ) & 1 ) << shift ) : \
( ( pins_l >> pin ) & 1 << shift )

I think the following code is much, much cleaner.

if( has_data_low_pins && has_data_high_pins )
{
b |= SetPinA( pin_d1, 1 );
b |= SetPinA( pin_d2, 2 );
b |= SetPinA( pin_d3, 3 );
b |= SetPinA( pin_d4, 4 );
b |= SetPinA( pin_d5, 5 );
b |= SetPinA( pin_d6, 6 );
b |= SetPinA( pin_d7, 7 );
}

but that's just me. Your mileage may vary.

"They have a consciousness, they have a life, they have a soul! Damn you! Let the rabbits wear glasses! Save our brothers! Can I get an amen?"

honey the codewitch

We're talking about two different things. You're talking about combining macros with this approach. That's not what I mean at all. I mean ALL of this code. All of it. Was preprocessor macros. The if()s were #ifdefs The reason that I wouldn't do what you're doing is I am a stickler about namespace polution. As such I use preprocessor macros sparingly, and usually if they end up in my code 9 times out of 10 it's because of "Other People's Code" and the other 1 is usually because a #define/#ifdef is easy to -D when you go to compile the code.

Real programmers use butterflies

Slacker007

How would you refactor that code so that it is mockery proof? Is it even possible? Just curious.

honey the codewitch

Well, I could make more inline functions to wrap that pin shifting, and probably reduce the number of if blocks, but that doesn't mean i'm going to.

Real programmers use butterflies

Nelek

honey the codewitch wrote:

the *first* time I do it in the code.

I had heard of weird places to do it, but in the code.... :rolleyes: :rolleyes: :laugh: :laugh:

M.D.V. ;) If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about? Help me to understand what I'm saying, and I'll explain it better to you Rating helpful answers is nice, but saying thanks can be even nicer.

Al_Brown

I realise you're not asking for suggestions but I would certainly be minded to simplify this, particularly if this is for general purpose use. If you're in a situation where this is for an embedded platform where the hardware configuration is fixed you should gain both better readability and performance if you can spare a little RAM. Something like this:

const int NUM_BUS_BITS = 8; // Dealing with an eight bit bus
const int NUM_GPIO_PINS = 40; // ESP32 has GPIOs 0 to 39

// Last read state of the GPIO bus
static uint32_t sGPIOLow, sGPIOHigh;

// Map data bus to GPIO bits
static struct
{
uint32_t* pSource; // Which GPIO variable to use
unsigned shift; // Shift to apply for this bit
} sDataMap[NUM_BUS_BITS];

// Initialise mapping between input data bit and GPIO pin - call for each data bit prior to readInput()
static void mapInputPin(unsigned dataBit, unsigned gpioPin)
{
if (dataBit < NUM_BUS_BITS && gpioPin < NUM_GPIO_PINS)
{
sDataMap[dataBit].pSource = gpioPin >= 32 ? &sGPIOHigh : &sGPIOLow;
sDataMap[dataBit].shift = gpioPin & 31;
}
}

// Read our eight bit bus value from the appropriate GPIO pins
static uint8_t readInput(void)
{
// Read the hardware bus (the need for 3x reads should be investigated)
sGPIOLow = gpio_input_get();
sGPIOLow = gpio_input_get();
sGPIOLow = gpio_input_get();
sGPIOHigh = gpio_input_get_high();

// Decode from GPIO to eight bit bus
uint8\_t dataRead = 0;
dataRead |= (((\*sDataMap\[0\].pSource) >> sDataMap\[0\].shift) & 1) << 0;
dataRead |= (((\*sDataMap\[1\].pSource) >> sDataMap\[1\].shift) & 1) << 1;
dataRead |= (((\*sDataMap\[2\].pSource) >> sDataMap\[2\].shift) & 1) << 2;
dataRead |= (((\*sDataMap\[3\].pSource) >> sDataMap\[3\].shift) & 1) << 3;
dataRead |= (((\*sDataMap\[4\].pSource) >> sDataMap\[4\].shift) & 1) << 4;
dataRead |= (((\*sDataMap\[5\].pSource) >> sDataMap\[5\].shift) & 1) << 5;
dataRead |= (((\*sDataMap\[6\].pSource) >> sDataMap\[6\].shift) & 1) << 6;
dataRead |= (((\*sDataMap\[7\].pSource) >> sDataMap\[7\].shift) & 1) << 7;

return dataRead;

}

int main(void)
{
// Initialise mapping for GPIO allocation
// (Use defines or named constants for a real implementation)
mapInputPin(0, 4);
mapInputPin(1, 5);
mapInputPin(2, 6);
mapInputPin(3, 7);
mapInputPin(4, 36);
mapInputPin(5, 37);
mapInputPin(6, 38);
mapInputPin(7, 39);

// Read input
uint8\_t readData = readInput

honey the codewitch

Those conditionals aren't executed at runtime. They are constexpr, as are most of the shifts, and the pin #s themselves. The reason for the multiple reads is timing. There is no hardware clock involved other than the CPU clock itself as the ESP32 does not have a hardware parallel 8-bit interface. It's all bit banging. The other ways I could have done it would have required me to drop to asm to insert the appropriate amount of delay because the arduino and idf framework delays are not granular enough. I need to cycle count. I'm not putting asm in my code, because ESP32s don't even use the same CPU across all iterations of them.

Real programmers use butterflies

Al_Brown

OK - it wasn't immediately clear that the bit mapping was constant at compile time. Personally, I'd still be inclined to do something like this but would want to confirm it was optimising as expected (which I'm sure you've done with the existing code):

// Data pin GPIO allocation - purely an example
const unsigned pin_d0 = 4;
const unsigned pin_d1 = 5;
const unsigned pin_d2 = 6;
const unsigned pin_d3 = 7;
const unsigned pin_d4 = 36;
const unsigned pin_d5 = 37;
const unsigned pin_d6 = 38;
const unsigned pin_d7 = 39;

// Read our eight bit bus value from the appropriate GPIO pins
static uint8_t readInput(void)
{
// Add delay here

// Read the hardware bus
uint32\_t GPIOLow = gpio\_input\_get();
uint32\_t GPIOHigh = gpio\_input\_get\_high();

// Decode from GPIO to eight bit bus
uint8\_t dataRead = 0;
dataRead |= (((pin\_d0 > 31 ? GPIOHigh : GPIOLow) >> (pin\_d0 & 31)) & 1) << 0;
dataRead |= (((pin\_d1 > 31 ? GPIOHigh : GPIOLow) >> (pin\_d1 & 31)) & 1) << 1;
dataRead |= (((pin\_d2 > 31 ? GPIOHigh : GPIOLow) >> (pin\_d2 & 31)) & 1) << 2;
dataRead |= (((pin\_d3 > 31 ? GPIOHigh : GPIOLow) >> (pin\_d3 & 31)) & 1) << 3;
dataRead |= (((pin\_d4 > 31 ? GPIOHigh : GPIOLow) >> (pin\_d4 & 31)) & 1) << 4;
dataRead |= (((pin\_d5 > 31 ? GPIOHigh : GPIOLow) >> (pin\_d5 & 31)) & 1) << 5;
dataRead |= (((pin\_d6 > 31 ? GPIOHigh : GPIOLow) >> (pin\_d6 & 31)) & 1) << 6;
dataRead |= (((pin\_d7 > 31 ? GPIOHigh : GPIOLow) >> (pin\_d7 & 31)) & 1) << 7;

return dataRead;

}

In practice, I would almost certainly encapsulate the individual lines as macros or as an inline function but it's probably clear enough what it's doing. Reads from both ports are kept in there - if a single hardware register read is really killing you then it could be eliminated. The delay aspect would still concern me. This is presumably reading from an external device (LCD?) and there will be a defined timing between initiating a read and the data being available. You will get away with your approach on a specific hardware platform but in future it could easily fall down if the clock speed changes or the external delays change. Dropping to assembler shouldn't be necessary - but your code shouldn't depend on how long it takes to do a dummy read. As an aside, the original code has instances of this:

if(pin_d1>31) {
b |= (((pins_h>>((pin_d1-32)&31))&1)<<1);
} else if(pin_d1>-1) {
b |= (((pins_l>>(pin_d1))&1)<<1);
}

Sub

jschell

There is a difference between overusing a language feature and refusing to use it ever.

honey the codewitch wrote:

I am a stickler about namespace polution

Myself I like to consider the maintenance cost of the code that I write. Maintenance costs will always be at least two times the initial cost and often will go to 10 times. And 100 times are not unheard of. So how I write the code now leads to my consideration of what form the code should take. The form suggested by the other response seems much more maintainable. Hard to say what your original code looked like and presuming that you mean "namespace" in the general sense rather than the C++ key word, then limiting the scope of the macro to the file itself means there would be no namespace pollution. It is not necessary to put everything in headers - it is just a convention. But if you put the macro in a header you could even unit test it.

honey the codewitch

jschell wrote:

The form suggested by the other response seems much more maintainable.

After perusing the thread, because it has been a minute, the only proposal that I am not already doing, or wouldn't make sense to do since it moves compile time actions to run time, was to leave the initial preprocessor macros, but those original preprocessor macros were never presented, so I'm confused as to how you could say it was more maintainable having never seen it. Furthermore, it is not feasible to limit said preprocessor macros to a single file as doing so would increase maintenance since several processors with their own instructions are presented, each in its own header. To combine them would create a monolithic header that included every single build target. No. Finally, unit testing this is not feasible, as I don't have the money to set up the hardware array necessary to unit test this across all build targets, nor the toolchain infrastructure necessary to make that doable, and would probably have to move away from using PlatformIO to do so anyway, which would make the effort required to do what I am doing snowball once I'm stuck with CMake builds using LLVM with gcc in funky ways. And even if I had the money, the time, and the software infrastructure to make that work, I don't have the physical space to set it all up.

Real programmers use butterflies

jschell

honey the codewitch wrote:

so I'm confused as to how you could say it was more maintainable having never seen it.

A specific code sample was presented in a comment. Then you said. "The reason that I wouldn't do what you're doing is I am a stickler about namespace polution." You statement strongly suggested that you were responding to the code sample. I did of course see the code sample in the posted response and then I responded to your comment about that code sample.

honey the codewitch wrote:

Furthermore, it is not feasible to limit said preprocessor macros to a single file as doing so would increase maintenance since several processors with their own instructions are presented, each in its own header. To combine them would create a monolithic header that included every single build target. No.

No idea what you are talking about. If you have one and exactly one C or C++ file (a .c or .cpp file) then you can define macros at the top of that file. The scope of those macros, in both languages are limited exclusively to that file. That is how those languages work. If you have several files that use the same code then you can put the macro(s) and only those macros in an h file and then use that h file only in those code files. That means specifically that you do not put it in some other h file for convenience. Now if you have the code snippet (not the macros) in an h file, then I suggest moving it out of the h file because it should not be in there in the first place.

honey the codewitch wrote:

Finally, unit testing this is not feasible, as I don't have the money to set up the hardware array necessary to unit test this across all build targets

Huh? A "unit test" in the context my statement would never be tested in that fashion. You are testing the macro not the system.

honey the codewitch

Presently the macros are in about 6 h files. I did not write it that way, and I'm not rewriting six files of macros. I am not unit testing all of this garbage because it's impossible. It's all timing sensitive and each one is different for each piece of hardware. No.

Real programmers use butterflies

jschell

honey the codewitch wrote:

I did not write it that way, and I'm not rewriting six files of macros.

As I already said in my previous post...Your statement strongly suggested that you were responding to the code sample. To be very clear - that would be the code sample that was posted in this thread and not your code. That code sample does not require "six files of macros".