Skip to content
  • Categories
  • Recent
  • Tags
  • Popular
  • World
  • Users
  • Groups
Skins
  • Light
  • Cerulean
  • Cosmo
  • Flatly
  • Journal
  • Litera
  • Lumen
  • Lux
  • Materia
  • Minty
  • Morph
  • Pulse
  • Sandstone
  • Simplex
  • Sketchy
  • Spacelab
  • United
  • Yeti
  • Zephyr
  • Dark
  • Cyborg
  • Darkly
  • Quartz
  • Slate
  • Solar
  • Superhero
  • Vapor

  • Default (No Skin)
  • No Skin
Collapse
Code Project
  1. Home
  2. The Lounge
  3. Order of element processing

Order of element processing

Scheduled Pinned Locked Moved The Lounge
pythoncomdata-structurestoolsperformance
14 Posts 10 Posters 0 Views 1 Watching
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K Offline
    K Offline
    kalberts
    wrote on last edited by
    #1

    The story listed in Daily news yesterday: Researchers find bug in Python script may have affected hundreds of studies[^] raised discussions in the coffee corner: How could numerical results depend on in which order files were processed? It is not immediately obvious. My guess: File names reflected some significant of ordering of, say, observations that gradually focused on some target, similar to a mathematical series expansion. When summing a long series, you start from the "small" end, not the "big" end, or you might loose a large number of small values that are insignificant one by one, but the sum of thousands of them can be quite significant. Adding elements in random order can loose small values. When traversing an array by a foreach, you expect to get the elements by increasing indexes. Assume that there then comes a new implmentation processing all array elements simultaneously on a highly parallell machine (assume that the handling of each element is independent of the others, no locking issues). Partial results are returned in arbitrary order. This would be similar to processing files in arbitrary order. A few (5-10?) years ago, I read a description of a new language that makes it explicit that with a foreach, or other set/array operation, the runtime system may process all elements in parallel if several processing units are available. (The compiler have to verify that there is access conflicts.) You can NOT rely on a foreach being sequential, or that the same modification added to all elements of an array is done row-wise or column-wise. But which language was this about? All I remember is that it came from some large actor, such as Google. In today's description of Go on Wikipedia, I do not see this mentioned. Did I read about a different language? Or did I read some paper that was a proposal for what became Go, but this part of it was dropped from the language defintion? I found no programming language description in Wikipedia that matched my memory.

    P M K G L 6 Replies Last reply
    0
    • K kalberts

      The story listed in Daily news yesterday: Researchers find bug in Python script may have affected hundreds of studies[^] raised discussions in the coffee corner: How could numerical results depend on in which order files were processed? It is not immediately obvious. My guess: File names reflected some significant of ordering of, say, observations that gradually focused on some target, similar to a mathematical series expansion. When summing a long series, you start from the "small" end, not the "big" end, or you might loose a large number of small values that are insignificant one by one, but the sum of thousands of them can be quite significant. Adding elements in random order can loose small values. When traversing an array by a foreach, you expect to get the elements by increasing indexes. Assume that there then comes a new implmentation processing all array elements simultaneously on a highly parallell machine (assume that the handling of each element is independent of the others, no locking issues). Partial results are returned in arbitrary order. This would be similar to processing files in arbitrary order. A few (5-10?) years ago, I read a description of a new language that makes it explicit that with a foreach, or other set/array operation, the runtime system may process all elements in parallel if several processing units are available. (The compiler have to verify that there is access conflicts.) You can NOT rely on a foreach being sequential, or that the same modification added to all elements of an array is done row-wise or column-wise. But which language was this about? All I remember is that it came from some large actor, such as Google. In today's description of Go on Wikipedia, I do not see this mentioned. Did I read about a different language? Or did I read some paper that was a proposal for what became Go, but this part of it was dropped from the language defintion? I found no programming language description in Wikipedia that matched my memory.

      P Offline
      P Offline
      PIEBALDconsult
      wrote on last edited by
      #2

      You're better off making that a comment on the article in the news forum itself.

      K Z 2 Replies Last reply
      0
      • K kalberts

        The story listed in Daily news yesterday: Researchers find bug in Python script may have affected hundreds of studies[^] raised discussions in the coffee corner: How could numerical results depend on in which order files were processed? It is not immediately obvious. My guess: File names reflected some significant of ordering of, say, observations that gradually focused on some target, similar to a mathematical series expansion. When summing a long series, you start from the "small" end, not the "big" end, or you might loose a large number of small values that are insignificant one by one, but the sum of thousands of them can be quite significant. Adding elements in random order can loose small values. When traversing an array by a foreach, you expect to get the elements by increasing indexes. Assume that there then comes a new implmentation processing all array elements simultaneously on a highly parallell machine (assume that the handling of each element is independent of the others, no locking issues). Partial results are returned in arbitrary order. This would be similar to processing files in arbitrary order. A few (5-10?) years ago, I read a description of a new language that makes it explicit that with a foreach, or other set/array operation, the runtime system may process all elements in parallel if several processing units are available. (The compiler have to verify that there is access conflicts.) You can NOT rely on a foreach being sequential, or that the same modification added to all elements of an array is done row-wise or column-wise. But which language was this about? All I remember is that it came from some large actor, such as Google. In today's description of Go on Wikipedia, I do not see this mentioned. Did I read about a different language? Or did I read some paper that was a proposal for what became Go, but this part of it was dropped from the language defintion? I found no programming language description in Wikipedia that matched my memory.

        M Offline
        M Offline
        Marc Clifton
        wrote on last edited by
        #3

        Parallel.ForEach ;) Python doesn't do parallel unless you explicitly make it do that, and the above is a C# example, not a Python example. I guess it just demonstrates yet again we make thousands of assumptions about how things should work, and that gob function is no different. Some assumptions we realize and take into consideration, other assumptions slip through the cracks to be discovered years later. Hopefully this one didn't kill anyone. :~

        Latest Articles:
        Client-Side TypeScript without ASP.NET, Angular, etc.

        1 Reply Last reply
        0
        • P PIEBALDconsult

          You're better off making that a comment on the article in the news forum itself.

          K Offline
          K Offline
          kalberts
          wrote on last edited by
          #4

          I considered my question to have a wider scope, and I expected to reach a broader audience, not limited to those who read comments (and reply to them) to a referenced article. Another detail is of course that my personal privacy control plan says that I should be very restrictive in creating new login accounts where my individual statement may be tracked and correlated with statements on other web sites (that be through cross-site cookies or otherwise). I choose not to create an account on ArsTechnica for making my request there.

          1 Reply Last reply
          0
          • K kalberts

            The story listed in Daily news yesterday: Researchers find bug in Python script may have affected hundreds of studies[^] raised discussions in the coffee corner: How could numerical results depend on in which order files were processed? It is not immediately obvious. My guess: File names reflected some significant of ordering of, say, observations that gradually focused on some target, similar to a mathematical series expansion. When summing a long series, you start from the "small" end, not the "big" end, or you might loose a large number of small values that are insignificant one by one, but the sum of thousands of them can be quite significant. Adding elements in random order can loose small values. When traversing an array by a foreach, you expect to get the elements by increasing indexes. Assume that there then comes a new implmentation processing all array elements simultaneously on a highly parallell machine (assume that the handling of each element is independent of the others, no locking issues). Partial results are returned in arbitrary order. This would be similar to processing files in arbitrary order. A few (5-10?) years ago, I read a description of a new language that makes it explicit that with a foreach, or other set/array operation, the runtime system may process all elements in parallel if several processing units are available. (The compiler have to verify that there is access conflicts.) You can NOT rely on a foreach being sequential, or that the same modification added to all elements of an array is done row-wise or column-wise. But which language was this about? All I remember is that it came from some large actor, such as Google. In today's description of Go on Wikipedia, I do not see this mentioned. Did I read about a different language? Or did I read some paper that was a proposal for what became Go, but this part of it was dropped from the language defintion? I found no programming language description in Wikipedia that matched my memory.

            K Offline
            K Offline
            kalberts
            wrote on last edited by
            #5

            Back in an old file achive at my home computer, I found the answer: "The Fortress Language Specification"

            2.8 For Loops Are Parallel by Default Here is an example of a simple for loop in Fortress:

            for i ← 1 : 10
            print(i “ ”)
            end

            This for loop iterates over all elements i between 1 and 10 and prints the value of i. Expressions such as 1 : 10 are referred to as range expressions. They can be used in any context where we wish to denote all the integers between a given pair of integers. A significant difference between Fortress and most other programming languages is that for loops are parallel by default. Thus, printing in the various iterations of this loop can occur in an arbitrary order, such as: 5 4 6 3 7 2 9 10 1 8

            According to Wikipedia: "In July 2012, Steele announced that active development on Fortress would cease after a brief winding-down period". The Wikipedia articles on Haskell's competitors for DARPA funding, IBM's X10 and Cray's Chapel, are so brief that it takes more searching to learn if they have any similar implicit parallelism of for/foreach and array operations. The reasons for terminating Haskell development may have been sound. Yet, when flipping through specifications of now-dead languages, I frequently say to myself "Hey, that is a good idea! Why isn't that provided in our modern languages?" I am not sure that parallel for loops falls in that category, but I see e.g.

            atomic do
            x += 1
            y += 1
            end

            - of course we can do similar things in many other languages, but often with a lot more fuzz and syntactic molasses, when all we need is the simplicity of this. Haskell also tried to revive dimension arithmetic, which I haven't seen since Algol68: If you multiply av value of dimension km/h by a value of dimension h, the result is av value of dimension km. Assigning it to a variable of dimension kg would lead to a compile-time error. Generally speaking: Software guys could learn a whole lot, and broaden their professional scope, from spending some time reading specifications and standards that never made it into the mainstream.

            D S 2 Replies Last reply
            0
            • P PIEBALDconsult

              You're better off making that a comment on the article in the news forum itself.

              Z Offline
              Z Offline
              ZurdoDev
              wrote on last edited by
              #6

              Finally an interesting topic in the Lounge and you tell them to take a hike. :sigh: :thumbsdown:

              Social Media - A platform that makes it easier for the crazies to find each other. Everyone is born right handed. Only the strongest overcome it. Fight for left-handed rights and hand equality.

              1 Reply Last reply
              0
              • K kalberts

                Back in an old file achive at my home computer, I found the answer: "The Fortress Language Specification"

                2.8 For Loops Are Parallel by Default Here is an example of a simple for loop in Fortress:

                for i ← 1 : 10
                print(i “ ”)
                end

                This for loop iterates over all elements i between 1 and 10 and prints the value of i. Expressions such as 1 : 10 are referred to as range expressions. They can be used in any context where we wish to denote all the integers between a given pair of integers. A significant difference between Fortress and most other programming languages is that for loops are parallel by default. Thus, printing in the various iterations of this loop can occur in an arbitrary order, such as: 5 4 6 3 7 2 9 10 1 8

                According to Wikipedia: "In July 2012, Steele announced that active development on Fortress would cease after a brief winding-down period". The Wikipedia articles on Haskell's competitors for DARPA funding, IBM's X10 and Cray's Chapel, are so brief that it takes more searching to learn if they have any similar implicit parallelism of for/foreach and array operations. The reasons for terminating Haskell development may have been sound. Yet, when flipping through specifications of now-dead languages, I frequently say to myself "Hey, that is a good idea! Why isn't that provided in our modern languages?" I am not sure that parallel for loops falls in that category, but I see e.g.

                atomic do
                x += 1
                y += 1
                end

                - of course we can do similar things in many other languages, but often with a lot more fuzz and syntactic molasses, when all we need is the simplicity of this. Haskell also tried to revive dimension arithmetic, which I haven't seen since Algol68: If you multiply av value of dimension km/h by a value of dimension h, the result is av value of dimension km. Assigning it to a variable of dimension kg would lead to a compile-time error. Generally speaking: Software guys could learn a whole lot, and broaden their professional scope, from spending some time reading specifications and standards that never made it into the mainstream.

                D Offline
                D Offline
                Dar Brett 0
                wrote on last edited by
                #7

                Member 7989122 wrote:

                Generally speaking: Software guys could learn a whole lot, and broaden their professional scope, from spending some time reading specifications and standards that never made it into the mainstream applicable to the technologies they work with day to day.

                FTFY

                K 1 Reply Last reply
                0
                • D Dar Brett 0

                  Member 7989122 wrote:

                  Generally speaking: Software guys could learn a whole lot, and broaden their professional scope, from spending some time reading specifications and standards that never made it into the mainstream applicable to the technologies they work with day to day.

                  FTFY

                  K Offline
                  K Offline
                  kalberts
                  wrote on last edited by
                  #8

                  Strongly disagree. Or: The essential thing for broadening your scope and learning something new (or maybe old) is exactly to lift your eyes from what you work with day to day. You can of course learn all the nitty-gritty details of the new interpeter version, or how to use every single option in the compiler, but that certainly isn't broadening your scope. You should spend some time searching for something new and different! That is broadening your scope. There is nothing wrong by studying the specifications and standards applicable to what you work with today, but that is something different.

                  1 Reply Last reply
                  0
                  • K kalberts

                    The story listed in Daily news yesterday: Researchers find bug in Python script may have affected hundreds of studies[^] raised discussions in the coffee corner: How could numerical results depend on in which order files were processed? It is not immediately obvious. My guess: File names reflected some significant of ordering of, say, observations that gradually focused on some target, similar to a mathematical series expansion. When summing a long series, you start from the "small" end, not the "big" end, or you might loose a large number of small values that are insignificant one by one, but the sum of thousands of them can be quite significant. Adding elements in random order can loose small values. When traversing an array by a foreach, you expect to get the elements by increasing indexes. Assume that there then comes a new implmentation processing all array elements simultaneously on a highly parallell machine (assume that the handling of each element is independent of the others, no locking issues). Partial results are returned in arbitrary order. This would be similar to processing files in arbitrary order. A few (5-10?) years ago, I read a description of a new language that makes it explicit that with a foreach, or other set/array operation, the runtime system may process all elements in parallel if several processing units are available. (The compiler have to verify that there is access conflicts.) You can NOT rely on a foreach being sequential, or that the same modification added to all elements of an array is done row-wise or column-wise. But which language was this about? All I remember is that it came from some large actor, such as Google. In today's description of Go on Wikipedia, I do not see this mentioned. Did I read about a different language? Or did I read some paper that was a proposal for what became Go, but this part of it was dropped from the language defintion? I found no programming language description in Wikipedia that matched my memory.

                    G Offline
                    G Offline
                    giulicard
                    wrote on last edited by
                    #9

                    C++17 has std::execution::par policy for the many std algorithms For example the std::for_each algorithm can be executed in parallel:

                    std::array jobs { 1000, 900, 1030, 800, 100 };
                    std::for\_each(
                        std::execution::par,
                        std::begin( jobs ), std::end( jobs ),
                        \[\]( auto Val ) {
                            // do something with Val
                        }
                    );
                    

                    Cheers

                    1 Reply Last reply
                    0
                    • K kalberts

                      Back in an old file achive at my home computer, I found the answer: "The Fortress Language Specification"

                      2.8 For Loops Are Parallel by Default Here is an example of a simple for loop in Fortress:

                      for i ← 1 : 10
                      print(i “ ”)
                      end

                      This for loop iterates over all elements i between 1 and 10 and prints the value of i. Expressions such as 1 : 10 are referred to as range expressions. They can be used in any context where we wish to denote all the integers between a given pair of integers. A significant difference between Fortress and most other programming languages is that for loops are parallel by default. Thus, printing in the various iterations of this loop can occur in an arbitrary order, such as: 5 4 6 3 7 2 9 10 1 8

                      According to Wikipedia: "In July 2012, Steele announced that active development on Fortress would cease after a brief winding-down period". The Wikipedia articles on Haskell's competitors for DARPA funding, IBM's X10 and Cray's Chapel, are so brief that it takes more searching to learn if they have any similar implicit parallelism of for/foreach and array operations. The reasons for terminating Haskell development may have been sound. Yet, when flipping through specifications of now-dead languages, I frequently say to myself "Hey, that is a good idea! Why isn't that provided in our modern languages?" I am not sure that parallel for loops falls in that category, but I see e.g.

                      atomic do
                      x += 1
                      y += 1
                      end

                      - of course we can do similar things in many other languages, but often with a lot more fuzz and syntactic molasses, when all we need is the simplicity of this. Haskell also tried to revive dimension arithmetic, which I haven't seen since Algol68: If you multiply av value of dimension km/h by a value of dimension h, the result is av value of dimension km. Assigning it to a variable of dimension kg would lead to a compile-time error. Generally speaking: Software guys could learn a whole lot, and broaden their professional scope, from spending some time reading specifications and standards that never made it into the mainstream.

                      S Offline
                      S Offline
                      Stuart Dootson
                      wrote on last edited by
                      #10

                      Member 7989122 wrote:

                      Haskell

                      Presuming you meant Fortress rather than Haskell there & onwards - Haskell's still going strong!.

                      Member 7989122 wrote:

                      dimension arithmetic

                      [F# Units of Measure](https://fsharpforfunandprofit.com/posts/units-of-measure/) and various C++ libraries ([one here](https://github.com/nholthaus/units), and [the Boost one](https://www.boost.org/doc/libs/1\_71\_0/libs/mpl/doc/tutorial/dimensional-analysis.html)) provide dimensional analysis (which I can definitely see the benefit of, having done an amount of mission critical PID controller implementation), although strong numeric typing/newtypes (as provided by Ada and Haskell (and Go and Rust and many others), respectively) can help with keeping units straight.

                      Member 7989122 wrote:

                      Generally speaking: Software guys could learn a whole lot, and broaden their professional scope, from spending some time reading specifications and standards that never made it into the mainstream.

                      I agree with you - although a good number of the things I like from ML, Haskell and the like (none of which are *dead*) are starting to emerge in C# (for example - [pattern matching](https://docs.microsoft.com/en-us/dotnet/csharp/pattern-matching), [non-nullable references](https://docs.microsoft.com/en-us/dotnet/csharp/nullable-references), [first class tuples](https://docs.microsoft.com/en-us/dotnet/csharp/tuples)). Still no [sum types](https://fsharpforfunandprofit.com/posts/discriminated-unions/)... Similarly, languages like [Occam](https://en.wikipedia.org/wiki/Occam\_(programming\_language)) and [Erlang](https://en.wikipedia.org/wiki/Erlang\_(programming\_language)) have a lot to say about concurrency, distributed systems & the like.

                      Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p

                      1 Reply Last reply
                      0
                      • K kalberts

                        The story listed in Daily news yesterday: Researchers find bug in Python script may have affected hundreds of studies[^] raised discussions in the coffee corner: How could numerical results depend on in which order files were processed? It is not immediately obvious. My guess: File names reflected some significant of ordering of, say, observations that gradually focused on some target, similar to a mathematical series expansion. When summing a long series, you start from the "small" end, not the "big" end, or you might loose a large number of small values that are insignificant one by one, but the sum of thousands of them can be quite significant. Adding elements in random order can loose small values. When traversing an array by a foreach, you expect to get the elements by increasing indexes. Assume that there then comes a new implmentation processing all array elements simultaneously on a highly parallell machine (assume that the handling of each element is independent of the others, no locking issues). Partial results are returned in arbitrary order. This would be similar to processing files in arbitrary order. A few (5-10?) years ago, I read a description of a new language that makes it explicit that with a foreach, or other set/array operation, the runtime system may process all elements in parallel if several processing units are available. (The compiler have to verify that there is access conflicts.) You can NOT rely on a foreach being sequential, or that the same modification added to all elements of an array is done row-wise or column-wise. But which language was this about? All I remember is that it came from some large actor, such as Google. In today's description of Go on Wikipedia, I do not see this mentioned. Did I read about a different language? Or did I read some paper that was a proposal for what became Go, but this part of it was dropped from the language defintion? I found no programming language description in Wikipedia that matched my memory.

                        L Offline
                        L Offline
                        Lost User
                        wrote on last edited by
                        #11

                        It's about testing. After writing a few "enumerators" there is no mystery. (Identified, and MSFT acknowledges, I found an issue in Net Framework 4.7.2 that negated the whole "is 100% compatible" with Net Core 3.0, after all the "forum experts" (not here) said blah, blah because they assumed it must be blah, blah).

                        It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

                        M 1 Reply Last reply
                        0
                        • K kalberts

                          The story listed in Daily news yesterday: Researchers find bug in Python script may have affected hundreds of studies[^] raised discussions in the coffee corner: How could numerical results depend on in which order files were processed? It is not immediately obvious. My guess: File names reflected some significant of ordering of, say, observations that gradually focused on some target, similar to a mathematical series expansion. When summing a long series, you start from the "small" end, not the "big" end, or you might loose a large number of small values that are insignificant one by one, but the sum of thousands of them can be quite significant. Adding elements in random order can loose small values. When traversing an array by a foreach, you expect to get the elements by increasing indexes. Assume that there then comes a new implmentation processing all array elements simultaneously on a highly parallell machine (assume that the handling of each element is independent of the others, no locking issues). Partial results are returned in arbitrary order. This would be similar to processing files in arbitrary order. A few (5-10?) years ago, I read a description of a new language that makes it explicit that with a foreach, or other set/array operation, the runtime system may process all elements in parallel if several processing units are available. (The compiler have to verify that there is access conflicts.) You can NOT rely on a foreach being sequential, or that the same modification added to all elements of an array is done row-wise or column-wise. But which language was this about? All I remember is that it came from some large actor, such as Google. In today's description of Go on Wikipedia, I do not see this mentioned. Did I read about a different language? Or did I read some paper that was a proposal for what became Go, but this part of it was dropped from the language defintion? I found no programming language description in Wikipedia that matched my memory.

                          O Offline
                          O Offline
                          obermd
                          wrote on last edited by
                          #12

                          The problem reported by the Daily news applies to all computer languages. While most computer languages honor simple algebraic operation orders => 1+2*3+4 becomes 1 + (2*3) + 4, no computer language can honor order of operations at a high level when it is fed those operations discreetly and asked for a result before receiving the next set of data and operations. I've seen this type of issue occur in Excel, SQL Server, Scheme (Lisp), Prolog, Java, C++, C#. In the case of this Python script, it was being asked to process the contents of each file as a unit and generate an intermediate result, which was then passed to the processing of the next file. The flaw in the script was assuming that all operating systems sort file enumerations in the same manner. Lisp, by the way, is the only language I can think of that doesn't honor algebraic ordering, but it doesn't need to because of a syntax that forces the programmer to specify the desired order of operations.

                          K 1 Reply Last reply
                          0
                          • O obermd

                            The problem reported by the Daily news applies to all computer languages. While most computer languages honor simple algebraic operation orders => 1+2*3+4 becomes 1 + (2*3) + 4, no computer language can honor order of operations at a high level when it is fed those operations discreetly and asked for a result before receiving the next set of data and operations. I've seen this type of issue occur in Excel, SQL Server, Scheme (Lisp), Prolog, Java, C++, C#. In the case of this Python script, it was being asked to process the contents of each file as a unit and generate an intermediate result, which was then passed to the processing of the next file. The flaw in the script was assuming that all operating systems sort file enumerations in the same manner. Lisp, by the way, is the only language I can think of that doesn't honor algebraic ordering, but it doesn't need to because of a syntax that forces the programmer to specify the desired order of operations.

                            K Offline
                            K Offline
                            kalberts
                            wrote on last edited by
                            #13

                            obermd wrote:

                            Lisp, by the way, is the only language I can think of that doesn't honor algebraic ordering,

                            Count in APL as well. Strict right to left, no operator priorities.

                            1 Reply Last reply
                            0
                            • L Lost User

                              It's about testing. After writing a few "enumerators" there is no mystery. (Identified, and MSFT acknowledges, I found an issue in Net Framework 4.7.2 that negated the whole "is 100% compatible" with Net Core 3.0, after all the "forum experts" (not here) said blah, blah because they assumed it must be blah, blah).

                              It was only in wine that he laid down no limit for himself, but he did not allow himself to be confused by it. ― Confucian Analects: Rules of Confucius about his food

                              M Offline
                              M Offline
                              Mark Smeltzer
                              wrote on last edited by
                              #14

                              ... Which was? Links are always appreciated 😉

                              1 Reply Last reply
                              0
                              Reply
                              • Reply as topic
                              Log in to reply
                              • Oldest to Newest
                              • Newest to Oldest
                              • Most Votes


                              • Login

                              • Don't have an account? Register

                              • Login or register to search.
                              • First post
                                Last post
                              0
                              • Categories
                              • Recent
                              • Tags
                              • Popular
                              • World
                              • Users
                              • Groups