VB haters, look away

kalberts

My immediate reaction: Yes, I would disagree. Bytecodes are ready for execution, while the dotNET output from a C# compiler is not. You could say that I am using a "narrow" defintion of the term, but the term could have a more general meaning. Yes, it could - it could mean any code representation that is built up of bytes. Like source code :-). We could even generalize the "byte" concept: The old Univac mainframes could work with 9 bit bytes (4 to the word) or 6 bit bytes (6 to the word), while DEC-10 and DEC-20 had 7-bit bytes (5 to the word and one spare bit). But that is not the commmon "compiler guy" interpretation of "bytecode". The linearized DAG is not directly executable, like a bytecode. Obviously, you could, at run time, do a just-in-time compilation into a bytecode for an interpreter, rather than compiling into native binary code. But at least as far as I know, there are no virtual machines directly interpreting dotNET assemblies with no pre-execution processing step. In my student days, we were a group of students making an attempt to build a direct interpreter for the intermedidate language from another front end compiler (for the CHILL programming language), having a similar architecture. We soon realized that the data structures required to maintain the current exeuction state would be immensly large and complex; the task of building the interpreter would far exceed making a complete backend compiler. You couldn't do without a unified symbol table. You couldn't do without a label-to-location mapping. You couldn't do without a lot of state information for various objects. You couldn't do without ... So we never completed the project. (It was a hobby project, not a course assignment.)

Kenworth71

Nah, op-codes are ready for execution, bytecodes are not. I like this definition: Bytecode is a form of hardware-independent machine language that is executed by an interpreter. It can also be compiled into machine code for the target platform for better performance.

kalberts

If you are right, then terminology is changing. In my book, the op-code is that field in the binary instruction code that indicates what is to be done: Add, shift, jump, ... Usually, the rest of the binary instruction code is operand specifications, such as memory addresses or constants. In more high-level contexts I have seen "op-code" used for a field in a structure, e.g. a protocol. Again, the opcode tells what is to be done (at the level of "withdraw from bank account", "turn on" etc.), the other fields tells with what is it to be done. You suggest a new interpretation, that an op-code is both the 'what to do' and the 'with what to do it'. Maybe that is an upcoming understanding, but certainly not the traditional one. JVM bytecodes are certainly ready for execution, once you find a machine for it. It is easier to build a virtual machine, a simulator, than to build a silicon one. So that is what we do. You can build a translator from MC68000 instructions to 386 instructions. Or from IBM 360 instructions to AMD64 instructions. Or from JVM instructions to VAX instructions. Suggesting that the intention of compiling to MC68K instructions was to serve as an intermediate step to 386 code would be crazy - that was never the intention of the MC68K instruction set. Similarly, the intention of Java bytecodes were not to be translated into another instruction set. If you first compile to one instruction set (including bytecode, such as Java or Pascal P4 bytecode), and then translate to another instruction set, there is generally a loss of information, so that the final code is of poorer quality than if it had been compiled directly from the DAG, which usually contains a lot of info that is lost (i.e. used and then discarded) in the backend. Some of it may be recovered by extensive code analysis, but expect to loose a significant part, in the sense that you will not utilize the target CPU fully. Especially if the first/bytecode architecture has a different register philosophy (general? special?), interrupt system or I/O mechanisms. So, if at all possible, generate the target code from the intermedate level, not from some fully compiled instruction set.

kalberts

Note that this is not at all set by the language defintion, but 100% decided by the compiler writer. I have seen compilers (not C#) that generate completely different code depending on the optimization switches: If you optimize for speed, and the case alternatives are sparse, you may end up with a huge jump table. If you optimize for code size and the number of specific cases is small, it might compile like an if ... elseif ... elseif ... When you switch on strings, hash methods are sometimes used to reduce the number of string comparisons that must be done. Compilers may try out several different methods of compiling a switch statements, and assign scores to each alternative based on the compiler options, such as optimization and target architecture. The one with the highest score wins, and is passed on to later compiler stages. This sure slows down compilation, but is generally worth it. A small sidetrack, but closely related: Contrary to common belief, modern standards for sound and image processing, such as MP3 and JPEG, does not specify how the compression is to be done, only how a compressed file is decompressed. A good compressor may try out a handful of different ways to compress the original (sometimes varying wildly in compressed encoding), decompress according to the standard and do a diff with the source material. The alternative with the smallest diff result is selected. (Or, the size of the diff result gives that alternative fewer or more points on the scoreboard, together with e.g. compressed size). The compress-decompress-diff-evaluate sure takes CPU power, but today we have plenty of that, at least for sound and still images.

Lost User

Hey, but we've got .NET Core now :-D :thumbsup:

Now is it bad enough that you let somebody else kick your butts without you trying to do it to each other? Now if we're all talking about the same man, and I think we are... it appears he's got a rather growing collection of our bikes.

kalberts

We still miss some loop constructs that are offered by other languages. One that I miss the most is for ListElementPointer in ListHead:NextField .... to traverse a singly linked list, linked through NextField. Then I miss the value set: for CodeValue in 1, 3, 10..20, 32 do ... (a total of 14 iterations). And then, another favorite: for ever do ... In C, I sometimes "simulate" this by #define ever (;;). Finally, I miss the alternative loop exits, where you specify a different loop tail depending on whether the value set was exhausted or the loop was terminated prematurely because some condition was fulfilled: for ... do ... code ... while ... maybe more code ... exitwhile ...code handling premature loop termination... exitfor ...code handling value set exhausted termination... An important aspect of the exitfor/exitwhile is that the code is inside the scope or the for statement, so that e.g. variables decalared within the for is available to the tail processing. If you want to simulate this by setting some termination flag and then break, and after the loop testing: if TerminationCause=exhaustion do else , then what went on in the loop is essentially lost, so this is certainly not a good replacement (and it takes a lot more typing). Finally, for all sorts of nested constructs: I strongly favor that a label identifies a block, rather than a point. I want to be able to do an: exit InnerLoop; or: exit OuterLoop; or even: exit MyProcedure; without the need for setting all sorts of flags that must be tested after InnerLoop and after OuterLoop and from there take new exit to an outer level. You could say that this is little more than syntactic sugar. Sure, but syntactic sugar makes programming sweet.

Peter Wone

> If they didn't want it then they wouldn't have fought so long to keep it. Not a reasonable conclusion. On more than one occasion Microsoft has engaged in litigation to slow down the opposition, deplete their resources and demoralise them. Microsoft did this to Sybase while Microsoft Access was being prepared, and was caught very much on the back foot when it unexpectedly _won_ the rights to the source code for SQL Server. PeterW -------------------- If you can spell and use correct grammar for your compiler, what makes you think I will tolerate less?

Member 10815573

That reference is saying the similarity is with the .NET Framework to the Java platform. Don't confuse the C# Language (or any language) with the .NET Framework which facilitates "managed code" and JIT compilation, similar (and in response) to the 'Java platform' i.e. CIL(MSIL) is synonymous with Java bytecode, CLR synonymous with JRE/JVM etc.

PIEBALDconsult

Designers were ahead of their time maybe. And early practitioners weren't smart enough to grok the concept. It reminds me of strings in Pascal -- you get a maximum of 255 characters preceded by how many characters are currently in the string. 'Tis possible that BASIC's designers had expected developers to use the zeroth element to hold the count of how many elements are in use.

Clifford Nelson

I think that is the reason that it took so long to upgrade the switch statement. I would expect that you are right an it still with do that. Of course could also do a string which would be a little more complex, and I am sure there is a different implementation for a int and string. swtich statements are a bad smell, so I look to see if there is a better implementation than a swtich. :-)

Clifford Nelson

Did not have exposure to that. Interesting. Not sure how often it would be useful, but if it helps....

Lost User

Member 10815573 wrote:

Don't confuse the C# Language (or any language) with the .NET Framework which facilitates "managed code" and JIT compilation, similar (and in response) to the 'Java platform' i.e. CIL(MSIL) is synonymous with Java bytecode, CLR synonymous with JRE/JVM etc.

I wasn't confusing anything. C# and Java were very similar languages at the point when C# was launched and with good reason - Anders Hejlsberg (who is the lead architect on the team developing the C# language) had previously developed Microsoft's J++ language (Microsoft's discontinued implementation of Java). C# really took many of the good bits of Java while the .NET Framework mirrored the Java platform. [C Sharp (programming language) - Wikipedia](https://en.wikipedia.org/wiki/C\_Sharp\_(programming\_language))

Quote:

James Gosling, who created the Java programming language in 1994, and Bill Joy, a co-founder of Sun Microsystems, the originator of Java, called C# an "imitation" of Java; Gosling further said that "[C# is] sort of Java with reliability, productivity and security deleted."[17][18] Klaus Kreft and Angelika Langer (authors of a C++ streams book) stated in a blog post that "Java and C# are almost identical programming languages. Boring repetition that lacks innovation,"[19] "Hardly anybody will claim that Java or C# are revolutionary programming languages that changed the way we write programs," and "C# borrowed a lot from Java - and vice versa. Now that C# supports boxing and unboxing, we'll have a very similar feature in Java."[20] In July 2000, Anders Hejlsberg said that C# is "not a Java clone" and is "much closer to C++" in its design.[21] Since the release of C# 2.0 in November 2005, the C# and Java languages have evolved on increasingly divergent trajectories, becoming somewhat less similar. One of the first major departures came with the addition of generics to both languages, with vastly different implementations. C# makes use of reification to provide "first-class" generic objects that can be used like any other class, with code generation performed at class-load time.[22] Furthermore, C# has added several major features to accommodate functional-style programming, culminating in the LINQ extensions released with C# 3.0 and its supporting framework of lambda expressions, extension methods, and anonymous types.[23] These features enable C# programmers to use functional programming techniques, such as closures,

kalberts

The CHILL language, providing this, was essentially developed to program network switches. If you have been much in contact with communication guys, you have certainly noted that they love designing protocols as state machines: Any entity is at any time in one of a limited number of states. Whatever happens in the interface is modeled as an atomic event; the event ID being one of a limited set of values. The processing of the event depends on the current state and the event ID. Processing is (conceptually) atomic, and changes the entity state into a new value. So a very common usage would be like (in quasi-C syntax, Chill was somewhat different) switch (state, eventID) { (disconnected, setupReq): ... code to handle a call setup request... (transferMode, dataPacket): ... ... more state/event combinations, with their specific handling ... (transferMode, ELSE): ... error hadling, protocol error, not allowed in transferMode state... ELSE: ... general error handling for illegal event in current state, not treated as a specific case... Now, some protocols are so complex that a "pure" state machine would end up with thousands of states. So the protocol spec defines from one up to a small handful of state variables, which may be thought of as sub-states in most cases (not always, but "sort of like"). So you could add one or more of those state variables as the third switch field, if they should lead to distinctly different handling. (For many switch cases, the switch label would be "Don't care" for that state/eventID combination.) So, for protocols defined as state machines, the multidimensional switch statement is a perfect match. I love that kind of protocol specification! It is terse, lucid, unambiguous, and most of all: It pinpoints all protocol error situations and makes it very clear to you where you should define an error handler, and the reasons for raising this error condition.

Kenworth71

You're agreeing with me: bytecodes are an abstraction, they for the JVM, not the CPU.

kalberts

The JVM IS a CPU! Just like any microcoded processor is a CPU. There is no principal difference between microcode breaking down the instruction code into activation of the various circuits, or (compiled) C code doing the same thing. Years ago, I was working on a machine which didn't provide BCD instructions in silicon. Cobol users could either buy a floppydisc with the microcode to give the CPU BCD instructions (microcode was kept in RAM), or they could use the software package that emulated BCD (triggered by the 'Illegal Instruction Code' interrupt). How would you describe the BCD instructions? As "abstractions" like the Java bytecodes? Or as integral to the CPU (even though they triggered an Illegal Instruction Code if the microcode was not installed)? Are all microcoded intstructions "abstractions"? If so, then this CPU as well as a lot of others are all abstractions. The Java bytecodes are just like those BCD instructions, except that they cover the complete instruction set. And I am quite sure that it would be possible to write microcode (for this machine with the BCD) to make the bytecodes the "native instruction set" of the machine - it did provide logarithmic/trigonometic functions and malloc/free as instructions, and microcode was developed so that it executed lisp more or less directly (after a tokenization, of course).

Kenworth71

JVM bytecode is abstract - i.e. it's not your CPU's native opcodes. That's it.

kalberts

Not even if I have a machine microcoded to handle them? Years ago, someone did write microcode for a PDP-11 to directly execute Pascal P4 bytecodes. With that microcode, the machine could execute P4 instructions and nothing else. Load a P4 file into RAM, set the instruction pointer to the starting point, and run: The program would execute. JVM bytecodes are quite similar to P4 bytecodes; there are no essential principal differences that makes one abstract, the other one machine instructions. ...Unless you say that "native opcodes" are those where each bit in the instruction corresponds directly to one physical control signal steering the transistor logic. If you do, then you reject every CPU that has any microcode at all; you accept only 100% hardcoded logic as a true CPU. Even though less microcode is used in today's CPUs than in the golden days of microcoding (like in the VAX era), almost all general CPUs today (as well as many specialized ones) are to some degree microcode. By your logic, the binary instructions fed to those machines are not the CPU's native opcodes, but only an abstraction. You have the right to say so, but renaming binary programs for almost all machines to "abstractions" does not contribute anything of significance.

Kenworth71

If the definition of bytecode is that it's abstract, & non-natively-executable by the CPU, therefore requires another layer to turn it into the op-codes your given CPU can understand, then sure both your Java bytecode and your CIL/MSIL meet that definition? You say 'bytecodes are ready for execution', I don't agree. You need the JVM - right?

kalberts

If I have a VAX executable file, containing VAX instructions, I need an interpreter for those codes. There never was a CPU that interpreted VAX instructions in pure silicon; every VAX on the market ran an interpreter, implemented in microcode. If I have a JVM executable file, containing Java bytecodes, I need an interpreter for those codes. There never was a CPU that interpreted Java bytecodes in pure silicon. But there was a CPU interpreting Pascal P4 bytecodes running an interpreter implemented in microcode. There is no reason why you couldn't do exactly the same for Java bytecodes. So I'll agree with you under the condition that we agree that both VAX instructions, and IAPX 386 instructions, IBM 360 instructions, Java bytecodes and Pascal P4 bytecodes are all in the same group. Neither of them are natively executed by silicon, but require an interpreter implemented at a lower level. The distinction between instructions being directly implemented in silicon and those being interpreted by some code at a lower level may be essential with regart to speed and physical size of the silicon die. For the user, for the programmer and for the system architecure as seen at the programm interface the difference is marginal. A far more essential question is whether you need to do any preprocessing to a file before you submit it for execution. For a Java bytecode (or Pascal P4) file, you need not do any further processing: The interpreter, whether written in microcode or as, say, conventional PDP-11 machine code, can start churning bytecodes right away, one by one. With CIL, you can NOT feed the tuples to an interpreter one by one and have it interpret it as you go. Actually, I have tried to do so - not with CIL, but with a very similar intermediate code coming out from the front end compiler for the CHILL language. At the outset, it looked like doable - after all, each tuple indiates an operation and some operands and stuff, sort of like an instruction. But the deeper you dig into it, the more you find that is yet-to-be-determined. Stuff that is dependent on the context, but that context must be built up from information in other parts of the DAG structure. Lots of things you cannot do without traversing major parts of the graph to do a single operation ... unless you do a preprocessing before you start doing any execution at all. THAT is an essential difference. You MUST do a preprocessing before you can start any execution, and that preprocessing will reshape the code into something else that can be

jschell

Member 7989122 wrote:

If Google had existed in the early 1980s, a search for "bytecode" would have returned thousands of references to Pascal and its P4 bytecode format

Agreed. But that is not relevant at all.

Member 7989122 wrote:

Actually, lots of interpreters for non-compiled languages of today do some compilation into some sort of bytecode

Since I have written two compilers and a number of interpreters and taken the requisite academic course related to compilers I am quite familiar with how they work. However since most programmers, certainly not the many I have worked with, have not done either of those then I wouldn't expect them to be familiar with that. So I have no expectation that the average programmer would be either.