Machine code and byte encoders

pra net

The machine code is binary language which only the machine can understand properly. This may be different for different hardware (basically processors) and operating systems. This means that binary language is machine specific. On the other hand, byte code is a sort of intermediate language. It is not in machine code form, but something between machine language and the high level language. It is not machine specific. It is compiled on the run, depending on the machine specifications. The resultant machine code is processed by your processor. How the byte code is converted to machine language depends on the program. e.g. a Java program stores its program as byte code. So when we want to run the program, java compiler compiles it on the run into machine code and runs it for you. This makes a few differences very obvious: 1. Machine code is hard to decompile just because it is pure zeros and ones. It is a set of hardware instructions. Whereas, byte code can easily be decompiled since it is not machine specific; nor it is in pure zeros and ones format. 2. Machine code is not portable. This is because each processor make or Operating System may interpret the instructions differently. On the other hand, byte code is portable. Byte code will always run irrespective of the environment or hardware, till the time an interpreter is available to convert it into that machine specific code. Hope this helps, Pradeep :)

cpkilekofp

martin_hughes wrote:

Sweet Holy Mother! How are you up and typing Mick Martin? You can only have been asleep 5 hours?

:laugh: That's a good night's sleep for some programmers, depending on their project.

cpkilekofp

John C wrote:

I see there are some Java bytecode obfuscators on the market which leads me to believe the experience is similar to .net.

C-Shroud from Gimpel Software was a popular obfuscator in the early '90s. It worked by obfuscating the C code, which then compiled to machine code that was completely incomprehensible. I mention this simply to demonstrate that the problem and solutions existed even before .NET and Java VMs came on the scene.

cpkilekofp

Pradeep, I have to respond a bit to your statement. The only difference between "byte code" and "machine code" is that byte code runs on a virtual machine, and machine code runs on a physical machine. Byte code for one virtual machine (e.g. the Java Runtime) is not portable to another virtual machine (e.g. the .NET CLR). Only the fact that virtual machines are commonly written for multiple architectures makes byte-code portable - machine code is fully "portable" to any number of software emulators, thus it is a common practice for embedded software to be intially developed on a development workstation using an emulator, so that, say, the code for an instrument monitor that uses a Motorola 68020 as its CPU might actually be developed on a Windows PC that natively runs on a Core2 Duo from Intel.

martin_hughes

Yeah, but he'd been drinking heavily.

Ahoy! Martin Hughes

cpkilekofp

martin_hughes wrote:

Yeah, but he'd been drinking heavily.

In that case, I certainly hope he's not operating machinery heavier than his keyboard :omg:

alex barylski · modified on Monday, October 6, 2008 12:22 AM

l a u r e n wrote:

what vm would be "running" the app that you want to "compile" ?

ionCube and Zend encoder are two options...PHPA is the open source version. ionCube and Zend both have encoders/compilers that take your source and compile into byte code. These encoded files are then uploaded to your server but they require the proprietary decoder extensions to be installed on the server in order to decode/decrypt and execute. Basically skipping the tokenizing/parsing stages and going straight to the execution. PHP Accelerator/APC are basically just extensions that hook into the parsing/execution phase and cache the resulting byte code for a certain request. The next time that request is made, the byte code is used and tokenization/parsing is spared. The former can protect your code (in a limited manner after reading more about byte code -- it's basically binary code with all high level constructs like classes, etc) and speed it up under most circumstances. The latter will really only speed it up.

l a u r e n wrote:

on a side note... i write and teach php for a living and i'm absolutely fascinated as to the reasons why you would want to "compile" it into some kind of (presumably proprietary) byte code and what you hope to achieve by doing so?

I'm an architecture geek with a custom developed framework. The framework is a compilation of good ideas borrowed from every single framework I could find on Google (about 9 months of research and study and trial and error). I don't want my ideas ending up in Zend or Symphony or CakePHP and even worse in my competitors products. My application will be a SaaS hosted application however I wouldn't mind giving it away for free as a marketing channel (as I have no idea how else I'm going to get people using it). Hence the compliation.

I'm finding the only constant in software development is change it self.

alex barylski

John C wrote:

There's absolutely no such thing as impossible,

Didn't I say virtually impossible? :P I realize that reverse engineering byte code or machine code isn't impossible, but if it were that easy someone would have reverse engineered Windows or Adobe Photoshop and there would be Open Source versions available. :P The point was, that machine code is significantly harder to reverse engineer. Byte code as I now understand is quite high level.

I'm finding the only constant in software development is change it self.

alex barylski

I know it's not impossible...but removing copy right protection is not completely reverse engineering. The architecture is what I am interested in protecting, not the implementation.

I'm finding the only constant in software development is change it self.

pra net

Yes, this is the exact same thing I was trying to explain :) The virtual machine is actually the program in machine language. It is just like your any other traditional C/C++ program. e.g. The Java Runtime is the actual program in machine code. The java runtime (virtual machine) bears the same relationship to the Java program (in byte code) as any Word Processor program bears with Word documents. Pradeep

Rob Grainger

Hockey wrote:

I realize that reverse engineering byte code or machine code isn't impossible, but if it were that easy someone would have reverse engineered Windows or Adobe Photoshop and there would be Open Source versions available.

Not sure about "Open Source".. It's illegal to reverse engineer these types of projects - check the license agreements. So releasing said code after reverse-engineering will undoubtedly encode copyrighted or patented material. GNU and Linux have both been bitten by this - volunteers adding code that was covered by license - with either the code itself (copyrighted) or the algorithm (often patented).

cpkilekofp

Mmmmm, I was also making the point that machine language is related to the computer hardware interface as the Java code is related to the Java runtime as the Word document is related to the Word program. Machine language, in other words, is just another script for a logical machine (where the logical machine, in this case, maps directly one-for-one to the physical machine). It's a point that's often missed except by C and assembly language programmers.

brianhood

Byte code is interpreted by an interpreter. machine code is processed by the CPU. basic is interpreted which run slow by reading text and translating. but if you transform it into byte code{function numbers} it will run faster by calling Function[number] in the function table instead of looking up a text command in a command list. java use to be interpreted but now you compile it into byte code to be interpreted faster. I think they also code the variables in a variable table as well as the functions, so there's no text to translate plus it make them smaller. Variable 100 {variabletable[100]} is quicker and smaller than looking up myvariablename each byte code tells whether it a command or a variable and how many arguments