music

OSdata.com

learning assembly language

summary

This is a brief non-technical explanation of what I am building.

Machine code or object code is the ones and zeros that a computer actually uses.

Assembly language is a human readable form of machine code.

Assembly language used to be used for anything critical. Now very few businesses even consider writing in assembly. Instead they simply buy more computers.

Still, assembly language is still taught in colleges and universities because it is one of the best ways to learn how a computer actually works.

I am writing instructional material that teaches students how to write asembly language code. I am also building an emulator that will allow the students to actually test and run their student programs.

Instead of building an emulator for an actual existing processor, I am buidling an emulator for an imaginary processor. This has been done before.

In the past, professors often insisted on the use of a real assembly language for a real processor or computer so that students would be able to do real work in the real world.

Because real assembly language isn’t used anymore, this reason no longer matters.

Real processrs all have at least some strangeness to them. These are the result of design compromises, usually related to the cost of making the processor.

These special cases of weirdness make it more difficult to learn assembly language, because the student has to learn the concept and the weird special exceptions.

This is the reason that teaching processors have existed.

At this point I could use an existing famous teaching processor, but I have in mind additional uses that I will outline below.

One of the problems of teaching assembly language is that there are some things that are extremely important but also very complex. The classic example is input and output, which is obviously essential but also happens to be one of the most advanced problems in assembly language.

I have chosen to use something called a Threaded Interpretative Language (or TIL) as a teaching tool.

A TIL has an inner and an outer interpretter.

The outer interpretter happens to be the part that performs input and output. I am proposing that each student build a simple outer interpretter in any programming language. This will be an easy task and isolates the difficult parts from the teaching.

The inner interpretter is the part that does the real work. Very few parts of the inner interpretter have to be built before the TIL can be used for simple things.

This is good, because the student can build each part (typically 80-120 individual parts called primatives) one at a time, in an order that makes sense for teaching assembly language.

At the end of the class in assembly language, the student will have both a knowledge of assembly language programming and a useful TIL for personal use.

The most famous TIL was the programming language Forth. Forth is still used today because it provides a very compact and efficient method for building low level computer projects. The first famous use of Forth was to build a program to control a large telescope.

I propose to continue past the things that would normally be taught in a college assembly language programming class and show interested students how to use their personal custom TIL to create a compiler for any high level language.

A compiler is a program that translates human readable source code into actual machine code.

A high level lanbguage is a normal programming language, such as Ada, C, C++, .COBOL, FORTRAN, Java, LISP, NET, Perl, PL/I, Python, Ruby, SmallTalk, etc.

The most common form of compiler translates a high level language into an intermediate stack based language. A stack is a kind of computer data structure.

The creator of Forth happened to have decided to build Forth as a stack-based language.

A TIL (including Forth) is an extensible language. That is, it is easy to add new features and commands to the language.

It is therefore easy to extend a TIL into a stack based intermediate language for any high level language.

This becomes especially cool if the underlying intermediate processor happens to be a general computing model (which is the case here).

It is easy to convert the intermediate code written for this TIL and artificial processor into actual real object or machine code for any real processor (that is part of the design).

It is more difficult, but still possible, to convert the intermediate code into a high level language (the opposite direction of a compiler, often called a decompiler).

There are three big problems faced by any large company (such as Google):

The Babel of different programming languages.
Poor tools for efficient distributed and parallel programming.
Slow turn around time caused by dependencies involving large programs maintained by large teams.

The proposed ability to convert between any two high level languages solves the first problem.

At one time there were major differences between the capabilities of major programming languages. While there are still differences, the differences between modern languages tend to be a question of style rather than a question of actual capability.

This means the choice of programming language is primarily psychological. A programmer is most efficient at the language he or she feels most comfortable with.

Now each programmer can write in whatever high level language he or she is best at. This is the language that the programmer will be most efficient and productive with.

Programmers no longer need to switch to someone else’s language choice. Each programmer on a large team can program in their favorite language and can view everyone else’s contributions to the large project in the language of their choice.

The second problem (inefficient parallel processing) is a serious problem for large businesses. This is one of the two problems that Google’s Go language was designed to solve.

Parallel processing is the idea of dividing a problem into many small parts, with each part performed by a different computer (or core within a processor). This dramatically speeds up operations compared to one computer doing all of the work by itself. This is how Google can answer search requests so quickly.

Unfortunately, very few of the common programming languages are built to take into account parallel processing. This means that the compiler can’t take full advantage of a large network of computers.

Also, many of the languages that do have parallel processing capabilities have only limited parallel processing capabilities.

And just to add to the mess, the few programming languages that have really good parallel processing capabilities tend to be obscure and not widely known.

One of the advantages of a TIL is that it is easy to make it very good at parallel processing.

And it is possible for an individual programmer to use the TIL compiler to add parallel processing capabilities to a programming language that doesn’t already have them and to supplement the capabilities of languages that have limited parallel processing capabilities.

Note that Google’s Go language only has a limited version of coroutines (a kind og parallel processing capability).

Another problem faced by large teams (such asGoogle) is the turn around time to compile.

In most programming languages, the part one programmer is working on has to refer to the parts built by many other programmers.

This is refered to as the dependencies (my program depends on your program).

In a large system, this can very much slow down how long it takes for a programmer to write a new piece of code (often a few lines), compile the new code, and then test the results.

Solving this problem was the other major reason for Google’s Go programming language.

Because of the nature of a TIL, it is possible to completely sidestep this slow down. Note that a TIL could be built that keeps this problem, but we are going to design things to avoid this problem.

All of the high level language coding in this system is purely psychological. The real code is the TIL and the intermediate processor.

Because of this, the dependency problem either goes away or is dramatically reduced, no matter what programming language the programmer has chosen.

So, this becomes not just a teaching tool, but an advanced tool that solves some of the most serious problems facing very large programming teams.

contact

If you find this interesting and want to contact me, write to Milo, PO Box 5237, Balboa Island, California, 92662, USA.

if you want to make a tax-deductible donation to the StarTree107 Foundation to support this educational work, contact Dr. Barry at 949-675-5778

Challenge: Homeless or Facebook

return to design goals discussion

I do the news as an unpaid volunteer for KOCI 101.5 FM, Newport Beach/Costa Mesa (also available on the web)

†UNIX used as a generic term unless specifically used as a trademark (such as in the phrase “UNIX certified”). UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company Ltd.