music

OSdata.com

UNCOL
Universal Computer Oriented Language

UNCOL was first proposed in the early 1950s and discussed in a paper published in 1958 by Melvin E. Conway. UNCOL stands for UNiversal Computer Oriented Language. The original intent was a universal intermediate language. The source code for all programming languages would be translated into the intermediate code and the intermediate code would then be translated into the machine code for all processors. This meant only one compiler back end per processor and only one compiler front end per programming language.

UNCOL was never successfully implemented, despite the fact that some of the most famous people in the history of computers have worked on the problem.

Several national governments (including the U.S. Department of Defense) have attempted a solution. Several large businesses (including IBM, Digital, and HP) have attempted a solution.

This webpage discusses the history of UNCOL, including previous failed attempts.

Note that substantial parts of this page are just notes, not completed work.

OSdata.com is used in more than 300 colleges and universities around the world

Find out how to get similar high web traffic and search engine placement.

The description of UNCOL is an introduction for those who are new to the concept. The beginning of the discussion of the history of UNCOL should be accessible to most readers, even novices.

The discussion of past attempts and alternatives to UNCOL should be accessible to most readers.

general
history
complexity of the problem
proven possible
suggested requirements
using existing languages
FORTRAN
C
MLRISC, C--, JVM, x86
Lisp
UNL as UNCOL
ANDF
jump to proposed solution
still useful concept
legacy code problem
meta-assemblers
applying this idea
human readability
basic syntax
overall approach to proposed solution
classical and quantum complete
comcept and implementation
universal message
outline of proposed solution
defintion of UNCOL/universal message to aliens
intelligent signal
start of message
counting
prime numbers
truth tables
Turing machines
basic signals
defining UNCOL
subroutines
flip-flops
simple circuits
state machines
Hardware Defintion Language (HDL)
Sr and derivatives
machine code
Knuth’s MMX
data
supplementary instructions
parse trees
tools
compiler/assembler
error correction
multi-dimensional plotting
compression schemes
astronomical data
building an actual UNCOL language
ADD example
AND example
data representation
storage
operations
lambda calculus
graphics processing
quantum computing
essential parts

general

UNCOL was originally proposed as an intermediate language to solve the problem of realistically making every programming language available on every processor.

If there are X number of programming languages and Y number of processors, then building a compiler for each combination of programming language and processor requires X·Y different programs.

If source code for each programming language can be translated into an UNCOL and UNCOL can be translated into object code for each processor, then the work is reduced to X+Y different programs.

This is an improvement that changes a multiplicative process into an additive process.

Unmentioned in the original example is that an UNCOL could potentially be used for many additional translation problems, possibly including device drivers, file formats, CODECs, APIs, etc.

Each new additional use of UNCOL raises the number of combinations factorially. When you look at additional combinations, you are potentially reducing factorial growth to linear growth. Factorial growth surpassed the number of particles in the observable universe (estimated at 10E+80) in just 59 combinations (1.386E+80). 59 combinations is a trivial number. Much smaller numbers of combinations exceed reasonable computing time.

Unmentioned in the original example is the idea of adding processors to the input language side of the problem. Transforming existing object code into UNCOL produces a partial solution to the legacy problem.

Unmentioned in the original example is the idea of adding programming languages to the output side of the problem. This more ambitious goal has been proposed under a variety of names. Transforming existing software from any arbitrary source language into any arbitrary target language also produces a partial solution to the legacy problem, as well as dramatically assisting multi-language projects.

history

history: UNCOL (UNiversal Computer Oriented Language) was proposed in 1954 or earlier.

UNCOL was first proposed in 1954 (or earlier) and first discussed in print in 1958 by Melvin E. Conway (“Proposal for an UNCOL”, Communications of the ACM 1:3:5). UNCOL stands for Universal Computer Oriented Language. The original intent was a universal intermediate language. The source code for all programming languages would be translated into the intermediate code and the intermediate code would then be translated into the machine code for all processors. This meant only one compiler back end per processor and only one compiler front end per programming language.

UNCOL was never successfully implemented, despite the fact that some of the most famous people in the history of computers have worked on the problem.

The good news is that although nobody succeeded at building UNCOL, they generally did create useful things, in some cases some of the most important contributions to software engineering. This is a field where failure at the main goal can result in great advances.

The most famous example would be Niklaus Emil Wirth’s invention of P-cocde and Pascal. For a while Pascal was the second most used programming language in the world (after C) and both Pascal and P-code have been highly influential even to this day.

complexity

UNCOL has repeatedly proven too complex for some of the best minds in computer science history.

From “UNCOL to ANDF”, Stavros Macrakis, June 29, 1993:

    A striking feature of the UNCOL papers is the number of innovations that would have been need to make it work.

    Since there was no standard character code, UNCOL would have had to define one. One proposal included over 500 characters thought useful for mathematical notation (including, e.g., black-letter subscripts). On the other hand, no thought was given to national language support.

    Bootstrapping was a novel technique. Many critics of UNCOL could not understand how it would be possible to bring an UNCOL compiler up on a new machine. Thus, much of the text of UNCOL papers describes bootstrapping and defends its feasibility.

    Inded, the very notion of an intermediate language for compilers (as opposed to ad hoc data formats to pass information between compiler passes) was rather novel.

    At the same time, programming language technology was in its infancy. Concepts that we now take for granted were novel or even non-existent, such as records, pointer types, and for that matter data types in general.

—“UNCOL to ANDF”, Stavros Macrakis, June 29, 1993

proven possible

The appendix to Alan Turing’s famous 1936 paper “On Computable Numbers, With an Application to the Entscheidungsproblem” proved that Turing Machines were mathematically equivalent to Church’s Lamba Calculus by showing that Turing’s computability and Church’s effective calculability were transformations of each other. Turing followed up with a more rigorous proof in his 1937 paper “Computability and λ-Definability”. Steven Cole Kleene proved the equivalence of lambda calculus and Kurt Gödel’s 1934 recursive functions.

The mathematical methods for describing effective calculability are Turing machines, recursive functions, and λ-definable functions. Any of these methods can be transformed into any other.

All three methods describe every possible software program.

Lambda calculus is the basis for procedural and functional programming languages.

Anything that can be expressed in any conceivable programming language can be expressed in Turing machines, recursive functions, and λ-definable functions.

Anything that can be expressed in any Turing complete programming language can be expressed in any other Turing complete programming language. And any Turing complete language can express anything that can be expressed in any programming language that is not Turing complete.

The expressions may be exceedingly clumsy, such as trying to do complex math in COBOL or general expressions in BASIC.

Note that while any Turing complete language can express any other Turing complete language, to be useful, a language should also have the ability to directly handle indirection (pointers) and the bit-by-bit representations of data.

Therefore, an UNCOL is mathematically possible (even if not practical), if for no other reason than any arbitrarily chosen programming language could be used as an UNCOL.

Some modern programming languages started with compilers that produced C or FORTRAN source code before moving on to natively generated code, effectively making the C and FORTRAN partial UNCOLs.

FORTRAN, C, and a few other languages have been proposed as an UNCOL. The problem is that every high level language has “baggage” that interferes with an effective UNCOL. The source and target languages often have severe mismatches in capabilities and even have incompatible implementations of the same or similar feature. Languages in the same family group are more likely to translate easily, but languages from entirely different groups become very difficult to synchronize.

The simplest true UNCOL would be either NAND or NOR. Both NAND and NOR are logically complete (all of Boolean algebra can be derived from either). The vast majority of modern digital computers are actually collections of NAND gates (some computers using alternative materials are collections of NOR gates). One can combine the set of NAND (or NOR) gates to create any arbitrary real computer with an array of flip-flops with pre-set values containing any arbitrary program or programs. Therefore, just NAND or just NOR is a true UNCOL capable of representing any program in any programming language.

The problem with either NAD or NOR as UNCOL is the incredible lack of efficiency. The transformations to convert a massive series of NAND gates describing a computer and a program are simply too time consuming to be of any practical use.

Therefore, the goal is not to find UNCOL (I already gave you two different valid UNCOLs), but to find an UNCOL that rests at a practical level for modern technology.

suggested requirements

T. B. Steel, Jr., of System Development Corporation, Santa Monica, California, presented a paper entitled “A first version of UNCOL” in the May 9-11, 1961, western joint IRE-AIEE-ACM computer conference in which he proposed a few standards for a true UNCOL.

The following is the abstract describing his paper. The full paper can be purchased at ACM Digital Library.

Summary

    UNCOL--UNiversal Computer Oriented Language--is being designed as an empirical, pragmatic aid to the solution of a fundamental problem of the digital data processing business: automated translation of programs from expressions in an ever increasing set of problem oriented languages into the machine languages of an expanding variety of digital processing devices. By application of a program called a generator, specific to a given problem language, program statements in this problem language are transformed into equivalent UNCOL statements, independent of any consideration of potential target computers. Subsequently, without regard to the identity of the original problem language, the UNCOL statement of the problem is processed by a program called a translator, which is specific to a given target computer, and the result is an expression of the original problem solution in the machine language of the desired processor. The advantage of this apparent complication over the current procedure of employing a program called a compiler for direct transformation from problem language to machine language is evident when one examines the number of languages and machines involved and the not inconsiderable expense of translation program construction. If there are M problem languages and N machines, then M+N compilers are required and only M+N generators and translators.

    In order to arrive at sensible specifications for UNCOL, certain limitations in its scope are essential. Accordingly, UNCOL is designed to cope with only those problem language and machine language characteristics that can reasonably be expected to enjoy general use in the next decade. Any broader approach shows promise of leading to elegant, impractical results.

    A glance at the preliminary specifications for UNCOL shows a language akin to a symbolic assembly language for a register-free, single address, indexed machine. The specific commands are few in number and simple in construction, depending on a special defining capability for the expression of more elaborate instructions. The data referencing scheme is complex, allowing the application of a signed index to the primary address, and permitting both the primary and index parts to be delegated to an indefinite level.

    Each item of data, either input or calculated, must be described as to type, range precision and the like by special data descriptive syntactical sentences in the language. These descriptions, additionally, provide information concerning ultimate storage allocation as well as indicators of contextual meaning for the explicit commands.

    Supplementary to the instructions and data descriptions are certain declarative sentences, inessential to the explicit statement of the problem solutions being translated, designed to provide information useful to the translator in the introduction of computational efficiency into the object program.

Introduction

    One of the harsher facts of life in the data processing world, against which none interested in the effective operation of a digital computing installation dare wear blinders, is the existence of pressure toward diversity. Basic hardware always appears in a variety of shapes, sizes, and configurations. New machines arrive on the scene with disconcerting regularity. Since 1952 the number of different types of computers built each year has oscillated about a mean of thirty, and subsequent to 1955 better than sixty per cent of these machines have been commercially built, general purpose devices available to any and all having the inclination and the money.¹, It follows that a problem of steadily increasing magnitude is found in the question of inter-machine translatability of programs.

    At the same time, an expanding frontier of applications as well as growing sophistication of machine language has led to the proliferation of the "easy-to-code" problem oriented languages--POLs. Considerable difficulty is attendant to the employment of these languages in view of the effort necessary to maintain an adequate supply of current tools. It is well known that the task of producing translations from problem oriented languages to real machine languages is far from trivial. Until quite recently this translating was done by people--we call this "machine language programming." Now, however, with the exception of a few special cases not relevant to this thesis, the responsibility for performing these translations is being assumed by automation. As a result, requirements appear for processing programs-- compilers--that are both expensive and time consuming to produce.

    This capital investment in a translation program would be well advised were the concern solely for one problem oriented language and one machine language. However, as indicated above, there is a large variety of machine and problem languages. Thus the required investment increases multiplicity--one compiler for each pairing of problem language and machine language. In addition, the first derivative of development in both machines and problem languages is sufficiently positive that there is a strong tendency toward obsolescence prior to use and the day is not yet with us when programmers will write compilers on lunch hour.

using existing languages

Several existing languages (especially C) have been proposed as an alternative to UNCOL.

It has been common for the first experimental attempts at a new language to use an existig language as an initial output target of a compiler. This works fine while the language is still trivial, but becomes a problem as the language matures. Eventually any serious language has to stop using C, FORTRAN, or other existing language as its output target.

From “UNCOL to ANDF”, Stavros Macrakis, June 29, 1993:

    A different approach to intermediate languages for distribution is the use of existing programming languages.

    The advantage is that they already have compilers on many machines, and if the source language is the same as the chosen distribution language, then the front end becomes trivial.

    However, there are disadvantages both on the producer (front-end or preprocessor) and installer (back-end) sides.

    The front end must translate source language semantics into the intermediate language. Sometimes, this is straightforward. But more often, there are hidden complications. For instance, languages’ treatment of loop termination conditions may be different. More subtly, languages’ treatment of variable values after error conditions may be different. Often, a language feature available in the source language is not available in the target, and must thus be emulated. Such an emulation may mean premature implementation decisions. Also, machine-dependent implementation techniques not used by the IL’s compiler are not available to the source language translator. These complications make producer design more difficult, reduce the quality of the installed code, and favor the particular language over others.

    The back end must translate the intermediate language into machine language. For the result to be predictable, the intermediate language must be fully specified in a portable way. This means not only specifying the semantics independently of the machine, but also providing a complete and portable set of environment inquiries, which can be used either at installation time or at runtime. Full specification and complete environment inquiries are found in few languages.

    Many compilers do not handle program-generated code well. Program- generated code often contains peculiar constructs and often exceeds compilers’ capacity limits. For that matter, compilers are tuned to provide good code for typical hand-written programs, and not necessarily for translations of programs written in other languages.

    Another important objection to using existing languages is the problem of reverse engineering. If protection of proprietary programming is required, the front end is no longer trivial even if the source language and the intermediate language are the same, since it must obscure or ‘shroud’ the source code.

—“UNCOL to ANDF”, Stavros Macrakis, June 29, 1993

FORTRAN

From “UNCOL to ANDF”, by Stavros Macrakis, June 29, 1993:

Fortran has often been used in the past as an intermediate language because of its simple semantics, relatively good standardization, and wide availability. Nowadays it would usually not be considered because of its lack of such fundamental features as records and pointers and weak support for character manipulation.

But writing portable Fortran is difficult. Indeed, such major producers of portable Fortran as IMSL and NAG have extensive software toolkits to support the process (cf. [Boyle77]). Depending on such elaborate preprocessing defeats the purpose of a standard intermediate language.

—“UNCOL to ANDF”, Stavros Macrakis, June 29, 1993

C

From “UNCOL to ANDF”, by Stavros Macrakis, June 29, 1993:

    Many language implementations generate C as an IL, both for C extensions (e.g. C++) and other languages (Ada [Meridian], Cedar, Eiffel, Modula-3,¹ Pascal [Bothe89], Sather [Lim91], Standard ML [Tarditi91]).

    1. cf. the discussion about the use of C as an intermediate language for Modula-3 compilers on the comp.compilers newsgroup. David Chase’s remarks <1990Aug14.163258.2094@esegue.segue.- boston.ma.us> are particularly pertinent.

    C is widely available and generally compiles into efficient code. Moreover, C’s low-level pointer constructs and weak typing allow easy emulation of many other languagesÕ constructs (e.g. passing parameters by reference).

    C is a particularly appropriate intermediate language for prototyping (in the absence of a standard intermediate language). In prototype implementations, portability and standardization are often not issues. But experimentation shows that good efficiency requires careful choice of idiom in C to translate source language features.²

    2. The Sather work shows this quite clearly.

disadvantages of C as IL

    For an intermediate language, many areas of C semantics are implementation-dependent, that is, underdefined. For instance, there is no way of specifying integer or floating-point precision, nor for that matter of querying them.

    Other areas are overdefined; for instance, parameter passage is always by value, although reference passage or copy-in/copy-out may be more appropriate on some architectures. It is possible to implement these other mechanisms explicitly, but that precludes the installer from choosing an implementation strategy as a function of the target architecture.

    C’s data definition mechanisms are weak. There is no way, for instance, to declare a variant record with discriminant nor a variable-size array. They can of course be emulated, but this means committing to a particular implementation which may not be appropriate for the target machine.

    Many constructs needed for other languages are missing from C. Lexical scope is missing. Exceptions are missing. Safe pointers (for garbage collection) are missing. All of these can be emulated in C, but only by committing to a particular runtime model.¹ Such overspecification reduces efficiency. Also, invariants preserved by the emulation are unknown to the C compiler and thus cannot benefit optimization.

    1. Consider the programming conventions required to support safe pointers in GNU Emacs Lisp.

    For all of the above reasons, although C has been useful for prototyping extensions to itself (C++) and for producing code rapidly for other languages, C is not an ideal intermediate language.

discussion: C as a distribution format

    As an intermediate language for languages other than itself, C is not very strong. C does have a specific advantages as a distribution format, however: if source code is distributed, the mechanisms for pre-processing and machine-dependent static quantities are handled with no additional effort.

    Still, many of C’s disadvantages as an IL carry over to C as a distribution format.

    In addition, the complete semantics of a C program are not specified by the C language; parameters to linkers and loaders may change symbol interpretations.

    In principle, many of these objections could be overcome by re-engineering of compilers, careful definition of standard special-purpose datatypes (int_12, int_13, ...), standardization of linkers, and more flexible optimizers. But if standard C compilers cannot be used, why use the standard C language for a purpose for which it was not intended?

—“UNCOL to ANDF”, Stavros Macrakis, June 29, 1993

MLRISC, C--, JVM, x86

From “Compiler Design”, by Wilhelm & Maurer, 2000:

UNCOL UNiversal Communication Oriented Language. In the 50’s, computer scientists tried to solve the prolem that for m programming languages, and n machins that you want to run these languages on, you had to build m * n compilers. Huge efforts were expended to find a universal intermediate language, UNCOL. Thus you would need only m compilers to compile the source language to UNCOL, plus n compilers to compile from UNCOL to the native machine code.

Since then, this has been a holy grail for computer science, and effortss to find the UNCOL, in one form or another have been continuing. MLRISC and C-- are two such efforts (though they don’t mention the infernal acronym). The Java Virtual Machine instructions may well be a candidate, having a lot of languages implemented to compile to it already, and others argue that the x86 instruction set is a de facto UNCOL.

LISP

From “UNCOL to ANDF”, by Stavros Macrakis, June 29, 1993:

    Sometimes Lisp is suggested as a universal intermediate language. Indeed, ANDF has some superficial resemblances to Lisp.

which one?

    There are apparently three different things meant when people speak of using Lisp as an intermediate language:

Using a fully parenthesized notation.

Using traditional dialects of Lisp.

Using Scheme as an object-oriented language.

fully parenthesized notation

    The major classes of intermediate languages are linear (triples, quadruples, tuples), tree-structured, and graph-structured. In all cases, a key decision is the actual operators used and their exact semantics.

    ANDF is in fact a tree-structured language. Its operators have been carefully chosen to be architecture- and language- neutral.

    Lisp is tree-structured as well, but its particular set of operations is specific to Lisp semantics. Replacing the operators with another set of operators would result in a different tree-structured language, not Lisp.

using traditional dialects of Lisp

    Lisp, as a programming language developed for human use, shares several of C’s shortcomings as an intermediate language, in particular underspecification (implementation-dependence) for some constructs, and overspecification for others.

    An example of underspecification is the lack of a machine-independent way of specifying the range of integers. As for overspecification, Lisp parameter passing is by value for certain primitive types and by sharing for composite types. Other languages may require different semantics. On a more practical level, no commercial Lisp compiler allocates composite objects (records, arrays) on the stack.

    So using an existing dialect of Lisp would require radical reworking both of the language and of its compilers.

Scheme-like object orientation

    Scheme used as an object-oriented language has several good properties: it is well-defined, it is abstract and thus uncommitted to a particular implementation style, it has very general control structure.

    However, it has two fatal flaws: lack of concrete data typing facilities and requirement for garbage collection. Also, none of its implementations are as efficient as implementations of traditional sequential programming languages.

—“UNCOL to ANDF”, Stavros Macrakis, June 29, 1993

UML as UNCOL

for

From “UML is UNCOL”, Sunday, 16 March 2003:

Upcoming release of UML 2.0 will generate interest in “model driven architecture” or MDA. MDA has to be one of the most ambitious ideas from software engineering. It envisions development process as creationn and refinement of models expressed in UML. Architects will provide UML models as input to new tools that support UML 2, and the tools will generate code for implementation technologies like J2EE, or .Net. In other words, UML 2 and its tools are programming languages and their compilers taken to higher lvel of abstraction to bridge the gap between business and IT. Right model means quality software. UML is UNCOL.

against

From the 2003-09-13 Update to the above:

Dave Thomas’ “UML - Unified or Universal Modeling Language?” helps us build resistance to seduction of top-down mindset. Ted Neward also points out a few pitfalls of “model-first” mentality. We welcome reader’s models on the relation between this kind of ambitious deductive reasoning and confusion of nomenclature and phenomena.

Martin Fowler (who should get credit for making UML useful and accessible through his books and articles) argues that MDAs’ claim of platform independence is baseless.

ANDF

ANDF has been suggested as an alternative to UNCOL. As recently as 2013, Wikipedia was quoting from a promotional paper on ANDF (also quoted on this web page) claiming that UNCOL was no longer needed and then strongly implying that ANDF was a true UNCOL solution.

UNCOL was an ambitious effort for the early 1960s. An attempt to solve the compiler-writing problem, it ultimately failed because language and compiler technology were not yet mature. In the 1970s, compiler-compilers ultimately contributed to solving the problem that UNCOL set itself: the economical production of compilers for new languages and new machines.

UNCOL is sometimes used as a generic term for the idea of a universal intermediate language. The Architecture Neutral Distribution Format is an example of an UNCOL in this sense.

The quoted paper (“UNCOL to ANDF”, by Stavros Macrakis) was intended as promotional material to convince the world that ANDF was going to be a reasonable substitute for UNCOL.

Contrary to Wikipedia’s claim, ANDF was a distribution format, not a language. Further, ANDF was not universal. ANDF was specificalyl limited to the standard flavors of UNIX available at the time. ANDF was only intended to allow for distribution of open source software across different underlying processors, but all running a POSIX-compatible version of UNIX.

From “UNCOL to ANDF”, by Stavros Macrakis, June 29, 1993:

    UNCOL was an ambitious effort for the early 1960’s. An attempt to solve the compiler-writing problem, it ultimately failed because language and compiler technology were not yet mature.

    In the 1970’s, compiler-compilers ultimately contributed to solving the problem that UNCOL set itself: the economical production of compilers for new languages and new machines. Although their intermediate languages were fairly invariant by language and machine, this was not their principal goal.

    With the growth of open systems, distribution of portable programs became more important. Neither UNCOL nor compiler-compiler technology had addressed this issue. But a standard intermediate language would permit true open systems, where programmers could choose their language independently of the implementation platform, and hardware vendors could choose their hardware implementation independently of the installed base of program binaries.

    Programming languages solved part of the problem: Fortran and C were often used as intermediate languages. But they did not solve it all: neither was really suitable as an intermediate language.

    A closer analysis of the specific needs of portable programs showed that particular mechanisms were essential. By designing an intermediate language from scratch which took account of these requirements, ANDF succeeds as a universal intermediate language.

    UNCOL was the first step in three decades’ work in software portability and compiler design whose culmination is ANDF.

—“UNCOL to ANDF”, Stavros Macrakis, June 29, 1993

The ANDF project broke up shortly after the promotional paper was written.

ANDF started having problems expressing a wider range of high level language features and approaches.

Several of the companies that were part of the ANDF effort did continue their own indpendent efforts for their own systems, but within a few years all of those efforts shut down.

still useful concept

It is my personal claim that an UNCOL would still be extremely useful.

One particularly interesting and useful application would be for a universal driver system. Hardware manufacturers currently face the decision of how many and which operating system/processor combiantions to support when releasing device drivers for their new (or existing) hardware.

A specialized driver compiler tool could be made available for as an open source project (a project I’d personally enjoy building). Hardware manufacturers could write their device drivers once for this universal driver software. Operating system creators (both proprietary and open source) could mount the tool on their system (and tweak the tool for the needs of their OS). End users could run the distributed device driver software through the tool for their oeprating system and then install the resulting object code into the appropriate location (thiese steps can be automated for the general user). The new device then works. Everybody is happy.

This would be one case where even the most secretive company would want the widest distribution of their device drivers and would be very happy to release the source code for their device drivers if it creates a low cost method for them to make their hardware available for any operating system/processor combination.

Also, a true working UNCOL is one of the obvious tools for building the more general purpose capability of automatically translating between high level languages, making life much easier for multi-language programming teams (whether the multiple languages are used simultaneously or serially because of changes of language through legacy code).

legacy code problem

A successful solution to UNCOL has potential uses in at least partial solutions to the legacy code problem, especially if extended into a universal language translator.

In 2009, Google studied the lehacy code problem and estimated that it would take a minimum of tens of trillions of dollars if every company in the world attempted to solve their legacy code problem simultaneously.

And because of the inherent bugs and problems in large scale programming projects, Google also estimated that the costs would put at least half of the world’s companies out of business before they could successfully complete the solution.

a different approach
meta-assemblers

I claim that a different, but related project from the early days of computing offers a valid solution.

Until the 1980s to 1990s, assembly language was commonly used. This was because processors were so limited in speed and power, that it was often necessary to resort to assembly language to get realistic performance.

Well written hand-coded assembly language typically runs at least four times as fast the equivalent compiled version (regardless of which high level language is compiled).

Although there are still specialized uses for assembly language, it is almost never used because processors have gotten so fast that the inefficiencies of compiling from high level languages are irrelevant.

The reason I mention the history of assembly language was that there was also the idea of the meta-assembler, one generic assembly language that could be used as a substitute for learning a different assembly language for each processor.

It turned out that the final object code generated by any of the meta-assembler attempts were approximately of the same efficiency as the output of compilers for high level languages.

Therefore, the metaassemblers had no advantages over high level languages, but lacked the many benefits of high level languages. Kind of a worst possible combination of the disadvantages of assembly and high level languages without the benefits of either.

But now that the problem of efficiency has been solved by brute force power and speed, a generic asesmbler does have advantages as the basis for a valid solution to UNCOL.

applying this idea

Because all compilers eventually have to map the original high level language source code to some kind of executable (actual processor object code or specialized byte codes), there is no significantly different amount of work in the choice of the target output code.

Compiling to a general purpose generic assembly language is approximately the same amount of work as compiling to any one real processor or any one byte code. This is where the UNCOL advantage shines. Write one front end per high level language to this proposed UNCOL and then write one backend per real processor.

If we have a small set of basic operations, we have enough generic assembly to build any software in a practical manner. This small set will include such things as the basic logical operations (AND, OR, XOR, NOT, etc.), shift/rotates, branching, subroutines, integer add, integer subtract, direct memory/register access, indirection, and definition of data types from bit string components.

The goal in the small set of basic operations is true universality, without locking in any baggage from any specific real world processor.

Each of these basic operations would be generic versions. Attributes would modify the generic operation to lock it into specifics. As an example, the basic add instruction can have attributes that identify things like the size of the data being added (byte, word, longword, specific bit width, etc.), the format of the data (signed integer, unsigned integer, BCD, even floating point formats), and specialized options (such as saturation vs. wraparound, special signals for overflow or underflow, etc.).

When compiling from a high level language to UNCOL, we would only specify information that is required by either the high level programming language or by the software the programmer has written and leave as many decisions as possible for flexibility on the backend of the compiler to take best advantage of each processor’s unique capabilities.

More advanced generic assembly language operations can be built by writing routines from just the basic building blocks. Each of those routines can then also be used as a building block (giving extensibility in a method similar to Forth, but without an interpreter).

Once each of the advanced generic assembly language instructions has source code in the lower level basic generic instructions, then the authors of the backend of compilers have the option of diretcly applying the advancced generic instruction (especially in the case where it maps well to their target processor) or using the source code inline or as a subroutine or even modifying the standard source code for a version that better matches their particular target processor.

A good set of higher level assembly language constructs is extremely useful for converting from UNCOL to real processors. As one example, multiply is a common instruction on modern processors. While we can create correct code using shifts and adds, it makes a lot more sense to be able to easily identify the multiply and replace it with a single machine instruction than to insert a shift and add routine.

A complete set of advanced assembly language constructs are therefore an important part of an UNCOL solution. To guarantee universality, for every advanced assembly language construct we must have a corresponding library routine made of basic assembly constructs. This backup defines the meaning of the advanced assembly language construct and provides a reliable way to perform the advanced assembly language construct on a processor that lacks that particular instruction.

Interestingly, we can have multiple versions of library routines for a particular advanced asembly language construct. The extra versions allow for choice in which of the possible available solutions best maps to a particular processor.

The same approach for defining and building advanced assembly constructs can be used for high level language features and concepts.

For each high level language, each concept and feature can be mapped to any combination of the basic and advanced generic assembly language. We end up with a library of language specific operations. This follows the example of C placing I/O and many other operations that vary by end processor and operating system into standard library routines.

These higher level language specific operations are written exactly once per language, preserving the goals of UNCOL.

It would be useful to identify higher level concepts that are shared by several languages, even if the sharing is limited only to a family of languages. Some of these concepts can be turned into optional generic operations. This would speed up the writing of compilers for languages that are similar to other languages.

human readability

Past attempts at UNCOL have tried to create a programming language that humans can and will use to write software.

I claim that there is no need for any human to directly write in UNCOL. The language only needs to be written and read by software (compilers and assemblers).

Nonetheless, I think it would be useful for the UNCOL to be designed so that it is easy for humans to read.

UNCOL can be very verbose specifically for the purpose of making it easy for human programmers to read and understand, even though verbose programming languages are typically rejected by programmers (who prefer the least amount of typing). If there is no intent for any human programmers to directly write in UNCOL, we become free to be as verbose as is necesary.

Our real goal is to make sure that it is easy for a machine to translate from any arbitrary source language to UNCOL and to make sure it is easy for a machine to translate from UNCOL to any arbitrary target processor.

Because our goal is to facilitate machines rather than humans, we can ignore anything that is designed for a human to read or write UNCOL. I’d like to preserve human readability, but that is a secondary goal.

That said, we do need to preserve the ability for a human creating either the front-end or back-end of a compiler to understand UNCOL well enough that he or she can create correct working translators.

classical and quantum complete

A modern version of UNCOL needs to be quantum complete.

The NAND gate is classically complete. The NOR gate is classically complete. The Toffoli (controlled-controlled-NOT) gate is classically complete.

The Toffoli gate along with π/2 rotations about the x and z axes make a quantum universal set. The Toffoli gate by itself is the smallest universal, reversible clasical operation in quantum computation. Also, the Toffoli and Hadamard gates constitute a universal set of quantum gates. Also the Hadamard, Phase, and C_NOT gates are a universal quantum gate set. The Identity function allows us to leave some qubits alone during each operation.

Note that any universal quantum gate set can efficiently be implemented with any other universal quantum gate set, so we can freely use any of our choice with confidence that we have an efficient transposition to any universal quantum computer.

It might be reasonable to make the quantum complete set the base operators for formally defining this proposed UNCOL solution, as that would make the UNCOL appropriate for both quantum and classical computing.

The quantum complete defintion of an UNCOL will tend to be cumbersome for the current widespread clasical computers, meaning that a pure classical implementation should exist alongside the quantum definition.

concept and implementation

The most basic concepts needed for a successful UNCOL are the equivalents of the basic logical operations (AND, OR, XOR, NOT, etc.), shift/rotates, branching, subroutines, integer add, integer subtract, direct memory/register access, indirection, and definition of data types from bit string components. All of this can be generalized intot he quantum realm with just the Toffoli gate, and the addition of the Pauli-X, Pauli-Y, Pauli-Z, and Haramard gates extend the set into quantum computing.

Everything else can be built up from these basic components.

Just as the C programming language simplified things by taking I/O and other common operations and moving them out of the language itself to a library subroutines, we can define useful higher level concepts in the set of minimal basic concepts.

In the worst case, we simply replace a concept with the library implementation. At one time this would be too inefficient, but with modern processors, this is no longer an impediment.

Using another example, Apple created the Standard Apple Numerics Engine (SANE) to provide a rich set of floating point operations to the 6502 and original 68000 processors, neither of which had hardware floating point. Later when the IEEE standard for floating point became common on processors, Apple abandoned SANE and made use of the hardware floating point.

In the worst case, we can create a concept-implementation pair for each feature of each high level language. Ideally we find the commonality between multiple languages, but this is not a requirement for success. The implementation in low level generic constructs can be used to create working object code for any target processor. This work need only be done once for each high level langauge.

By maintaining the concept-implementation sepeartion, we leave open the possibility for the back end of the compiler to use custom features of the target procesor for a better or more efficient implementation.

20 basic logical operations
truth tables

There are four binary truth tables (always true, logical negation, logical identity, always false) and 16 binary truth tables (tautalogy TRUE, contradicton FALSE, logical conjunction AND, nand, logical disjunction OR, nor, exclusive disjunction XOR, xnor, A inplies B, B implies A, A doesn’t imply B, B doesn’t imply A, A, B, not A, not B).

There are four possible logical operations on a single bit (unary connectives): always true (ignoring its initial value), stay the same, NOT (inverting the bit), and always false (ignoring the initial value).

There are 16 possible logical operations on two bits (binary connectives or dyadic connectives). two of these (always true and always false) are repeated from the set of single bit operations. The 16 basic dyadic connectives are:

Contradiction (always FALSE)
Tautology (always TRUE)
Proposition P
Negation of P
Proposition Q
Negation of Q
Conjunction (AND)
Alternative denial (NAND)
Disjunction (OR)
Joint denial (NOR)
Material nonimplication
Material implication (if...then)
Converse nonimplication
Converse implication
Exclusive disjunction (XOR)
Biconditional (NXOR; IFF, if and only if; equivalence)

state machine

State machines are fairly common. It would be useful to include compiler tools to identify when high language constructs are being used to create state machines and then translate those portions of high level code into special state machine descriptors in UNCOL. This will allow greater flexibility to the back end of the compiler.

Note that it was proven in the 1960s that a Turing machine with just two internal states and just two symbols (the simplest possible Turing machine) is not universal (“Turing complete” or able to perform any calculation), although it is sufficient for producing either one third or two thirds. Marvin Minsky of M.I.T. constructed a universal Turing machine (which is Turing complete) with seven states and four symbols. In the 1990s, mathematician Stephen Wolfram (developer of Mathematica) created a universal Turing machine wih two states and five symbols. In his 2002 book A New Kind of Science, Wolfram proposed a Turing machine with two states and three symbols, known as the Wolfram 2,3, but could not prove that it was Turing complete. Alex Smith, a student at the University of Birmingham, England, published a 50 page mathematical proof that the Wolfram 2,3 is universal. That proof is available at www.wolframscience.com/prizes/tm23/

Hardware Defintion Language (HDL)

While we are at this low level, I will just mention both VHDL and Verilog, as they are both robust existing languages for describing hardware.

Both VHDL and Verilog have some higher level constructs. We aren’t interested in those. And the primary target level of a proposed UNCOL is at a higher level than hardware description. Still, there are constructs in both languages worthy of consideration for an UNCOL.

Some of the topics to include (at the moment, these are more of laundry lists of things to consider rather than a real list):

Verilog: posedge, negedge, initial, always, forever, resgiters, nets (wire, wand, word, tri0, supply0, trireg, tri, triand, trior, tril, supplyl), blocking assignment, nonblocking assignment, gate types, MOS, bidirectional switches, gate delays, fork, join, bitwise operators, logical operators, reduction operators, arithmetic operators (two’s complement), relational operators, shift operators, concatentation operators, replication operators, conditional operators, four-valued logic (0, 1, Z, X). We are not concerned about system tasks other than $time.

VHDL: levels of abstraction (behavioral, structural, physical), mode (in, out, buffer, inout), type (bit, bit_vector, boolean, character, integer, natural, positive, real, time, std_logic, std_ulogic, std_logic_vector, std_ulogic_vector), nine level logic (U, X, 0, 1, Z, W, L, H -), generic, architecture, behavioral, structural, concurrency, component, signals, enumerated types, arrays (increasing and decreasing indexes), type conversion, attributes (signal, scalar, array), operators (logical, relational, shift, addition, unary, multiplying, misc), process (sensitivity list, if, case, loop, next, exit, when, while loop, for loop, wait, wait until, wait for, wait on), null, dataflow, concurrent signal assignment, conditional signal assignment, selected signal assignment, component declaration, component instantiation and interconnections.

SR and derivatives

The SR (Synchronizing Resources) language was a research language that had both a very concise syntax and support for the major methods of concurrency, features that will make it ripe for raiding.

Some features (again a laundry list for future work): resources, globals, operations, procs, procedures, processes, co statements, local procedure calls, remote procedure call, call.send.forward invocations, receive and input statements, rendevous, message passing, dynamic process creation, multicast, semaphores, shared memory, virtual machines.

The MPD language is a variation of SR that uses the same intermediate form and run-time system as SR.

The JR programming language brings SR’s concurrency model to Java.

The FT-SR programming language is a derivative designed for constructing fault-tolerant distributed systems.

machine code

Rather than using any actual processor’s machine code, a successful UNCOL might use the ideas of a pseudo assembler and overloading of concepts.

Decades ago I was one of the programmers ridiculing the idea of a universal assembler (a generic assembler that then gets translated into a real assembler). At the time there was the problem that converting from a universal assmebler to a specific real assembler introduced the same inefficiencies as the use of a high level language, but failed to provide the ease of use of a high level language. The universal assembler combined the disadvantages of the two approaches and cut out the advantages of each.

But we have long since passed the point where the inefficiencies of a high level language are so trivial as to not matter. The universal assembler has become a reasonable choice and works great as the basis for an UNCOL.

My proposal is to have a pseudo assembler. Like the more general case of pseudo code, there is no intent to fix the code to all of the specific details, but rather to record the logical steps. The details can then be filled in by the back end of the compiler.

Another problem with the attempts at a universal assembler were the question of overspecification or underspecification. If the universal assembler was limited to a very small set of basic instructions, then the translation to processors with more advanced features will often be blocked from use because it will not be easy to identify the higher level concepts. If the universal assembler used a large set of instructions, then the translation to less powerful processors will suffer.

I propose overloading concepts, by including both an implementation using a very small set of instructions and the marking of groups of instructions that are an implementation of higher level concepts. Where appropriate, the back end of the compiler can use the higher level concept markings as the source of translation rather than the more detailed version specified with the small set of low level instructions.

This overloading approach also avoids the problem of a large number of special case concepts becoming essentially just a different presentation of all of the high level languages. Simply moving the problem to a new location (from a bunch of high level languages to a bunch of concepts in a single language) is an illusion, not an actual solution. Having both the higher level concepts and low level implementations means that a back end translator can always use the low level implementation and only use those higher level concepts that are reasonably appropriate for the target processor.

A glance at the preliminary specifications for UNCOL shows a language akin to a symbolic assembly language for a register-free, single address, indexed machine. The specific commands are few in number and simple in construction, depending on a special defining capability for the expression of more elaborate instructions. The data referencing scheme is complex, allowing the application of a signed index to the primary address, and permitting both the primary and index parts to be delegated to an indefinite level.

—“A first version of UNCOL”, T. B. Steel, Jr., 1961

Just as we can describe the logic circuits necessary for any binary digital computer, we can identify (with numbers/names) any machine code instruction from any binary digital computer and then use those instructions as part of UNCOL.

Any existing or possible binary digital computer’s instruction set is a valid UNCOL, but has the well-documented problems of distortions and other impediments for universality.

Therefore we want a very generic set of machine language instructions as the basid for a practical UNCOL.

In practice, we have more flexibility than is obvious.

Some programs simply can’t be compiled to run on all processors. Many modern programs are way to big to work on the first few decades of computers. Only certain kinds of programs can be efficiently compiled to graphic processing units.

While I encourage the use of the simplest instructions/operations possible, we can include very advanced instructions/operations in UNCOL because those more advanced instructions/operations can be compiled to any more simple computer via table lookup, subroutine library, or other simple methods.

Knuth’s MMIX

Knuth’s MMIX is a good starting point. We would want to remove the connections to things such as a specific register architecture, specific data sizes, and data types.

Also, by Knuth’s own admission, some of his MMIX instructions are beyond the built-in capabilities of many existing modern computers. Some of these items might be avoided so as to allow for programs to be easily implemented on existing computers.

It should be obvious by now that the basic 20 operations and the basic signals can be combined to create an infinite number of valid and approximately equivalent versions of UNCOL.

If we are careful to make sure that we can easily map between the UNCOLs in a pracitical and efficient manner, then there may be real advantages to having multiple UNCOLS simultaenously in use.

data

In addition to portions of UNCOL describing operations (instructions), we also need portions of UNCOL that describe the data.

Each item of data, either input or calculated, must be described as to type, range precision and the like by special data descriptive syntactical sentences in the language. These descriptions, additionally, provide information concerning ultimate storage allocation as well as indicators of contextual meaning for the explicit commands.

—“A first version of UNCOL”, T. B. Steel, Jr., 1961

Any specifications about data that are not already made in either the original source code or the original source language will be deferred. It is important for UNCOL not to add any extra specifications about data. Anything unspecified can be determined at the time of binding to a specific object code.

In a normal assembly language data storage is defined with directives rather than instructions.

The normal datat storage directives are aimed at the assignment of memory addresses and the bit strings needed for various fundamental data types (primarily data types supported directly by hardware).

We need the additional flexibility to define high level composite data structures and to specify various storage methods (stack frame, register, etc.).

Because we will leave register assignment to the compiler back end, we can provide an “infinite” (unbounded) number of registers and can have any arbitrary register type. We need to include the ability to specify machine specific special purpose registers, whether located inside the main processor core or located in external devices.

supplementary

Supplementary instructions should be avoided.

Supplementary to the instructions and data descriptions are certain declarative sentences, inessential to the explicit statement of the problem solutions being translated, designed to provide information useful to the translator in the introduction of computational efficiency into the object program.

—“A first version of UNCOL”, T. B. Steel, Jr., 1961

parse trees

A useful version of UNCOL may include a text based representation of a parse tree. But instead of using a specific parse tree designed for one specific programming language, we need a more generalized version of a parse tree to be truely universal.

A UNCOL called CAT (Common Abstract Tree Language) was developed at the University of Kiel in West Germany. It was bought by the Norwegian Computer Company Norsk Data, and used to make "real" compilers that were used in "real" products. There were front-ends for pascal, c, and BASIC (Fortran was planned but never developed); back-ends for NS16000, M680xx, ND-100, ND-500, ND-5000 (Norsk Date CPUs( and a Siemens machine I’ve forgotten the name of. Reinhard Voeller is still with Norsk Data in Kiel, Uwe Schmidt is professor at the FH in Wedel, in Germany.

—“Summary of UNCOL references”, Comp.compilers newsgroup posting, Debora Weber-Wulff, 12 December 1991

TCOL was supposed to be a family of tree-based intermediate languages, each specialized, but the whole family meant to be universal.

—“Summary of UNCOL references”, Comp.compilers newsgroup posting, David Lamb, 12 December 1991

If we intend to use UNCOL for the more general purpose problem of automated conversion between high level languages, then we will certainly want to drag around a parse tree. This will allow us to restore comments, indentation (and other white space), identifier names, and other information that helps a human read and understand source code. Ideally, if a piece of source code is translated from the high level language into UNCOL and then translated back into the same (original) high level language, we really want the piece of source code to stay the same.

As Microsoft said, intermediate languages are efficient because “by delaying the commitment to a specific native platform as much as possible, the execution platform can make optimal use of the knowledge of the underlying machine, or even adapt to the dynamic behavior of the program.”

building an actual UNCOL langauge

A couple of quick examples on how an actual UNCOL language would be built using the principles outlined above.

ADD (example)

The vast majority of processors have a group of ADD instructions. Individual instructions in the group vary by things like integer or floating point or BCD, signed or unsigned (for integer ADDs), and data size (single or double floating point, IEEE or proprietary floating point, byte word, long word, etc.).

Additionally, there are differences in what happens on overflow or underflow, with general purpose processors tending to wrap-around and set a flag, while digital signal processors tending to saturate.

To add to the confusion, some high level languages automatically or at programmer option extend beyond the normal sizes of data. This is often called arbitrary precision arithmetic and is typically implemented with a arbitrary precision arithmetic package or library.

With the proposed UNCOL solution, we want to hold off making any implementation decisions as long as possible, preferably deferring these decisions to the compiler backend for best match with each target processor.

We will record specific details set by the programmer or the high level language with attributes tagged to either the operation or the data.

Often there will be a need for conversions of data types. These conversions need to be recorded. If the data type of the operation has not been specified, then we may be able to leave the decision of the data type of the operation (and any conversions needed) deferred to the backend of the compiler.

To summarize, we’ll have a single ADD command/instruction/operation in this UNCOL, and will specify the details as attributes to the data or the operation or some combiantion of the two.

AND (example)

The logical AND operation is common in computers. In some cases the AND will be specified byt he programmer, but in many cases the AND will be inserted as part of the implementation of a high level concept into low level machine code (and the proposed UNCOL is intended primarily to be a general purpose generic assembly lanbguage).

Many processors have two different ANDs.

One common kind of AND is a straight-forward bit-wise AND, matching each bit in a bit string to the corresponding bit in another bit string and resulting in a bit string.

Another common kind of AND operates on a byte, word, or other size data as a whole. In the latter case, typically TRUE or FALSE is defined as ZERO and the other option is defined as all non-zero values. But there are other possibilities, such as negative and non-negative numbers to represent the truth values.

To make things even messier, many programming languages have their own specific implied implementation of logical values. The implementation choice of the high level language may conflict with the hardware implementation of a specific target processor.

For this implementation of UNCOL, we will use a single generic AND, then specify the implementation details in attributes tagged to the operation or data.

When possible, we want the flexibility for the back end compiler to make the final decision based on the facilities of the target processor.

One noteworthy potential problem is a programmer explicitly taking advantage of his or her knowledge of the hgih level language’s implementation choices. An example, would be testing a result value arithmetically (is it negative, zero, positive, or some other arithmetic test) rather than using the high level language’s logical tests. This is extremely likely in a language like C, where there is no built-in logical or boolean data type.

data

Technically, the only data type we need is bit strings because all of the data in a binary digital computer is represented by bit strings (including bit strings of zero bit and one bit length. Realistically, this is impractical and UNCOl is only useful if it is practical (the reason we didn’t stop at an UNCOL of NAND).

We want the facility to designate the data type of data. Different data types can have a variety of special attributes.

Bit flags and bit strings.

Numbers. Integers, floating point, fixed point, fractions, complex, BCD, arbitrary precision, signed, unsigned, bit size (or possible ranges of bit size).

Characters. Collating sequence, character strings.

Pointers and references.

Various higher level concepts mapped to bit strings, such as colors, enumerations.

First-class fucntions, parametric polymorphism.

Composite types. array, record, union, variant record, set, object

Abstract. queue, set, stack, tree, graph, hash, map, smart pointer.

Possible additional data types from OpenCL.

storage

Because most storage methods and address modes can be expressed as a series of more simple instructions, we can safely implement a wide variety of address modes as possible attributes with the confidence that our concepts can esily be implemented on any processor with enough storage space and processing power.

The exception to the above is any of the register based address modes. We can leave the vast majority (possibly all) of register assignment to the back end of the compiler. We can keep track of storage without worrying about eventual use of registers.

Cache control, virtual memory control.

operations

Data movement.

Add

Subtract

Multiply

Divide, Modula, Remainder.

“Commercial machines are usually deicient in their support of integer arithmetic. For example, they almost never produce the true quotient [x/y] and true remainder x mod y when x is negative or y is negative; they often throw away the upper half of a product. They don’t treat left and right shifts as strict equivalents of multiplication and division by powers of 2. Sometimes they do not implement division in hardware at all; and when they doo handle division, they usually assume that the upper half of the 128-bit dividend is zero. Such restrictions make high-precision calculations more difficult.”, Fascile 1, MMIX, Donald E. Knuth

“Commercial machines do not perform FINT [floating integer] and FREM [floating remainder] efficiently.”, Fascile 1, MMIX, Donald E. Knuth

Square root.

Negation.

Shift.

Rotate.

Compare and Test.

Conditional Set, Conditional Zero, Conditional Branch, Probable branch conditional.

Subroutine/Function Call, Return, Save/Restore.

AND (minimum), OR (maximum), XOR (exclusive or, add mod 2), AND-NOT, OR-NOT, NAND (not and), NOR (not or), Not Exclusie Or, multiplex, sideways add

Saturating subtraction, multiple or, multiple exclusive or.

“Commercial machines do not (yet?) have the powerful MOR [mutliple or] and MXOR [multiple exclusive or] operations. They usually have half a dozen or so ad hoc instructions that handle only the most common special cases of MOR.”, Fascile 1, MMIX, Donald E. Knuth

Stack operations. Push, pop.

Pre-operations. Preload data, prestore data, prefecth instructions, synchronize instructions and data, synchronize data, synchronize, compare and stop, load virtual translation status.

Interrupts. Trip, Trap.

Actions regarding special registers.

Input and Output. Fopen, Fclose, Fread, Fgets, Fgetws, Fwrite, Fputs, Fputws, Fseek, Ftell.

No operations.

lambda calculus

It may be useful to include lambda calculus and simply typed lambda calculus as part of an UNCOL. The biggest disadvantage is that lambda calaculus tends to not map very well to hardware. The biggest advantage is that lambda calaclus tends to map very well to software. Also there are existing high level languages that include at least parts of lambda calculus as part of the high level language (for example, Eiffel).

We need to support the concepts of the three basic lambda terms: free and bound variables, lambda abstraction, and application.

We need to support the three kinds of reductions, alpha (α-converson), beta (β-reduction), and eta (η-conversion).

It may be useful to include the some standard defintions and combinators, including S λx[λy[λz[xz(yz)]]] substitute-and-apply-operator, K λx[λy[x]] constant function, I λx[x] identity function, B λx[λy[λz[x(yz)]]], C λx[λy[λz[xzy]]] swap, T λx[λt[z]] truth value true, F λx[λy[y]] truth value false, ω λx[xx] self-application combinator, Ω ωω self-appllcation of the self-application combiantor, Y λƒ[(λx[ƒ(xx)])(λx[ƒ(xx)])] Curry’s paradoxical combinator, Θ (λx[λƒ[ƒ(xxƒ)]](λx[λƒ[ƒ(xxƒ)]]) Turings fixed-point combinator.

graphics processing

While a graphics processor can be forced to emulate a general purpose processor, there are many cases where that emulation would run orders of magnitude slower than a regular processor.

Still, there are a substantial number of non-graphics problems that lend themselves well to the use of graphics processors.

Having primitive graphics operations available is potentially very useful for UNCOL. Graphics processors can be used a a special co-processor or emulated on a regular processor.

Graphics processors usually support parallel addition, multiplciation, and transcendental fucntions. Graphics processors typically support a matrix of data as a stream.

Some graphics operations include (some are from OpenGL): map, reduce, stream filtering, scatter, gather, sort, swizzling, writemasking, texturing, dot product (3-component and 4-component), absolute value, add, multiply, multiply and add, subtract, compare, cosine with reduction, distance vector, exponential base 2, logarithm base 2, floor, maximum, minimum, fraction, compute light coefficients, linear interpolation, scalar power function, reciprocal, reciprocal square root, sine/cosine, sine with reduction, cross product.

quantum computing

While a quantum computer can be forced to emulate a general purpose processor, there are many cases where that emulation would run orders of magnitude slower than a regular processor.

Still, there are a substantial number of problems that lend themselves well to the use of quantum computers.

Having primitive quantum computing operations available is potentially very useful for UNCOL. Quantum computers can be used a a special co-processor. Emulation on a regular processor is impractical (the universe is too small).

We need the fundamental types of qubits and registers (arrays or vectors) of qubits.

We need the quantum gates of identity transformation (I), Haramard, Pauli-X, Pauli-Y, Pauli-Z, C_NOT, Toffili (CCNOT), Fredkin (CSWAP), phase shift (S), measure.

essential parts

The following are the essential parts for building UNCOL:

Testing It is important to have as much testing as possible before the UNCOL standard is published to the public because of the well-documented problem of being unable to remove mistakes from a programming language without breaking existing software written with the language mistake.

Analysis It is important to analyze as many existing languages and processors as possible, with an emphasis on the most popular and representatives as many different families as possible. Because I suggest an extensible version of UNCOL, I recommend being very conservative in adding any required UNCOL language features, while allowing ease of defining and building optional features.

Standard The most important part is the public standard for the UNCOL programming language, probably both binary and text versions. I propose that the standard be divided into required, optional, and library parts. I recommend following the example of the C programming language of minimizing official language features and moving as much as possible to options and libraries.

Required The required parts will be divided into two parts much like Forth, with low level primatives and higher level secondaries. The secondaries are defined from combinations of primatives and other secondaries.

Optional A mechanism for defining optional parts makes the UNCOL programming language extensible, allowing it to grow with new technology and knowledge.

Libraries Most features of high level languages should be moved to libraries written with required and optional features.

Formal definition Creating a formal definition of the language, while important, can be delayed.

Publishing The official language standard and the official definition of the UNCOL programming language need to be published.

Connections There must be at least one method of connecting to the UNCOL programming language. The original proposal recommended front ends for programming languages and back ends for processors. There are many different tools implied by this approach. These tools can be divided between those that are open source, those that are closed source but free, and those that are closed source and sold or leased for a profit.

Apps Portions of the UNCOL programming language could potentially be made available as stand alone apps (free). This allows for active public use that stress tests parts of UNCOL before the entire standard has been completed.

Artificial intelligence Applying artificial intelligence to the tools can produce much more interesting tools.

Drivers It would be very useful to release a subset of UNCOL that could be used for device drivers. This would allow hardware manufacturers to release an UNCOL-encoded version of their device drivers that could then be easily compiled or interpretted to run on any processor/operating system combination.

OSdata.com is used in more than 300 colleges and universities around the world

Read details here.

Some or all of the material on this web page appears in the
free downloadable college text book on computer programming.

Tweets by @osdata

A web site on dozens of operating systems simply can’t be maintained by one person. This is a cooperative effort. If you spot an error in fact, grammar, syntax, or spelling, or a broken link, or have additional information, commentary, or constructive criticism, please e-mail Milo. If you have any extra copies of docs, manuals, or other materials that can assist in accuracy and completeness, please send them to Milo, PO Box 1361, Tustin, CA, USA, 92781.

Click here for our privacy policy.


previous page	next page

home page

one level up

special topics

peer level

programming languages

Made with Macintosh

This web site handcrafted on Macintosh computers using Tom Bender’s Tex-Edit Plus and served using FreeBSD .

Names and logos of various OSs are trademarks of their respective owners.

Last Updated: May 10, 2013

Created: June 23, 2012


previous page	next page

OSdata.com

UNCOL Universal Computer Oriented Language

OSdata.com is used in more than 300 colleges and universities around the world

general

history

complexity

proven possible

suggested requirements

using existing languages

FORTRAN

C

disadvantages of C as IL

discussion: C as a distribution format

MLRISC, C--, JVM, x86

LISP

which one?

fully parenthesized notation

using traditional dialects of Lisp

Scheme-like object orientation

UML as UNCOL

ANDF

still useful concept

legacy code problem

a different approachmeta-assemblers

applying this idea

human readability

classical and quantum complete

concept and implementation

20 basic logical operationstruth tables

state machine

Hardware Defintion Language (HDL)

SR and derivatives

machine code

Knuth’s MMIX

data

supplementary

parse trees

building an actual UNCOL langauge

ADD (example)

AND (example)

data

storage

operations

lambda calculus

graphics processing

quantum computing

essential parts

OSdata.com is used in more than 300 colleges and universities around the world

home page

one level up

peer level

UNCOL
Universal Computer Oriented Language

a different approach
meta-assemblers

20 basic logical operations
truth tables