![]() music |
![]() | OSdata.com |
The problem of a universal intermediate language for computers and the problem of transmitting messages to aliens have a lot of overlap.
UNCOL was first proposed in the early 1950s and discussed in a paper published in 1958 by Melvin E. Conway. UNCOL stands for UNiversal Computer Oriented Language. The original intent was a universal intermediate language. The source code for all programming languages would be translated into the intermediate code and the intermediate code would then be translated into the machine code for all processors. This meant only one compiler back end per processor and only one compiler front end per programming language.
UNCOL was never successfully implemented, despite the fact that some of the most famous people in the history of computers have worked on the problem.
Several governments (including the U.S. Department of Defense) have attempted a solution. Several large businesses (including IBM, Digital, and HP) have attempted a solution.
This webpage discusses the history of UNCOL and attempt to solve UNCOL, then discusses the previous failed attempts, and ends with a proposal for a working version of UNCOL.
OSdata.com is used in more than 300 colleges and universities around the worldFind out how to get similar high web traffic and search engine placement. |
The description of UNCOL is an introduction for those who are new to the concept. The beginning of the discussion of the history of UNCOL should be accessible to most readers, even novices.
The discussion of past attempts and alternatives to UNCOL should be accessible to most readers.
The discussion of my proposal for a solution is written for those who are already qualified to understand the intricacies of my proposed solution. I wont spend a lot of time trying to explain things an expert in the field already knows. This will tend to leave out the interested novice, but my goal is to communicate a proposed solution in such a manner that experts in the field can evaluate my ideas and either confirm or disprove.
I will point out that the U.S. Congress changed the patent law so that it no longer matters who created an idea first. Instead, the only legal factor is who finishes the legal paperwork first. This was designed (by lobbyists) to guarantee that poor individuals would never again be allowed to patent anything and that large corporations would be free to steal any original ideas the poor discover.
So, with my publication of my solution, the race is on to see which large corporation will be the fastest to finish the legal paperwork to own my solution.
Even with a race to create the language, it should be obvious that for the language to receive widespread use, the language itself will almost certainly have to be available for free to anyone. Even Microsoft makes their proprietary languages available for anyone. The money is in the tools to support the language.
I will suggest that because there is now a race, there are significant advantages to a corporation having access to my years of study of this solution. I know a whole lot more than I can type out in a reasonable amount of time, and because I am poor, I will gladly accept very small amounts of pay.
UNCOL was originally proposed as an intermediate language to solve the problem of realistically making every programming language available on every processor.
If there are X number of programming languages and Y number of processors, then building a compiler for each combination of programming language and processor requires X·Y different programs.
If source code for each programming language can be translated into an UNCOL and UNCOL can be translated into object code for each processor, then the work is reduced to X+Y different programs.
This is an improvement that changes a multiplicative process into an additive process.
Unmentioned in the original example is that an UNCOL could potentially be used for many additional translation problems, possibly including device drivers, file formats, CODECs, APIs, etc.
Each new additional use of UNCOL raises the number of combinations factorially. When you look at additional combinations, you are potentially reducing factorial growth to linear growth. Factorial growth surpassed the number of particles in the observable universe (estimated at 10E+80) in just 59 combinations (1.386E+80). 59 combinations is a trivial number. Much smaller numbers of combinations exceed reasonable computing time.
Unmentioned in the original example is the idea of adding processors to the input language side of the problem. Transforming existing object code into UNCOL produces a partial solution to the legacy problem.
Unmentioned in the original example is the idea of adding programming languages to the output side of the problem. This more ambitious goal has been proposed under a variety of names. Transforming existing software from any arbitrary source language into any arbitrary target language also produces a partial solution to the legacy problem, as well as dramatically assisting multi-language projects.
history: UNCOL (UNiversal Computer Oriented Language) was proposed in 1954 or earlier.
UNCOL was first proposed in 1954 (or earlier) and first discussed in print in 1958 by Melvin E. Conway (Proposal for an UNCOL, Communications of the ACM 1:3:5). UNCOL stands for Universal Computer Oriented Language. The original intent was a universal intermediate language. The source code for all programming languages would be translated into the intermediate code and the intermediate code would then be translated into the machine code for all processors. This meant only one compiler back end per processor and only one compiler front end per programming language.
UNCOL was never successfully implemented, despite the fact that some of the most famous people in the history of computers have worked on the problem.
The good news is that although nobody succeeded at building UNCOL, they generally did create useful things, in some cases some of the most important contributions to software engineering. This is a field where failure at the main goal can result in great advances.
The most famous example would be Niklaus Emil Wirths invention of P-cocde and Pascal. For a while Pascal was the second most used programming language in the world (after C) and both Pascal and P-code have been highly influential even to this day.
UNCOL has repeatedly proven too complex for some of the best minds in computer science history.
From UNCOL to ANDF, Stavros Macrakis, June 29, 1993:
A striking feature of the UNCOL papers is the number of innovations that would have been need to make it work.
Since there was no standard character code, UNCOL would have had to define one. One proposal included over 500 characters thought useful for mathematical notation (including, e.g., black-letter subscripts). On the other hand, no thought was given to national language support.
Bootstrapping was a novel technique. Many critics of UNCOL could not understand how it would be possible to bring an UNCOL compiler up on a new machine. Thus, much of the text of UNCOL papers describes bootstrapping and defends its feasibility.
Inded, the very notion of an intermediate language for compilers (as opposed to ad hoc data formats to pass information between compiler passes) was rather novel.
At the same time, programming language technology was in its infancy. Concepts that we now take for granted were novel or even non-existent, such as records, pointer types, and for that matter data types in general.
UNCOL to ANDF, Stavros Macrakis, June 29, 1993
The appendix to Alan Turings famous 1936 paper On Computable Numbers, With an Application to the Entscheidungsproblem proved that Turing Machines were mathematically equivalent to Churchs Lamba Calculus by showing that Turings computability and Churchs effective calculability were transformations of each other. Turing followed up with a more rigorous proof in his 1937 paper Computability and λ-Definability. Steven Cole Kleene proved the equivalence of lambda calculus and Kurt Gödels 1934 recursive functions.
The mathematical methods for describing effective calculability are Turing machines, recursive functions, and λ-definable functions. Any of these methods can be transformed into any other.
All three methods describe every possible software program.
Lambda calculus is the basis for procedural and functional programming languages.
Anything that can be expressed in any conceivable programming language can be expressed in Turing machines, recursive functions, and λ-definable functions.
Anything that can be expressed in any Turing complete programming language can be expressed in any other Turing complete programming language. And any Turing complete language can express anything that can be expressed in any programming language that is not Turing complete.
The expressions may be exceedingly clumsy, such as trying to do complex math in COBOL or general expressions in BASIC.
Therefore, an UNCOL is mathematically possible (even if not practical), if for no other reason than any arbitrarily chosen programming language could be used as an UNCOL.
Some modern programming languages started with compilers that produced C or FORTRAN source code before moving on to natively generated code, effectively making the C and FORTRAN partial UNCOLs.
FORTRAN, C, and a few other languages have been proposed as an UNCOL. The problem is that every high level language has baggage that interferes with an effective UNCOL. The source and target languages often have severe mismatches in capabilities and even have incompatible implementations of the same or similar feature. Languages in the same family group are more likely to translate easily, but languages from entirely different groups become very difficult to synchronize.
The simplest true UNCOL would be either NAND or NOR. Both NAND and NOR are logically complete (all of Boolean algebra can be derived from either). The vast majority of modern digital computers are actually collections of NAND gates (some computers using alternative materials are collections of NOR gates). One can combine the set of NAND (or NOR) gates to create any arbitrary real computer with an array of flip-flops with pre-set values containing any arbitrary program or programs. Therefore, just NAND or just NOR is a true UNCOL capable of representing any program in any programming language.
The problem with either NAD or NOR as UNCOL is the incredible lack of efficiency. The transformations to convert a massive series of NAND gates describing a computer and a program are simply too time consuming to be of any practical use.
Therefore, the goal is not to find UNCOL (I already gave you two different valid UNCOLs), but to find an UNCOL that rests at a practical level for modern technology.
T. B. Steel, Jr., of System Development Corporation, Santa Monica, California, presented a paper entitled A first version of UNCOL in the May 9-11, 1961, western joint IRE-AIEE-ACM computer conference in which he proposed a few standards for a true UNCOL.
The following is the abstract describing his paper. The full paper can be purchased at ACM Digital Library.
Summary
UNCOL--UNiversal Computer Oriented Language--is being designed as an empirical, pragmatic aid to the solution of a fundamental problem of the digital data processing business: automated translation of programs from expressions in an ever increasing set of problem oriented languages into the machine languages of an expanding variety of digital processing devices. By application of a program called a generator, specific to a given problem language, program statements in this problem language are transformed into equivalent UNCOL statements, independent of any consideration of potential target computers. Subsequently, without regard to the identity of the original problem language, the UNCOL statement of the problem is processed by a program called a translator, which is specific to a given target computer, and the result is an expression of the original problem solution in the machine language of the desired processor. The advantage of this apparent complication over the current procedure of employing a program called a compiler for direct transformation from problem language to machine language is evident when one examines the number of languages and machines involved and the not inconsiderable expense of translation program construction. If there are M problem languages and N machines, then M+N compilers are required and only M+N generators and translators.
In order to arrive at sensible specifications for UNCOL, certain limitations in its scope are essential. Accordingly, UNCOL is designed to cope with only those problem language and machine language characteristics that can reasonably be expected to enjoy general use in the next decade. Any broader approach shows promise of leading to elegant, impractical results.
A glance at the preliminary specifications for UNCOL shows a language akin to a symbolic assembly language for a register-free, single address, indexed machine. The specific commands are few in number and simple in construction, depending on a special defining capability for the expression of more elaborate instructions. The data referencing scheme is complex, allowing the application of a signed index to the primary address, and permitting both the primary and index parts to be delegated to an indefinite level.
Each item of data, either input or calculated, must be described as to type, range precision and the like by special data descriptive syntactical sentences in the language. These descriptions, additionally, provide information concerning ultimate storage allocation as well as indicators of contextual meaning for the explicit commands.
Supplementary to the instructions and data descriptions are certain declarative sentences, inessential to the explicit statement of the problem solutions being translated, designed to provide information useful to the translator in the introduction of computational efficiency into the object program.
Introduction
One of the harsher facts of life in the data processing world, against which none interested in the effective operation of a digital computing installation dare wear blinders, is the existence of pressure toward diversity. Basic hardware always appears in a variety of shapes, sizes, and configurations. New machines arrive on the scene with disconcerting regularity. Since 1952 the number of different types of computers built each year has oscillated about a mean of thirty, and subsequent to 1955 better than sixty per cent of these machines have been commercially built, general purpose devices available to any and all having the inclination and the money.1, It follows that a problem of steadily increasing magnitude is found in the question of inter-machine translatability of programs.
At the same time, an expanding frontier of applications as well as growing sophistication of machine language has led to the proliferation of the "easy-to-code" problem oriented languages--POLs. Considerable difficulty is attendant to the employment of these languages in view of the effort necessary to maintain an adequate supply of current tools. It is well known that the task of producing translations from problem oriented languages to real machine languages is far from trivial. Until quite recently this translating was done by people--we call this "machine language programming." Now, however, with the exception of a few special cases not relevant to this thesis, the responsibility for performing these translations is being assumed by automation. As a result, requirements appear for processing programs-- compilers--that are both expensive and time consuming to produce.
This capital investment in a translation program would be well advised were the concern solely for one problem oriented language and one machine language. However, as indicated above, there is a large variety of machine and problem languages. Thus the required investment increases multiplicity--one compiler for each pairing of problem language and machine language. In addition, the first derivative of development in both machines and problem languages is sufficiently positive that there is a strong tendency toward obsolescence prior to use and the day is not yet with us when programmers will write compilers on lunch hour.
Several existing languages (especially C) have been proposed as an alternative to UNCOL.
It has been common for the first experimental attempts at a new language to use an existig language as an initial output target of a compiler. This works fine while the language is still trivial, but becomes a problem as the language matures. Eventually any serious language has to stop using C, FORTRAN, or other existing language as its output target.
From UNCOL to ANDF, Stavros Macrakis, June 29, 1993:
A different approach to intermediate languages for distribution is the use of existing programming languages.
The advantage is that they already have compilers on many machines, and if the source language is the same as the chosen distribution language, then the front end becomes trivial.
However, there are disadvantages both on the producer (front-end or preprocessor) and installer (back-end) sides.
The front end must translate source language semantics into the intermediate language. Sometimes, this is straightforward. But more often, there are hidden complications. For instance, languages treatment of loop termination conditions may be different. More subtly, languages treatment of variable values after error conditions may be different. Often, a language feature available in the source language is not available in the target, and must thus be emulated. Such an emulation may mean premature implementation decisions. Also, machine-dependent implementation techniques not used by the ILs compiler are not available to the source language translator. These complications make producer design more difficult, reduce the quality of the installed code, and favor the particular language over others.
The back end must translate the intermediate language into machine language. For the result to be predictable, the intermediate language must be fully specified in a portable way. This means not only specifying the semantics independently of the machine, but also providing a complete and portable set of environment inquiries, which can be used either at installation time or at runtime. Full specification and complete environment inquiries are found in few languages.
Many compilers do not handle program-generated code well. Program- generated code often contains peculiar constructs and often exceeds compilers capacity limits. For that matter, compilers are tuned to provide good code for typical hand-written programs, and not necessarily for translations of programs written in other languages.
Another important objection to using existing languages is the problem of reverse engineering. If protection of proprietary programming is required, the front end is no longer trivial even if the source language and the intermediate language are the same, since it must obscure or shroud the source code.
UNCOL to ANDF, Stavros Macrakis, June 29, 1993
From UNCOL to ANDF, by Stavros Macrakis, June 29, 1993:
Fortran has often been used in the past as an intermediate language because of its simple semantics, relatively good standardization, and wide availability. Nowadays it would usually not be considered because of its lack of such fundamental features as records and pointers and weak support for character manipulation.
But writing portable Fortran is difficult. Indeed, such major producers of portable Fortran as IMSL and NAG have extensive software toolkits to support the process (cf. [Boyle77]). Depending on such elaborate preprocessing defeats the purpose of a standard intermediate language.
UNCOL to ANDF, Stavros Macrakis, June 29, 1993
From UNCOL to ANDF, by Stavros Macrakis, June 29, 1993:
Many language implementations generate C as an IL, both for C extensions (e.g. C++) and other languages (Ada [Meridian], Cedar, Eiffel, Modula-3,1 Pascal [Bothe89], Sather [Lim91], Standard ML [Tarditi91]).
1. cf. the discussion about the use of C as an intermediate language for Modula-3 compilers on the comp.compilers newsgroup. David Chases remarks <1990Aug14.163258.2094@esegue.segue.- boston.ma.us> are particularly pertinent.
C is widely available and generally compiles into efficient code. Moreover, Cs low-level pointer constructs and weak typing allow easy emulation of many other languagesÕ constructs (e.g. passing parameters by reference).
C is a particularly appropriate intermediate language for prototyping (in the absence of a standard intermediate language). In prototype implementations, portability and standardization are often not issues. But experimentation shows that good efficiency requires careful choice of idiom in C to translate source language features.2
2. The Sather work shows this quite clearly.
disadvantages of C as IL
For an intermediate language, many areas of C semantics are implementation-dependent, that is, underdefined. For instance, there is no way of specifying integer or floating-point precision, nor for that matter of querying them.
Other areas are overdefined; for instance, parameter passage is always by value, although reference passage or copy-in/copy-out may be more appropriate on some architectures. It is possible to implement these other mechanisms explicitly, but that precludes the installer from choosing an implementation strategy as a function of the target architecture.
Cs data definition mechanisms are weak. There is no way, for instance, to declare a variant record with discriminant nor a variable-size array. They can of course be emulated, but this means committing to a particular implementation which may not be appropriate for the target machine.
Many constructs needed for other languages are missing from C. Lexical scope is missing. Exceptions are missing. Safe pointers (for garbage collection) are missing. All of these can be emulated in C, but only by committing to a particular runtime model.1 Such overspecification reduces efficiency. Also, invariants preserved by the emulation are unknown to the C compiler and thus cannot benefit optimization.
1. Consider the programming conventions required to support safe pointers in GNU Emacs Lisp.
For all of the above reasons, although C has been useful for prototyping extensions to itself (C++) and for producing code rapidly for other languages, C is not an ideal intermediate language.
discussion: C as a distribution format
As an intermediate language for languages other than itself, C is not very strong. C does have a specific advantages as a distribution format, however: if source code is distributed, the mechanisms for pre-processing and machine-dependent static quantities are handled with no additional effort.
Still, many of Cs disadvantages as an IL carry over to C as a distribution format.
In addition, the complete semantics of a C program are not specified by the C language; parameters to linkers and loaders may change symbol interpretations.
In principle, many of these objections could be overcome by re-engineering of compilers, careful definition of standard special-purpose datatypes (int_12, int_13, ...), standardization of linkers, and more flexible optimizers. But if standard C compilers cannot be used, why use the standard C language for a purpose for which it was not intended?
UNCOL to ANDF, Stavros Macrakis, June 29, 1993
From Compiler Design, by Wilhelm & Maurer, 2000:
UNCOL UNiversal Communication Oriented Language. In the 50s, computer scientists tried to solve the prolem that for m programming languages, and n machins that you want to run these languages on, you had to build m * n compilers. Huge efforts were expended to find a universal intermediate language, UNCOL. Thus you would need only m compilers to compile the source language to UNCOL, plus n compilers to compile from UNCOL to the native machine code.
Since then, this has been a holy grail for computer science, and effortss to find the UNCOL, in one form or another have been continuing. MLRISC and C-- are two such efforts (though they dont mention the infernal acronym). The Java Virtual Machine instructions may well be a candidate, having a lot of languages implemented to compile to it already, and others argue that the x86 instruction set is a de facto UNCOL.
From UNCOL to ANDF, by Stavros Macrakis, June 29, 1993:
Sometimes Lisp is suggested as a universal intermediate language. Indeed, ANDF has some superficial resemblances to Lisp.
which one?
There are apparently three different things meant when people speak of using Lisp as an intermediate language:
- Using a fully parenthesized notation.
- Using traditional dialects of Lisp.
- Using Scheme as an object-oriented language.
fully parenthesized notation
The major classes of intermediate languages are linear (triples, quadruples, tuples), tree-structured, and graph-structured. In all cases, a key decision is the actual operators used and their exact semantics.
ANDF is in fact a tree-structured language. Its operators have been carefully chosen to be architecture- and language- neutral.
Lisp is tree-structured as well, but its particular set of operations is specific to Lisp semantics. Replacing the operators with another set of operators would result in a different tree-structured language, not Lisp.
using traditional dialects of Lisp
Lisp, as a programming language developed for human use, shares several of Cs shortcomings as an intermediate language, in particular underspecification (implementation-dependence) for some constructs, and overspecification for others.
An example of underspecification is the lack of a machine-independent way of specifying the range of integers. As for overspecification, Lisp parameter passing is by value for certain primitive types and by sharing for composite types. Other languages may require different semantics. On a more practical level, no commercial Lisp compiler allocates composite objects (records, arrays) on the stack.
So using an existing dialect of Lisp would require radical reworking both of the language and of its compilers.
Scheme-like object orientation
Scheme used as an object-oriented language has several good properties: it is well-defined, it is abstract and thus uncommitted to a particular implementation style, it has very general control structure.
However, it has two fatal flaws: lack of concrete data typing facilities and requirement for garbage collection. Also, none of its implementations are as efficient as implementations of traditional sequential programming languages.
UNCOL to ANDF, Stavros Macrakis, June 29, 1993
for
From UML is UNCOL, Sunday, 16 March 2003:
Upcoming release of UML 2.0 will generate interest in model driven architecture or MDA. MDA has to be one of the most ambitious ideas from software engineering. It envisions development process as creationn and refinement of models expressed in UML. Architects will provide UML models as input to new tools that support UML 2, and the tools will generate code for implementation technologies like J2EE, or .Net. In other words, UML 2 and its tools are programming languages and their compilers taken to higher lvel of abstraction to bridge the gap between business and IT. Right model means quality software. UML is UNCOL.
against
From the 2003-09-13 Update to the above:
Dave Thomas UML - Unified or Universal Modeling Language? helps us build resistance to seduction of top-down mindset. Ted Neward also points out a few pitfalls of model-first mentality. We welcome readers models on the relation between this kind of ambitious deductive reasoning and confusion of nomenclature and phenomena.
Martin Fowler (who should get credit for making UML useful and accessible through his books and articles) argues that MDAs claim of platform independence is baseless.
ANDF has been suggested as an alternative to UNCOL. As recently as 2013, Wikipedia was quoting from a promotional paper on ANDF (also quoted on this web page) claiming that UNCOL was no longer needed and then strongly implying that ANDF was a true UNCOL solution.
UNCOL was an ambitious effort for the early 1960s. An attempt to solve the compiler-writing problem, it ultimately failed because language and compiler technology were not yet mature. In the 1970s, compiler-compilers ultimately contributed to solving the problem that UNCOL set itself: the economical production of compilers for new languages and new machines.
UNCOL is sometimes used as a generic term for the idea of a universal intermediate language. The Architecture Neutral Distribution Format is an example of an UNCOL in this sense.
The quoted paper (UNCOL to ANDF, by Stavros Macrakis) was intended as promotional material to convince the world that ANDF was going to be a reasonable substitute for UNCOL.
Contrary to Wikipedias claim, ANDF was a distribution format, not a language. Further, ANDF was not universal. ANDF was specificalyl limited to the standard flavors of UNIX available at the time. ANDF was only intended to allow for distribution of open source software across different underlying processors, but all running a POSIX-compatible version of UNIX.
From UNCOL to ANDF, by Stavros Macrakis, June 29, 1993:
UNCOL was an ambitious effort for the early 1960s. An attempt to solve the compiler-writing problem, it ultimately failed because language and compiler technology were not yet mature.
In the 1970s, compiler-compilers ultimately contributed to solving the problem that UNCOL set itself: the economical production of compilers for new languages and new machines. Although their intermediate languages were fairly invariant by language and machine, this was not their principal goal.
With the growth of open systems, distribution of portable programs became more important. Neither UNCOL nor compiler-compiler technology had addressed this issue. But a standard intermediate language would permit true open systems, where programmers could choose their language independently of the implementation platform, and hardware vendors could choose their hardware implementation independently of the installed base of program binaries.
Programming languages solved part of the problem: Fortran and C were often used as intermediate languages. But they did not solve it all: neither was really suitable as an intermediate language.
A closer analysis of the specific needs of portable programs showed that particular mechanisms were essential. By designing an intermediate language from scratch which took account of these requirements, ANDF succeeds as a universal intermediate language.
UNCOL was the first step in three decades work in software portability and compiler design whose culmination is ANDF.
UNCOL to ANDF, Stavros Macrakis, June 29, 1993
The ANDF project broke up shortly after the promotional paper was written.
ANDF started having problems expressing a wider range of high level language features and approaches.
Several of the companies that were part of the ANDF effort did continue their own indpendent efforts for their own systems, but within a few years all of those efforts shut down.
It is my personal claim that an UNCOL would still be extremely useful.
One particularly interesting and useful application would be for a universal driver system. Hardware manufacturers currently face the decision of how many and which operating system/processor combiantions to support when releasing device drivers for their new (or existing) hardware.
A specialized driver compiler tool could be made available for as an open source project (a project Id personally enjoy building). Hardware manufacturers could write their device drivers once for this universal driver software. Operating system creators (both proprietary and open source) could mount the tool on their system (and tweak the tool for the needs of their OS). End users could run the distributed device driver software through the tool for their oeprating system and then install the resulting object code into the appropriate location (thiese steps can be automated for the general user). The new device then works. Everybody is happy.
This would be one case where even the most secretive company would want the widest distribution of their device drivers and would be very happy to release the source code for their device drivers if it creates a low cost method for them to make their hardware available for any operating system/processor combination.
Also, a true working UNCOL is one of the obvious tools for building the more general purpose capability of automatically translating between high level languages, making life much easier for multi-language programming teams (whether the multiple languages are used simultaneously or serially because of changes of language through legacy code).
I claim that a different, but related project from the early days of computing offers a valid solution.
Until the 1980s to 1990s, assembly language was commonly used. This was because processors were so limited in speed and power, that it was often necessary to resort to assembly language to get realistic performance.
Well written hand-coded assembly language typically runs at least four times as fast the equivalent compiled version (regardless of which high level language is compiled).
Although there are still specialized uses for assembly language, it is almost never used because processors have gotten so fast that the inefficiencies of compiling from high level languages are irrelevant.
The reason I mention the history of assembly language was that there was also the idea of the meta-assembler, one generic assembly language that could be used as a substitute for learning a different assembly language for each processor.
It turned out that the final object code generated by any of the meta-assembler attempts were approximately of the same efficiency as the output of compilers for high level languages.
Therefore, the metaassemblers had no advantages over high level languages, but lacked the many benefits of high level languages. Kind of a worst possible combination of the disadvantages of assembly and high level languages without the benefits of either.
But now that the problem of efficiency has been solved by brute force power and speed, a generic asesmbler does have advantages as the basis for a valid solution to UNCOL.
Because all compilers eventually have to map the original high level language source code to some kind of executable (actual processor object code or specialized byte codes), there is no significantly different amount of work in the choice of the target output code.
Compiling to a general purpose generic assembly language is approximately the same amount of work as compiling to any one real processor or any one byte code. This is where the UNCOL advantage shines. Write one front end per high level language to this proposed UNCOL and then write one backend per real processor.
If we have a small set of basic operations, we have enough generic assembly to build any software in a practical manner. This small set will include such things as the basic logical operations (AND, OR, XOR, NOT, etc.), shift/rotates, branching, subroutines, integer add, integer subtract, direct memory/register access, indirection, and definition of data types from bit string components.
The goal in the small set of basic operations is true universality, without locking in any baggage from any specific real world processor.
Each of these basic operations would be generic versions. Attributes would modify the generic operation to lock it into specifics. As an example, the basic add instruction can have attributes that identify things like the size of the data being added (byte, word, longword, specific bit width, etc.), the format of the data (signed integer, unsigned integer, BCD, even floating point formats), and specialized options (such as saturation vs. wraparound, special signals for overflow or underflow, etc.).
When compiling from a high level language to UNCOL, we would only specify information that is required by either the high level programming language or by the software the programmer has written and leave as many decisions as possible for flexibility on the backend of the compiler to take best advantage of each processors unique capabilities.
More advanced generic assembly language operations can be built by writing routines from just the basic building blocks. Each of those routines can then also be used as a building block (giving extensibility in a method similar to Forth, but without an interpreter).
Once each of the advanced generic assembly language instructions has source code in the lower level basic generic instructions, then the authors of the backend of compilers have the option of diretcly applying the advancced generic instruction (especially in the case where it maps well to their target processor) or using the source code inline or as a subroutine or even modifying the standard source code for a version that better matches their particular target processor.
A good set of higher level assembly language constructs is extremely useful for converting from UNCOL to real processors. As one example, multiply is a common instruction on modern processors. While we can create correct code using shifts and adds, it makes a lot more sense to be able to easily identify the multiply and replace it with a single machine instruction than to insert a shift and add routine.
A complete set of advanced assembly language constructs are therefore an important part of an UNCOL solution. To guarantee universality, for every advanced assembly language construct we must have a corresponding library routine made of basic assembly constructs. This backup defines the meaning of the advanced assembly language construct and provides a reliable way to perform the advanced assembly language construct on a processor that lacks that particular instruction.
Interestingly, we can have multiple versions of library routines for a particular advanced asembly language construct. The extra versions allow for choice in which of the possible available solutions best maps to a particular processor.
The same approach for defining and building advanced assembly constructs can be used for high level language features and concepts.
For each high level language, each concept and feature can be mapped to any combination of the basic and advanced generic assembly language. We end up with a library of language specific operations. This follows the example of C placing I/O and many other operations that vary by end processor and operating system into standard library routines.
These higher level language specific operations are written exactly once per language, preserving the goals of UNCOL.
It would be useful to identify higher level concepts that are shared by several languages, even if the sharing is limited only to a family of languages. Some of these concepts can be turned into optional generic operations. This would speed up the writing of compilers for languages that are similar to other languages.
Past attempts at UNCOL have tried to create a programming language that humans can and will use to write software.
I claim that there is no need for any human to directly write in UNCOL. The language only needs to be written and read by software (compilers and assemblers).
Nonetheless, I think it would be useful for the UNCOL to be designed so that it is easy for humans to read.
UNCOL can be very verbose specifically for the purpose of making it easy for human programmers to read and understand, even though verbose programming languages are typically rejected by programmers (who prefer the least amount of typing). If there is no intent for any human programmers to directly write in UNCOL, we become free to be as verbose as is necesary.
Our real goal is to make sure that it is easy for a machine to translate from any arbitrary source language to UNCOL and to make sure it is easy for a machine to translate from UNCOL to any arbitrary target processor.
Because our goal is to facilitate machines rather than humans, we can ignore anything that is designed for a human to read or write UNCOL. Id like to preserve human readability, but that is a secondary goal.
That said, we do need to preserve the ability for a human creating either the front-end or back-end of a compiler to understand UNCOL well enough that he or she can create correct working translators.
Note that there will be an assumption that most optimization will occur after this UNCOL. The only optimizations that will already be performed are those that are required by the source language or the source code. This allows details of implementation for a specific processor to take into account the capabilities and features of the target processor.
The basic syntax for most assemblers is either [label op code options destination(s) and source(s)] or [label directive options].
label: opcode.x dest, src
label: directive options
Specific examples:
startofproc: AND.B D1, (A3)
The above pseudo-assembly is an instruction to bit-wise logically AND the contents of the D1 register with the contents of the memory location pointed to by the A3 register. The instruction has the label startofproc.
procstorage: db 20
The above pseudo-assembly is a directive to set aside 20 bytes of memory and assign the label procstorage. to the start of the memory block.
The proposed UNCOL solution uses a generic pseudo-assembler, but with a few key differences.
name spaces In addition to the namespace for the actual text labels, every single line will have a hidden internal label. These two name spaces will be kept separate to prevent any clashes in label names. The textual namespace is primarily used to preserve the names from the original source code.
generic opcodes To preserve universality, the proposed UNCOLs opcodes are as generic as possible. That means, as an example, we can have a generic ADD instruction and leave implementation details for later. The ADD might be an integer addition. It might be a floating point addition. It might be a saturating addition. Unless these details are specified by the programming language (such as BASIC doing all math in floating point) or by the source code (such as an integer type declaration), the choice will be left for code generation.
The opcodes for this proposed UNCOL have not yet been determined. Much of the rest of the material is my own notes on what might be included.
size and format Typically assembly language places indicattions of data format (integer, bit, float, etc.) and size (byte, word, long, single, double, etc.) after a period that immediately after the opcode.
Again, we will attempt to defer all decisions for later optimization and code generation.
We will provide a set of size and format codes, from the rather generic (such as integer or float) to the very specific (such as 32 bit signed twos complement integer).
In a real assembler, the opcodes are matched to specific machine instructions and therefore may often imply additional information that assemblers rarely reflect. For example, the choice between normal and saturating addition, or even whether or not the addition will generate a carry, is rarely included in the size and format specification. In the proposed UNCOL, we cant make this kind of assumption because we dont yet know the exact capabillities and instruction set of the eventual intended target processor, so we will try to stay generic and offer the ability to designate a specific choice when it is required by either the source language or the source code.
We also want to include the ability to designate new formats and sizes so that the UNCOL remains useful into the future.
flags Real assemblers typically leave the assignment and reading of flags as implied operations. Each target processor will have its own unique variations of how each instruction sets, clears, and reads flags and even which kinds of flags are available.
From the point of view of the proposed UNCOL, we are concerned about tests that will eventually be made on the results of specific operations.
Therefore, we need to include a method to specifically designate that an operation needs to save a flag-like result from an operation and then match that up with the operations (usually branches) that will make use of that information.
sources/destinations Operations usually have one or more sources and one or more (usually only one) destination. These are the equivalent of inputs and outputs.
In a real assembly language the sources or destinations can be implied by the instruction. With the proposed UNCOL, we always need to specifiy every source and destination.
Some sources or destinations can be unordered (such as additions or multiplciations), while other sources or destinations have very specific ordering (such as subtractions or divisions [dividend and divisor], or even the pair of quotient and remainder). The proposed UNCOL needs a method for indicating ordered or unordered and the specifics of any order.
Some instructions inherently have a specific number of sources (arity), such as the unary operation of NOT, the binary operation of XOR, or the ternary conditional operator. Some instructions can have a variable and unlimited number of sources (such as AND). Typically, the outputs of an operation can be used as the source for any number of outputs. The proposed UNCOL needs a method to specify these choices.
evaluation Closely related is the subject of evaluation.
We just mentioned that the logical AND can have any number of inputs. The moment one of the inputs is FALSE, we can stop evaluating any additional inputs, because FALSE has become the only possible result. This is short circuit evaluation. A programmer might want a full evaluation because of the desire to guarantee that all side effects must be implemented. Even with short circuit evaluation, there is the question of order, because some programmers rely on short circuit evaluation to prevent some operations or side effects. And there is the option of lazy evaluation. The proposed UNCOL must include a method for designating which option is used.
address modes Assembly languages for CISC processors often include a large number of address modes, while assembly languages for RISC processors often include just LOAD and STORE, with all other operations being register based.
During the transition from CISC to RISC it became obvious that any of the CISC address modes could be simulated with explicit address computations.
A simple logical view of address modes consists of direct and indirect (pointer) operations.
registers The number and nature of registers varies greatly by target processor. In general, we should leave register assignment to the back end of the compiler.
Even so, there will be occassions when it is essential to specify registers, especially special purpose registers of a processor or even special purpose registers from memory mapped external chips or devices.
syntactic sugar It is often useful to include syntactic sugar.
For example, XOR of anything with itself is the same as CLEAR to zero. Some processors actually have both an XOR and a CLEAR instruction. Some processors only have the XOR instruction, but provide the CLEAR as an alias in the assembly language for the convenience of the programmer.
The biggest advantage of including syntactic sugar is that the intentions of the programmer as preserved, which on occassion might be useful in replacing a block of code with a more effective method for a specific processor.
concepts It will often be useful for code generation to know the conceptual intent of the programmer rather than slaishly following the programmers implementation. This is especially true in cases where the programmer is pushing against the limits of the programming language to achieve functionality that the programming language wasnt designed to handle.
For example, there are many processors that have the ability to handle stacks in hardware. If we know that the programmers intent is to perform a stack operation, but the programmer chose a predecrement/postincrement pair and the processor works better with a preincrement/postdecrement pair, then we want to be able to modify the code to take advantage of the target hardware.
Another example would be operations on trees. If we know the programmer intent, we can replace the programmers specific implementation with an alternative implementation that makes more sense for a specific target processor.
My overall approach is to build a set of operations (instructions/transformations/commands/directives), signals, and other structures needed for a practical UNCOL.
I will describe the method needed to formally describe every part of the UNCOL language, but wont actually write out the formal descriptions (a very time conusming task).
This is actually not a bad thing, because the back end code generation will use the basic concepts rather than some formal definition. While the formal descriptions/defintions give a certainty to meaning, they arent used in working code.
Once again, NOR is a valid UNCOL. NAND is a valid UNCOL. Our goal is to add enough additional constructs to make UNCOL of practical value. Because our goal is to be practical, we can add constructs incrementally as needed. We should be able to gather the low hanging fruit quickly (as all other attempts have done), but we have a valid method for extending the language to meet any new challenge.
Our language constructs should be as general as possible and will have a resemblance to a very generic assembly language. Because all languages eventually have to be translated itno some machine code, all of the challenges have already been met with traditional tools. These tools are well-known to those working in the field.
The universality part of UNCOL requires that we have every possible programming language in the set of source languages. Assembly languages are programming languages and by definition are part of our source set.
Therefore, if our generic assembly-like language is valid, then we can convert any arbitrary real assembly language for any real machine into our generic assembly language. This is a trueism (we havent yet proven we have valid generality), but I want to emphasize the obvious.
We can encapsulate higher level concepts with a set of generic assembly instructions. The set of generic assembly instructions needed to implement a particular concept can be named as a new UNCOL construct.
Backing up named UNCOL constructs with a specific set of generic assembly instructions gives us a formal definition of the meaning of that construct as well as a temporary working implementation for use when adding a new back end for a new processor. Ideally, the back end compiler writer will replace that generic version with optimizations appropriate for a specific processor.
We want to maximize separation between concepts and implementation for a number of reasons, including the importaant cases of unusual languages or processors.
Many details will be moved to attributes of an UNCOL construct. For example, we will specify a generic ADD operation, but unlike real assembly languages, we wont create a huge set of difefrent ADD instructions for different sizes of data (byte, word, long short, etc.), different encoding (unsigned, signed, floating point, etc.), or special cases (such as the choice between wrap-around and saturating). When these details are specified in the source code or as an inherent part of the source programming language, we will record these details, but any details that are unrecorded are left to be determined based on the characteristics of the target processor.
The set of generic assembly language instructions needed to encapsulate any special cases of any arbitrary machine instruction set can be used as an in-line substitute to add those specialized capabilities to any other processor. This solves the problem of special cases in processor capabilities (with the caveat that the solution should also be practical, not merely possible).
We can use the same approach to encapsulate any special features of any arbitrary high level programming language. Again, we establish possibility, but leave open the question of practicality.
We want to look for commonly repeated high level language features and extract the commonality to create new commands (or other constructs) in the official UNCOL standard. This not only simplifies the expression of ideas, it leaves more flexibility for back end compiler writers because they can identify the basic concept and implement in the method best for that particular processor rather than using the implementation specified by a particular set of generic assembly instructions.
We may add official UNCOL constructs that initially are useful for only a single source programming language, but we want to avoid this as much as possible.
I want to emphasize the difference between concept and implementation.
The most basic concepts needed for a successful UNCOL are the equivalents of the basic logical operations (AND, OR, XOR, NOT, etc.), shift/rotates, branching, subroutines, integer add, integer subtract, direct memory/register access, indirection, and definition of data types from bit string components.
Everything else can be built up from these basic components.
Just as the C programming language simplified things by taking I/O and other common operations and moving them out of the language itself to a library subroutines, we can define useful higher level concepts in the set of minimal basic concepts.
In the worst case, we simply replace a concept with the library implementation. At one time this would be too inefficient, but with modern processors, this is no longer an impediment.
Using another example, Apple created the Standard Apple Numerics Engine (SANE) to provide a rich set of floating point operations to the 6502 and original 68000 processors, neither of which had hardware floating point. Later when the IEEE standard for floating point became common on processors, Apple abandoned SANE and made use of the hardware floating point.
In the worst case, we can create a concept-implementation pair for each feature of each high level language. Ideally we find the commonality between multiple languages, but this is not a requirement for success. The implementation in low level generic constructs can be used to create working object code for any target processor. This work need only be done once for each high level langauge.
By maintaining the concept-implementation sepeartion, we leave open the possibility for the back end of the compiler to use custom features of the target procesor for a better or more efficient implementation.
I noticed that the problem of UNCOL is actually a subset of the problem of the universal message for communicating with intelligent alien species. I am not asserting alien species exist or do not exist. The problem of how to communicate with them is a valid problem independent of their actual existence or lack of existence.
All four of the major past attempts to send a message from humans into space (three from NASA and one form the EU-Russia) have started with a series of binary encoded prime numbers.
There is widespread agreement that the first successful communications with aliens will be in some mathematical language.
My observation is that the header for that message (the portion that mathematically establishes the basis for communications with aliens) is essentially the same as a (one of several possible) formal definitions of UNCOL.
The only key difference is that with UNCOL, we already have alternative forms of communications (existing languages and common artwork) that make it easier to communicate the language. Otherwise, they are the same problem.
My proposed formal definition of UNCOL would start with simple logic gates (plus a few other key signals) and build those up into components that define the basic operations, syntax, and other characteristics of my proposed version of UNCOL.
My definition would include basic building blocks for higher level constructs, including Boolean algebra, Turing machines, and lambda calculus.
A language for messages to aliens would include those same constructs, plus at least two Turing machiens (one emiting binary one third and one emiting pi, as examples to show that the language has been correctly understood) and a database system to store the actual message. Maybe Larry Ellison of Oracle wants to donate the source for MySQL to the project?
The following is an outline of how to build a true UNCOL.
Because of the similarity to the universal message to aliens problem, I will also make reference to the solution to that problem.
As previously, mentioned, the problem of a universal computer oriented language and the problem of a universal message format for communicating with aliens happen to have a huge amount of overlap.
This overlap is most apparent when attempting to formally define my proposed UNCOL. The methods for formally defining my proposed UNCOL happen to be a subset of the methods for establishing a clear mathematical language for communicating with aliens. The major differences are that it would be would be useful in the formal definition of UNCOL to include additional natural language commentary and a text-based presentation format (as contrasted with the pure binary presentation prefered for communicating with aliens).
This web page will outline an informal definition of the UNCOL programming language.
A message to aliens would include at leaast two parts: a definition of a universal language and a message. A program encoded in UNCOL could include a header that included the formal defintiion of UNCOL followed by the program. Distributing just the software program intermediate UNCOL code would imply the formal definition. Actually including the formal definition would differentiate between versions of the UNCOL over time.
I have noticed that some of the scientists working on the universal language for aliens start with an attempt to present standard algebraic notation in the stated belief that equality is the basic mathematical concept.
I will point out the inherent ambiguity of standard algebraic (infix) notation.
What is the answer to 1 + 2 * 3?
If we evaluate from left to right, we add one plus two, equals three, and then multply by three (three tmes three), with a final answer of nine (9).
If we evaluate with standard algebraic precedence of operators (multiplcation/division higher priortiy than addition/subtraction), we first multiple two times three, equals six, and then add one (onne plus six), with a final answer of seven (7).
We should not provide aliens with this ambiguity right at the start of any message.
In mathematics, this problem is solved through the use of either prefix or postfix notation. Postfix notation (RPN) is commonly used by engineers and extensively used in computer programming because it fits so well with the concept of a push-down stack.
The first goal is to establish that this is an intelligent message and not a naturally occurring phenomenon.
It is my claim that the complex nature of the overall signal combined with the repeated pattern of counting integers will clearly distinguish this as an intelligent message worthy of attempts to decode.
I am not too concerned with the carrier for the message (such as radio waves or what kinds of radio waves), but rather the content of the mesage itself.
Once the aliens have distinguished that they have received a message from an intelligent species, the goal is to have a minimum of detective work before the entire message becomes self-decoding.
It is important that an alien be able to accurately guess the location of the start of the message, because that contains the keys for decoding the entire message.
| prime numbers | |
|---|---|
| decimal | binary |
| 2 | 10 |
| 3 | 11 |
| 5 | 101 |
| 7 | 111 |
| 11 | 1011 |
| 13 | 1101 |
| 17 | 10001 |
| 19 | 10011 |
| 23 | 10111 |
| 29 | 11101 |
| 31 | 11111 |
| 37 | 100101 |
| 41 | 101001 |
| 43 | 101011 |
| 47 | 101111 |
| 53 | 110101 |
| 59 | 111011 |
| 61 | 111101 |
| 67 | 1000011 |
| 71 | 1000111 |
The first part of the message is counting. The obvious candidates are binary (advantage of simplicity and compatibility with the vast majority of modern computers), balanced ternary (advantage of one unique representation for every number and used on Zuses first computers), or quater-imaginary (advantage of representing complex numbers, not just real numbers).
I am going to assume binary for the sake of discussion, but this material will work in the other number systems. Any pair of different signals (preferably opposites) will work for the two binary digits, allowing messages in many different forms.
The message starts by counting. The two obvious sequences are to count every integer or to count by prime numbers.
I am recommending the use of counting by integer as a separator between logical blocks of information, as this is the easiest pattern to recognize and would not be likely to be part of the message.
I am recommending that the counting by prime numbers be the marker for the start of message. This is an easy pattern to recognize and would not be likely to be part of the message. It also distinguishes itself from the more common counting by integers as a separator.
These are two parts of the message that dont self-decode, so it is essential that they be easy to spot and solve on their own. I think both patterns are easy to recognize.
| 0000 | 0001 | 0010 | 0011 |
| 0100 | 0101 | 0110 | 0111 |
| 1000 | 1001 | 1010 | 1011 |
| 1100 | 1101 | 1110 | 1111 |
All four major attempts to send a message to aliens have started with a series of prime numbers. That series clearly stands out from background noise and clearly identifies a message originating from an intelligent species. The previous four attempts to send a message to aliens have used the binary number system. This is the most simple system. The series of prime numbers clearly identifies which is the 1 bit and which is the 0 bit.
The binary bit pattern of the first prime numbers is shown in the table to the right.
See also a discussion of prime numbers.
The second part of the message is the key to decoding the rest of the message. Unfortunately, the future aliens will have to figure this one out on their own.
The prime numbers would be followed by truth tables, another simple binary system. There are four binary truth tables (always true, logical negation, logical identity, always false) and 16 binary truth tables (tautalogy TRUE, contradicton FALSE, logical conjunction AND, nand, logical disjunction OR, nor, exclusive disjunction XOR, xnor, A inplies B, B implies A, A doesnt imply B, B doesnt imply A, A, B, not A, not B).
There are four possible logical operations on a single bit (unary connectives): always true (ignoring its initial value), stay the same, NOT (inverting the bit), and always false (ignoring the initial value).
There are 16 possible logical operations on two bits (binary connectives or dyadic connectives). two of these (always true and always false) are repeated from the set of single bit operations. The 16 basic dyadic connectives are:
Presenting the 20 truth tables in a series will imply a numbering for each operation. Explicitly naming and numbering each truth table will help in establishing communications with an alien species because it will give meanings to selected words that can then be used to decode additional words in a message.
As already mentioned, both Charles Sanders Pierce and Henry Sheffer proved that both NOR is functionally complete and NAND is functionally complete. The other operations simplify the building of messages.
Technically, we have everything needed for a true UNCOL. We combine these 20 basic operations to make more advanced operations until we build programs.
We can assign a number to each use of one of these 20 basic operations and
The group of AND, OR, and NOT are functionally complete.
Both NAND and NOR are functionally complete all by themselves.
All logic circuits (and therefore all binary digital computers and all software for binary digital computers) can be described with NANDs. The same is true for NORs. This makes them Turing complete.
These 20 operations (and some variations) happen to be the lowest of eight levels of my proposed Universal Computer Oriented Language and can be used to unambigiously define all of the other levels.
I propose 20 consectuive records.
Each record will show the truth table for one of the 20 basic logic operations.
The records can be presented in any order as long as every record is included. I do suggest providing all of the unary connectives before the binary conenctives to make decoding easier.
My proposed sequence in each record is: the input state of the bit or pair of bits, the resulting output state, a unique name, and a record terminator.
The hardest part of decoding is realizing that one is seeing a set of truth tables.
The existence of 20 record terminators should help identify what is happening. If the names are of fixed length, the additional clue of two kinds of groups, one of four and one of 16 should also help identify what is happening.
Human readable alternative names for these 20 operations can be added at a later stage in the message. This step is important for human use of the Universal Computer Oriented Language, but unnecessary for future aliens.
My proposal to this point is: count by prime numbers (to establish start of message), count by integers (separator), truth tables for 20 basic logic operations (to establish key to decode message), count by integers (separator).
The next step is a demonstration of the use of the key which will allow the future aliens to confirm that they have correctly decoded the key.
My proposal is to then encode two different Turing machines as a method for aliens to confirm that they have correctly decoded the message so far.
Turing machines (proposed by Alan Turing) are the simplest complete computing machines.
Early in Turings paper he shows an example of the most simple Turing machine possible, the Turing machine that emits the binary fraction one third (1/3). This is a simple alternating sequence of zero (0) and one (1), which can also be generated by setting a clock signal as the input to a toggle flip flop. [.0101010101 ]
3. Examples of computing machines
1. A machine can be constructed to computer the sequence 010101 . . . .
The machine is to have the four m-configurations b, c, k, e
and is capable of printing 0 and 1. The behaviour of the machine is
described in the following table in which R means the machine moves
so that it scans immediately on the right of the one it was
scanning previously Similarly for L. E means the scanned
symbol is erased and P stands for prints.
This table (and all
succeeding tables of the same kind) is to be understood to mean that for
a configuration described in the first two columns the operations in the
third column are carried out successfully, and the machine then goes over
into the m-configuration described in the last column.
When the second
column is left blank, it is understood that the behaviour of the third and
fourth columns applies for any symbol and for no sumbol.
The machine
starts in the m-configuration b with a blank tape.
Configuration Behavious m-config. symbol operations final m-config. b None P0, R c c None R e e None P1, R k k None R b If (contrary to the description in § 1) we allow the letters L, R to appear
more than once in the operations column we can simplify the table
considerably.
m-config. symbol operations final m-config. b None P0 b b 0 R, R, P1 b b 1 R, R, P0 b
Note that an equally simple Turing machine is one that emits the binary fraction two thirds (2/3), which is the alternating sequence of one (1) and zero (0). [.1010101010 ]
If we ignore the initial starting state of the toggle flip flop (the initial starting state will be determined by a race condition in a real world circuit), then the description will produce one of the two different Turing machines.
Although one could build this simple machine with just NAND, requiring the use of every one of the 20 logical connectives helps the future alien guarantee that they have correctly identified all of the elements of the key.
The future aliens do not have to run the Turing machine for too many digits to confirm that they have correctly used the key.
Alan Turing immediately follows the first Turing machine that emits a rational number with a slightly more difficult machine that immediately jumps to an irrational number (which might be transcendental).
II. As a slightly more difficult example we can construct a machine to
compute the sequence 001011011101111011111.....
After introducing subroutines and more sophisticated logic circuits, I propose a second confirmation message of the description of a Turing machine needed to emit the digits of pi.
Pi is an easily recognized number and a universal constant that transcends culture, the speed of light, and even changes in the laws of chemistry (which are expected in the future onn the scale we are considering).
The future aliens can, of course, run the simple Turing machine for as long as they want, producing as many digits of pi as they desire.
At some point the future aliens will be reasonablly convinced that they have correctly decoded the key.
The Turing machine that creates the digits of pi is not defined using Turings notation because there is no common language established this early in the message for sharing Turings notation. A future alien is far more likely to identify simple truth tables than Turings rather obscure notation.
The next few sections of the message define the tools needed to decode the entire message.
The message so far: counting by prime numbers (identify start), count by integers (separator), 20 logic operations (key and lowest level of the UNCOL), count by integers (separator), Turing machine that produces pi (confirmation of key), and count by integers (separator).
There are a few other additional basic signals that should receive a name and number. Some of these items are not really signals, but I am using that name to identify this group.
Input. Not that some operations require exactly one or exactly two inputs, while some operations can have an infinite number of possible inputs (such as AND, OR, NAND, NOR).
Output. Operations can have zero or more (infinite) outputs.
Connection.
Clock.
Repeat. A signal to indicate that we can repeat items as needed gives a great flexibility, including the ability to define inifinite operations. This can remove questions of exact size.
Group. A matching pair of nestable grouping signals.
We can identify specific uses of these signals by assigning them a unique number (and optionally, a name).
For those jumping to see the proposed UNCOL solution, be aware that I am also simultaneously defining a method for a universal message to aliens.
The next step is to define the other layers of my proposed Universal Computer Oriented Language (UNCOL).
Because the NAND and the NOR are both functionally complete, we didnt actually need to define any other operations.
Defining the entire 20 operations (the four unary logical operations and the 16 binary logical operations) should help make it easier for a future alien to identify the key.
Defining the entire 20 operations also gives a flexibility in expression that makes it easier to write software in the UNCOL.
Defining additional layers of the UNCOL is unnecessary from a minimalistic viewpoint because we are already Turing complete.
Adding higher levels of the UNCOL means that the message can be greatly compacted.
Adding higher levels of the UNCOL means that the language is much easier for humans to read and understand.
Because any Turing complete language is fucntionally equivalent to any other Turing complete language (as well as functionally equivalent with both lambda calculus and general recursive functions), we can use the 20 basic logical operations to define any other Turing complete system.
Each of the layers of my proposed Universal Computer Oriented Language are Turing complete.
Instead of defining every layer in terms of the lowest layer (20 basic logic operations), I propose defining each layer in terms of the next lower layer (occassionally supplemented with other lower level operations as needed for clarity).
This approach will make the layers easier for a human to understand and will make the message to future aliens much more compact (and might also help aliens understand what they are decoding).
Anyone with a background in circuit design will see that we now have the basic tools for a text based description of digital circuits or switching circuits.
Anyone with a mathematics background will see that we now have the basic tools for a test based description of a variety of mathmatical models.
Either approach can be used to create descriptions of software.
We can string together the identifying numbers for specific instances of the 20 basic operations, inputs, outputs, and connections to create logic circuits and subroutines. We can even define all existing and all possible binary digital computers.
Each collection can also be uniquely identified by a number (and optionally a name). We can then thread collections together to make more advanced capabilities and operations.
These higher level subroutines or capabilities or operations make it easier to create a practical version of UNCOL.
We need to actually specify how these high level operations are created for both the header of a message to aliens and for the formal definition of the UNCOL language.
We can skip the formal definitions for an informal presentation that is sufficient for both an informal outline and to define a practical working UNCOL.
The most obvious simple collection or subroutine is the flip-flop. It is easy to define all of the basic flip-flop types.
There are common kinds of simple circuits, such as adders, shifters, and multipliers. Each of these simple curcuits can be defined.
In addition to collections of these simple circuits into larger circuits, with the use of a special signal to indicate repeat as needed, we can also define generic, extensible versions of these simple circuits.
Logic gates.
Schmitt triggers, oscillators.
Flip-flops.
Registers.
Counters.
Adders.
Shifters.
Multiplexers.
State machines.
State machines are fairly common. It would be useful to include compiler tools to identify when high language constructs are being used to create state machines and then translate those portions of high level code into special state machine descriptors in UNCOL. This will allow greater flexibility to the back end of the compiler.
Note that it was proven in the 1960s that a Turing machine with just two internal states and just two symbols (the simplest possible Turing machine) is not universal (Turing complete or able to perform any calculation), although it is sufficient for producing either one third or two thirds. Marvin Minsky of M.I.T. constructed a universal Turing machine (which is Turing complete) with seven states and four symbols. In the 1990s, mathematician Stephen Wolfram (developer of Mathematica) created a universal Turing machine wih two states and five symbols. In his 2002 book A New Kind of Science, Wolfram proposed a Turing machine with two states and three symbols, known as the Wolfram 2,3, but could not prove that it was Turing complete. Alex Smith, a student at the University of Birmingham, England, published a 50 page mathematical proof that the Wolfram 2,3 is universal. That proof is available at www.wolframscience.com/prizes/tm23/
While we are at this low level, I will just mention that we want to freely and liberally raid both VHDL and Verilog, as they are both robust existing languages for handling these kinds of features. We are concerned with the underlying concepts as contrasted with the syntax of the two hardware description languages.
Both VHDL and Verilog have some higher level constructs. We arent interested in those.
The primary target level of this proposed UNCOL is at a higher lvel than hardware description. Again, our primary target is the equivalent of a meta-assembler.
Nonetheless, reference to both VHDL and Verilog are important for both completeness (true universality) and for ideas on how we can build up a formal definition of this proposed UNCOL.
Some of the topics to include (at the moment, these are more of laundry lists of things to consider rather than a real list, monstly so I wont forget things as I continue work:
Verilog: posedge, negedge, initial, always, forever, resgiters, nets (wire, wand, word, tri0, supply0, trireg, tri, triand, trior, tril, supplyl), blocking assignment, nonblocking assignment, gate types, MOS< bidirectional switches, gate delays, fork, join, bitwise operators, logical operators, reduction operators, arithmetic operators (twos complement), relational operators, shift operators, concatentation operators, replication operators, conditional operators, four-valued logic (0, 1, Z, X). We are not concerned about system tasks other than $time.
VHDL: levels of abstraction (behavioral, structural, physical), mode (in, out, buffer, inout), type (bit, bit_vector, boolean, character, integer, natural, positive, real, time, std_logic, std_ulogic, std_logic_vector, std_ulogic_vector), nine level logic (U, X, 0, 1, Z, W, L, H -), generic, architecture, behavioral, structural, concurrency, component, signals, enumerated types, arrays (increasing and decreasing indexes), type conversion, attributes (signal, scalar, array), operators (logical, relational, shift, addition, unary, multiplying, misc), process (sensitivity list, if, case, loop, next, exit, when, while loop, for loop, wait, wait until, wait for, wait on), null, dataflow, concurrent signal assignment, conditional signal assignment, selected signal assignment, component declaration, component instantiation and interconnections.
The SR (Synchronizing Resources) language was a research language that had both a very concise syntax and support for the major methods of concurrency, features that will make it ripe for raiding.
Some features (again a laundry list for future work): resources, globals, operations, procs, procedures, processes, co statements, local procedure calls, remote procedure call, call.send.forward invocations, receive and input statements, rendevous, message passing, dynamic process creation, multicast, semaphores, shared memory, virtual machines.
The MPD language is a variation of SR that uses the same intermediate form and run-time system as SR.
The JR programming language brings SRs concurrency model to Java.
The FT-SR programming language is a derivative designed for constructing fault-tolerant distributed systems.
I want to emphasize that SR and its derivatives are at a higher level than this proposed UNCOL. The goal is to make sure that the concepts and capabilities of SR and its derivatives are supported, not to use SR as a model for syntax.
Key to this proposal is the use of the ideas of a pseudo assembler and overloading of concepts.
Decades ago I was one of the programmers ridiculing the idea of a universal assembler (a generic assembler that then gets translated into a real assembler). At the time there was the problem that converting from a universal assmebler to a specific real assembler introduced the same inefficiencies as the use of a high level language, but failed to provide the ease of use of a high level language. The universal assembler combined the disadvantages of the two approaches and cut out the advantages of each.
But we have long since passed the point where the inefficiencies of a high level language are so trivial as to not matter. The universal assembler has become a reasonable choice and works great as the basis for an UNCOL.
My proposal is to have a pseudo assembler. Like the more general case of pseudo code, there is no intent to fix the code to all of the specific details, but rather to record the logical steps. The details can then be filled in by the back end of the compiler.
Another problem with the attempts at a universal assembler were the question of overspecification or underspecification. If the universal assembler was limited to a very small set of basic instructions, then the translation to processors with more advanced features will often be blocked from use because it will not be easy to identify the higher level concepts. If the universal assembler used a large set of instructions, then the translation to less powerful processors will suffer.
I propose overloading concepts, by including both an implementation using a very small set of instructions and the marking of groups of instructions that are an implementation of higher level concepts. Where appropriate, the back end of the compiler can use the higher level concept markings as the source of translation rather than the more detailed version specified with the small set of low level instructions.
This overloading approach also avoids the problem of a large number of special case concepts becoming essentially just a different presentation of all of the high level languages. Simply moving the problem to a new location (from a bunch of high level languages to a bunch of concepts in a single language) is an illusion, not an actual solution. Having both the higher level concepts and low level implementations means that a back end translator can always use the low level implementation and only use those higher level concepts that are reasonably appropriate for the target processor.
A glance at the preliminary specifications for UNCOL shows a language akin to a symbolic assembly language for a register-free, single address, indexed machine. The specific commands are few in number and simple in construction, depending on a special defining capability for the expression of more elaborate instructions. The data referencing scheme is complex, allowing the application of a signed index to the primary address, and permitting both the primary and index parts to be delegated to an indefinite level.
A first version of UNCOL, T. B. Steel, Jr., 1961
Just as we can describe the logic circuits necessary for any binary digital computer, we can identify (with numbers/names) any machine code instruction from any binary digital computer and then use those instructions as part of UNCOL.
Any existing or possible binary digital computers instruction set is a valid UNCOL, but has the well-documented problems of distortions and other impediments for universality.
Therefore we want a very generic set of machine language instructions as the basid for a practical UNCOL.
In practice, we have more flexibility than is obvious.
Some programs simply cant be compiled to run on all processors. Many modern programs are way to big to work on the first few decades of computers. Only certain kinds of programs can be efficiently compiled to graphic processing units.
While I encourage the use of the simplest instructions/operations possible, we can include very advanced instructions/operations in UNCOL because those more advanced instructions/operations can be compiled to any more simple computer via table lookup, subroutine library, or other simple methods.
Knuths MMIX is a good starting point. We would want to remove the connections to things such as a specific register architecture, specific data sizes, and data types.
Also, by Knuths own admission, some of his MMIX instructions are beyond the built-in capabilities of many existing modern computers. Some of these items might be avoided so as to allow for programs to be easily implemented on existing computers.
It should be obvious by now that the basic 20 operations and the basic signals can be combined to create an infinite number of valid and approximately equivalent versions of UNCOL.
If we are careful to make sure that we can easily map between the UNCOLs in a pracitical and efficient manner, then there may be real advantages to having multiple UNCOLS simultaenously in use.
In addition to portions of UNCOL describing operatins (instructions), we also need portions of UNCOL that describe the data.
Each item of data, either input or calculated, must be described as to type, range precision and the like by special data descriptive syntactical sentences in the language. These descriptions, additionally, provide information concerning ultimate storage allocation as well as indicators of contextual meaning for the explicit commands.
A first version of UNCOL, T. B. Steel, Jr., 1961
Any specifications about data that are not already made in either the original source code or the original source language will be deferred. It is important for UNCOL not to add any extra specifications about data. Anything unspecified can be determined at the time of binding to a specific object code.
In a normal assembly language data storage is defined with directives rather than instructions.
The normal datat storage directives are aimed at the assignment of memory addresses and the bit strings needed for various fundamental data types (primarily data types supported directly by hardware).
We need the additional flexibility to define high level composite data structures and to specify various storage methods (stack frame, register, etc.).
Becuase we will leave register assignment to the compiler back end, we can provide an infinite (unbounded) number of registers and can have any arbitrary register type. We need to include the ability to specify machine specific special purpose registers, whether located inside the main processor core or located in external devices.
Supplementary instructions should be avoided.
Supplementary to the instructions and data descriptions are certain declarative sentences, inessential to the explicit statement of the problem solutions being translated, designed to provide information useful to the translator in the introduction of computational efficiency into the object program.
A first version of UNCOL, T. B. Steel, Jr., 1961
A useful version of UNCOL may include a text based representation of a parse tree. But instead of using a specific parse tree designed for one specific programming language, we need a more generalized version of a parse tree to be truely universal.
A UNCOL called CAT (Common Abstract Tree Language) was developed at the University of Kiel in West Germany. It was bought by the Norwegian Computer Company Norsk Data, and used to make "real" compilers that were used in "real" products. There were front-ends for pascal, c, and BASIC (Fortran was planned but never developed); back-ends for NS16000, M680xx, ND-100, ND-500, ND-5000 (Norsk Date CPUs( and a Siemens machine Ive forgotten the name of. Reinhard Voeller is still with Norsk Data in Kiel, Uwe Schmidt is professor at the FH in Wedel, in Germany.
Summary of UNCOL references, Comp.compilers newsgroup posting, Debora Weber-Wulff, 12 December 1991
TCOL was supposed to be a family of tree-based intermediate languages, each specialized, but the whole family meant to be universal.
Summary of UNCOL references, Comp.compilers newsgroup posting, David Lamb, 12 December 1991
If we intend to use UNCOL for the more general purpose problem of automated conversion between high level languages, then we will certainly want to drag around a parse tree. This will allow us to restore comments, indentation (and other white space), identifier names, and other information that helps a human read and understand source code. Ideally, if a piece of source code is translated from the high level language into UNCOL and then translated back into the same (original) high level language, we really want the piece of source code to stay the same.
As Microsoft said, intermediate languages are efficient because by delaying the commitment to a specific native platform as much as possible, the execution platform can make optimal use of the knowledge of the underlying machine, or even adapt to the dynamic behavior of the program.
having defined all of the components of all of the layers of my Universal Computer Oriented Lnaguage, the message will include tools written in the UNCOL.
One very important tool is the compiler/assembler for the Universal Computer Oriented Language.
Future aliens can use the compiler/assembler as a guide for creating a compiler/assembler for their own computers.
This same format can be used in modern times to deliver platform-independent, processor-independent, operating system-independent software to any computer, both those currently in use and those that are created int he future. This is the primary goal of the Universal Computer Oriented Language.
Another important tool is an error correction scheme. If a message has to survive trillions of years, it needs a very solid error correction scheme.
At this point it is possible to define any error correction scheme we want in the Universal Computer Oriented Langauge. Both existing human computers and future alien computers can use the compiler/assembler to run the error correction scheme. The error correction scheme can be freely swapped out as humans and subsequent species discover better error correction schemes (and old messages will have their error correction schemes encoded in their messages).
To prevent confusion, I did not previously mention this part, but now that you see this can work, you realize that we will insert the error correction data at the end of every block.
There are two possible reactions the future aliens can have to the error correction data.
It is possible that they may independently decode the error correction scheme from the corresponding blocks counting in integers and counting in prime numbers. In this case, the error correction scheme becomes an additional confirmation that they are correctly decoding the message.
It is possible that the aliens may view the error correction data as an unsoved puzzle until they reach the part of the message that shows them the error correciton algorithm. Delayed realization of the meaning of the error correction scheme should not interfere with the ability to decode the preceding portions of the message.
Between 100 billion years and 1 trillion years from the start of the universe, all of the galaxies in the Local Group will merge into one large galaxy. Around 2 trillion years after the start of the universe all galaxies outside the Local Supercluster will no longer be detectable. The Stelliferous Era continues until at least 100 trillion years from the start of the universe.
That means that during the vast majority of the time with stars (98%), future aliens will not be able to detect any other galaxies and will not be able to determine the early history of the universe without intact messages from species such as us.
How do we successfully communicate our current observations of the universe to unknown species of future aliens on a scale of trillions of years?
When communicating with aliens, we are likely to want to include our current astronomical knowledge as measured from our current location (earth). This implies a data base with the information and some kind of presentation software to present the astronomical information stored in the data base.
The important software tool is the multi-dimensional plotting program. This is used to extract and display the astronomical data that we currently possess that will no longer be available after about 100 million years.
I might note that when this same system is used to distribute platform-independent software, that this would be the location of the actual software being distributed.
I propose building the multi-dimensional plotting software on the models of Forth and Post Script. This will be easy to build in the stack-oriented layer of my Universal Computer Oriented Language.
There are huge advantages to compressing the message. Shorter messages are easier to preserve.
Therefore, one or more compression schemes may be included in the tools area of the message.
The final portion of the message is the actual astronomical data.
If this message scheme is used for distributing self-playing media (music, movies, books, etc.), then this is the location of the media data.
I propose using relational methods for storing and retrieving the astronomical data.
A couple of quick examples on how an actual UNCOL language would be built using the principles outlined above.
The vast majority of processors have a group of ADD instructions. Individual instructions in the group vary by things like integer or floating point or BCD, signed or unsigned (for integer ADDs), and data size (single or double floating point, IEEE or proprietary floating point, byte word, long word, etc.).
Additionally, there are differences in what happens on overflow or underflow, with general purpose processors tending to wrap-around and set a flag, while digital signal processors tending to saturate.
To add to the confusion, some high level languages automatically or at programmer option extend beyond the normal sizes of data. This is often called arbitrary precision arithmetic and is typically implemented with a arbitrary precision arithmetic package or library.
With the proposed UNCOL solution, we want to hold off making any implementation decisions as long as possible, preferably deferring these decisions to the compiler backend for best match with each target processor.
We will record specific details set by the programmer or the high level language with attributes tagged to either the operation or the data.
Often there will be a need for conversions of data types. These conversions need to be recorded. If the data type of the operation has not been specified, then we may be able to leave the decision of the data type of the operation (and any conversions needed) deferred to the backend of the compiler.
To summarize, well have a single ADD command/instruction/operation in this UNCOL, and will specify the details as attributes to the data or the operation or some combiantion of the two.
The logical AND operation is common in computers. In some cases the AND will be specified byt he programmer, but in many cases the AND will be inserted as part of the implementation of a high level concept into low level machine code (and the proposed UNCOL is intended primarily to be a general purpose generic assembly lanbguage).
Many processors have two different ANDs.
One common kind of AND is a straight-forward bit-wise AND, matching each bit in a bit string to the corresponding bit in another bit string and resulting in a bit string.
Another common kind of AND operates on a byte, word, or other size data as a whole. In the latter case, typically TRUE or FALSE is defined as ZERO and the other option is defined as all non-zero values. But there are other possibilities, such as negative and non-negative numbers to represent the truth values.
To make things even messier, many programming languages have their own specific implied implementation of logical values. The implementation choice of the high level language may conflict with the hardware implementation of a specific target processor.
For this implementation of UNCOL, we will use a single generic AND, then specify the implementation details in attributes tagged to the operation or data.
When possible, we want the flexibility for the back end compiler to make the final decision based on the facilities of the target processor.
One noteworthy potential problem is a programmer explicitly taking advantage of his or her knowledge of the hgih level languages implementation choices. An example, would be testing a result value arithmetically (is it negative, zero, positive, or some other arithmetic test) rather than using the high level languages logical tests. This is extremely likely in a language like C, where there is no built-in logical or boolean data type.
Technically, the only data type we need is bit strings because all of the data in a binary digital computer is represented by bit strings (including bit strings of zero bit and one bit length. Realistically, this is impractical and UNCOl is only useful if it is practical (the reason we didnt stop at an UNCOL of NAND).
We want the facility to designate the data type of data. Different data types can have a variety of special attributes.
Bit flags and bit strings.
Numbers. Integers, floating point, fixed point, fractions, complex, BCD, arbitrary precision, signed, unsigned, bit size (or possible ranges of bit size).
Characters. Collating sequence, character strings.
Pointers and references.
Various higher level concepts mapped to bit strings, such as colors, enumerations.
First-class fucntions, parametric polymorphism.
Composite types. array, record, union, variant record, set, object
Abstract. queue, set, stack, tree, graph, hash, map, smart pointer.
Because most storage methods and address modes can be expressed as a series of more simple instructions, we can safely implement a wide variety of address modes as possible attributes with the confidence that our concepts can esily be implemented on any processor with enough storage space and processing power.
The exception to the above is any of the register based address modes. We can leave the vast majority (possibly all) of register assignment to the back end of the compiler. We can keep track of storage without worrying about eventual use of registers.
Cache control, virtual memory control.
Data movement.
Add
Subtract
Multiply
Divide, Modula, Remainder.
Commercial machines are usually deicient in their support of integer arithmetic. For example, they almost never produce the true quotient [x/y] and true remainder x mod y when x is negative or y is negative; they often throw away the upper half of a product. They dont treat left and right shifts as strict equivalents of multiplication and division by powers of 2. SOmetimes they do not implement division in hardware at all; and when they doo handle division, they usually assume that the upper half of the 128-bit dividend is zero. Such restrictions make high-precision calculations more difficult., Fascile 1, MMIX, Donald E. Knuth
Commercial machines do not perform FINT [floating integer] and FREM [floating remainder] efficiently., Fascile 1, MMIX, Donald E. Knuth
Square root.
Negation.
Shift.
Rotate.
Compare and Test.
Conditional Set, Conditional Zero, Conditional Branch, Probable branch conditional.
Subroutine/Function Call, Return, Save/Restore.
AND (minimum), OR (maximum), XOR (exclusive or, add mod 2), AND-NOT, OR-NOT, NAND (not and), NOR (not or), Not Exclusie Or, multiplex, sideways add
Saturating subtraction, multiple or, multiple exclusive or.
Commercial machines do not (yet?) have the powerful MOR [mutliple or] and MXOR [multiple exclusive or] operations. They usually have half a dozen or so ad hoc instructions that handle only the most common special cases of MOR., Fascile 1, MMIX, Donald E. Knuth
Stack operations. Push, pop.
Pre-operations. Preload data, prestore data, prefecth instructions, synchronize instructions and data, synchronize data, synchronize, compare and stop, load virtual translation status.
Interrupts. Trip, Trap.
Actions regarding special registers.
Input and Output. Fopen, Fclose, Fread, Fgets, Fgetws, Fwrite, Fputs, Fputws, Fseek, Ftell.
No operations.
It may be useful to include lambda calculus and simply typed lambda calculus as part of an UNCOL. The biggest disadvantage is that lambda calaculus tends to not map very well to hardware. The biggest advantage is that lambda calaclus tends to map very well to software. Also there are existing high level languages that include at least parts of lambda calculus as part of the high level language (for example, Eiffel).
We need to support the concepts of the three basic lambda terms: free and bound variables, lambda abstraction, and application.
We need to support the three kinds of reductions, alpha (α-converson), beta (β-reduction), and eta (η-conversion).
It may be useful to include the some standard defintions and combinators, including S λx[λy[λz[xz(yz)]]] substitute-and-apply-operator, K λx[λy[x]] constant function, I λx[x] identity function, B λx[λy[λz[x(yz)]]], C λx[λy[λz[xzy]]] swap, T λx[λt[z]] truth value true, F λx[λy[y]] truth value false, ω λx[xx] self-application combinator, Ω ωω self-appllcation of the self-application combiantor, Y λƒ[(λx[ƒ(xx)])(λx[ƒ(xx)])] Currys paradoxical combinator, Θ (λx[λƒ[ƒ(xxƒ)]](λx[λƒ[ƒ(xxƒ)]]) Turings fixed-point combinator.
The following are the essential parts for building UNCOL:
Testing It is important to have as much testing as possible before the UNCOL standard is published to the public because of the well-documented problem of being unable to remove mistakes from a programming language without breaking existing software written with the language mistake.
Analysis It is important to analyze as many existing languages and processors as possible, with an emphasis on the most popular and representatives as many different families as possible. Because I suggest an extensible version of UNCOL, I recommend being very conservative in adding any required UNCOL language features, while allowing ease of defining and building optional features.
Standard The most important part is the public standard for the UNCOL programming language, probably both binary and text versions. I propose that the standard be divided into required, optional, and library parts. I recommend following the example of the C programming language of minimizing official language features and moving as much as possible to options and libraries.
Required The required parts will be divided into two parts much like Forth, with low level primatives and higher level secondaries. The secondaries are defined from combinations of primatives and other secondaries.
Optional A mechanism for defining optional parts makes the UNCOL programming language extensible, allowing it to grow with new technology and knowledge.
Libraries Most features of high level languages should be moved to libraries written with required and optional features.
Formal definition Creating a formal definition of the language, while important, can be delayed.
Publishing The official language standard and the official definition of the UNCOL programming language need to be published.
Connections There must be at least one method of connecting to the UNCOL programming language. The original proposal recommended front ends for programming languages and back ends for processors. There are many different tools implied by this approach. These tools can be divided between those that are open source, those that are closed source but free, and those that are closed source and sold or leased for a profit.
Apps Portions of the UNCOL programming language could potentially be made available as stand alone apps (free). This allows for active public use that stress tests parts of UNCOL before the entire standard has been completed.
Artificial intelligence Applying artificial intelligence to the tools can produce much more interesting tools.
Drivers It would be very useful to release a subset of UNCOL that could be used for device drivers. This would allow hardware manufacturers to release an UNCOL-encoded version of their device drivers that could then be easily compiled or interpretted to run on any processor/operating system combination.
OSdata.com is used in more than 300 colleges and universities around the worldfree downloadable college text book on computer programming. |
A web site on dozens of operating systems simply cant be maintained by one person. This is a cooperative effort. If you spot an error in fact, grammar, syntax, or spelling, or a broken link, or have additional information, commentary, or constructive criticism, please e-mail Milo. If you have any extra copies of docs, manuals, or other materials that can assist in accuracy and completeness, please send them to Milo, PO Box 1361, Tustin, CA, USA, 92781.
Click here for our privacy policy.
| previous page | next page |

This web site handcrafted on Macintosh
computers using Tom Benders Tex-Edit Plus
and served using FreeBSD
.
Names and logos of various OSs are trademarks of their respective owners.
Copyright © 2011, 2012, 2013 Milo
Last Updated: May 10, 2013
Created: June 23, 2012
| previous page | next page |