Computer Programming

Copyright © Milo

    Except where noted, all text is copyright Milo. Unless otherwise specified, all materials in this book are owned by the original copyright holder.

    This free college text book on a subject required in the vast majority of schools is an attempt to help make higher education available to the middle class and the poor. Professors are encouraged to use this to create their own mash-up book that matches their lesson plans (within copyright restrictions listed below).

    Material is usually presented as a summary, a mixture of platform-independent principles and ideas along with specific implementation instructions for some popular languages (with the intent that the professor’s mash-up have only one language implementation of the professor’s choice), additional resource material, and an optional discussion of the underlying mathematics (to keep more talented students from becoming bored).

Message to Professors

    This book is released under a free educational license allowing most non-profit educational uses (including professor mash-ups), but requiring that each and every contributor be contacted for any other use. This is specifically non-GNU to protect the rights of professors to include their own materials in their own mash-ups.

    If you are at an accredited college, university, or other school in the U.S. or Canada, you have permission to use any of these materials in your own “mash-up”, as hand-outs, or as a whole book (and you can change the order, delete stuff, add stuff, and mdify stuff) as long as it is for your own classroom use. I do need you to include the copyright notice on anything you use.

    I do not know the accreditation rules for other nations, so am limiting the above permission in other nations to government schools, colleges, and universities in any nation. Write a letter and ask if you are at a non-government school in another nation. I do need you to include the copyright notice on anything you use.

    Also, accredited schools, colleges, and universities in the U.S. and Canada and government schools in any other nation may print this book (with modifications) and provide it to students either free or at actual printing costs. I do need you to include the copyright notice (including website URL) on anything you use.

    I also ask that you provide a copy of any variant versions of this book you create (you can send a CD-R to Milo, PO Box 1361, Tustin, CA, 92781, USA).

    Feel free to donate additional materials (you will be credited for your contributions).

    And please provide feedback on any recommendations, suggestions, complaints, corections, etc. to improve the quality of this free text book for your class use.

Message to Self-Taught Students

    It is almost always better to learn from a qualty university or college.

    With the current dismantlement of quality higher education for the middle class and poor, many prospective students are forced to learn on their own.

    If you must self-teach, please seek out other talented individuals and form a study group.

    If you must seek underground education, such as reading a free college text book on-line, please use the knowledge you gain for positive purposes, such as when internet-savvy youth used the internet to start and finish the non-violent overthrow of the Mubarak government in Egypt. Note, I strongly recommend that the people of Egypt be rewarded by returning the prime meridian of the world to the center of the Great Library at Alexandria, and that Egypt, Brazil, Germany, Japan, and India be added to the permanent members of the United Nations Security Council with veto power. That would be a just international reward for skilled positive use of the internet for widespread public good.

    Please avoid any destructive programming. Seriously consider the moral and ethical consequences of weapons of mass destruction.

HTML version of October 2010

    This HTML is version 0.01 from 5 October 2010.

    My goal is to provide a free downloadable text that can be used in college and high school computer programming classes. The intent of this free downloadable college text book is to attempt to directly help poor and middle class students with the high cost of college text books by providing a high quality free alternative that can be used in the classroom for a subject that most college students are required to take.

    The first part of the book covers the basics of programming and the second part covers advanced topics, such as assembly language, compilers, and operating systems. Each chapter and subchapter starts with a language independent summary of the programming concepts, followed by the mechanics of language specific implementation and additional related material that may interest more talented students.

    This free downloadable book is based on and includes materials from http://www.OSdata.com . Materials from OSdata.com have already been used in more than 300 colleges and universities around the world and have been quoted in studies and policy decisions by the U.S. Navy and the government of the Federal Republic of Germany.

    This is still a work in progress. Feedback and constructive criticism appreciated (especially feedback from professors who might want to use the finished book).

    While this book is still being written, professors are free to use specific chapters (or portions of chapters) as class handouts to supplement existing for-profit text books. This same policy will continue to apply after the book is completed, but this policy offers usefulness for many classes right now today even though the book is still incomplete.

    Poor students should not feel bad about using this book for free. You are exactly who this book is intended to help. You may optionally do volunteer work for the charitable organization of your choice (not political or religious activity — actual work for a charitable organization helping the poor, the elderly, the sick, the disabled, or the environment, etc.).

    Distributed for free, but donations are very welcome. If you print out this book or read substantial portions on a computer screen (after the book is completed), please send a $10 donation to the author at: Milo, PO Box 1361, Tustin, CA, 92781, USA. Donations will help support further research and writing. You do not have to make multiple donations when you download new editions/versions of this book.

    Those who make a donation have permission to print out future versions of this book, as well as back up and replacement copies of this book, for no additional donation (although additional donations would be appreciated).

    Remember that any donations are voluntary and donations are not expected from those who are poor or otherwise might be burdened by the cost of making a donation. Corporations or rich people who want to help support the writing of this book are encouraged to make donations and will be specifically mentioned for their support.

    author: Milo, PO Box 1361, Tustin, CA, 92781, USA

    The e-mail address for contacting the author changes regularly to avoid spam. The current e-mail address for contacting the author is on the website OSdata.com.

credits

    Elaine Higgins, faculty, Language Finger, Maureen and Mike Mansfield Library, University of Montana, http://www.lib.umt.edu/

    Neal Ziring, professor, The Language Guide, University of Michigan, http://www.engin.umd.umich.edu/CIS/course.des/cis400/index.html

copyright information

    This free downloadable computer programming text book is Copyright © Milo except where specifically noted otherwise.

    Students have permission to download this free computer programming textbook (in whole or in part) and print out copies for personal use.

    Government schools and the instructors/professors/teachers at government schools have permission to download and print copies of this free computer programming textbook (in whole or in part) for personal use and for distribution to students in their classes. Schools/instructors/professors/teachers may charge a reasonable fee to cover the cost of printing, binding, and other related costs.

    Accredited colleges, universities, or other schools in the U.S. or Canada, have permission to use any of these materials in their own “mash-up”, as hand-outs, or as a whole book (and can change the order, delete stuff, add stuff, and mdify stuff) as long as it is for classroom use. The copyright notice and URL of this website must apppear on materials so used.

    The schools and educators granted permission in the paragraphs above may change the order of material, add new material, delete existing material, and otherwise make changes that customize the book for their classes (with copyright notice and URL of this website). Schools and educators are asked to provide copies of any new materials added and indicate whether they want to credited by name and institution or anonymously.

    I do not know the accreditation rules for other nations, so am limiting the above permission in other nations to government schools, colleges, and universities in any nation. Write a letter and ask if you are at a non-government school in another nation. I do need you to include the copyright notice on anything you use.

    Others wishing to print this free text book should contact the author with their request.

    I also ask that you provide a copy of any variant versions of this book you create (you can send a CD-R to Milo, PO Box 1361, Tustin, CA, 92781, USA).

    Those who can afford to pay for this computer programming text book, should send cash donations to Milo, PO Box 1361, Tustin, California, USA, 92781.

    Feel free to donate additional materials (you will be credited for your contributions).

    And please provide feedback on any recommendations, suggestions, complaints, corections, etc. to improve the quality of this free text book for your class use.

    author: Milo, PO Box 1361, Tustin, CA, 92781, USA

Stanford University

    Stanford CS Education Library This [quoted in several different chapters] is document #101, Essential C, in the Stanford CS Education Library. This and other educational materials are available for free at http://cslibrary.stanford.edu/. This article is free to be used, reproduced, excerpted, retransmitted, or sold so long as this notice is clearly reproduced at its beginning. Copyright 1996-2003, Nick Parlante, nick.parlante@cs.stanford.edu.

    This [quoted in tree chapter] is article #110 in the Stanford CS Education Library. This and other free CS materials are available at the library (http://cslibrary.stanford.edu/). That people seeking education should have the opportunity to find it. This article may be used, reproduced, excerpted, or sold so long as this paragraph is clearly reproduced. Copyright 2000-2001, Nick Parlante, nick.parlante@cs.stanford.edu.

Ada-Europe

    Legal information regarding the use of the Ada-Europe materials.

    Copyright © 1992, 1993, 1994, 1995 Intermetrics, Inc.
    Copyright © 2000 The MITRE Corporation, Inc.
    Copyright © 2004, 2005, 2006 AXE Consultants
    Copyright © 2004, 2005, 2006 Ada-Europe

    Ada Reference Manual - Language and Standard Libraries

    Copyright © 1992, 1993, 1994, 1995, Intermetrics, Inc.

    This copyright is assigned to the U.S. Government. All rights reserved.

    This document may be copied, in whole or in part, in any form or by any means, as is or with alterations, provided that (1) alterations are clearly marked as alterations and (2) this copyright notice is included unmodified in any copy. Compiled copies of standard library units and examples need not contain this copyright notice so long as the notice is included in all copies of source code and documentation.

    Technical Corrigendum 1

    Copyright © 2000, The MITRE Corporation. All Rights Reserved.

    This document may be copied, in whole or in part, in any form or by any means, as is, or with alterations, provided that (1) alterations are clearly marked as alterations and (2) this copyright notice is included unmodified in any copy. Any other use or distribution of this document is prohibited without the prior express permission of MITRE.

    You use this document on the condition that you indemnify and hold harmless MITRE, its Board of Trustees, officers, agents, and employees, from any and all liability or damages to yourself or your hardware or software, or third parties, including attorneys’ fees, court costs, and other related costs and expenses, arising out of your use of this document irrespective of the cause of said liability.

    MITRE MAKES THIS DOCUMENT AVAILABLE ON AN “AS IS” BASIS AND MAKES NO WARRANTY, EXPRESS OR IMPLIED, AS TO THE ACCURACY, CAPABILITY, EFFICIENCY MERCHANTABILITY, OR FUNCTIONING OF THIS DOCUMENT. IN NO EVENT WILL MITRE BE LIABLE FOR ANY GENERAL, CONSEQUENTIAL, INDIRECT, INCIDENTAL, EXEMPLARY, OR SPECIAL DAMAGES, EVEN IF MITRE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

    Amendment 1

    Copyright © 2004, 2005, 2006, AXE Consultants. All Rights Reserved.

    This document may be copied, in whole or in part, in any form or by any means, as is, or with alterations, provided that (1) alterations are clearly marked as alterations and (2) this copyright notice is included unmodified in any copy. Any other use or distribution of this document is prohibited without the prior express permission of AXE.

    You use this document on the condition that you indemnify and hold harmless AXE, its board, officers, agents, and employees, from any and all liability or damages to yourself or your hardware or software, or third parties, including attorneys’ fees, court costs, and other related costs and expenses, arising out of your use of this document irrespective of the cause of said liability.

    AXE MAKES THIS DOCUMENT AVAILABLE ON AN “AS IS” BASIS AND MAKES NO WARRANTY, EXPRESS OR IMPLIED, AS TO THE ACCURACY, CAPABILITY, EFFICIENCY MERCHANTABILITY, OR FUNCTIONING OF THIS DOCUMENT. IN NO EVENT WILL AXE BE LIABLE FOR ANY GENERAL, CONSEQUENTIAL, INDIRECT, INCIDENTAL, EXEMPLARY, OR SPECIAL DAMAGES, EVEN IF AXE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

    Consolidated Standard

    Copyright © 2004, 2005, 2006, Ada-Europe.

    This document may be copied, in whole or in part, in any form or by any means, as is, or with alterations, provided that (1) alterations are clearly marked as alterations and (2) this copyright notice is included unmodified in any copy. Any other use or distribution of this document is prohibited without the prior express permission of Ada-Europe.

    You use this document on the condition that you indemnify and hold harmless Ada-Europe and its Board from any and all liability or damages to yourself or your hardware or software, or third parties, including attorneys’ fees, court costs, and other related costs and expenses, arising out of your use of this document irrespective of the cause of said liability.

    ADA-EUROPE MAKES THIS DOCUMENT AVAILABLE ON AN “AS IS” BASIS AND MAKES NO WARRANTY, EXPRESS OR IMPLIED, AS TO THE ACCURACY, CAPABILITY, EFFICIENCY MERCHANTABILITY, OR FUNCTIONING OF THIS DOCUMENT. IN NO EVENT WILL ADA-EUROPE BE LIABLE FOR ANY GENERAL, CONSEQUENTIAL, INDIRECT, INCIDENTAL, EXEMPLARY, OR SPECIAL DAMAGES, EVEN IF ADA-EUROPE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.


goal of this text book

    The goal of this book is to provide a free downloadable text that can be used in college and high school computer programming classes.

    According to the Los Angeles Times college text books average $120 each (late 2006) and the major book publishers are still jacking up the prices. According to the LA Times, many poor students are barred from higher education (even though they have financial aid or a scholarship) because they simply can’t afford the price of text books, which can be more than a thousand dollars a semester/quarter.

    For-profit book publishing corporations use numerous malicious tricks to keep pushing up the price of text books and reduce the use of recycled used text books. Their racist purpose in engaging in these activities is to price text books out of the range of poor people so as to keep the mostly minority poor students from having the same equal access to education as rich white children.

    The major book publishers put out new (more expensive) editions of text books every three years. It just happens to be that three years is the amount of time for a text book to saturate the used text book market and cut into sales of new books. The book publishers claim that this is mere coincidence with the timing of their new editions and that they only publish new editions when they need to make improvements on the existing text. They claim it is mere coincidence that these “necessary” improvements happen to exactly match the sales cycle for every text book they publish!

    I do actively encourage students, teachers, and professional programmers to provide useful feedback and criticism to help make this project useful as a free downloadable college text book.

    Donations of money to help support the writing and hosting of this project are greatly appreciated. Mailing address: Milo, PO Bx 1361, Tustin, CA, 92781, USA.

using this text book

    This book is divided into two major sections.

    This organization reflects the way computer programming is normally taught: an overview class that gives a foundation in basic concepts, followed by a series of more advanced concepts.

    The first section provides an introduction and overview to computer programming.

    This first section is further divided into general discussions and language specific discussions. It is generally unwise for beginners to attempt to learn more than one programming language at a time. Each programming language is color coded. The choice of colors is completely arbitrary and has no meaning.

    Within each chapter and subchapter, the material starts with a language-independent summary that explains the underlying programming concept. This is followed by detailed explanation of the concept intermixed with explanations of the mechanics of major programming languages (usually Pascal and C, along with others as appropriate). Additional reference materials for the smarter students include brief comparisons of how these ideas are implemented in a number of programming languages and the underlying mathematical concepts.

    The second section provides a detailed examination and reference for advanced studies in computer programming and computer science.

    Do not expect for this book to be assigned in the same order as it is written. There are many different ways to teach computer programming, so your particular professor or instructor is likely to change the order of presentation of the material, probably also deleting entire chapters and possibly inntroducing additional outside materials.

    In particular, there is more material in the introductory section than can reasonably be covered in a single class. Your professor will decide which materials should be emphasized and which materials should be skipped. Some programming languages naturally emphasize some materials and don’t include others.

    Most schools start students on programming as quickly as possible. Don’t be surprised if your professor skips over some or all of the early chapters and tells you that some or all is material that you should already know.

    Once the introductory material has been covered, the advanced material can literally be taught in almost any order.

    Professors are strongly encouraged to edit and rearrange materials to match the order of class presentation.

    Important Note: For the sake of clarity, much of the material in the first section is watered down and simplified. Most exceptions (including some important ones) will be overlooked. Many details (including some important ones) will be ignored. Terminology will be used in a casual manner without formal definitions. Including all that information would just bog down the discussion and make it more difficult to understand the basic principles.


table of contents


picking a class

    The educational goal of this subchapter is to provide a few hints and suggestions on how to choose a class and professor. It is certainly possible that there may be limited choices available, event he possibility of only a single available class.

    Strong recommendation: Pick a class based on the quality of the professor rather than how easy the grading is.

    Before signing up for a class, ask other computer science majors about the professors. Getting hired as a professor is based on factors completely unrelated to the ability to teach. Further, the major factors for advancement in pay and prestige has nothing to do with teaching. Promotion is based primarily on the ability to continually be published. Colleges and universitites deliberately and intentionally ignore the ability to teach in making decisions about hiring and career advancement for professors!

    Luckily, most professors learn something about education and teaching through years of dealing with students. And there are some professors who are naturally gifted at teaching.

    You want to try to figure out from other students which are the best professors and then try to get into the classes taught by the professors who are best at teaching.

    Don’t be concerned about a reputation as a “tough grader”. How tough a professor is at grading has nothing to do with how good they are at teaching. A professor who is naturally gifted at teaching will prepare you for tougher grade standards.

    Also take into consideration when a class is taught.

    Teenagers and young adults experience “phase shift” from growth hormones. The release of growth hormones has the side effect of changing the natural inner circadian clock, making the teenager or yonng adult stay up late at night and making it very difficult to wake early in the morning. If you attempt to take an early morning class while experiencing phase shift, you will have a great deal of trouble learning and might fail a class that you’d easily pass if taken later in the day.

    Night or evening classes are typically taught by working professionals rather than tenured professors. These instructors tend to have a less academic and more practical approach to teaching. If you need more help on the academic parts, take a day class. If you need more help on the nuts and bolts of programming, then consider taking an evening class.


syllabus

    The educational goal of this subchapter is to give some suggestions on how to make effective use of a syllabus.

    The syllabus is a great road map for success in your class.

    The syllabus is your friend. Reading the syllabus can easily result in a letter grade improvement. For normal students taking a computer science class just because it is a requirement, just reading the syllabus can turn a failing D into a passing C.

    Class identification The beginning of the syllabus will identify the class (including any identification numbers used by the school) and the professor (and any teaching assistants, proctors, or others involved in the class). As trivial as it may sound, make sure that the class ID matches the class you signed up for!

    Professor’s name The syllabus will include the correct spelling of the professor’s name. Double check this before turning in anything in writing. There are few things more personal to someone than their own name. If you mess up on the professor’s name, the subconscious resentment will hurt your grade (even if the professor consciously tries to ignore the slight).

    Contact information The syllabus will include information on how to contact the professor. This will typically include a telephone number, an e-mail address, and a room/building number for the professor’s office.

    Office hours Every professor will have office hours. These may be set hours with an open door or it may be a notation that office hours are by appointment. Even if there are open office hours, it is worthwhile to arrange an appointment with the professor just to make sure that he or she is there at the appointed time and to gain a priority over students showing up without an appointment. Never be late to an appointment for office hours.

    Note that you should make your appointment at the end of class, not at the beginning. A professor is always busy preparing for a class and simply won’t have time to focus on student requests. At the end of the class the professor can better handle questions and requests for an appointment.

    Professors almost always want their students to succeed. A professor will make a reasonable attempt to help a student learn. Note that if the amount of time and effort exceeds certain limits that the professor may recommend private tutoring. If the professor recommends private tutoring, ask for recommendations on who to take the lessons from. The professor will certainly know some higher level or graduate students who can use the extra income. Also, the school may have free tutoring available. If you have learning or physical disabilities, set an appointment for office hours so that the professor will be aware of the difficulties. The professor may be able to arrange for alternatives to the regular coursework. Also the professor will be able to refer you to professional staff at the school who can offer you additional assistance in dealing with your learning or physical disability.

    If you have life problems that come up (such as death in the family, severe illness, work related problems, etc.), make an appointment for office hours with the professor. While there may be little that the professor can do to help, there are possibilities that the professor may be able to change some deadlines or give an incomplete grade or offer some other kind of assistance. At a minimum the professor will be aware that you are trying your best to keep up with the classwork under difficult circumstances.

    Note that in most schools an incomplete grade is at the professor’s option. Schools discourage incomplete grades. The professor will have the final decision on whether the life difficulty is sufficient to warrant an incomplete and whether the student will have a reasonable ability to make up the incomplete grade in the future. Usually it is a better option to withdraw from the class and take it again when you are prepared.

    Website Most professors will provide the URL for the class. This website or webpage(s) may have copies of the syllbus, programming asignments, and other materials.

    Some professors may place their sldies and/or lecture notes on their website (usually after each specific class session).

    Some professors may place answers to examinations, homework, or other assignments on the website (again, after the specific class session).

    Some professors may list recommended websites that will have additional information useful for the class. These recommendations may be printed on the syllabus or may be on a links page on the professor’s website.

    Course description The course description is typically written in edu-speak, but it worth figuring out. This is your best hint for what the class will cover and what you will be expected to learn.

    Use the course description to figure out if you are in the right class. Obviously if the class is a required one for your major, then you have no choice in the matter. But if you have a choice, the course description will help you figure out if you are taking the correct class from multiple alternatives.

    The course description can also help you prepare your studies by letting you know the highest priorities of the professor. When you prepare for quizes, tests, midterms, and finals, the course description can help answer that vexing question of what is the most likely subject matter to appear on the examination.

    Topics The syllabus may have a list of topics to be covered. This is typically easier to understand than the edu-speak of the course description and serves the same uses.

    Required materials Check to see what is required. If there are particular forms required for taking examinations, these specific requirements will be listed on the syllabus.

    If there are specific requirements for computers, hardware, software, or operating systems, these requirements will be listed. Sometimes you might be arrange for a reasonable alternative, but don’t be surprised if the professor rejects alternatives from the requirements on the syllabus. It dramatically increases the work load on the professor to allow custom alternatives, so those typically won’t be allowed unless there is a compelling reason (such as a physical disability).

    Watch for media requirements, such as DVDs, CD-Rs, floppy disks, punched cards, paper tape, or other media. In particular pay attention to formats (such as DVD-R vs. DVD+R). A small mistake in format could result in a drop in letter grade or even failure.

    If you do not understand what a required material is, ask the professor in class when he or she asks if anyone has questions about the syllabus. If you do not know where to obtain a particular required material (and you have already determined the school book store doesn’t carry it), ask the professor. Professors will often know where you can get the best student discount on each item.

    Textbook The syllabus will list required and optional texts.

    Required texts are ones that the professor will expect you to obtain, usually by the second time the class meets.

    Optional texts are ones that the professor knows will help you understand the material, but aren’t essential to pass the class. if you can afford these optional texts, you should obtain and read them. Sometimes a professor will list classic reference books that are directly related to the class. These are great books for your long term professional library.

    If you can’t afford the text books assigned, you should check to see if there are one or more reserve copies available at the school library. You can’t take the reserve copy from the library and often there are time limits on how long you can read it at the library, especially near finals.

    Reserve copies may not be available because large book publishing corporations refuse to provide school libraries with a free copy and school libraries can’t afford to keep purchasing replacement editions every three years. Often the reserve copy was purchased by your professor out of his or her own money. Students who can afford to forego selling books back to the college bookstore may want to consider donating their books to the library at the end of the class.

    Grading Every professor will have different criteria for grading. You dramatically improve your chances of obtaining a higher letter grade by carefully reading the grading criteria.

    The grading system may be a point system. If so, the syllabus will list the exact range of points needed for each letter grade. There may be limits on how many points can be obtained from particular activities.

    The grading system may be a percentage system. If so, the syllabus will list the exact range of percentages needed for each letter grade. Rarely you may find that the possible percentages exceed 100%. In this case the professor is offering some alternatives on the methods for obtaining hgiher grades.

    Attendance Almost every professor gives some credit towards mere attendance. This means that you arrived and are in your seat on time as the class starts and that you stay until the end of the class session. Showing up for every class is the easiest way to improve your grade (not merely because of the credit for attendance, but also because of exposure to the professor’s complete lectures).

    Typically there will be a maximum number of absences allowed before the student is either automatically dropped or automatically fails. There may be specific listings of requirements for excused absences. Pay attention to these limits.

    Late arrival Also pay attention to the policy regarding being late to class. The professor may give lower credit for late arrival. The professor may give no credit for late arrival.

    Even if the professor gives no credit for late arrival, it is better to get at least part of lecture than to miss it entirely. if you do arrive late, be quiet and unobtrusive when taking your seat. Do not distract from the ongoing lecture.

    If you are late to a class, do not interrupt the class by asking questions about material already covered. Find out what you missed after class from your fellow students.

    Class participation Most professors also give credit for class participation. This is a very subjective grading standard and somewhat unfair to those who are socially shy.

    Always pay attention and take good notes. The professor will notice this, especially in smaller classes.

    Try to make sure that you ask relevant questions during class. Asking excessive questions is disruptive and annoying, but professors like a good mix of student questions, if only to get a feel for how well the class is learning the material.

    If the professor asks that questions be saved for a question and aanswer period at the end of the class, write down your questions on a separate piece of paper so that you will remember them when the time for Q&A occurs.

    If you are too shy to ask questions during class, write down your questions and ask at the end of class or during office hours. Professors generally prefer that questions be asked during class so that the educational experience can be shared by all of the students, but it is better to ask questions later than not at all.

    Programming assignments The syllabus will list how many programming assignments will be required, when they will be due, and how much credit you will receive for each. There may be a brief description or title of each programming assignment. There will probably be a breakdown on how each programming assignment is graded.

    Generally you will receive the instructions for each programming assignment on separate handouts throughout the class. This gives the professor the flexibility to modify programming assignments based on the progress of each individual class.

    Programming assignments are always trivial and artificial in nature. There simply isn’t enough time in a typical class to assign large scale programming assignments, so each programming assignment will have specific educational goals in mind. Very artificial standards will be required to focus attention on the specific educational goals of each programming assignement.

    When you receive a programming assignment, carefully read all of the requirements. Make sure that you udnerstand exactly what is expected of you. Immediately ask any questions about the requirements of the programming assignment (professors won’t answer questions on how to do the assignment, but they will help you understand exactly what you are expected to accomplsih).

    Because of the artificial nature of programming assignments the requirements may be highly unusual. You may see restrictions that simply wouldn’t exist in real world programming. You may be asked to do specific things that would never occur in real world programming. It doesn’t do you any good to rebel against the artififial nature of programming assignments.

    The artificial nature of programming assignments can also give you useful feedback on whether you are solving the correct problem. If you find yourself spending a lot of time designing or coding something that isn’t explicitly required by the programming assignment you may have wandered away from the assignment into a whole bunch of work that has nothing to do with your grade.

    You will also find that the basic knowledge for every programming assignment should be provided in some combination of the lcetures and the textbook(s). You can use the progamming assignments as a guide for particular subject matters to pay attention to during lectures and during reading.

    Class schedule Some professors may list the exact topics that will be covered in each class session. You can use this as a guide for what to read and study to prepare for each class.

    The important things to notice in the class schedule:

    Reading assignments The professor will list the specific chapters of each required text that you are expected to read and the time by which you must have that material read.

    Don’t fall behind on your reading assignments. While some professors repeat in their lectures the same material covered in the textbook, many professors use the textbook as a starting point and delve into further details and nuance during lecture. If you haven’t read the assigned text then you may get lost during the lecture. Also asking questions that were covered in the required reading may adversely affect your class participation grade (unless you are asking for a better explanation of something you didn’t understand from the required reading).

    Writing assignments The syllabus will list any required writing assignments. Introductory classes and classes intended for a general audience will almost always have some kind of written report or essay. Follow all of the rules that you hear in your English composition classes, as well as any custom instructions that appear on the syllabus.

    Examinations The syllabus will list all examinations for the class and how much they count for your grade.

    Note the exact date and class session of any midterm, final examination, or other major examination. Note any required materials (such as particular answer forms). Note whether the examinations are cumulative or not.

    Particularly notice the date and time of the final exam. It is very common for the final exam to be longer than a normal class and to be scheduled at a time (and possibly even day) different than normal class sessions.

    You may find an indication that there may be surprise quizes. Quizes, especially suprise quizes, are generally given at the beginning of the class session. This technique helps make sure that students aren’t late for class. Quizes are also often given at the end of the class, possibly allowing slower students extra time to finish. Quizes at the beginning of a class tend to cover material from the assigned reading, while quizes aat the end of a class may also cover the material in that session’s lecture.

    Academic rules The syllabus will generally have a summary of the school’s policies on academic dishonesty and disruptive behavior. If you need a more complete copy, the school’s administration can provide you with the full set of academic rules. If you are unclear on any rule, ask your professor.

    Often there will be an additional handout that discusses the rules for use of the school’s computing resources. Obey all of these rules, even if they seem absurd.

    Note that cheating, especially on programming assignments, is extremely easy to spot. Because every student has a different personality and individual programming style, every student’s work will have a distinctive look, even on the most trivial of programming assignments. Merely changing variable and procedure names, for example, will not obscure the similarity of programming style. Even a stupid grader will immediately notice if two or more student programs have a similar look.

    Extra credit The syllabus will list any possibilities for extra credit work. If you think you might need the extra credit, start work early. You simply won’t have the time to do extra credit work when finals crunch comes along.

    If the professor doesn’t list any extra credit, you can ask.


computer programming

    This chapter is intended as an introduction to computer programming.

    Programming is problem solving and writing instructions for a computer.

    The principles of programming are independent of the computer programming language used. Different languages have different strengths and weaknesses, making some kinds of programs easier or more difficult to write, but the basic principles remain the same regardless of language.

    A skilled programmer should be able to switch to a new programming language in a few hours.

    On the other hand, beginners should pick one language and learn it before attempting a second language. Normally this choice will be made by the school or the professor.

    There is the possibility that the professor might have created a custom version of this book that places everything in the order the professor thinks is best and deletes all materials that the professor views as outside the needs of the class (such as materials on other programming languages). The student is of course free to download the complete version of the book from this website.

    This free text book includes information on multiple programming languages. Unless instructed otherwise, you should concentrate on the language you are learning and skip over the others. Trying to learn the syntax and semantics of multiple programming languages at the same time as learning the basics of programming is a recipe for utter confusion.

    While any programming text book for beginning computer science students has to get across the mechanical details of the programming language being learned, it is far more important to learn the basic principles of programming, which apply across programming languages.

    “As long as programmers cherish their freedom not only to design their own clever software, but also to modify adopted software according to their likings, a proper design discipline remains unlikely. And as long as companies secretly cherish complexity as an effective protection against being copied, there is little hope for dramatic improvements of the state of the art.” —Niklaus Wirth

   “7. It is easier to write an incorrect program than understand a correct one.” —Alan Perlis, Epigrams on Programming, ACM’s SIGPLAN Notices Volume 17, No. 9, September 1982, pages 7-13

   “93. When someone says ‘I want a programming language in which I need only say what I wish done,’ give him a lollipop.” —Alan Perlis, Epigrams on Programming, ACM’s SIGPLAN Notices Volume 17, No. 9, September 1982, pages 7-13


size of programs

    The educational goal of this subchapter is to introduce the concept of differences in scale or size of programming projects and how the size and complexity of a program impacts the task of programming.

    Programs are generally divided into three basic sizes: trivial, small, and large.

    Trivial programs are programs that a skilled programmer can write in less than two days of coding.

    Small programs are programs that one skilled programmer can write in less than one year of full time work.

    Large programs are programs that require more than two to five man-years of labor, normally written by programming teams (which can exceed 1,000 skilled workers).

    These estimates are approximate and there are obvious gaps in the gray zone between the sizes. Further, there can be huge differences in individual abilities.

    Larry Ellison wrote the first version of Oracle database by himself in about six months. That is a genius exception. Data bases typically take large teams (sometimes hundreds of programmers) at least a year.

    Bill Gates, copying and pasting from the source code of three working open source versions, took more than six months to create a bug-filled BASIC compiler and then hired a team of six skilled programmers who spent more than six more months to get rid of enough bugs to make the compiler somewhat usable (a total of more than three man-years). That is an idiot exception. A BASIC compiler typically takes a skilled programmer a few hours to create. Note that Bill Gates takes credit for quickly having created a BASIC compiler, but according to other sources he was sued for having illegally used open source code for commercial purposes, forcing him to spend a great deal of time attempting to do a project that many programmers of the day could successfully finish in hours.

impact on good programming practices

    Almost every program assigned in a class setting will be trivial, simply because there isn’t enough time in a quarter or semester for longer programs.

    Each programming assignment will concentrate on one or a small number of specific programming concepts.

    The artificial nature of school programming assignments cause most students to question the utility of modern programming practices, especially the time and effort spent on form and documentation.

    These practices are the result of decades of real world programming.

    Successful programs tend to have a long lifetime. Programmers will have to look at existing source code, figure out what is going on, and then correctly modify the program to add new features or update existing features to meet changing real world conditions.

UNIX style

    The UNIX philosophy on programming is to encourage a lot of small tools (programs) that work together to complete a task (usually threaded together with a shell script). This bottom-up approach to programming encourages the reuse and sharing of software. The typical UNIX (or LINUX) distribution includes hundreds of small standardized tools that allow a skilled UNIX prorgammer (or even a skilled UNIX administrator or user) to immediately do useful large scale work.


Basics of computer hardware

    The educational goal of this subchapter is to give an overview of how computers work.

    A computer is a programmable machine (or more precisely, a programmable sequential state machine). There are two basic kinds of computers: analog and digital.

    Analog computers are analog devices. That is, they have continuous states rather than discrete numbered states. An analog computer can represent fractional or irrational values exactly, with no round-off. Analog computers are almost never used outside of experimental settings.

    A digital computer is a programmable clocked sequential state machine. A digital computer uses discrete states. A binary digital computer uses two discrete states, such as positive/negative, high/low, on/off, used to represent the binary digits zero and one.

    The French word ordinateur, meaning that which puts things in order, is a good description of the most common functionality of computers.

what are computers used for?

    Computers are used for a wide variety of purposes.

    Data processing is commercial and financial work. This includes such things as billing, shipping and receiving, inventory control, and similar business related functions, as well as the “electronic office”.

    Scientific processing is using a computer to support science. This can be as simple as gathering and analyzing raw data and as complex as modelling natural phenomenon (weather and climate models, thermodynamics, nuclear engineering, etc.).

    Multimedia includes content creation (composing music, performing music, recording music, editing film and video, special effects, animation, illustration, laying out print materials, etc.) and multimedia playback (games, DVDs, instructional materials, etc.).

parts of a computer

    The classic crude oversimplication of a computer is that it contains three elements: processor unit, memory, and I/O (input/output). The borders between those three terms are highly ambigious, non-contiguous, and erratically shifting.

hardware subsystems of a computer

    A slightly less crude oversimplification divides a computer into five elements: arithmetic and logic subsystem, control subsystem, main storage, input subsystem, and output subsystem.

processor

    The processor is the part of the computer that actually does the computations. This is sometimes called an MPU (for main processor unit) or CPU (for central processing unit or central processor unit).

    A processor typically contains an arithmetic/logic unit (ALU), control unit (including processor flags, flag register, or status register), internal buses, and sometimes special function units (the most common special function unit being a floating point unit for floating point arithmetic).

    Some computers have more than one processor. This is called multi-processing.

    The major kinds of digital processors are: CISC, RISC, DSP, and hybrid.

    CISC stands for Complex Instruction Set Computer. Mainframe computers and minicomputers were CISC processors, with manufacturers competing to offer the most useful instruction sets. Many of the first two generations of microprocessors were also CISC.

    RISC stands for Reduced Instruction Set Computer. RISC came about as a result of academic research that showed that a small well designed instruction set running compiled programs at high speed could perform more computing work than a CISC running the same programs (although very expensive hand optimized assembly language favored CISC).

    DSP stands for Digital Signal Processing. DSP is used primarily in dedicated devices, such as MODEMs, digital cameras, graphics cards, and other specialty devices.

    Hybrid processors combine elements of two or three of the major classes of processors.

    For more detailed information on these classes of processors, see processors.

arithmetic and logic

    An arithmetic/logic unit (ALU) performs integer arithmetic and logic operations. It also performs shift and rotate operations and other specialized operations. Usually floating point arithmetic is performed by a dedicated floating point unit (FPU), which may be implemented as a co-processor.

    An arithmetic/logic unit (ALU) performs integer arithmetic and logic operations. It also performs shift and rotate operations and other specialized operations. Usually floating point arithmetic is performed by a dedicated floating point unit (FPU), which may be implemented as a co-processor.

control

    Control units are in charge of the computer. Control units fetch and decode machine instructions. Control units may also control some external devices.

    A bus is a set (group) of parallel lines that information (data, addresses, instructions, and other information) travels on inside a computer. Information travels on buses as a series of electrical pulses, each pulse representing a one bit or a zero bit (there are trinary, or three-state, buses, but they are rare). An internal bus is a bus inside the processor, moving data, addresses, instructions, and other information between registers and other internal components or units. An external bus is a bus outside of the processor (but inside the computer), moving data, addresses, and other information between major components (including cards) inside the computer. Some common kinds of buses are the system bus, a data bus, an address bus, a cache bus, a memory bus, and an I/O bus.

main storage

    Main storage is also called memory or internal memory (to distinguish from external memory, such as hard drives).

    RAM is Random Access Memory, and is the basic kind of internal memory. RAM is called “random access” because the processor or computer can access any location in memory (as contrasted with sequential access devices, which must be accessed in order). RAM has been made from reed relays, transistors, integrated circuits, magnetic core, or anything that can hold and store binary values (one/zero, plus/minus, open/close, positive/negative, high/low, etc.). Most modern RAM is made from integrated circuits. At one time the most common kind of memory in mainframes was magnetic core, so many older programmers will refer to main memory as core memory even when the RAM is made from more modern technology. Static RAM is called static because it will continue to hold and store information even when power is removed. Magnetic core and reed relays are examples of static memory. Dynamic RAM is called dynamic because it loses all data when power is removed. Transistors and integrated circuits are examples of dynamic memory. It is possible to have battery back up for devices that are normally dynamic to turn them into static memory.

    ROM is Read Only Memory (it is also random access, but only for reads). ROM is typically used to store thigns that will never change for the life of the computer, such as low level portions of an operating system. Some processors (or variations within processor families) might have RAM and/or ROM built into the same chip as the processor (normally used for processors used in standalone devices, such as arcade video games, ATMs, microwave ovens, car ignition systems, etc.). EPROM is Erasable Programmable Read Only Memory, a special kind of ROM that can be erased and reprogrammed with specialized equipment (but not by the processor it is connected to). EPROMs allow makers of industrial devices (and other similar equipment) to have the benefits of ROM, yet also allow for updating or upgrading the software without having to buy new ROM and throw out the old (the EPROMs are collected, erased and rewritten centrally, then placed back into the machines).

    Registers and flags are a special kind of memory that exists inside a processor. Typically a processor will have several internal registers that are much faster than main memory. These registers usually have specialized capabilities for arithmetic, logic, and other operations. Registers are usually fairly small (8, 16, 32, or 64 bits for integer data, address, and control registers; 32, 64, 96, or 128 bits for floating point registers). Some processors separate integer data and address registers, while other processors have general purpose registers that can be used for both data and address purposes. A processor will typically have one to 32 data or general purpose registers (processors with separate data and address registers typically split the register set in half). Many processors have special floating point registers (and some processors have general purpose registers that can be used for either integer or floating point arithmetic). Flags are single bit memory used for testing, comparison, and conditional operations (especially conditional branching).

external storage

    External storage (also called auxillary storage) is any storage other than main memory. In modern times this is mostly hard drives and removeable media (such as floppy disks, Zip disks, optical media, etc.). With the advent of USB and FireWire hard drives, the line between permanent hard drives and removeable media is blurred. Other kinds of external storage include tape drives, drum drives, paper tape, and punched cards. Random access or indexed access devices (such as hard drives, removeable media, and drum drives) provide an extension of memory (although usually accessed through logical file systems). Sequential access devices (such as tape drives, paper tape punch/readers, or dumb terminals) provide for off-line storage of large amounts of information (or back ups of data) and are often called I/O devices (for input/output).

input/output overview

    Most external devices are capable of both input and output (I/O). Some devices are inherently input-only (also called read-only) or inherently output-only (also called write-only). Regardless of whether a device is I/O, read-only, or write-only, external devices can be classified as block or character devices.

    A character device is one that inputs or outputs data in a stream of characters, bytes, or bits. Character devices can further be classified as serial or parallel. Examples of character devices include printers, keyboards, and mice.

    A serial device streams data as a series of bits, moving data one bit at a time. Examples of serial devices include printers and MODEMs.

    A parallel device streams data in a small group of bits simultaneously. Usually the group is a single eight-bit byte (or possibly seven or nine bits, with the possibility of various control or parity bits included in the data stream). Each group usually corresponds to a single character of data. Rarely there will be a larger group of bits (word, longword, doubleword, etc.). The most common parallel device is a printer (although most modern printers have both a serial and a parallel connection, allowing greater connection flexibility).

    A block device moves large blocks of data at once. This may be physically implemented as a serial or parallel stream of data, but the entire block gets transferred as single packet of data. Most block devices are random access (that is, information can be read or written from blocks anywhere on the device). Examples of random access block devices include hard disks, floppy disks, and drum drives. Examples of sequential access block devcies include magnetic tape drives and high speed paper tape readers.

input

    Input devices are devices that bring information into a computer.

    Pure input devices include such things as punched card readers, paper tape readers, keyboards, mice, drawing tablets, touchpads, trackballs, and game controllers.

    Devices that have an input component include magnetic tape drives, touchscreens, and dumb terminals.

output

    Output devices are devices that bring information out of a computer.

    Pure output devices include such things as card punches, paper tape punches, LED displays (for light emitting diodes), monitors, printers, and pen plotters.

    Devices that have an output component include magnetic tape drives, combination paper tape reader/punches, teletypes, and dumb terminals.


kinds of programming

    The educational goal of this subchapter is to introduce the basic different kinds of programming in common use.

    There are two basic kinds of programming: system and application.

    System programming deals with the use of a computer system. This includes such things as the operating system, device drivers for input and output devices, and systems utilities.

    Application programming deals with the programs directly used by most people.

    Application programming is generally divided further into scientific and business programming.

    Scientific programming is work in the scientific, engineering, and mathematical fields. Often the programmers are the researchers who use the programs.

    Business programming is work in the data processing field, including large scale business systems, web-based businesses, and office applications. It is exceedingly rare for these kinds of programs to be created by the person who uses them.

    Another large category of programming is programming for personal or home use. This includes games. Historically, many advances in computer science occurred in the development of computer and video games.

    Embedded systems are programs that are built into specific hardware, such as the computer systems in an automobile or microwave oven. These programs combine features of operating systems and application program into a single monolithic system.

    Scripting is a simplified version of programming originally intended for use primarily by non-programmers. In practice, most non-programmers have trouble with scripting languages. Some professional programmers have built very useful, sometimes intricate systems using scripting languages, especially those contained in office software (such as word processors or spreadsheets).


programming languages

    The educational goals of this subchapter are to familiarize the student with the kinds of programming languages and the basic tools for programming. Key terminology is introduced (the student should know the definition of terms in bold). This is followed by some C specific information.

    Programming languages vary from low level assemblers to high level languages.

direct programming

    Originally computers were programmed directly in a “language” that the computer understood.

    This direct programming could involve directly wiring the program into the computer. In some cases, this involved a soldering iron. In other cases there was some kind of plug-board ot make it easier to change the programmed instructions. This method was known as hard wiring.

    Large telegraph networks and later large telephone networks became so complex as to essentially be a computer on a system-wide basis. Many of the ideas (especially logic circuits) that were later necessary to create computers were first developed for large scale telegraph and telephone systems.

    In some early computers the programming could be accomplished with a set of switches. The use of front panel switches (and corresponding indicator lights) continued as an option on many mainframe and minicomputer systems. Some microcomputer systems intended for hobbyists and for dedicated systems also had some kind of front panel switches.

    Another method was the use of punched cards. This was a technology originally developed for controlling early industrial age factories, particularly large looms. The designs or patterns for the cloth would be programmed using punched cards. This made it easy to switch to new designs. Some of the large looms became so complex that they were essentially computers, although that terminology wasn’t used at the time.

machine code and object code

    Both the front panel switch and the punched card methods involved the use of numeric codes. Each numeric code indicated a different machine instruction. The numbers used internally are known as machine code. The numbers on some external media, such as punched cards (or disk files) are known as object code.

assembly and assemblers

    One of the early developments was a symbolic assembler. Instead of writing down a series of binary numbers, the programmer would write down a list of machine instructions, using human-readable symbols. A special program, the assembler, would convert these symbolic instructions into object or machine code.

    Assembly languages have the advantage that they are easier to understand than raw machine code, but still give access to all of the power of the computer (as each assembler symbol translates directly into a specific machine instruction).

    Assembly languages have the disadvantage that they are still very close to machine language. These can be difficult for a human to follow and understand and time-consuming for a human to write. Also, programs written in assembly are tied to a specific computer hardware and can’t be reused on another kind of computer.

    The human readable version of assembly code is known as source code (it is the source that the assembler converts into object code). All programs written in high level languages are also called source code.

high level languages

    High level languages are designed to be easier to understand than assembly languages and allow a program to run on multiple different kinds of computers.

    The source code written in high level languages needs to be translated into object code. The two basic approaches are compilers and interpetters. Some programming languages are available in both interpretted and compiled versions.

    High level languages have usually been designed to meet the needs of some particular kind of programming. For example, FORTRAN was originally intended for scientific programming. COBOL was originally intended for business programming and data processing. SQL was originally intended for data base queries. C was originally intended for systems programming. LISP was originally intended for list processing. PHP was originally intended for web scripting. Ada was originally intended for embedded systems. BASIC and Pascal were originally intended as teaching languages.

    Some high level languages were intended to be general purpose programming languages. Examples include PL/I and Modula-2. Some languages that were originally intended for a specific purpose have turned into general purpose programming languages, such as C and Pascal.

interpreters

    Interpreters convert each high level instruction into a series of machine instructions and then immediately run (or execute) those instructions. In some cases, the interpreter has a library of routines and looks up the correct routine from the library to handle each high level instruction.

compilers

    Compilers convert a finished program (or section of a program) into object code. This is often done in steps. Some compilers convert high level language instructions into assembly language instructions and then an assembler is used to create the finished object code.

    Some compilers convert high level language instructions into an intermediate language. This intermediate language is platform-independent (it doesn’t matter which actual computer hardware is eventually used). The intermediate language is then converted into object code for a specific kind of computer. This approach makes it easier to move (or port) a compiler from one kind of computer to another. Only the last step (or steps) need to be rewritten, while the main complier is reused.

    Compiled code almost always runs faster than interpretted code. An optimizing compiler examines a high level program and figures out ways to optimize the program so that it runs even faster.

C

    A C program is considered to be strictly conforming to ANSI C if the program only uses features and libraries as they are described in the ANSI standard (with no additional optional features or extensions).

    A conforming hosted implementation accepts any strictly conforming program. This applies to a program that is intended to run on an operating system.

    A conforming freestanding implementation accepts any strictly conforming program that doesnt use any library facilities other than those in the header files float.h, limits.h, stdarg.h, and stdef.h.. This applies to a program that is intended to run on an embedded system or other environment with minimal operating system support (such as no file system)..

linkers

    As programs grow in size, requiring teams of programmers, there is a need to break them up into separate files so that different team members can work on their individual assignments without interfering with the work of others. Each file is compiled separately and then combined later.

    Linkers are programs that combine the various parts of a large program into a single object program. Linkers also bring in support routines from libraries. These libraries contain utility and other support code that is reused over and over for lots of different programs.

    Historically, linkers also served additional purposes that are no longer necessary, such as resolving relocatable code on early hardware (so that more than one program could run at the same time).

loaders

    A loader is a program that loads programs into main memory so that they can be run. In the past, a loader would have to be explicitely run as part of a job. In modern times the loader is hidden away in the operating system and called automatically when needed.

editors

    An editor is a program that is used to edit (or create) the source files for programming. Editors rarely have the advanced formatting and other features of a regular word processor, but sometimes include special tools and features that are useful for programming.

    Two important editors are emacs and vi from the UNIX world. I personally use Tom Bender’s Tex-Edit Plus ( http://www.tex-edit.com/ ), which is available in multiple different languages (Danish, English, French, German, Italian, Japanese, Spanish).

command line interface

    A command line interface is an old-style computer interface where the programmer (or other person) controls the computer by typing lines of text. The text lines are used to give instructions (or commands) to the computer. The most famous example of a command line interface is the UNIX shell.

    In addition to built-in commands, command line interfaces could be used to run programs. Additional information could be passed to a program, such as names of files to use and various “program switches” that would modify how a program operated.

development environment

    A development environment is an integrated set of programs (or sometimes one large monolithic program) that is used to support writing computer software. Development environments typically include an editor, compiler (or compilers), linkers, and various additional support tools. Development environments may include their own limited command line interface specifically intended for programmers.

    The term “development environment” can also be used to mean the collection of programs used for writing software, even if they aren’t integrated with each other.

    Because there are a huge number of different development environments and a complete lack of any standardization, the methods used for actually typing in, compiling, and running a program are not covered by this book. Please refer to your local documentation for details.

    The development environment for UNIX, Linux, and Mac OS X are discussed in the chapter on shell programming.

Stanford introduction

    Stanford CS Education Library This [the following section until marked as end of Stanford University items] is document #101, Essential C, in the Stanford CS Education Library. This and other educational materials are available for free at http://cslibrary.stanford.edu/. This article is free to be used, reproduced, excerpted, retransmitted, or sold so long as this notice is clearly reproduced at its beginning. Copyright 1996-2003, Nick Parlante, nick.parlante@cs.stanford.edu.

The C Language

    C is a professional programmer’s language. It was designed to get in one’s way as little as possible. Kernighan and Ritchie wrote the original language definition in their book, The C Programming Language (below), as part of their research at AT&T. Unix and C++ emerged from the same labs. For several years I used AT&T as my long distance carrier in appreciation of all that CS research, but hearing “thank you for using AT&T” for the millionth time has used up that good will.

    Some languages are forgiving. The programmer needs only a basic sense of how things work. Errors in the code are flagged by the compile-time or run-time system, and the programmer can muddle through and eventually fix things up to work correctly. The C language is not like that.

    The C programming model is that the programmer knows exactly what they want to do and how to use the language constructs to achieve that goal. The language lets the expert programmer express what they want in the minimum time by staying out of their way.

    C is “simple’ in that the number of components in the language is small-- If two language features accomplish more-or-less the same thing, C will include only one. C’s syntax is terse and the language does not restrict what is “allowed” -- the programmer can pretty much do whatever they want.

    C’s type system and error checks exist only at compile-time. The compiled code runs in a stripped down run-time model with no safety checks for bad type casts, bad array indices, or bad pointers. There is no garbage collector to manage memory. Instead the programmer mangages heap memory manually. All this makes C fast but fragile.

Analysis -- Where C Fits

    Because of the above features, C is hard for beginners. A feature can work fine in one context, but crash in another. The programmer needs to understand how the features work and use them correctly. On the other hand, the number of features is pretty small.

    Like most programmers, I have had some moments of real loathing for the C language. It can be irritatingly obedient -- you type something incorrectly, and it has a way of compiling fine and just doing something you don’t expect at run-time. However, as I have become a more experienced C programmer, I have grown to appreciate C’s straight-to-the point style. I have learned not to fall into its little traps, and I appreciate its simplicity.

    Perhaps the best advice is just to be careful. Don’t type things in you don’t understand. Debugging takes too much time. Have a mental picture (or a real drawing) of how your C code is using memory. That’s good advice in any language, but in C it’s critical.

    Perl and Java are more “portable” than C (you can run them on different computers without a recompile). Java and C++ are more structured than C. Structure is useful for large projects. C works best for small projects where performance is important and the progammers have the time and skill to make it work in C. In any case, C is a very popular and influential language. This is mainly because of C’s clean (if minimal) style, its lack of annoying or regrettable constructs, and the relative ease of writing a C compiler.

Other Resources

    The C Programming Language, 2nd ed., by Kernighan and Ritchie. The thin book which for years was the bible for all C programmers. Written by the original designers of the language. The explanations are pretty short, so this book is better as a reference than for beginners.

    Stanford CS Education Library This [the following section until marked as end of Stanford University items] is document #101, Essential C, in the Stanford CS Education Library. This and other educational materials are available for free at http://cslibrary.stanford.edu/. This article is free to be used, reproduced, excerpted, retransmitted, or sold so long as this notice is clearly reproduced at its beginning. Copyright 1996-2003, Nick Parlante, nick.parlante@cs.stanford.edu.

end of Stanford introduction


standards and variants

    The educational goal of this subchapter is to make the student aware that there are offcial versions of many programming languages, but that in practice there are a lot of variations (and that a programmer must be able to adapt to changes and variations).

    Programming languages can meet official standards or come in variants and dialects.

    Programming languages have traditionally been developed either by a single author or by a committee.

    Typically after a new programming language is released, new features or modifications, called variants, start to pop-up. The different versions of a programming language are called dialects. Over time, the most popular of these variants become common place in all the major dialects.

    If a programming language is popular enough, some international group or committee will create an official standard version of a programming language. The largest of these groups are ANSI (Ameican national Standards Institute) and ISO (International Orgnaization for Standardization).

    While variants and dialects may offer very useful features, the use of the non-standard features will lock the program into a particular development environment or compiler and often will lock the program into a specific operating system or even hardware platform.

    Use of official standaards allows for portability, which is the ability to move a program from one machine or operating system to another.

    While variants were traditionally introduced in an attempt to improve a programming language, Microsoft started the practice of intentionally creating variants to lock developers into using Microsoft products. In some cases the Microsoft variants offered no new features, but merely chaanged from the established standard for the sake of being different. Microsoft lost a lawsuit with Sun Microsystems for purposely creating variants to Java in hopes of killing off Java in favor of Microsoft languages.


Hello World

    The educational goal of this subchapter is to give the student a general idea of what a simple computer program looks like in their chosen language. The professor may present only the example from the programming language being taught or may show many examples to give a feel for some of the similarities and differences between common programming languages.

    The classic example for any programming language is “Hello World” (a simple program that writes the text “Hello World”). These examples show how to write this simple program in different programming languages.

    The cliche first program is Hello World — a simple program that types the message “Hello World”. Even experienced programmers will often write a simple Hello World program when learning a new programming language or switching to a new development environment.

    Kernighan and Ritchie popularized the Hello World! program in their book C Programming Language.

    Every programming language has its own peculair rules for the form of a program. The Hello World program is a fine illustration of the basic rules of a programming language.

    All of the code below produces the same basic results: printing or typing the phrase “Hello World”. You may want to take a brief look at how different languages can have very different methods for achieving the same results.

Ada

with Ada.Text_IO;
use Ada.Text_IO;

procedure HelloWorld is
begin
   Ada.Text_IO.Put_Line("Hello World");
end HelloWorld;

    Ada was first released in 1983 (ADA 83), with major releases in 1995 (ADA 95) and 2005 (ADA 2005). Ada was created by the U.S. Department of Defense (DoD), originally intended for embedded systems and later intended for all military computing purposes.

    Ada is named for Augusta Ada King, the Countess of Lovelace, the first computer programmer in modern times.

ALGOL

BEGIN
   OUTSTRING(2, "Hello World");
END.

    ALGOL (ALGOrithmic Language) was first formalized in 1958 (ALGOL 58), with major releases in 1960 (ALGOL 60) and 1968 (ALGOL 68). ALGOL was originally intended for scientific computations.

    ALGOL is considered to be the first second generation computer language.

    ALGOL was a highly influential programming language. Most modern programming languages are descendants of ALGOL.

    ALGOL introduced such concepts as: block structure of code, scope of variables, BNF notation for defining syntax, dynamic arrays, reserved words, and user defined data types.

BASIC

10 PRINT "Hello World"

    BASIC (Beginner’s All-purpose Symbolic Instruction Code) was designed as a teaching language in 1963 by John George Kemeny and Thomas Eugene Kurtz of Dartmouth College.

C

#include <stdio.h>

main()
{
   printf("Hello World\n");
}

    C was developed in 1972 by Dennis Ritchie of Bell Telephone Laboratories for use in systems programming for UNIX.

C++

#include <iostream.h>

int main(int argc, char *argv[])
{
   cout << "Hello World" << endl;
   return 0;
}

    C++ was developed in 1983 by Bjarne Stroustrup at Bell Telephone Laboratories to extend C for object oriented programming.

COBOL

IDENTIFICATION DIVISION.
PROGRAN-ID. HelloWorld.
AUTHOR. Milo.

ENVIRONMENT DIVISION.
CONFIGURATION SECTION.
INPUT-OUTPUT SECTION.

DATA DIVISION.
FILE SECTION.
WORKING-STORAGE SECTION.
LINKAGE SECTION.

PROCEDURE DIVISION.
DISPLAY "Hello World".
STOP RUN.

    COBOL (COmmon Business Oriented Language) was created in 1959 by the Short Range Committee of the U.S. Department of Defense (DoD). Official ANSI standards included COBOL-68 (1968), COBOL-74 (1974), COBOL-85 (1985), and COBOL-2002 (2002). COBOL 97 (1997) introduced an object oriented version of COBOL.

Dylan

define method hello-world()
   format-out("Hello World\n");
end method hello-world;

hello-world();

    Dylan was created in the early 1990s by Apple Computer and others. Dylan was originally intended for use with the Apple Newton, but wasn’t finished in time.

Forth

: hello_world ." Hello World" ;

    Forth was created by Charles H. Moore in 1968. Forth was a reference to Moore’s claim that he had created a fourth generation programming language.

FORTRAN

      PROGRAM HELLO
      WRITE(UNIT=*, FMT=*) 'Hello World'
      END

    FORTRAN (FORmula TRANslation) was created in 1954 at International Business Machines (now IBM). FORTRAN is the oldest programming language still in common use.

HTML

<html>
<head>
<title>Hello World</title>
</head>
<body>
<p>Hello World</p>
</body>
</html>

    HTML (HyperText Markup Language) was developed in 1989.

Java

import java.applet.*;
import java.awt.*;
Public class HelloWorld extends Applet
{
public void paint(Graphics g)
   {
      g.drawstring("Hello World".,10,10);
   }
}

    Java (a reference to coffee) was created by Sun Microsystems and released in 1995.

LISP

(print "Hello World".)

    LISP (LISt Processing) was created n 1958 by John McCarthy of MIT. LISP is the second oldest programming language still in common use.

Logo

print [Hello World]

    Logo was created in 1967 by Daniel G. Bobrow, Wally Feurzeig, and Seymour Papert.

Modula-2

MODULE HelloWorld;
FROM InOut IMPORT WriteString, WriteLn;
BEGIN
   WriteString('Hello World');
   WriteLn;
END HelloWorld.

    Modula (MODUlar LAnguage) was created by Niklaus Wirth, who started work in 1977. Modula-2 was released in 1980.

Oberon

MODULE HelloWorld;
IMPORT Out;
BEGIN
   Out.Open;
   Out.String('Hello World');
END HelloWorld.

    Oberon (named for a moon of Uranus) was created in 1986 by Niklaus Wirth.

Pascal

PROGRAM HelloWorld (OUTPUT);
BEGIN
   WRITELN('Hello World');
END.

    Pascal (named for French religious fanatic and mathematician Blaise Pascal) was created in 1970 by Niklaus Wirth.

Perl

print "Hello World\n";

    Perl (Practical Extracting and Report Language) was created by Larry Wall in 1987.

PHP

<?php
   echo "Hello World\n";
?>

    PHP (PHP Hypertext Processor) was created by Rasmus Lerdorf in 1995.

PL/I

HELLO: PROCEDURE OPTIONS (MAIN);
   PUT SKIP LIST('HELLO WORLD');
END HELLO;

    PL/I (Programming Language One) was created in 1964 at IBM’s Hursley Laboratories in the United Kingdom.

PostScript

/Courier findfont
14 scalefont
setfont
0 0 moveto
(Hello World) show
showpage

    PostScript was created in 1984 by John Warnock and Chuck Geschke at Adobe.

Prolog

?- write('Hello World'), nl.

    Prolog (PROgramming LOGic) was created in 1972 by Alan Colmerauer with Philippe Roussel.

Python

print: "Hello World"

    Python (named for Monty Python Flying Circus) was created in 1991 by Guido van Rossum.

Ruby

"Hello World\n".display

    Ruby was created in 1995 by Yukihiro Matsumoto.

Shell Script (BASH)

echo Hello World

SmallTalk

'Hello World' out.

    SmallTalk was created in 1969 at Xerox PARC by a team led by Alan Kay, Adele Goldberg, Ted Kaehler, and Scott Wallace.

SNOBOL

   OUTPUT = "Hello World"
END

    SNOBOL (StroNg Oriented symBOli Language) was created in 1962.

SQL

SELECT "Hello World"

    SQL (Standard Query Language) was designed by Donald D. Chamberlin and Raymond F. Boyce of IBM in 1974.


creating a program

    Using your local development environment, you will want to try to type in the example for the language you are learning, then attempt to compile it, and finally attempt to run it. These are critical skills to learn before you can start learning how to program.

C

    The typical steps for compiling a program in C on a UNIX machine are:

step command input output
create source code ed
emacs
use any text editor
type from keyboard or terminal source code

check
(for lexical errors)
lint source code file listing with warnings

preprocess cc
(or cpp)
source code file c code file

compile
(convert to assembly for specific hardware platform)
cc2 c code file assembly source code file

assemble
(for specific hardware platform)
asm
(or as)
(or masm)
assembly language file a.out
object code file

link link object code file executable code

run program name file with
executable code
results of program

other

   “7. It is easier to write an incorrect program than understand a correct one.” —Alan Perlis, Epigrams on Programming, ACM’s SIGPLAN Notices Volume 17, No. 9, September 1982, pages 7-13


brief history of programming languages
and other significant milestones

    This chapter includes a brief history of programming languages.

    The educational goal of this chapter is to familiarize the student with the history of computer programming. This chapter may provide a good overview for classes on the history of computers or history of programming languages.

    There have been literally thousands of programming languages, many of which have been lost to history.

    This history of programming languages also discusses the developments of computer hardware, computer operating systems, games, and technology. Games are included because often the major advances in computer hardware and software technologies first appeared in games. As one famous example, the roots of UNIX were the porting of an early computer game to new hardware.

antiquity

    The earliest calculating machine was the abacus, believed to have been invented in Babylon around 2400 BCE. The abacus was used by many different cultures and civilizations, including the major advance known as the Chinese abacus from the 2nd Century BCE.

200s BCE

    The Antikythera mechanism, discovered in a shipwreck in 1900, is an early mechanical analog computer from between 150 BCE and 100 BCE. The Antikythera mechanism used a system of 37 gears to compute the positions of the sun and the moon through the zodiac on the Egyptian calendar, and possibly also the fixed stars and five planets known in antiquity (Mercury, Venus, Mars, Jupiter, and Saturn) for any time in the future or past. The system of gears added and subtracted angular velocities to compute differentials. The Antikythera mechanism could accurately predict eclipses and could draw up accurate astrological charts for important leaders. It is likely that the Antikythera mechanism was based on an astrological computer created by Archimedes of Syracuse in the 3rd century BCE.

100s BCE

    The Chinese developed the South Pointing Chariot in 115 BCE. This device featured a differential gear, later used in modern times to make analog computers in the mid-20th Century.

400s

    The Indian grammarian Panini wrote the Ashtadhyayi in the 5th Century BCE. In this work he created 3,959 rules of grammar for India’s Sanskrit language. This important work is the oldest surviving linguistic book and introduced the idea of metarules, transformations, and recursions, all of which have important applications in computer science.

1400s

    The Inca created digital computers using giant loom-like wooden structures that tied and untied knots in rope. The knots were digital bits. These computers allowed the central government to keep track of the agricultural and economic details of their far-flung empire. The Spanish conquered the Inca during fighting that stretched from 1532 to 1572. The Spanish destroyed all but one of the Inca computers in the belief that the only way the machines could provide the detailed information was if they were Satanic divination devices. Archaeologists have long known that the Inca used knotted strings woven from cotton, llama wool, or alpaca wool called khipu or quipus to record accounting and census information, and possibly calendar and astronomical data and literature. In recent years archaeologists have figured out that the one remaining device, although in ruins, was clearly a computer.

1800s

    Charles Babbage created the difference engine and the analytical engine, often considered to be the first modern computers. Augusta Ada King, the Countess of Lovelace, was the first modern computer programmer.

    In the 1800s, the first computers were programmable devices for controlling the weaving machines in the factories of the Industrial Revolution. Created by Charles Babbage, these early computers used Punch cards as data storage (the cards contained the control codes for the various patterns). These cards were very similiar to the famous Hollerinth cards developed later. The first computer programmer was Lady Ada, for whom the Ada programming language is named.

    In 1822 Charles Babbage proposed a difference engine for automated calculating. In 1933 Babbage started work on his Analytical Engine, a mechanical computer with all of the elements of a modern computer, including control, arithmetic, and memory, but the technology of the day couldn’t produce gears with enough precision or reliability to make his computer possible. The Analytical Engine would have been programmed with Jacquard’s punched cards. Babbage designed the Difference Engine No.2. Lady Ada Lovelace wrote a program for the Analytical Engine that would have correctly calculated a sequence of Bernoulli numbers, but was never able to test her program because the machine wasn’t built.

    George Boole introduced what is now called Boolean algebra in 1854. This branch of mathematics was essential for creating the complex circuits in modern electronic digital computers.

1900s

    In the 1900s, researchers started experimenting with both analog and digital computers using vacuum tubes. Some of the most successful early computers were analog computers, capable of performing advanced calculus problems rather quickly. But the real future of computing was digital rather than analog. Building on the technology and math used for telephone and telegraph switching networks, researchers started building the first electronic digital computers.

    Leonardo Torres y Quevedo first proposed floating point arithmetic in Madrid in 1914. Konrad Zuse independently proposed the same idea in Berlin in 1936 and built it into the hardware of his Zuse computer. George Stibitz also indpendently proposed the idea in New Jersey in 1939.

    The first modern computer was the German Zuse computer (Z3) in 1941. In 1944 Howard Aiken of Harvard University created the Harvard Mark I and Mark II. The Mark I was primarily mechanical, while the Mark II was primarily based on reed relays. Telephone and telegraph companies had been using reed relays for the logic circuits needed for large scale switching networks.

    The first modern electronic computer was the ENIAC in 1946, using 18,000 vacuum tubes. See below for information on Von Neumann’s important contributions.

    The first solid-state (or transistor) computer was the TRADIC, built at Bell Laboratories in 1954. The transistor had previously been invented at Bell Labs in 1948.

1938

    Computers: Zuse Z1 (Germany, 1 OPS, first mechanical programmable binary computer, storage for a total of 64 numbers stored as 22 bit floating point numbers with 7-bit exponent, 15-bit signifocana [one implicit bit], and sign bit); Konrad Zuse called his floating point hardware “semi-logarithmic notation” and included the ability to handle infinity and undefined.

1941

    Computers: Atanasoff-Berry Computer; Zuse Z3 (Germany, 20 OPS, added floating point exceptions, plus and minus infinity, and undefined)

1942

    Computers: work started on Zuse Z4

1943

    Computers: Harvard Mark I (U.S.); Colossus 1 (U.K., 5 kOPS)

1944

    Computers: Colossus 2 (U.K., single processor, 25 kOPS); Harvard Mark II and AT&T Bell Labortories’ Model V (both relay computers) were the first American computers to include floating point hardware

1945

    Plankalkül (Plan Calculus), created by Konrad Zuse for the Z3 computer in Nazi germany, may have been the first programming language (other than assemblers). This was a surprisingly advanced programming language, with many features that didn’t appear again until the 1980s.

    Computers: Zuse Z4 (relay based computer, first commercial computer)

1946

    Computers: UPenn Eniac (5 kOPS); Colossus 2 (parallel processor, 50 kOPS)
    Technology: electrostatic memory

1948

    Computers: IBM SSEC; Manchester SSEM
    Technology: random access memory; magnetic drums; transistor

1949

    Short Code created in 1949. This programming language was compiled into machine code by hand.

    Computers: Manchester Mark 1
    Technology: registers

1951

    Grace Hopper starts work on A-0.

    Computers: Ferranti Mark 1 (first commercial computer); Leo I (frst business computer); UNIVAC I, Whirlwind

1952

    Autocode, a symbolic assembler for the Manchester Mark I computer, was created in 1952 by Alick E. Glennie. Later used on other computers.

    A-0 (also known as AT-3), the first compiler, was created in 1952 by Grace Murray Hopper. She later created A-2, ARITH-MATIC, MATH-MATIC, and FLOW-MATIC, as well as being one of the leaders in the development of COBOL.Grace Hopper was working for Remington Rand at the time. Rand released the language as MATH-MATIC in 1957.

    According to some sources, first work started on FORTRAN.

    Computers: UNIVAC 1101; IBM 701
    Games: OXO (a graphic version of Tic-Tac-Toe created by A.S. Douglas on the EDSAC computer at the University of Cambridge to demonstrate ideas on human-computer interaction)

1953

    Computers: Strela

1954

    FORTRAN (FORmula TRANslator) was created in 1954 by John Backus and other researchers at International Business Machines (now IBM). Released in 1957. FORTRAN is the oldest programming language still in common use. Identifiers were limited to six characters. Elegant representation of mathematic expressions, as well as relatively easy input and output. FORTRAN was based on A-0.

    “Often referred to as a scientific language, FORTRAN was the first high-level language, using the first compiler ever developed. Prior to the development of FORTRAN computer programmers were required to program in machine/assembly code, which was an extremely difficult and time consuming task, not to mention the dreadful chore of debugging the code. The objective during its design was to create a programming language that would be: simple to learn, suitable for a wide variety of applications, machine independent, and would allow complex mathematical expressions to be stated similarly to regular algebraic notation. While still being almost as efficient in execution as assembly language. Since FORTRAN was so much easier to code, programmers were able to write programs 500% faster than before, while execution efficiency was only reduced by 20%, this allowed them to focus more on the problem solving aspects of a problem, and less on coding.

    “FORTRAN was so innovative not only because it was the first high-level language [still in use], but also because of its compiler, which is credited as giving rise to the branch of computer science now known as compiler theory. Several years after its release FORTRAN had developed many different dialects, (due to special tweaking by programmers trying to make it better suit their personal needs) making it very difficult to transfer programs from one machine to another.” —Neal Ziring, The Language Guide, University of Michigan

    “Some of the more significant features of the language are listed below:” —Neal Ziring, The Language Guide, University of Michigan

  • Simple to learn - when FORTRAN was design one of the objectives was to write a language that was easy to learn and understand.
  • Machine Independent - allows for easy transportation of a program from one machine to another.
  • More natural ways to express mathematical functions - FORTRAN permits even severely complex mathematical functions to be expressed similarly to regular algebraic notation.
  • Problem orientated language
  • Remains close to and exploits the available hardware
  • Efficient execution - there is only an approximate 20% decrease in efficiency as compared to assembly/machine code.
  • Ability to control storage allocation -programmers were able to easily control the allocation of storage (although this is considered to be a dangerous practice today, it was quite important some time ago due to limited memory.
  • More freedom in code layout - unlike assembly/machine language, code does not need to be laid out in rigidly defined columns, (though it still must remain within the parameters of the FORTRAN source code form).

   “42. You can measure a programmer’s perspective by noting his attitude on the continuing vitality of FORTRAN.” —Alan Perlis, Epigrams on Programming, ACM’s SIGPLAN Notices Volume 17, No. 9, September 1982, pages 7-13

    Computers: IBM 650; IBM 704 (vacuum tube computer with floating point); IBM NORC (67 kOPS)
    Technology: magnetic core memory

1955

    Operating Systems: GMOS (General Motors OS for IBM 701)
    Computers: Harwell CADET

1956

    Researchers at MIT begin experimenting with direct keyboard input into computers.

    IPL (Information Processing Language) was created in 1956 by A. Newell, H. Simon, and J.C. Shaw. IPL was a low level list processing language which implemented recursive programming.

    Operating Systems: GM-NAA I/O
    Computers: IBM 305 RAMAC; MIT TX-0 (83 kOPS)
    Technology: hard disk

1957

    MATH-MATIC was released by the Rand Corporation in 1957. The language was derived from Grace Murray Hopper’s A-0.

    FLOW-MATIC, also called B-0, was created in 1957 by Grace Murray Hopper.

    The first commercial FORTRAN program was run at Westinghouse. The first compile run produced a missing comma diagnostic. The second attempt was a success.

    The U.S. government created the Advanced Research Project Group (ARPA) in esponse to the Soviet Union’s launching of Sputnik. ARPA was intended to develop key technology that was too risky for private business to develop.

    Computers: IBM 608
    Technology: dot matrix printer

1958

    FORTRAN II in 1958 introduces subroutines, functions, loops, and a primitive For loop.

    IAL (International Algebraic Logic) started as the project later renamed ALGOL 58. The theoretical definition of the language is published. No compiler.

    LISP (LISt Processing) was created n 1958 and released in 1960 by John McCarthy of MIT. LISP is the second oldest programming language still in common use. LISP was intended for writing artificial intelligence programs.

    “Interest in artificial intelligence first surfaced in the mid 1950. Linguistics, psychology, and mathematics were only some areas of application for AI. Linguists were concerned with natural language processing, while psychologists were interested in modeling human information and retrieval. Mathematicians were more interested in automating the theorem proving process. The common need among all of these applications was a method to allow computers to process symbolic data in lists.

    “IBM was one of the first companies interested in AI in the 1950s. At the same time, the FORTRAN project was still going on. Because of the high cost associated with producing the first FORTRAN compiler, they decided to include the list processing functionality into FORTRAN. The FORTRAN List Processing Language (FLPL) was designed and implemented as an extention to FORTRAN.

    “In 1958 John McCarthy took a summer position at the IBM Information Research Department. He was hired to create a set of requirements for doing symbolic computation. The first attempt at this was differentiation of algebraic expressions. This initial experiment produced a list of of language requirements, most notably was recursion and conditional expressions. At the time, not even FORTRAN (the only high-level language in existance) had these functions.

    “It was at the 1956 Dartmouth Summer Research Project on Artificial Intelligence that John McCarthy first developed the basics behind Lisp. His motivation was to develop a list processing language for Artificial Intelligence. By 1965 the primary dialect of Lisp was created (version 1.5). By 1970 special-purpose computers known as Lisp Machines, were designed to run Lisp programs. 1980 was the year that object-oriented concepts were integrated into the language. By 1986, the X3J13 group formed to produce a draft for ANSI Common Lisp standard. Finally in 1992, X3J13 group published the American National Standard for Common Lisp.” —Neal Ziring, The Language Guide, University of Michigan

    “Some of the more significant features of the language are listed below:” —Neal Ziring, The Language Guide, University of Michigan

  • Atoms & Lists - Lisp uses two different types of data structures, atoms and lists.
  • Atoms are similar to identifiers, but can also be numeric constants
  • Lists can be lists of atoms, lists, or any combination of the two
  • Functional Programming Style - all computation is performed by applying functions to arguments. Variable declarations are rarely used.
  • Uniform Representation of Data and Code - example: the list (A B C D)
    • a list of four elements (interpreted as data)
    • is the application of the function named A to the three parameters B, C, and D (interpreted as code)
  • Reliance on Recursion - a strong reliance on recursion has allowed Lisp to be successful in many areas, including Artificial Intelligence.
  • Garbage Collection - Lisp has built-in garbage collection, so programmers do not need to explicitly free dynamically allocated memory.

   “55. A LISP programmer knows the value of everything, but the cost of nothing.” —Alan Perlis, Epigrams on Programming, ACM’s SIGPLAN Notices Volume 17, No. 9, September 1982, pages 7-13

    Operating Systems: UMES
    Computers: UNIVAC II; IBM AN/FSQ-7 (400 kOPS)
    Games:Tennis For Two (developed by William Higinnotham using an osciliscope and an analog computer)
    Technology: integrated circuit

1959

    COBOL (COmmon Business Oriented Language) was created in May 1959 by the Short Range Committee of the U.S. Department of Defense (DoD). The CODASYL committee (COnference on DAta SYstems Languages) worked from May 1959 to April 1960. Official ANSI standards included COBOL-68 (1968), COBOL-74 (1974), COBOL-85 (1985), and COBOL-2002 (2002). COBOL 97 (1997) introduced an object oriented version of COBOL. COBOL programs are divided into four divisions: identification, environment, data, and procedure. The divisions are further divided into sections. Introduced the RECORD data structure. Emphasized a verbose style intended to make it easy for business managers to read programs. Admiral Grace Hopper is recognized as the major contributor to the original COBOl language and as the inventor of compilers.

    LISP 1.5 released in 1959.

    “ ‘DYNAMO is a computer program for translating mathematical models from an easy-to-understand notation into tabulated and plotted results. … A model written in DYNAMO consists of a number of algebraic relationships that relate the variables one to another.’ Although similar to FORTRAN, it is easier to learn and understand. DYNAMO stands for DYNAmic MOdels. It was written by Dr. Phyllis Fox and Alexander L. Pugh, III, and was completed in 1959. It grew out of an earlier language called SIMPLE (for Simulation of Industrial Management Problems with Lots of Equations), written in 1958 by Richard K. Bennett.” —Language Finger, Maureen and Mike Mansfield Library, University of Montana.

    ERMA (Electronic Recording Method of Accounting), a magnetic ink and computer readable font, was created for the Bank of America.

    Operating Systems: SHARE
    Computers: IBM 1401

1960

    ALGOL (ALGOrithmic Language) was released in 1960. Major releases in 1960 (ALGOL 60) and 1968 (ALGOL 68). ALGOL is the first block-structured labguage and is considered to be the first second generation computer language. This was the first programming language that was designed to be machine independent. ALGOL introduced such concepts as: block structure of code (marked by BEGIN and END), scope of variables (local variables inside blocks), BNF (Backus Naur Form) notation for defining syntax, dynamic arrays, reserved words, IF THEN ELSE, FOR, WHILE loop, the := symbol for assignment, SWITCH with GOTOs, and user defined data types. ALGOL became the most popular programming language in Europe in the mid- and late-1960s.

    C.A.R. Hoare invents the Quicksortin 1960.

    Operating Systems: IBSYS
    Computers: DEC PDP-1; CDC 1604; UNIVAC LARC (250 kFLOPS)

1961

    Operating Systems: CTSS, Burroughs MCP
    Computers: IBM 7030 Stretch (1.2 MFLOPS)

1962

    APL (A Programming Language) was published in the 1962 book A Programming Language by Kenneth E. Iverson and a subset was first released in 1964. The language APL was based on a notation that Iverson invented at Harvard University in 1957. APL was intended for mathematical work and used its own special character set. Particularly good at matrix manipulation. In 1957 it introduced the array. APL used a special character set and required special keyboards, displays, and printers (or printer heads).

    FORTRAN IV is released in 1962.

    Simula was created by Ole-Johan Dahl and Kristen Nygaard of the Norwegian Computing Center between 1962 and 1965. A compiler became available in 1964. Simula I and Simula 67 (1967) were the first object-oriented programming languages.

    SNOBOL (StroNg Oriented symBOli Language) was created in 1962 by D.J. Farber, R.E. Griswold, and F.P. Polensky at Bell Telephone Laboratories. Intended for processing strings, the language was the first to use associative arrays, indexed by any type of key. Had features for pattern-matching, concatenation, and alternation. Allowed running code stored in strings. Data types: integer, real, array, table, pattern, and user defined types.

    SpaceWarI, the first interactive computer game, was created by MIT students Slug Russel, Shag Graetz, and Alan Kotok on DEC’s PDP-1.

    Operating Systems: GECOS
    Games: Spacewar! (created by group of M.I.T. students on the DEC PDP-1)
    Computers: ATLAS, UNIVAC 1100/2200 (introduced two floating point formats, single precision and double precision; single precision: 36 bits, 1-bit sign, 8-bit exponent, and 27-bit significand; double precision: 36 bits, 1-bit sign, 11-bit exponent, and 60-bit significand), IBM 7094 (followed the UNIVAC, also had single and double precision numbers)
    Technology: RAND Corporation proposes the internet

1963

    Work on PL/I starts in 1963.

    “Data-Text was the “original and most general problem-oriented computer language for social scientists.” It has the ability to handle very complicated data processing problems and extremely intricate statistical analyses. It arose when FORTRAN proved inadequate for such uses. Designed by Couch and others, it was first used in 1963/64, then extensively revised in 1971. The Data-Text System was originally programmed in FAP, later in FORTRAN, and finally its own language was developed.” —Language Finger, Maureen and Mike Mansfield Library, University of Montana.

    Sketchpad, an interactive real time computer drawing system, was created in 1963 by Ivan Sutherland as his doctoral thesis at MIT. The system used a light pen to draw and manipulate geometric figures on a computer screen.

    ASCII (American Standard Code for Information Interchange) was introduced in 1963.

    Computers: DEC PDP-6
    Technology: mouse

1964

    BASIC (Beginner’s All-purpose Symbolic Instruction Code) was designed as a teaching language in 1963 by John George Kemeny and Thomas Eugene Kurtz of Dartmouth College. BASIC was intended to make it easy to learn programming. The first BASIC program was run at 4 a.m. May 1, 1964.

    PL/I (Programming Language One) was created in 1964 at IBM’s Hursley Laboratories in the United Kingdom. PL/I was intended to combine the scientific abilities of FORTRAN with the business capabilities of COBOL, plus additional facilities for systems programming. Also borrows from ALGOL 60. Originally called NPL, or New Programming Language. Introduces storage classes (automatic, static, controlled, and based), exception processing (On conditions), Select When Otherwise conditional structure, and several variations of the DO loop. Numerous data types, including control over precision.

    RPG (Report Program Generator) was created in 1964 by IBM. Intended for creating commercial and business reports.

    APL\360 implemented in 1964.

    Operating Systems: DTSS, TOPS-10
    Computers: IBM 360; DEC PDP-8; CDC 6600 (first supercomputer, scalar processor, 3 MFLOPS)
    Technology: super computing

1965

    SNOBOL 3 was released in 1965.

    Attribute grammars were created in 1965 by Donald Knuth.

    Operating Systems: OS/360; Multics
    Technology: time-sharing; fuzzy logic; packet switching; bulletin board system (BBS); email

1966

    ALGOL W was created in 1966 by Niklaus Wirth. ALGOL W included RECORDs, dynamic data structures, CASE, passing parameters by value, and precedence of operators.

    Euler was created in 1966 by Niklaus Wirth.

    FORTRAN 66 was released in 1966. The language was rarely used.

    ISWIM (If You See What I Mean) was described in 1966 in Peter J. Landin’s article The Next 700 Programming Languages in the Communications of the ACM. ISWIM, the first purely functional language, influenced functional programming languages. The first language to use lazy evaluation.

    LISP 2 was released in 1966.

    Computers: BESM-6

1967

    Logo was created in 1967 (work started in 1966) by Seymour Papert. Intended as a programming language for children. Started as a drawing program. Based on moving a “turtle” on the computer screen.

    Simula 67 was created by Ole-Johan Dahl and Kristen Nygaard of the Norwegian Computing Center in 1967. Introduced classes, methods, inheriteance, and objects that are instances of classes.

    SNOBOL 4 (StroNg Oriented symBOli Language) was released in 1967.

    CPL (Combined Programming Language) was created in 1967 at Cambridge and London Universities. Combined ALGOL 60 and functional language. Used polymorphic testing structures. Included the ANY type, lists, and arrays.

    Operating Systems: ITS; CP/CMS; WAITS

1968

    ALGOL 68 in 1968 introduced the =+ token to combine assignment and add, UNION, and CASTing of types. It included the IF THEN ELIF FI structure, CASE structure, and user-defined operators.

    Forth was created by Charles H. Moore in 1968. Stack based language. The name “Forth” was a reference to Moore’s claim that he had created a fourth generation programming language.

    ALTRAN, a variant of FORTRAN, was released.

    ANSI version of COBOL defined.

    Edsger Dijkstra wrote a letter to the Communications of the ACM claiming that the use of GOTO was harmful.

    Computers: DEC PDP-10
    Technology: microprocessor; interactive computing (including mouse, windows, hypertext, and fullscreen word processing)

1969

    BCPL (Basic CPL) was created in 1969 in England. Intended as a simplified version of CPL, includes the control structures For, Loop, If Then, While, Until Repeat, Repeat While, and Switch Case.

    “BCPL was an early computer language. It provided for comments between slashes. The name is condensed from “Basic CPL”; CPL was jointly designed by the universities of Cambridge and London. Officially, the “C” stood first for “Cambridge,” then later for “Combined.” -- Unofficially it was generally accepted as standing for Christopher Strachey, who was the main impetus behind the language.” —Language Finger, Maureen and Mike Mansfield Library, University of Montana.

    B (derived from BCPL) developed in 1969 by Ken Thompson of Bell Telephone Laboratories for use in systems programming for UNIX. This was the parent language of C.

    SmallTalk was created in 1969 at Xerox PARC by a team led by Alan Kay, Adele Goldberg, Ted Kaehler, and Scott Wallace. Fully object oriented programming language that introduces a graphic environment with windows and a mouse.

    RS-232-C standard for srial communication introduced in 1969.

    UNIX created at AT&T Bell telephone Laboratories by Kenneth Thompson and Dennis Ritchie..

    ARPA creates ARPAnet, the forerunner of the Internet (originally hosted by UCLA, UC Santa Barbara, University of Utah, and Stanford Research Institute).

    Operating Systems: ACP; TENEX/TOPS-20; work started on Unix
    Computers: CDC 7600 (36 MFLOPS)
    Games: Space Travel (written by Jeremy Ben for Multics; when AT&T pulled out of the Multics project, J. Ben ported the program to FORTRAN running on GECOS on the GE 635; then ported by J. Ben and Dennis Ritchie in PDP-7 assembly language; the process of porting the game to the PDP-7 computer was the beginning of Unix)
    Technology: ARPANET (military/academic precursor to the Internet); RS-232; networking; laser printer (invented by Gary Starkweather at Xerox)

1970

    Prolog (PROgramming LOGic) was created in 1972 in France by Alan Colmerauer with Philippe Roussel. Introduces Logic Programming.

    Pascal (named for French religious fanatic and mathematician Blaise Pascal) was created in 1970 by Niklaus Wirth on a CDC 6000-series computer. Work started in 1968. Pascl was intended as a teaching language to replace BASIC, but quickly developed into a general purpose programming language. Programs compiled to a platform-independent intermediate P-code. The compiler for pascal was written in Pascal, an influential first in language design.

    Forth used to write the program to control the Kitt Peaks telescope.

    BLISS was a systems programming language developed by W.A. Wulf, D.B. Russell, and A.N. Habermann at Carnegie Mellon University in 1970. BLISS was a very popular systems programming language until the rise of C. The original compiler was noted for its optimizing of code. Most of the utilities for DEC’s VMS operating system were written in BLISS-32. BLISS was a typeless language based on expressions rather than statements. Expressions produced values, and possibly caused other actions, such as modification of storage, transfer of control, or looping. BLISS had powerful macro facilities, conditional execution of statements, subroutines, built-in string functions, arrays, and some automatic data conversions. BLISS lacked I/O instructions on the assumption that systems I/O would actually be built in the language.

    Operating Systems: Unix; RT-11; RSTS-11
    Computers: Datapoint 2200; DEC PDP-11
    Technology: dynamic RAM; flight data processor

1971

    Computers: Intel 4004 (four-bit microprocessor)
    Games: Computer Space (first commercial vidoe game)
    Technology: floppy disk; first electronic calculator (T1)

1972

    C was developed from 1969-1972 by Dennis Ritchie of Bell Telephone Laboratories for use in systems programming for UNIX.

    Pong, the first arcade video game, was introduced by Nolan Bushnell in 1972. His company was called Atari.

    Operating Systems: VM/CMS
    Computers: Intel 8008 (microprocessor); Rockwell PPS-4 (microprocessor); Fairchild PPS-25 (microprocessor)
    Games: Pong; Magnavox Odysssey (first home video game console)
    Technology: game console (Magnavox Odyssey); first scientific calculator (HP); first 32-bit minicomputer; first arcade video game

1973

    ML (Meta Language) was created in 1973 by R. Milner of the University of Edinburgh. Functional language implemented in LISP.

    Actor is a mathematical model for concurrent computation first published bby Hewitt in 1973.

    “Actor is an object-oriented programming language. It was developed by the Whitewater Group in Evanston, Ill.” —Language Finger, Maureen and Mike Mansfield Library, University of Montana.

    ARPA creates Transmission Control Protocol/Internet Protocol (TCP/IP) to network together computers for ARPAnet.

    Computers: National IMP (microprocessor)
    Technology: TCP/IP; ethernet

1974

    SQL (Standard Query Language) was designed by Donald D. Chamberlin and Raymond F. Boyce of IBM in 1974.

    AWK (first letters of the three inventors) was designed by Aho, Weinberger, and Kerninghan in 1974. Word processing language based on regular expressions.

    Alphard (named for the brightest star in Hydra) was designed by William Wulf, Mary Shaw, and Ralph London of Carnegie-Mellon University in 1974. A Pascal-like language intended for data abstraction and verification. Make use of the “form”, which combined a specification and an implementation, to give the programmer control over the impolementation of abstract data types.

    “Alphard is a computer language designed to support the abstraction and verification techniques required by modern programming methodology. Alphard’s constructs allow a programmer to isolate an abstraction, specifying its behavior publicly while localizing knowledge about its implementation. It originated from studies at both Carnegie-Mellon University and the Information Sciences Institute.” —Language Finger, Maureen and Mike Mansfield Library, University of Montana.

    “CLU began to be developed in 1974; a second version was designed in 1977. It consists of a group of modules. One of the primary goals in its development was to provide clusters which permit user-defined types to be treated similarly to built-in types.” —Language Finger, Maureen and Mike Mansfield Library, University of Montana.

    Operating Systems: MVS
    Computers: Intel 8080 (microprocessor); Motorola 6800 (microprocessor); CDC STAR-100 (100 MFLOPS)
    Technology: Telenet (first commercial version of ARPANET)

1975

    Scheme, based on LISP, was created by Guy Lewis Steele Jr. and Gerald Jay Sussman at MIT in 1975.

    Tiny BASIC created by Dr. Wong in 1975 runs on Intel 8080 and Zilog Z80 computers.

    RATFOR (RATional FORtran) created by Brian Kernigan in 1975. Used as a precompiler for FORTRAN. RATFOR allows C-like control structures in FORTRAN.

    Computers: Altair 880 (first personal computer); Fairchild F-8 (microprocessor); MOS Technology 6502 (microprocessor); Burroughs ILLIAC IV (150 MFLOPS)
    Technology: single board computer; laser printer (commercial release by IBM)

1976

    Design System Language, a forerunner of PostScript, is created in 1976. The Forth-like language handles three dimensional databases.

    SASL (Saint Andrews Static Language) is created by D. Turner in 1976. Intended for teaching functional programming. Based on ISWIM. Unlimited data structures.

    CP/M, an operating system for microcomputers, was created by Gary Kildall in 1976.

    Operating Systems: CP/M
    Computers: Zilog Z-80 (microprocessor); Cray 1 (250 MFLOPS); Apple I
    Technology: inkjet printer; Alan Kay’s Xerox NoteTaker developed at Xerox PARC

1977

    Icon, based on SNOBOL, was created in 1977 by faculty, staff, and students at the University of Arizona under the direction of Ralph E. Griswold. Icon uses some programming structures similar to pascal and C. Structured types include list, set, and table (dictionary).

    OPS5 was created by Charles Forgy in 1977.

    FP was presented by John Backus in his 1977 Turing Award lecture Can Programming be Liberated From the von Neumann Style? A Functional Style and its Algebra of Programs.

    Modula (MODUlar LAnguage) was created by Niklaus Wirth, who started work in 1977. Modula-2 was released in 1979.

    Computers: DEC VAX-11; Apple II; TRS-80; Commodore PET; Cray 1A
    Games: Atari 2600 (first popular home video game consle)

1978

    CSP was created in 1978 by C.A.R. Hoare.

    “C.A.R. Hoare wrote a paper in 1978 about parallel computing in which he included a fragment of a language. Later, this fragment came to be known as CSP. In it, process specifications lead to process creation and coordination. The name stands for Communicating Sequential Processes. Later, the separate computer language Occam was based on CSP.” —Language Finger, Maureen and Mike Mansfield Library, University of Montana.

    Operating Systems: Apple DOS 3.1; VMS (later renamed OpenVMS)
    Programming Languages: CSP; FP; VISICALC
    Computers: Intel 8086 (microprocessor)
    Games: Space Invaders (arcade game using raster graphics; so popular in Japan that the government has to quadruple its supply of Yen)
    Technology: LaserDisc

1979

    Modula-2 was released in 1979. Created by Niklaus Wirth, who started work in 1977.

    VisiCalc (VISIble CALculator) was created for the Apple II personal computer in 1979 by Harvard MBA candidate Daniel Bricklin and programmer Robert Frankston.

    REXX (REstructured eXtended eXecutor) was designed by Michael Cowlishaw of IBM UK Laboratories. REXX was both an interpretted procedural language and a macro language. As a maco language, REXX can be used in application software.

    Work started on C with Classes, the language that eventually became C++.

    Computers: Motorola MC68000 (microprocessor); Intel 8088 (microprocessor)
    Games: Lunar Lander (arcade video game, first to use vector graphics); Asteroids (vector arcade game); Galaxian (raster arcade game, color screen); first Multi-USer Dungeon (MUD, written by Roy Trubshaw , a student at Essex University, forrunner of modern massively multiplayer games); Warrior (first head-to-head arcade fighting game)
    Technology: first spreadsheet (VisiCalc); object oriented programming; compact disk; Usenet discussion groups

1980

    dBASE II was created in 1980 by Wayne Ratliff at the Jet Propulsion Laboratories in Pasadena, California. The original version of the language was called Vulcan. Note that the first version of dBASE was called dBASE II.

    SmallTalk-80 released.

    Operating Systems: OS-9
    Computers: Commodore VIC-20; ZX80; Apple III
    Games: Battlezone (vector arcade video game, dual joystick controller and periscope-like viewer); Berzerk (raster arcade video game, used primative speech synthesis); Centipede (raster arcade video game, used trackball controller); Missile Command (raster arcade video game, used trackball controller); Defender (raster arcade video game); Pac-Man (raster arcade video game); Phoenix (raster arcade video game, use of musical score); Rally-X (raster arcade video game, first game to have a bonus round); Star Castle (vector arcade video game, color provided by transparent plastic screen overlay); Tempest (vector arcade video game, first color vector game); Wizard of Wor (raster arcade video game)

1981

    Relational Language was created in 1981 by Clark and Gregory.

    Operating Systems: MS-DOS; Pilot
    Computers: 8010 Star; ZX81; IBM PC; Osborne 1 (first portable computer); Xerox Star; MIPS I (microprocessor); CDC Cyber 205 (400 MFLOPS)
    Games: Donkey Kong (raster arcade video game); Frogger (raster arcade video game); Scramble (raster arcade video game, horizontal scrolling); Galaga (raster arcade video game); Ms. Pac-Man (raster arcade video game); Qix (raster arcade video game); Gorf (raster arcade video game, synthesized speech); Zork (first adventure game)
    Technology: portable PC; ISA bus; CGA video card

1982

    ANSI C The American National Standards Institute (ANSI) formed a technical subcommittee, X3J11, to create a standard for the C language and its run-time libraries.

    InterPress, the forerunner of PostScript, was created in 1982 by John Warnock and Martin Newell at Xerox PARC.

    Operating Systems: SunOS
    Computers: Cray X-MP; BBC Micro; Commodore C64 (first home computer with a dedicated sound chip); Compaq Portable; ZX Spectrum; Atari 5200; Intel 80286 (microprocessor)
    Games: BurgerTime (raster arcade video game); Dig Dug (raster arcade video game); Donkey Kong Junior (raster arcade video game); Joust (raster arcade video game); Moon Patrol (raster arcade video game, first game with parallax scrolling); Pole Position (raster arcade video game); Q*bert (raster arcade video game); Robotron 2084 (raster arcade video game, dual joystick); Time Pilot (raster arcade video game); Tron (raster arcade video game); Xevious (raster arcade video game, first game promoted with a TV commercial); Zaxxon (raster arcade video game, first game to use axonometric projection)
    Technology: MIDI; RISC; IBM PC compatibles

1983

    Ada was first released in 1983 (ADA 83), with major releases in 1995 (ADA 95) and 2005 (ADA 2005). Ada was created by the U.S. Department of Defense (DoD), originally intended for embedded systems and later intended for all military computing purposes. Ada is named for Augusta Ada King, the Countess of Lovelace, the first computer programmer inmodern times.

    Concurrent Prolog was created in 1983 by Shapiro.

    Parlog was created in 1983 by Clark and Gregory.

    C++ was developed in 1983 by Bjarne Stroustrup at Bell Telephone Laboratories to extend C for object oriented programming.

    Turbo Pascal, a popular Pascal compiler, was released.

    The University of California at Berkeley released a version of UNIX that included TCP/IP.

    Operating Systems: Lisa OS
    Computers: Apple IIe; Lisa; IBM XT; IBM PC Jr; ARM (microprocessor); Cray X-MP/4 (941 MFLOPS)
    Games: Dragon’s Lair (raster arcade video game, first video game to use laserdisc video; note that the gambling device Quarterhorse used the laserdisc first); Elevator Action (raster arcade video game); Gyruss (raster arcade video game, used the musical score Toccata and Fugue in D minor by Bach); Mappy (raster arcade video game, side scrolling); Mario Bros. (raster arcade video game); Spy Hunter (raster arcade video game, musical score Peter Gunn); Star Wars (vector arcade video game, digitized samples from the movie of actor’s voices); Tapper (raster arcade video game); Lode Runner (Apple ][E); Journey (arcade video game includes tape of the song Separate Ways leading to licensed music in video games)
    Technology: math coprocessor; PC harddisk

1984

    Objective C, an extension of C inspired by SmallTalk, was created in 1984 by Brad Cox. Used to write NextStep, the operating system of the Next computer.

    Standard ML, based on ML, was created in 1984 by R. Milner of the University of Edinburgh.

    PostScript was created in 1984 by John Warnock and Chuck Geschke at Adobe.

    Operating Systems: GNU project started; MacOS 1
    Computers: Apple Macintosh; IBM AT; Apple IIc; MIPS R2000 (microprocessor); M-13 (U.S.S.R., 2.4 GFLOPS); Cray XMP-22
    Software: Apple MacWrite; Apple MacPaint
    Games: 1942 (raster arcade video game); Paperboy (raster arcade video game, unusual controllers, high resolution display); Punch-Out (raster arcade video game, digitized voice, dual monitors)
    Technology: WYSIWYG word processing; LaserJet printer; DNS (Domain Name Server); IDE interface

1985

    Paradox was created in 1985 as a competitor to the dBASE family of relational data base languages.

    PageMaker was created for the Apple Macintosh in 1985 by Aldus.

    Operating Systems: GEM; AmigaOS; AtariOS; WIndows 1.0; Mac OS 2
    Computers: Cray-2/8 (3.9 GFLOPS); Atari ST; Commodore Amiga; Apple Macintosh XL; Intel 80386 (microprocessor); Sun SPARC (microprocessor)
    Software: Apple Macintosh Office; Aldus PageMaker (Macintosh only)
    Games: Super Mario Bros.; Tetris (puzzle game; invented by Russian mathematician Alexey Pajitnov to test equipment that was going to be used for artificial intelligence and speech recognition research); Excitebike (first game with battery backup for saving and loading game play)
    Technology: desktop publishing; EGA video card; CD-ROM; expanded memory (for IBM-PC compatibles; Apple LaserWriter

1986

    Eiffel (named for Gustave Eiffel, designer of the Eiffel Tower) was released in 1986 by Bertrand Meyer. Work started on September 14, 1985.

    “Eiffel is a computer language in the public domain. Its evolution is controlled by Nonprofit International Consortium for Eiffel (NICE), but it is open to any interested party. It is intended to treat software construction as a serious engineering enterprise, and therefore is named for the French architect, Gustave Eiffel. It aims to help specify, design, implement, and change quality software.” —Language Finger, Maureen and Mike Mansfield Library, University of Montana.

    GAP (Groups, Algorithms, and Programming) was developed in 1986 by Johannes Meier, Werner Nickel, Alice Niemeter, Martin Schönert, and others. Intended to program mathematical algorithms.

    CLP(R) was developed in 1986.

    Operating Systems: Mach; AIX; GS-OS; HP-UX; Mac OS 3
    Computers: Apple IIGS; Apple Macintosh Plus; Amstrad 1512; ARM2 (microprocessor); Cray XMP-48
    Software: Apple Macintosh Programmer’s Workshop
    Games: Metroid (one of first games to have password to save game proogress; first female protagonist in video games, non-linear game play; RPG)
    Technology: SCSI

1987

    CAML (Categorical Abstract Machine Language) was created by Suarez, Weiss, and Maury in 1987.

    Perl (Practical Extracting and Report Language) was created by Larry Wall in 1987. Intended to replace the Unix shell, Sed, and Awk. Used in CGI scripts.

    HyperCard was created by William Atkinson in 1987. HyperTalk was the scripting language built into HyperCard.

    Thomas and John Knoll created the program Display, which eventually became PhotoShop. The program ran on the Apple Macintosh.

    Adobe released the first version of Illustrator, running on the Apple Macintosh.

    Operating Systems: IRIX; Minix; OS/2; Windows 2.0; MDOS (Myarc Disk Operating System); Mac OS 4; Mac OS 5
    Computers: Apple Macintosh II; Apple macintosh SE; Acorn Archimedes; Connection Machine (first massive parallel computer); IBM PS/2; Commodore Amiga 500; Nintendo Entertainment System
    Software: Apple Hypercard; Apple MultiFinder; Adobe Illustrator (Macintosh only; later ported to NeXT, Silicon Graphics IRIX, Sun Solaris, and Microsoft Windows); Design (which eventually became ImagePro and then Photoshop; Macintosh only); QuarkXPress (Macintosh only)
    Games: Final Fantasy (fantasy RPG); The Legend of Zelda (first free form adventure game); Gran Turismo (auto racing); Mike Tyson’s Punch Out (boxing sports game)
    Technology: massive parallel computing; VGA video card; sound card for IBM-PC compatibles; Apple Desktop Bus (ADB)

1988

    CLOS, an object oriented version of LISP, was developed in 1988.

    Mathematica was developed by Stephen Wolfram.

    Oberon was created in 1986 by Niklaus Wirth.

    Operating Systems: OS/400; Mac OS 6
    Computers: Cray Y-MP; Apple IIc Plus
    Software: ImagePro (which eventually became Photoshop; Macintosh only)
    Games: Super Mario Bros. 2; Super Mario Bros 3 (Japanese release); Contra (side-scrolling shooter); Joh Madden Football (football sports game)
    Technology: optical chip; EISA bus

1989

    ANSI C The American National Standards Institute (ANSI) completed the official ANSI C, called the American National Standard X3.159-1989.

    HTML was developed in 1989.

    Miranda (named for a character by Shakespeare) was created in 1989 by D. Turner. Based on SASL and ML. Lazy evaluation and embedded pattern matching.

    Standard Interchange Languagewas developed in 1989.

    Operating Systems: NeXtStep; RISC OS; SCO UNIX
    Computers: Intel 80486 (microprocessor); ETA10-G/8 (10.3 GFLOPS)
    Software: Microsoft Office; first (pre-Adobe) version of Photoshop (Macintosh only)
    Games: Sim; SimCity; Warlords (four-player shooter)
    Technology: ATA interface, World Wide Web

1990

    Haskell was developed in 1990.

    Tim Berners-Lee of the European CERN laboratory dcreated the World Wide Web on a NeXT computer.

    In February of 1990, Adobe released the first version of the program PhotoShop (for the Apple Macintosh).

    Operating Systems: BeOS; OSF/1
    Computers: NEC SX-3/44R (Japan, 23.2 GFLOPS); Cray XMS; Cray Y-MP 8/8-64 (first Cray supercomputer to use UNIX); Apple Macintosh Classic; Neo Geo; Super Nintendo Entertainment System
    Software: Adobe Photoshop (Macintosh only)
    Games: Final Fantasy III released in Japan (fantasy RPG); Super Mario Bros 3 (Japanese release); Wing Commander (space combat game)
    Technology: SVGA video card; VESA driver

1991

    Python (named for Monty Python Flying Circus) was created in 1991 by Guido van Rossum. A scripting language with dynamic types intended as a replacement for Perl.

    Pov-Ray (Persistence of Vision) was created in 1991 by D.B.A. Collins and others. A language for describing 3D images.

    Visual BASIC, a popular BASIC compiler, was released in 1991.

    Linux operating system was released on September 17, 1991, by Finnish student Linus Torvalds.

    Operating Systems: Linux kernel; Mac OS 7
    Computers: Apple PowerBook; PowerPC (microprocessor); PARAM 8000 (India, created by Dr. Vijay Bhatkar, 1GFLOP)
    Games: Street Fighter II (shooter); The Legend of Zelda: A Link to the Past (fantasy RPG); Sonic the Hedgehog; Sid Meyer’s Civilization
    Technology: CD-i

1992

    Dylan was created in 1992 by Apple Computer and others. Dylan was originally intended for use with the Apple Newton, but wasn’t finished in time.

    Oak, the forerunner of Java, was developed at Sun Microsystems.

    Operating Systems: Solaris; Windows 3.1; OS/2 2.0; SLS Linux; Tru64 UNIX
    Computers: Cray C90 (1 GFLOP)
    Games: Wolfenstein 3D (ffirst fully 3D rendered game engine); Mortal Kombat; NHLPA 93 (multiplayer hockey sports game); Dune II (first real-time strategy game)

1993

    AppleScript, a scripting language for the macintosh operating system and its application softweare, was released by Apple Computers.

    Operating Systems: Windows NT 3.1; Stackware Linux; Debian GNU/Linux; Newton
    Computers: Cray EL90; Cray T3D; Apple Newton; Apple Macintosh TV; Intel Pentium (microprocessor, 66MHz); Thinking Machines CM-5/1024 (59.7 GFLOPS); Fujitsu Numerical Wind Tunnel (Japan, 124.50 GFLOPS); Intel Paragon XP/S 140 (143.40 GFLOPS)
    Games: Myst (first puzzle-based computer adventure game; CD-ROM game for Macintosh); Doom (made the first person shooter genre popular, pioneering work in immersive 3D graphics, networked multiplayer gaming, and support for customized additions and modifications; also proved that shareware distribution could work for game distribution); U.S. Senator Joseph Lieberman holds Congressional hearings and attempts to outlaw violent games
    Technology: MP3

1994

    Work continued on Java with a version designed for the internet.

    Operating Systems: Red Hat Linux
    Computers: Cray J90; Apple Power Macintosh; Sony PlayStation; Fujitsu Numerical Wind Tunnel (Japan, 170.40 GFLOPS)
    Games: Super Metroid (RPG)
    Technology: DNA computing

1995

    Java (named for coffee) was created by James Gosling and others at Sun Microsystems and released for applets in 1995. Original work started in 1991 as an interactive language under the name Oak. Rewritten for the internet in 1994.

    JavaScript (originally called LiveScript) was created by Brendan Elch at Netscape in 1995. A scripting language for web pages.

    PHP (PHP Hypertext Processor) was created by Rasmus Lerdorf in 1995.

    Ruby was created in 1995 by Yukihiro Matsumoto. Alternative to Perl and Python.

    Delphi, a variation of Object Pacal, was released by Borland

    Operating Systems: OpenBSD; OS/390; Windows 95
    Computers: BeBox; Cray T3E; Cray T90; Intel Pentium Pro (microprocessor); Sun UltraSPARC (microprocessor)
    Software: Microsoft Bob
    Games: Chrono Trigger (fantasy RPG); Command & Conquer (real time strategy game)
    Technology: DVD; wikis

1996

    UML (Unified Modeling Language) was created by Grady Booch, Jim Rumbaugh, and Ivar Jacobson in 1996 by combining the three modeling languages of each of the authors.

    Operating Systems: MkLinux
    Computers: Hitachi SR2201/1024 (Japan, 220.4 GFLOPS); Hitachi/Tsukuba CP=PACS/2048 (Japan, 368.2 GFLOPS)
    Games: Tomb Raider and Lara Croft, Pokémon; Quake (first person shooter using new 3D rendering on daughter boards); Super Mario 64 (3D rendering)
    Technology: USB

1997

    REBOL (Relative Expression Based Object language) was created by Carl SassenRath in 1997. Extensible scripting language for internet and distributed computing. Has 45 types that use the same operators.

    ECMAScript (named for the European standards group E.C.M.A.) was created in 1997.

    Alloy, a structural modelling language, was developed at M.I.T.

    Operating Systems: Mac OS 8
    Computers: AMD K6 (microprocessor); Intel Pentium II (microprocessor); PalmPilot; Intel ASCI Red/9152 (1.338 TFLOPS)
    Software: AOL Instant Messenger
    Games: Final fantasy VII; Goldeneye 007 (one of the few successful movie to game transitions; based on 1995 James Bond movie Goldeneye); Castlevania: Symphony of the Night (2D fantasy RPG)
    Technology: web blogging

1998

    Operating Systems: Windows 98
    Computers: Apple iMac; Apple iMac G3; Intel Xeon (microprocessor)
    Games: The Legend of Zelda: Ocarina of Time (fantasy RPG); Metal Gear Solid; Crash Bandicoot: Warped

1999

    Operating Systems: Mac OS 9
    Computers: PowerMac; AMD Athlon (microprocessor); Intel Pentium III (microprocessor); BlackBerry; Apple iBook; TiVo; Intel ASCI Red/9632 (2.3796 TFLOPS)

2000

    D, designed by Walter Bright, is an object-oriented language based on C++, Java, Eiffel, and Python.

    C# was created by Anders Hajlsberg of Microsoft in 2000. The main language of Microsoft’s .NET.

    RELAX (REgular LAnguage description for XML) was designed by Murata Makoto.

    Operating Systems: Mac OS 9; Windows ME; Windows 2000
    Computers: Intel Pentium 4 (microprocessor, over 1 GHz); Sony PlayStation 2; IBM ASCI White (7.226 TFLOPS)
    Games: Tony Hawk’s Pro Skater 2 (skateboarding sports game); Madden NFL 2001 (football sports game)
    Technology: USB flash drive

2001

    AspectJ (Aspect for Java) was created at the Palo Alto Research Center in 2001.

    Scriptol (Scriptwriter Oriented Language) was created by Dennis G. Sureau in 2001. New control structuress include for in, while let, and scan by. Variables and literals are objects. Supports XML as data structure.

    Operating Systems: Mac OS X Cheetah; Windows XP; z/OS
    Computers: Nintendo GameCube; Apple iPod; Intel Itanium (microprocessor); Xbox
    Games: Halo
    Technology: blade server

2002

    Operating Systems: Mac OS X 10.1 Puma
    Computers: Apple eMac; Apple iMac G4; Apple XServe; NEC Earth Simulator (Japan, 35.86 TFLOPS)

2003

    Operating Systems: Windows Server 2003; Mac OS X 10.2 Jaguar
    Computers: PowerPC G5; AMD Athlon 64 (microprocessor); Intel Pentium M (microprocessor)
    Games: Final Fantasy Crystal Chronicles

2004

    Scala was created February 2004 by Ecole Polytechnique Federale de Lausanne. Object oriented language that implements Python features in a Java syntax.

    Operating Systems: Mac OS X 10.3 Panther
    Computers: Apple iPod Mini; Apple iMac G5; Sony PlayStation Portable; IBM Blue Gene/L (70.72 TFLOPS)
    Games: Fable
    Technology: DualDisc; PCI Express; USB FlashCard

2005

    Job Submission Description Language.

    Operating Systems: Mac OS X 10.4 Tiger
    Computers: IBM System z9; Apple iPod Nano; Apple Mac Mini; Intel Pentium D (microprocessor); Sun UltraSPARC IV (microprocessor); Xbox 360
    Games: Lego Star Wars

2006

    Computers: Apple Intel-based iMac; Intel Core 2 (microprocessor); Sony PlayStation 3
    Technology: Blu-Ray Disc

2007

    Operating Systems: Apple iOS (for iPhone); Windows Vista
    Computers: AMD K10 (microprocessor), Apple TV; Apple iPhone; Apple iPod Touch; Amazon Kindle

2008

    Operating Systems: Google Android; Mac OS X 10.5 Leopard
    Computers: Android Dev Phone 1; BlackBerry Storm; Intel Atom (microprocessor); MacBook Air; IBM RoadRunner (1.026 PFLOPS); Dhruva (India, 6 TFLOPS)

2009

    Operating Systems: Windows 7; Mac OS X 10.6 Snow Leopard
    Computers: Motorola Driod; Palm Pre; Cray XT5 Jaguar (1.759 PFLOPS)

2010

    Computers: Apple iPad; IBM z196 (microprocessor); Apple iPhone 4; Kobo eReader; HTC Evo 4G

other

   “41. Some programming languages manage to absorbe change, but withstand progress.” —Alan Perlis, Epigrams on Programming, ACM’s SIGPLAN Notices Volume 17, No. 9, September 1982, pages 7-13

   “69. In a 5 year period we get one superb programming language. Only we can’t control when the 5 year period will begin.” —Alan Perlis, Epigrams on Programming, ACM’s SIGPLAN Notices Volume 17, No. 9, September 1982, pages 7-13


listings and errors

    Compilers can produce listings. These will add extra information to your original source code file. Many compilers have optional settings on what kinds of information are included.

    Of particular interest are errors. There are two basic kinds of errors in programming: (1) errors in typing a language and (2) errors in programming logic.

    A mistake in a program is called a bug. This is an old engineering term, appearing in Thomas Edison’s writings:

    “It has been just so in all of my inventions. The first step is an intuition, and comes with a burst, then difficulties arise — this thing gives out and then that ‘Bugs’ — as such little faults and difficulties are called — show themselves and months of intense watching, study and labor are requisite before commercial success or failure is certainly reached.” —Thomas Edison

    Many books incorrectly attribute the term “bug” to Grace Hopper. While she was working at Harvard University’s Computation Laboratory, operators discovered a moth trapped in a relay in Harvard’s Mark II computer. The moth was taped to the log book. Hopper repeated the story for decades as the case of the first actual bug in a computer.

    A compiler generated listing will only show errors in the typing (such as mistakes in spelling or punctuation) or incorrect use of the language (such as incorrect formation of an executable statement). These are known as compile time errors because they can be identified at the time the program is compiled.

    You won’t know about errors in programming logic until the program is actually running. These are known as run time errors. You will want to thoroughly test your programs to try to find any mistakes, but testing won’t always reveal every error. There is an art to testing a program. Some of the best program testers don’t even knw how to write a program, but they sure know how to find ways to break programs.

    Errors vary in severity. In some cases the compiler may be able to correct errors of low severity for you. Some compilers let you control which errors are flagged, skipping over errors the compiler can correct. You should strive to eliminate all errors from your source code listing, even if the compiler is able to correct them for you.

    Compilers also often produce warning. These are things that are not necessarily a problem. Some compilers let you decide whether to have warnings listed or not, and in some cases even let you decide which kinds of warnings are listed. You should strive to eliminate all warnings from your source code listing.

    Many compilers have the option of inserting line numbers into your listing. These line numbers can be very useful in tracking down compile time errors. Note that it is rare that the compiler’s idea of line numbers will end up exactly matching the physical lines in your original source file. The line numbers are typically based on the language’s statements. Comments and blank lines are rarely counted as lines in a source code listing. Multiple statements on a single line are usually identified by the number of the first statement on the same physical line.

    Many compilers will give you the option of stopping after the first error is detected or attemting to continue to compile. Note that the first error may confuse the compiler enough that it starts reporting bogus errors. This is known as cascading errors (named after cascading waterfalls).

    Many compilers will give you the option of viewing the symbol table. This is a list of the identifiers (procedures, functions, labels, constants, variables, etc.). There are many uses for a symbol table in correcting your source code. As one example, it is possible that you misspelled an identifier. The code may be correct from a lexical point of view (with no errors listed), but won’t work at run time. You might spot the incorrect spelling in your symbol table.

    You may want to purposely introduce a few errors into your Hello World program and see how your compiler reports them.

    Compilers are notorious for cryptic error messages. The error messages are generally not standardized for any particualr language, but vary with each compiler. With practice, you will learn to read the error messages and use them to find and identify your mistakes.

    A few days after initially writing this chapter, I spoke to a man who had been a BASIC programmer during the 1980s. He recounted that he had attempted to write a FORTRAN program and spent three weeks trying to track down a single error: accidently replacing a period with a comma. Sadly, this is typical of the debugging abilities of most programmers. There are very few places in FORTRAN where replacing a period with a comma will still produce a running program. It really shouldn’t have taken more than a few minutes to track down the exact line with the error and then identify the incorrect character through visual examination. If the program did compile and run, it shouldn’t have taken too much time to pin down the location of the error and then spot the incorrect character. Being methodical is a great way to not only locate bugs but also to avoid them in the first place.

    While software errors may be inevitable, good programming practices can dramatically lower the number of errors.

    Structured programming was created as a method to eliminate programming errors. It tended to eliminate many errors caused by sloppy coding and reduced the complexity of some kinds of programming to the point that mistakes were less common.

    Object oriented programming was created to both allow for faster creation of complex software and to reduce the number of mistakes in complex software. While these characteristics do apply to skilled programmers, for the vast majority of current programming, object oriented programming doesn’t actually cut down on errors, but instead limits the damage of errors. That is, a typical programmer will still make as many mistakes as ever, but the mistakes will be isolated and won’t cause havoc on the larger system.

other

   “40. There are two ways to write error-free programs; only the third one works.” —Alan Perlis, Epigrams on Programming, ACM’s SIGPLAN Notices Volume 17, No. 9, September 1982, pages 7-13

   “61. In programming, as in everything else, to be in error is to be reborn.” —Alan Perlis, Epigrams on Programming, ACM’s SIGPLAN Notices Volume 17, No. 9, September 1982, pages 7-13


free form vs. columns

    Programming languages can be free form or column based. Most early programming languages were organized in the columns of a punched card, while modern programming languages are completely (or nearly completely) free form.

    Many early programming languages relied heavily on punched cards for entry, including requiring that specific elements of the program appear in specific columns. The Hollerith punch card had 80 columns.

Note that the gray columns in the following picture are much wider than depicted.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18-69 70 71 72 73 74 75 76 77 78 79 80
COBOL
sequence
number
cmnt A B identification
FORTRAN
label cont FORTRAN statements ignored
PL/I
OS PL/I statements sequence number

    Almost all modern programming languages are free form, meaning that the programmer has relative freedom to format a program in an easy to read and understand manner.

    A continuation character is used in some column-based languages to indicate that some element extends over more than one line. For example, in FORTRAN a programmer places a character other than space or blank in the sixth column to indicate that the card is a continuation from the previous card.

    Even though modern languages are generally free form, you will occassionally find vestigages of the older column/card view crop up in places.

    Indentation should be used to visually make the source code more comprehensible and easier to read. Consider the following examples (from C and Pascal):

C

good

main()
{
   int i,j,k;
   k = 0;
   for(i = 1; i <= 10; i++)
   {
      for(j = 1; j <= 15; j++)
      {
         k++;
      }
   }
}

    

bad

main()
{
int i,j,k;
for(i = 1; i <= 10; i++) { for(j = 1; j <= 15; j++)
{ k++; } } }

Pascal

good

procedure SimpleLoop;
var
   i: integer;
   j: integer;
begin
   j := 0;
   for i := 1 to 10
      do begin
         j := j + i;
      end; {for]
end; {SimpleLoop}

    

bad

procedure SimpleLoop;
var i,j: integer; begin
j := 0; for i := 1 to 10 do begin j := j + i;
end; end;

    Not only does good indentation make code easier to read, it also makes it easier to spot many kinds of simple typing errors, such as imbalance of matching pairs.

C

    C is completely free form.

Pascal

    Pascal is completely free form.

PHP

    PHP is completely free form.


white space

    The educational goal of this subchapter is to learn what the white space characters are in your chosen programming language and to learn how to use whitespace to make your programs easier to read, understand, and maintain.

    Whitespace is used for clarity and ease of understanding. The term is borrowed from art and design.

    While whitespace characters vary by language, they generally include any characters that don’t print on the page, such as the space or blank character, tab, new line, carriage return, and form feed.

    An important use for whitespace is to visually show the structure of a program. A programmer can use indentation to show logical and lexical blocks of code. Indentation and use of white space is normally optional. See example in preceeding chapter on free form vs columns.

    Brian Kernighan and Dennis Ritchie (the original creators of C) recommend in their famous book The C Programming Language that programmers use indentation to “emphasize the logical structure of the program.” They also “recommend writing only one statement per line, and using blanks around operators to clarify grouping.” And they point out “Although C compilers do not care about how a program looks, proper indentation and spacing are critical in making programs easy for people to read.”

    Some languages use indentation as a required language feature. Blocks of code are marked by indentation. This approach is called the off side rule (a term borrowed from soccer or international football). Some programming languages using the off side rule include: ISWIM (the first to introduce the idea), ABC, Curry, Haskell, Lispin, Miranda, Nemerle, Occam, Pliant, Python, and YAML.

C

    Whitespace characters may be used almost anywhere in a c program. There are a few locations where whitespace is forbidden (such as between a function name and its argument list). There are a few places where whitespace is required (such as preventing ambiguity).

whitespace characters in c

name ASCII (hex) source code
representation
blank or space 20 \040
or typed space
backspace 08 \b
horizontal tab 09 \t
vertical tab 0B \v
form feed 0C \f
newline
or line feed
0A \n
carriage return 0D \r

    Comments in c are supposed to be considered whitespace, but some compilers simply remove comments during parsing.

PHP

    Whitespace characters in PHP include new line (\n), carriage returns (\r), spaces (or blanks), horizontal tabs (\t), vertical tabs (\v), and end of string characters (\0).

    Whitespace is ignored in PHP.

    Geek humor: There is a programming language called Whitespace. Released as an April Fool'’s joke on April 1, 2003, by Edwin Brady and Chris Morris of the University of Durham, the language uses only the whitespace characters space, tab, and linefeed, and ignores all other characters (treating normal characters as comments). Binary integers are created by using space as a zero (0) bit and tab as a one (1) bit, numbers being terminated by linefeed. Sequences of the three white space characters are commands. Commands primarily act on a data stack (similar to Forth), although there is a heap for storage. It is possible to insert a working Whitespace program into most ordinary programs.


comments

    Comments are extra information intended for human readers of a program. A compiler or interpretter ignores comments (or converts them into whitespace).

    A computer doesn’t need comments. You could write a program without any comments and your compiler won’t care. Any human trying to read your program will care.

    Students often have trouble understanding why there is such an emphasis on comments. It seems like busy work.

    The truth is that any program small enough to be used as a teaching assignment is probably so trivial that any competent programmer could look at the raw uncommented source code and figure out exactly how the program works.

    Students rarely ever have to go back and modify a class assignment after it has been turned in for a grade.

    In real life, successful programs last for years or decades. There will be a need for many different programmers will have to make modifications to a program long after the initial coding. In addition to any bug fixes, there will be new features added and changes in response to external conditions.

    In the life cycle of non-trivial programs, the vast majority of the time and effort is in the maintenance of the program, not the initial design and coding.

    Sooner or later the programmer making the changes will be someone different than the original programmer. Comments are the basic tool that the next programmer will use in figuring out how to make these inevitable changes.

    In the earliest days of computers there wa a tendency to believe that programs would be written once and used for a brief period and then abandoned for new work.

    Programmers started to realize that they were redoing work that they or someone else had already done. Reinventing the wheel. This led to the creation of libraries of code that all the programmers at a particular installation could share.

    In order for the libraries to be useful, the programmer who wrote each library routine would have to leave behind some kind of documentation for other programmers to know what the library routine did and how to use it in their own programs.

    External documentation included such things as a manual (or manual entry), flow charts, pseudocode, and numerous other paper documents.

    Internal documentation included comments and naming things so that the name was meaningful.

    In the first decades of computer programming, there was an emphasis on external documentation. Unfortunately, external documentation turned out to be unreliable.

    In some cases the documentation was created before the program was written (such as a flow chart or pseudocode). When the actual software inevitably changed (as errors were located or new requirements were added by the bosses), the original flowcharts and other documentation wouldn’t get changed. Sometimes the differences were minor annoyances. Sometimes the differences were major problems to the next programmer to work on the project.

    If the external documentation was saved for after the actual coding, it would often be a sloppy afterthought. The programmer, anxious to get on to the next job, would rush through the creation of the external documentation. Important information would get left out. Sometimes incorrect information would be written down. Sometimes things would be simplified to the point of being useless. Often the programmer would simply not write any external documentation of any kind. And inevitably external documentation would have a tendency of getting lost or misplaced or even discarded.

    Because of the inherent unreliability of external documentation, current software engineering emphasizes well written internal documentation. Internal documentation cna easily be updated as the program is being written. And comments are the primary method for internal documentation.

creating comments

    There are four basic kinds of comments (not all languages support all four methods).

    Comments can exist on separate lines (a single line of comment only). For some early langagues, this is the only option available.

    Comments can exist over a series of multiple lines.

    Comments can exist at the end of a line.

    Comments can appear in the midst of a line.

    A possible source of confusion is different methods for indicating a comment. For example, in Pascal, a comment can be indicated by placing it between braces or curly brackets { Pascal comment }. But in C, the curly brackets or braces are used to indicate a block of code. These kinds of differences are one reason that it is best that a beginning student only learn one programming language at a time.

    Important: As a beginning student, only read the one language section that discusses the language you are learning. Attempting to learn multiple languages at once is a recipe for disaster.

C

    Comments begin with the character pair /* and end with the character pair */. Comments can extend over multiple lines and can occur anywhere a space or newline character can be used. Comments can not be placed in the middle of any identifier (such as the name of a variable, procedure, or constant).

/* This is an example of a single line comment in c */

/* This is an example of a
multiple line comment in c */

temporary= "t"; /* example of a comment at the end of a line in c */

temporary= "t"; /* example of a comment in a line of code in c */ other= "r";

    The ANSI/ISO standard for C known as C99 (for the year 1999) also allows the use of two slash characters // to indicate that a comment has been started and continues to the end of the line. This is borrowed from a common assembly language method for indicating comments and appeared in many earlier implementations of C compilers, as well as C++.

temporary= "t"; // example of a comment at the end of a line in C99

    ANSI C requires that comments be replaced with a single space character. Some compilers instead ignore comments (delete them without replacing them with a space character). This can in certain situations produce unexpected errors.

    A few c compilers allow nestable comments (comments within a comment). Most c compilers end nested comments at the first end pair */, which can result in unexpected errors.

/* This is an example of a /* nested comment */ in c */

    In most compilers, the comment above would end at the */ after the words “nested comment”, and the “in c*/” would produce an error.

    To comment out a series of lines of code without having to worry about unnesting comments (and possibly adding them back in later if the code is added back in), use the preprocessor commands:

#if 0
lines of code, /* possibly including comments*/
#endif

    Note that if a comment starts before the location where you added the preprocessor commands or ends after the location where you ended the preprocessor commands, that you will still have a problem. Make sure to completely enclose any comments within the preprocessor commands #if 0 and #endif.

    Some c compilers require a blank or space character after the initial /* and require a blank or space character before the closing */. For greater readability, it is best to follow this rule even if your compiler doesn’t require it.

Pascal

    Comments begin with the opening braces (or left curly bracket) { and end with the closing braces (or right curly bracket) }. Comments can extend over multiple lines and can occur anywhere a space or newline character can be used. Comments can not be placed in the middle of any keyword or identifier (such as the name of a variable, program, procedure, or constant).

{ This is an example of a single line comment in Pascal }

{ This is an example of a
multiple line comment in Pascal }

temporary:= "t"; { example of a comment at the end of a line in Pascal }

temporary:= "t"; { example of a comment in a line of code } other:= "r";

    Pascal allows the option of using the character pair (* to start a comment and the character pair *) to end a comment. This was put into the language to support older computers that didn’t have the braces.

(* This is an example of an alternate-style comment in Pascal *)

    The ISO 7185 and ISO 10206 standards, as well as ASNI Pascal, allow mixing the two comment styles, such as (* this is a mixed comment } and { this is also a mixed comment *). Many Pascal compilers are confused by mixed comments, so you should avoid this in your source code.

{ This is an example of a mixed comment in Pascal *)

    Many Pascal compilers allow nestable comments (comments within a comment). Some Pascal compilers end nested comments at the first } or *), which can result in unexpected errors.

{ This is an example of a { nested comment } in Pascal }

    In many compilers, the comment above would end at the } after the words “nested comment”, and the “in Pascal }” would produce an error.

    Comments are not terminated with a semicolon.

    Some Pascal compilers allow using the C++ form of the character pair // to start a comment that extends to the end of that particular line.

// This is an example of a non-standard comment in Pascal

    This non-standard approach should generally be avoided, but might be used (if allowed on your compiler) as a quick method to temporarily comment out sections of source code without worrying about nested comments.

PHP

    PHP supports the comment syles of C, C++, and UNIX shell scripting.

    Comments begin with the character pair /* and end with the character pair */. Comments can extend over multiple lines and can occur anywhere a space or newline character can be used. Comments can not be placed in the middle of any identifier (such as the name of a variable, function, or constant).

/* This is an example of a single line comment in PHP */

/* This is an example of a
multiple line comment in PHP */

$temporary= "t"; /* example of a comment at the end of a line in PHP */

$temporary= "t"; /* example of a comment in a line of code in PHP */ $other= "r";

    PHP ignores text inside comments. PHP treats comments as whitespace.

    PHP does not allow nestable comments (comments within a comment).

/* This is an example of a /* nested comment */ in PHP */

    In PHP, the comment above would end at the */ after the words “nested comment”, and the “in PHP*/” would produce an error.

    PHP allows use of the C++ form of the character pair // to start a comment that extends to the end of that particular line or the ending PHP tag, whichever comes first.

// This is an example of a C++ style comment in PHP

    PHP also allowsthe use of the UNIX shell script form of the pound character # to start a comment that extends to the end of that particular line or the ending PHP tag, whichever comes first.

# This is an example of a UNIX shell script style comment in PHP

    To comment out a series of lines of code without having to worry about unnesting comments (and possibly adding them back in later if the code is added back in), use the C++ (//) or UNIX shell script (#) style at the beginning of each commented line.

    For greater readability, it is best to place a space character after the initial /* and before the trailing */.

    The PHP 5.0.1 function string php_strip_whitespace ( string $filename ) will return the PHP source code in the file filename with all PHP comments and whitespace removed. This fucntion does not remove whitespace or comments from any HTML in the file.

ALGOL

    A comment in ALGOL may appear after any program line ending with a semicolon ( ; ). A comment may also appear after any begin or end.

    A comment is designated by the key word comment, followed by the actual comment. The key word comment may optionally be omitted when the comment immediately follows an end.

    An ALGOL comment may consist of any string of symbols with the exception of the semicolon ( ; ) and the key words end or else.

in-line comments

    Use comments throughout your program to let yourself or others know what is going on. Even if the program is solely for your own use, you may have trouble remembering important details a year or more after you write it.

    Some programmers add a comment to the end of every line, often repeating the same information as the source code with slightly different language. This is a complete waste of time.

; BAD COMMENT!
MOV D0, D5     ; move the contents of the D5 register into the D0 register
RET

    Add comments when they add information to the source code.

; GOOD COMMENT!
MOV D0, D5     ; grand total stored in D0 as function resultin preparation for return
RET

    Use comments liberally throughout your source code to explain everything that you might forget or that another programmer will need to know to understand your code.

header comments

    Procedures, functions, and other important blocks of code, as well as the beginning of programs, should have header comments that fully explain the purpose of the code, any conventions for using the code, and important information about the code that another programmer will need to understand.

    Many working installations will have particular required formats for header comments. Most professors will have detailed requirements for header comments. Follow those required formats exactly as specified.

    If your installation does not have a required format for header comments, go ahead and use a format that you are familar with (possibly one you learned in school).

    I can not over emphasize the importance of good header comments.

    The following example is from an assembly language routine, but it illustrates the basic idea.

;***********************************************************
;* FUNCTION NAME: Hexadecimal_Character_to_Binary
;* PURPOSE: Converts an ASCII character into its binary 4-bit equivalent
;* INPUTS: D0 register has an ASCII character in the ranges 0..9 or A..F or a..f
;*      space character translated into zero
;* OUTPUT D0.B register has a hexadecimal integer
;*      of nibble (4-bit) length
;*      entire register zero filled
;*      if invalid input, D0.L set to -1
;*      N flag cleared on successful translation
;*      N flag set on failure (invalid hex)
;* METHODS: Uses a table look-up for fast translation
;* REGISTERS: All registers other than D0 are preserved.
;* CALLS: none
;* CALLED BY: UTILITY routine widely used
;***********************************************************

    There are three common methods for indicating a block of comments (examples shown only in C to save space, the principles are the same for any other language).

    This first method is the most commonly used.

/************************************
* line of comment
* another line of comment
* yet another line of comment
************************************/

    This second method is less clear (because the individual lines are not flagged as comments).

/************************************
line of comment
another line of comment
yet another line of comment
************************************/

    This third method makes each line a separate comment. This method works best if the right column lines up vertically.

/************************************/
/* line of comment                  */
/* another line of comment          */
/* yet another line of comment      */
/************************************/

    There are two other important methods for languages (such as Java) that use the C++ approach of offering both /* */ and // comment styles.

    This method is known as the Sun commenting style and is useful with the automatic document generator that comes with the Java Development Kit.

/**
 * line of comment
 * another line of comment
 * yet another line of comment
 */

    This method is known as the Microsoft commenting style and is associated with the Visual C++ programming environment.

//////////////////////////////////////
// line of comment
// another line of comment
// yet another line of comment

Stanford C essentials

    Stanford CS Education Library This [the following section until marked as end of Stanford University items] is document #101, Essential C, in the Stanford CS Education Library. This and other educational materials are available for free at http://cslibrary.stanford.edu/. This article is free to be used, reproduced, excerpted, retransmitted, or sold so long as this notice is clearly reproduced at its beginning. Copyright 1996-2003, Nick Parlante, nick.parlante@cs.stanford.edu.

Comments

    Comments in C are enclosed by slash/star pairs: /* .. comments .. */ which may cross multiple lines. C++ introduced a form of comment started by two slashes and extending to the end of the line: // comment until the line end

    The // comment form is so handy that many C compilers now also support it, although it is not technically part of the C language.

    Along with well-chosen function names, comments are an important part of well written code. Comments should not just repeat what the code says. Comments should describe what the code accomplishes which is much more interesting than a translation of what each statement does. Comments should also narrate what is tricky or non-obvious about a section of code.

    Stanford CS Education Library This [the above section] is document #101, Essential C, in the Stanford CS Education Library. This and other educational materials are available for free at http://cslibrary.stanford.edu/. This article is free to be used, reproduced, excerpted, retransmitted, or sold so long as this notice is clearly reproduced at its beginning. Copyright 1996-2003, Nick Parlante, nick.parlante@cs.stanford.edu.

end of Stanford C essentials

Stanford Perl essentials

    This [the following sentence] is document #108 [Essential Perl] in the Stanford CS Education Library --see http://cslibrary.stanford.edu/108/. This document is free to be used, reproduced, or sold so long as this paragraph and the copyright are clearly. Copyright 2000-2002, Nick Parlante, nick.parlante@cs.stanford.edu.

    Comments begin with a “#” and extend to the end of the line.

other

   “71. Documentation is like term insurance: It satisfies because almost no one who subscribes to it depends on its benefits.” —Alan Perlis, Epigrams on Programming, ACM’s SIGPLAN Notices Volume 17, No. 9, September 1982, pages 7-13


program structures

    Control structures and data structures are the stuff that programs are made of.

  • control strutures
  • data strutures

    This is an informal discussion that will serve as background so that the reader will understand the context of the next few chapters.

    In modern computer programming (called structured programming there are three basic control structures:

  1. sequence
  2. choice or decision
  3. loop or repetition


control structures

    Control structures are used to control the flow of a program.

    This is an informal discussion that will serve as background so that the reader will understand the context of the next few chapters.

    In modern computer programming (called structured programming there are three basic control structures:

  1. sequence/li>
  2. choice or decision/li>
  3. loop or repetition/li>

sequence

    A sequnce is a series of steps. Classic examples include, calculations, manipulations of data, and running subroutines or functions.

    One very important note is that in structured programming, a sequence must have only one entry point and only one exit point. While structured programming does allow the use of three different control structures, a group or block of program commands must always have exactly one entry and exactly one exit.

    A common example of a sequence is finding an average of several numbers. The following example assumes exactly five total numbers being added.

    1.a. set the running subtotal to zero (an example of initialization)
    1.b. set the count of numbers to zero (an example of initialization)
    2.a. add the first number to the running subtotal
    2.b. add one to the count of numbers (an example of incrementing)
    3.a. add the second number to the running subtotal
    3.b. again increment (add one to) the count of numbers
    4.a. add the third number to the running subtotal
    4.b. increment (add one to) the count of numbers
    5.a. add the fourth number to the running subtotal
    5.b. increment the count of numbers
    6.a. add the fifth number to the running subtotal
    6.b. increment the count of numbers
    7. divide the running subtotal by the count of numbers

    The following shows the example with actual numbers:

step next
number
running
subtotal
count of
numbers
1a N/A 0 -
1b N/A 0 0
2a 6 0+6=6 0
2b - 6 0+1=1
3a 2 6+2=8 1
3b - 8 1+1=2
4a 5 8+5=13 2
4b - 13 2+1=3
5a 9 13+9=22 3
5b - 22 3+1=4
6a 3 22+3=25 4
6b - 25 4+1=5
25 / 5 = 5 the average is 5

    The most simple or elementary steps of a sequence or things like the simple computations in the example above or other basic computer operations. other examples would be making a copy of a piece of data (if this was a formal discussion, I’d call it a piece of datum), a single logical operation (AND, OR, NOT, etc, if you recognize those), a rotation or shift, a single text substittion, concatenating text, or the deletion, insertion, or modification of a piece of data into a data base, list, tree, or other data structure.

    If the programming language allows combining several operations into a single instructure, the instruction is often viewed as a signle sequential step. For example, the following computation includes two multiplications, an exponentiation, an addition, and a subtraction, but is viewed collectively as a single sequential step (note that because this is an informal discussion, none of these examples are intended to be from any actual programming language):

x = 3a2 + 2b - c
or
x := 3*a**2 + 2*b - c;

    In computer programming, sequential steps can be many things other than arithmetic cmputations.

    The grouping of a sequence of steps (as well as a grouping with any combination of the three basic control structures) is often referred to as a block of code. The block of code may be either a concept or may be an actual programming structure in the programming language. A block of code as a single group can be viewed as a sequential step, even if the block internally contains other control structures.

    A much more powerful version of a sequential step is a subroutine or function. Note that in some programming languages there is no disticntion between a subroutine and a function, while in other programming languages there is a strong difference between the two. We won’t get into those details here.

    The basic idea of a function is similar to the basic idea of a function in algebra. A function takes a value (or sometimes more than one value) as an input and produces a single answer or value as a result. Common examples include trignometric functions, such as sine, cosine, and tangent, which might appear as follows:

x = cos(a) + sin(b) + tan(c)

    While almost all modern computers have a single hardware instruction for trignometric functions, most functions have to be written by a programmer, whether they are built-in (supplied by the compiler) or user defined (written by the programmer).

    An example of a function that would have several internal (hidden) steps, as well as more than one input value, would be finding the greatest common denominator (see example below). Even though there are several steps hidden inside the function, the function is treated as a single sequential step.

x := gcd(a,b);

    A subroutine is also a set of instructions that are hidden away and treated as one group and therefore a single sequential step. For languages that distinguish between subroutines and functions, a subroutine normally doesn’t return a result, while a function normally does return a result.

    A subroutine can be used twhen the same piece of coding will appear in many different places in a prgram. Rather than writing it out each time individually, the subroutine would be written once and a call to the subroutine would be inserted everywhere you would have otherwise had to write it out repeatedly. This can make the program much easier to maintain, because it is much easier to fix, change, or update one subroutine than to search a giant program listing for every place the same little piece of coding appears.

    A subroutine can also be used to assist in creating a top-down structured program, increasing readability.

    The use of subroutines as single sequential steps allow for decomposing a complex program from top-down, a basic method in modern structured programming. Consider the following example:

    Initialize;
    OpenDataFiles;
    ReadDataFromFiles;
    CloseFiles;
    PerformCalculations;
    SortData;
    PrepareReport;
    PrintReport;
    Finish;

    Even a non-programmer could probably figure out what is going on and maybe even give useful feedback on important steps that might be missing. By hiding all the details of each major step, it is easier for the programmer to visualize the structure of the program, making it more likely that the programmer will create a bug-free program that other programmers can easily modify for changing circumstances for years or decades into the future.

    Using the idea of functional decomposition, the programmer might also use this same approach of a sequence of subroutines at lower levels. For example, consider the subroutine PrepareReport, which might be:

    Initialize;
    PrepareReportHeader;
    PrepareMainSection;
    PrepareExceptionSection;
    PrepareReportFooter;
    Finish;

    Obviously each of these sequential steps is its own subroutine with its own hidden steps.

    Another important concept is the utility. There are some functions or subroutines that are so commonly used that they are considered to be utility routines or utlity function. The built-in functions of a programming language are a good example. A group of programmers will often share a library of commonly used utility functions and subroutines.

choice or decision

    A decision is the act of making a choice. The two most commonly encountered decisions are the IF-THEN-ELSE and the CASE or SWITCH. An IF-THEN-ELSE is a straightforward decision between two different chocies. A CASE or SWITCH is picking an option between a long list of choices.

    The IF-THEN structure is pretty straght forward. Your program tests to see if something is true. If the test is true, then your program does something (usually a sequence, although it can be any combination of the three control structures).

    For example, if you are tall enough, then you may go on the amusement park ride (which also implies that if you are not tall enough, then you can’t)..

    The thing being tested in an IF is called the condition.

    The two most common tests are for the condition to be TRUE or for the condition to be FALSE. Sometimes it is easier to write a program with a negative test (testing for a FALSE condition) rather than a positive test (testing for a TRUE condition). Sometimes it makes the program easier to read and understand (and therefore easier and less expensive to maintain over years or decades) for the test to be for a FALSE condition.

    While the basic control structure is testing for TRUE or FALSE, the test condition often can be for things slightly more complex conceptually, such as equality (are two thing sexactly equal), inequality (greater than or less than), length, time, emptiness, fullness, completion, incompletion, etc.

    Note also that we can have a simple test (only one condition is tested) or a compound test (multiple conditions are tested). Compound tests have potential problems in different programming languages, which will be discussed in detail when we get to the appropriate section.

    The THEN part is what is done if the condition passes (I’d say if the condition is true but you might have been testing for the condition to be false). The THEN part can be a single sequential step, a block (or group) of sequential steps, a block or group of any combination ot the three control structures, or a function or subroutine.

    After the test is completed and the action possibly taken, the program continues on to the next item in the sequence of steps. This is an important feature of structured programming: exactly one entry and exactly one exit. It is a disaster for maintenance if the program can start wandering all over the place.

    Using our average example, we might be interested in testing for zero divide before performing a division. Both elementary algebra and computer hardware reject any attempt to divide by zero (and the computer hardware can reject your attempt by freezing or otherwise crashing).

    In our sequence example for finding an average, we never actually stated where the numbers to be averaged came from. They just were there when we needed them. What would happen if we never received the input numbers? When we attempted to do the division of the subtotal by the count, the count could have still been zero. The program crashes.

    We could place a test for the count being zero just before we attempt to divide:

    IF count NOT EQUAL zero (0)
        THEN average = subtotal / count;

    This avoids the program crashing.

    We could also place a test for the existence of the numbers, somewhere near the start of the sequence (as one of the initialization steps). We would then place the entire rest of the block of sequential steps into the THEN portion, skipping the entire process if we don’t actually have our five input numbers. This approach would not only avoid a divide by zero crash, but also avoid wasting a bunch of computation time on something that is going to fail anyway.

    The IF-THEN-ELSE structure gives two different paths, which must combine back together at the end to maintain structured programming.

    To use our average example, we could text for the count not being zero. If the count is not zero, we perform the THEN portion (single sequential step, block of code, subroutine, etc.), which in this example would be the division step. If the count is zero, then we would perform the ELSE portion (single sequential step, block of code, subroutine, etc.).

    In this case, we could report that there is an error, so that the reader will know why he or she didn’t receive an answer for the average.

    IF count NOT EQUAL zero (0)
        THEN average = subtotal / count
    ELSE
        REPORT an ERROR;

    I previsously mentionedcompund testing (testing for more than one condition at a time). Another variation is the nested test. In the nested IF, we make a test and then in either the THEN or the ELSE part (or both), we follow up with an additional test. The nesting can be continued to as many levels as needed.

    As an example, say we were searching through a list of people for either a blue-eyed man or a brown-eyed woman. At the top level fo the IF-THEN-ELSE we would test for man or woman. Then we would nest additional testing. If the THEN part was for the condition of the person being a woman, the nested test would be for brown eyes, and in the ELSE part (the person is a man, assuming only two choices) we would test for blue eyes.

    IF person is a WOMAN
        THEN IF person has brown eyes
            THEN successful match (woman with brown eyes)
            ELSE unsuccessful match (woman without brown eyes)
        ELSE (must be a MAN) IF person has blue eyes
            THEN successful match (man with blue eyes)
            ELSE unsuccessful match (man without blue eyes)

    Nested IFs can become very complicated and difficult to read and prone to mismatch errors. This will be discussed more in the appropriate section.

    The other important decision structure is the CASE structure. Technically there is no need for a case structure, because all case structures can be written as a NESTED IF structure instead. Because readability is vital for successful long term maintenance of a prorgam, most modern langauges have a CASE structure (sometimes called a SWITCH structure).

    In the basic CASE structure, we have one test and many possibilities. Each possibility is matched with a single sequential step, block of structured code, or subroutine. Normally only one of the choices is performed.

    An example using a test for eye color:

    CASE of eye color:
        BLUE EYES:     do the blue eyes block of code
        BROWN EYES:    do the brown eyes block of code
        GREEN EYES:    do the green eyes block of code
        GRAY EYES:     do the gray eyes block of code
        OTHERWISE:     do special exception block of code

    Notice the existence of the OTHERWISE choice. This covers the exception cases where you had unexpected results.

    Without getting into messy details now, note that some programming languages allow a fall through. For example, we might have wanted to do one extra step for blue eyes than we did for brown eyes, but that otherwise the two choices shared the same steps. We could have the one extra step occur in the blue eyes case, then fall through to the brown eyes and do all of the brown eyes steps. If the CASE determined that there were brown eyes, it would skip the extra blue eyes step and only do the brown eyes steps. I hope I didn’t just confuse you. If that doesn’t make sense, don’t worry about it for now.

loop or repetition

    A loop or repetition is the act of repeating an action or actions. A lot of the power of computers is being able to do repetitive choices over and over (sometimes millions or more times).

    This is one of the most important and powerful features of a computer — the ability to repeatedly do something over and over in really large numbers of repetitions.

    The most basic loop or repetition structure is known as the DO LOOP. This is also called the FOR LOOP. Another name for this is the Iterative DO.

    In the DO LOOP, you instruct the computer exactly how many times you want it to repeat a particular task and it does it exactly that many times (assuming no crashes or bugs).

    Please remember that the examples in this informal discussion are not from any actual programming language. These are designed to illustrate the basic principles.

    Using our average example from above, you will notice that after the first (initialization) steps (1.a and 1.b.) and before the last (final computation) step (7), that each of the pairs of steps are exactly the same. Instead of writing out each and every step, we could describe the two basic steps and put them into a DO LOOP:

    Initialize;
    BEGIN
        Subtotal := 0;
    END;
    DO five times;
    BEGIN
        Add the next number to the Subtotal;
    END;
    Finish;
    BEGIN
        Answer is Subtotal divided by 5;
    END;

    In most programming languages there is some variation of a loop count variable. The number of times to loop is established by setting the initial value of the loop count variable and setting the limit for the the number of times to repeat the DO LOOP.

    Initialize;
    BEGIN
        Subtotal := 0;
        IF there are no numbers available to average
            THEN REPORT an ERROR and STOP;
    END;
    DO Count := 1 TO 5;
    BEGIN
        Add the next number to the Subtotal;
    END;
    Finish;
    BEGIN
        Answer is Subtotal divided by Count;
    END;

    You may notice that we started the count at one rather than zero. We did this to have the DO LOOP perform exactly five repetitions with the count of numbers matching the standard counting numbers.

    You may also notice that we used the Count outside of the actual DO LOOP. Note that in many prgramming languages you can not actually use the loop count variable outside the loop (although you can usuually use it inside the DO LOOP). A common work-around is to create an extra copy of the count variable that we can use outside the DO LOOP:

    Initialize;
    BEGIN
        Subtotal := 0;
        NumberCount := 0;
        IF there are no numbers available to average
            THEN REPORT an ERROR and STOP;
    END;
    DO LoopCount := 1 TO 5;
    BEGIN
        Add the next number to the Subtotal;
        Increment (add one to) the NumberCount;
    END;
    Finish;
    BEGIN
        Answer is the Subtotal divided by the NumberCount;
    END;

    The loop count variable is considered to have a start value and a finish value. The loop count variable is often called the control variable. The starting value for the control variable is often called the initial value. The finishing value for the control variable is often called the limit value.

    Some programming languages will allow greater flexibility in how the loop count variable is used. Many programming languages give the choice of counting up or counting down. Computers are really good at counting down to zero. This shouldn’t matter much with modern computers (where the ability to easily read and understand someone else’s program is far more important than a nanosecond saved somewhere), but you will certainly encounter this when working on old programs from back in the days when this time difference was important.

    Most prgramming languages allow the initial value and limit value to be variables. This allows the flexibility of the program deciding these values while it is running rather thaan the programmer having to know the values in advance when the program is written. This is known as the difference between run time and compile time.

    Some programming languuages allow the step to be something other than a step up or down by one (increment and decrement). An example might be a loop that only works on odd or even numbers (the initial value might be one (1) for odd numbers and either zero (0) or two (2) for even numbers).

    In the following example, the loop would be performed a total of 10 times and each time through the loop the count would be a multiple of ten (for some reason we need multiples of ten in the work we are doing).

DO COUNT := 10 to 100 STEP UP BY 10;

    If the programming language allows for steps other than increment, the size of the step is known as the step value or the modification value. There is also the possibility that the step value can be a variable and determined at run time.

    Some programming languages allow a lot more variations, such as counting from 1 to 99 by steps of 1 and then coutning from 100 to 1,000 by steps of 100., or even more complicated versions. Because this is an informal discussion, I won’t get into those complexities here, but I wanted you to know that they can exist.

    The next important repetition or loop structure is the DO WHILE loop.

    Historically the first looping structure available was the iterative do loop. The DO WHILE loop was the form of loop used in the original proof of the concept of structured programming.

    In the DO WHILE loop there is a condition test and the loop is repeated over and over again as long as the condition remains true.

    We can use the DO WHILE loop to add flexibility to our average example. So far we have always had to have exactly five numbers to average. Using this method, we would need to write a whole new program for a list of six numbers to average and another program for a list of ten numbers to average. This would get ridicously out of hand in a hurry.

    By using a DO WHILE loop, we can write one program for any size list of numbers to average:

    Initialize;
    BEGIN
        Subtotal := 0;
        Count := 0;
        IF there are no numbers available to average
            THEN REPORT an ERROR and STOP;
    END;
    DO WHILE there are still numbers available;
    BEGIN
        Add the next number to the Subtotal;
        Increment (add one to) the Count;
    END;
    Finish;
    BEGIN
        IF count NOT EQUAL zero (0)
            THEN average := subtotal / count;
        ELSE
            Report ERROR;
    END;

    An important note regarding the DO WHILE loop (and loops in general) is that the loop needs to come to an end some time. If the loop never ends it is called an infinite loop and is one of the reasons that a program or computer freezes. The computer hasn’t actually stopped working (which is implied by the word freeze). Rather the program is stuck in a loop that it will never finish (until you turn the computer off, reboot, stop the program, etc.).

    It is important to check for and avoid inifinite loops. Make sure that any loop you write will eventually come to an end.

    The third major repetition or loop structure is the DO UNTIL loop.

    In the DO UNTIL loop there is a condition test and the loop is repeated over and over again until the condition becomes true.

    The DO UNTIL is actually not an appropriate choice for our avaerage example, but it will work and you are already familiar with this example.

    Initialize;
    BEGIN
        Subtotal := 0;
        Count := 0;
        IF there are no numbers available to average
            THEN REPORT an ERROR and STOP;
    END;
    DO UNTIL there are no more numbers available;
    BEGIN
        Add the next number to the Subtotal;
        Increment (add one to) the Count;
    END;
    Finish;
    BEGIN
        IF count NOT EQUAL zero (0)
            THEN average := subtotal / count;
        ELSE
            Report ERROR;
    END;

    As with the the DO WHILE loop, it is vitally important to make sure that your DO UNTIL loop will eventually end and avoid putting your program into an infinite loop.

    The Iterative DO LOOP is perfomed the exact number of times the programmer specified.

    The DO WHILE loop is perfomed zero or more times.

    The DO UNTIL loop is perfomed at least once.

    The key difference between the DO WHILE loop and the DO UNTIL loop is that the condition test comes at the beginning of the DO WHILE loop and the condition test comes at the end of the DO UNTIL loop.

    The DO WHILE loop might never be performed (if the test condition fails before the first attempt to loop).

    The DO UNTIL loop will always be performed at least once (because the condition test occurs after the first pass of the loop).

    Just as it was possible to have nested IF structures, it is also possible to have nested loop structures, placing one or more loops inside another loop. See the following example:

    DO OutsideCount := 1 TO 100 STEP UP BY 1;
    BEGIN outer loop
        DO InsideCount := 1 TO 1000 STEP UP BY 1;
        BEGIN inner loop
            some sequence of steps
        END inner loop;
    END outer loop;
    STOP PROGRAM.

    It is possible to nest the loops more than two deep. Notice that we use indentation to make it clear to the reader which loop (inner or outer) we are talking about. Your computer doesn’t need the indentation, but anyone trying to read and modify your program a few years later (even you) will really appreciate the indentation.

    It is possible to have the inner and outer loops be different kinds, such as an Iterative DO LOOP inside a DO WHILE LOOP.


data structures

    Control structures and data structures are the stuff that programs are made of.

    This is an informal discussion that will serve as background so that the reader will understand the context of the next few chapters.


building blocks of a program

    The terminology and details for constructing a program vary by language, but tend to follow a few common principles. The following presentation is very informal. All of these items will be discussed in more detail later.

    Programs are created from a series of characters (letters, digits, punctuation, and other characters) from the source character set.

    Characters are grouped into tokens. Each token, which can be one or more characters, has a distinct meaning in the language. Comments and white space are generally not considered to be tokens. Depending on how a language defines tokens, they can include operators, literals, keywords (or reserved words), and identifiers.

C

    C has five classes of tokens: operators, separators, identifiers, reserved words, and constants.

    A token is distinguished by the fact that if you try to break it up into smaller elements that it will lose or change its meaning. To use a simple example, the characters 3.5 collectively have the clear meaning of three and a half, but individually mean three, period, and five. A token is indivisible.

    Literals are symbols used to indicate a specific value. Anytime you encounter a raw number, such as 2 or 573.9901, you are seeing an example of a numeric literal. There are other kinds of literals beyond just numbers.

    Operators are symbols (usually one or two characters) that indicate an operation (such as the plus sign [+] for addition).

    Separators are symbols that are used to separate portions of your source code. Some languages don’t use this concept. Some languages consider whitespace to be a separator, while other languages don’t (even if whitespace is used to separate tokens).

    Key words or reserved words are the commands (and related terminology) of a programming language (such as the ubiquitious IF command). Reserved words can not be used for any other purpose. Some languages (such as PL/I) allow key words to be reassigned to other purposes (which creates massive confusion and is a horribly bad programming practice). In most languages, there is no distinction between the terms keywords and reserved words.

    Identifiers are names. This includes the names of programs, modules, procedures, functions, variables, constants, etc. In some languages this may include reserved words or key words. Note that in the c programming language, constants are not considered an identifier.

    Tokens can be grouped together into statements. Some languages have subgroups of statements called expessions.

    In some languages (such as C) there is a special character (or characters) that terminates (or marks the end of) a statment. In some languages (such as Pascal) there is a special character (or characters) that seperates statements. This is a subtle but important difference. Some languages (such as FORTRAN) are line oriented, with one statement to a line (with a continuation character to allow for multiple line statements).

    Statements may (in most modern languages) be grouped into blocks of code.

    Most modern programming languages have a header and a body, although few call them by those exact names.

    Generally declarations go into the header and statements and blocks of code go into the body.


SECTION 2

    This section examines advanced topics.

    At this point in the book you should be able to determine whether I have the programming knowledge and the writing skill to realistically create a free downloadable college text book on computer programming. Possibly a large corporation or someone who is rich might want to donate a few hundred dollars to support me at the poverty level so that I can complete this book quickly. Possibly a business owner within reasonable walking distance of south east Costa Mesa, California, might be willing to provide me with a paying job. I will do almost any work that is legal, ethical, and safe (even things like washing dishes and mopping floors). I will work for minimum wage. I will work hard. I am a legal natural born citizen of the United States.


Boolean algebra
and logic

    Boolean algebra is named for George Boole, who introduced the ideas in the 1854 work “An Investigation of the Law of Thought”. Claude Shannon showed the application of Boolean algebra to switching circuits in the 1938 work “Symbolic Analysis of Relay and Switching Circuits”.

    Major applications of Boolean algebra include:

  1. truth calculus
  2. switching algebra
  3. set theory (algebra of classes)

    Boolean algebra is a pure mathematical system that deals with perfect abstracts.

    Real world logic circuits are physically imperfect implementations of Boolean algebra. Electrical problems (such as noise, interference, and heat) can cause failure. Delays in signals reaching certain locations (such as slew rate and propogation delay) can slow down cmputers and logic circuits and can produce nightmarish problems for computer and circuit designers. As processors and logic circuits shrink in size, quantum effects can even start to interfere with correct operation.

simple summary

    Boolean algebra is binary.

    Objects can be one of two values: 1 or 0; true or false; high or low; positive or negative; closed or open; or any other pair of binary values.

    The basic Boolean operations are AND, OR, and NOT.

    The following chart shows the major interpretations of Boolean algebra:

Boolean
Algebra
Truth Calculus Switching Algebra Logic Circuit Set Theory
Algebra of Classes
·
multiplication
AND series circuit intersection
+
addition
OR parallel circuit union
0 F FALSE open circuit   S(Z) null set
1 T TRUE short circuit   S(U) universal set
_
A
¬A NOT A normally closed switch C(S) complemented set

    AND requires both objects to be true for the result to be true. The AND works like a pair of switches in series. Both switches must be closed for current to flow.

    AND is conisdered to be Boolean multiplication and is represented by the middle dot symbol: · (such as A·B). As in ordinary algebra, AND (Boolean multiplication) can be written by dropping the middle dot (such as AB). There is no Boolean division operation.

    The truth table for AND is as follows:

AND
ABresult
000
100
010
111

    The AND gate in logic circuits looks like:

    The AND operation (Boolean multiplication) has the same results as ordinary arithmetic multiplication..

    The AND operation has a result of 0 when any of its input variables is 0.

    The AND operation has a result of 1 only when both of its input variables are 1.

    OR (or inclusive or) requires either object to be true for the result to be true. The OR works like a pair of switches in parallel. Current will flow if either or both switches are closed.

    OR is conisdered to be Boolean addition and is represented by the plus symbol: + (such as (A+B). There is no Boolean subtraction operation.


    The truth table for OR is as follows:

OR
ABresult
000
101
011
111

    The OR gate in logic circuits looks like:

    The OR operation has a result of 1 when any of its input variables is 1.

    The OR operation has a result of 0 only when both of its input variables are 0.

    In OR (Boolean addition) 1 + 1 = 1. Similarly, 1 + 1 + 1 = 1.

    NOT (also called negation or complement) simply reverses the value of an object, changing true into false and changing false into true.

    The truth table for NOT is as follows:

NOT
Aresult
01
10

    The NOT gate (or inverter) in logic circuits looks like:

    As in ordinary algebra, in mixed expressions, all ANDs (Boolean multiplication) are performed before ORs (Boolean addition). For example, A+B·C is evaluated by ANDing B with C and then ORing A with the result of the first operation (BC).

    Parenthesis can be used to change the ordinary order of evaluation. For example, (A+B)·C is evaluated by ORing A with B and then ANDing C with the result of the first operation (A+B). Parenthesis can be used for clarity.

    Negation of a single variable or object is done before using the result in an expression. Negation of an entire expression is done after the expression is evaluated.

    XOR (or exclusive or) is similar to the normal English meaning of the word “or” — a choice between two items, but not both or none. XOR is less commonly written EOR. The symbol for the XOR operation is .

    The truth table for XOR (exclusive OR) is as follows:

XOR
ABresult
000
101
011
110

    The XOR gate in logic circuits looks like:

    XOR is not considered to be a basic Boolean operation, but is widely used in logic expressions. XOR is the same as ( (A) · (¬B) ) + ( (¬A) · (B ) ), extra parenthesis added for clarity.

    NAND is the combination of a NOT and an AND. NAND produces the oppposite of an AND.

    The truth table for NAND (Not AND) is as follows:

NAND
ABresult
001
101
011
110

    The NAND gate in logic circuits looks like:

    NOR is the combination of a NOT and an OR. NOR produces the opposite of OR.

    The truth table for NOR (Not OR) is as follows:

NOR
ABresult
001
100
010
110

    The NOR gate in logic circuits looks like:

    XNOR (or NXOR) is the combination of a NOT and a XOR. XNOR produces the opposite of XOR.

    The truth table for XNOR (Not eXclusive OR) is as follows:

XNOR
ABresult
001
100
010
111

    The XNOR gate in logic circuits looks like:


postulates

Boolean algebra

    Boolean algebra is an algebraic system consisting of binary elements and binary operations.

    The postulates of Boolean algebra provide the foundation for the entire system. The order of the postulates varies greatly from author to author. Some parts of postulates are not strictly necessary (might be derived as theorems instead), but in introductory materials such as this one, are filled out to make details clear to students.

    Basic set: X, Y, and Z are elements of the set S.
Note that some authorities use the elements A, B, and C instead.

    Equivalence: Equivalence is defined for the set S such that:
         if X = Y and Y = Z
         then X = Z

    Operations: The operations + (Boolean addition) and · (Boolean multiplication) are defined such that:
        X + Y and X · Y are in the set S
NOTE: These two operations were informally introduced in the introduction chapter.

    Values: All elements in the set S will take on the valuation:
        S = (0,1)
NOTE: These two values are often interpreted as false (0) and true (1).

    Complement Law: Each member of the set S has an inverse (or compliment), such that when X = 0, then ¬X = 1
NOTE: The compliment is also indicated by a “tick” mark after the variable (X') and by placing a bar (or horizontal line) over a variable.
NOTE: The Complement Law produces the following important relationships:
        X · ¬X = 0
        X + ¬X = 1

    Identity Law: The value 1 is the identity for Boolean multiplication and the value 0 is the identity for Boolean addition.
NOTE: The Identity Law produces the following important relationships:
        X + 0 = X
        X + 1 = 1

    Cummulative Law: Boolean addition and Boolean multiplication are both cummulative:
        X + Y = Y + X
        X · Y = Y · X

    Distributive Law: The Distributive Law in Boolean algebra highlights its different nature from normal linear algebra (this law is not true for normal algebra):
        X · ( Y + Z ) = ( X · Y ) + ( X · Z )
        X + ( Y · Z ) = ( X + Y ) · ( X + Z )

    There are slight variations and alternate axiomatizations in presentations of Boolean algebra. The following are often included as axioms in some works. These can be proven from the above. If these axioms are used, then some of the above might become theorems.

    Duality Principle: Duality holds that for any valid expression of identity, the resulting expression obtained by interchanging 1 and 0 and · and +, is a valid dual.
NOTE: This gives:
        for the expression: X + ¬X = 1
        the dual is: X · ¬X = 0

    Idempotent Law: Property of a variable operating on itself.
        X + X = X
        X · X = X

    Associative Law: Boolean addition and Boolean multiplciation are both associative.
        X · ( Y · Z ) = ( X · Y ) · Z
        X + ( Y + Z ) = ( X + Y ) + Z

    Absorption Law:
        X · ( X + Y ) = X
        X + ( X · Y ) = X

    Null Law:
            X + 1 = 1
            X · 0 = 0

    Note that Boolean algebra does not have a subtraction or a division operation, but it does have a complement operation that isn’t found in normal linear algebra.


Assembly Language

introduction

    This section examines assembly languages in a general manner. Specific examples of addressing modes and instructions from various processors are used to illustrate the general nature of assembly language.

general

    Unlike the other programming languages catalogued here, assembly language is not a single language, but rather a group of languages. Each processor family (and sometimes individual processors within a processor family) has its own assembly language.

    In contrast to high level languages, data structures and program structures in assembly language are created by directly implementing them on the underlying hardware. So, instead of catalogueing the data structures and program structures that can be built (in assembly language you can build any structures you so desire, including new structures nobody else has ever created), we will compare and contrast the hardware capabilities of various processor families.

    This chapter does not attempt to teach how to program in assembly language. Because of the close relationship between assembly languages and the underlying hardware, this chapter will discuss hardware implementation as well as software.

history

    history: The oldest non-machine language, allowing for a more human readable method of writing programs than writing in binary bit patterns (or even hexadecimal patterns).

comparison of assembly and high level languages

    Assembly languages are close to a one to one correspondence between symbolic instructions and executable machine codes. Assembly languages also include directives to the assembler, directives to the linker, directives for organizing data space, and macros. Macros can be used to combine several assembly language instructions into a high level language-like construct (as well as other purposes). There are cases where a symbolic instruction is translated into more than one machine instruction. But in general, symbolic assembly language instructions correspond to individual executable machine instructions.

    High level languages are abstract. Typically a single high level instruction is translated into several (sometimes dozens or in rare cases even hundreds) executable machine language instructions. Some early high level languages had a close correspondence between high level instructions and machine language instructions. For example, most of the early COBOL instructions translated into a very obvious and small set of machine instructions. The trend over time has been for high level languages to increease in abstraction. Modern object oriented programming languages are highly abstract (although, interestingly, some key object oriented programming constructs do translate into a very compact set of machine instructions).

    Assembly language is much harder to program than high level languages. The programmer must pay attention to far more detail and must have an intimate knowledge of the processor in use. But high quality hand crafted assembly language programs can run much faster and use much less memory and other resources than a similar program written in a high level language. Speed increases of two to 20 times faster are fairly common, and increases of hundreds of times faster are occassionally possible. Assembly language programming also gives direct access to key machine features essential for implementing certain kinds of low level routines, such as an operating system kernel or microkernel, device drivers, and machine control.

    High level programming languages are much easier for less skilled programmers to work in and for semi-technical managers to supervise. And high level languages allow faster development times than work in assembly language, even with highly skilled programmers. Development time increases of 10 to 100 times faster are fairly common. Programs written in high level languages (especially object oriented programming languages) are much easier and less expensive to maintain than similar programs written in assembly language (and for a successful software project, the vast majority of the work and expense is in maintenance, not initial development).

    For information on how to interface high level languages and assembly languages, see the chpater on assembly/high level language interface<

availability

    availability: Assemblers are available for just about every processor ever made. Native assemblers produce object code on the same hardware that the object code will run on. Cross assemblers produce object code on different hardware that the object code will run on.

structure

    format: free form or column (depends on the assembly langauge)

    nature: procedural language with one to one correspondence between language mnemonics and executable machine instructions.

    “Assembler languages occupy a unique place in the computing world. Since most assmebler-language statements are symbolic of individual machine-language instructions, the assembler-language programmer has the full power of the computer at his disposal in a way that users of other languages do not. Because of the direct relationship between assembler language and machine language, assembler language is used when high efficiency of programs is needed, and especially in areas of application that are so new and amorphous that existing program-oriented languages are ill-suited for describing the procedures to be followed.” —Assembler Language Programming by George W. Struble, page vii

    “Perhaps the most glaring difference among the three types of languages [high level, assembly, and machine] is that as we move from high-level languages to lower levels, the code gets harder to read (with understanding). The major advantages of high-level languages are that they are easy to read and are machine independent. The instructions are written in a combination of English and ordinary mathematical notation, and programs can be run with minor, if any, changes on different computers.” —VAX-11 Assembly Language Programming by Sara Baase, page 1

    “The second most visible difference among the different types of languages is that several lines of assembly language are needed to encode one line of a high-level language program.” —VAX-11 Assembly Language Programming by Sara Baase, page 2

    “There are a number of situations in which it is very desirable to use assembler language routines to do part of a job, and use some higher-level language for other parts. It makes sense to use higher-level languages such as Fortran, COBOL, or PL/I for parts of procedures for which they are well-suited, and supplement with assembler language routines for those parts of procedures for which the higher-level language is awkward or inefficient.” —Assembler Language Programming by George W. Struble, page 427

    “If one has a choice between assembly language and a high-level language, why choose assembly language? The fact that the amount of programming done in assembly language is quite small compared to the amount done in high-level languages indicates that one generally doesn’t choose assembly language. However, there are situations where it may not be convenient, efficient, or possible to write programs in hihg-level languages. … Programs to control and communicate with peripheral devices (input and output devices) are usually written in assembly language because they use special instructions that are not available in high-level languages, and they must be very efficient. Some systems programs are written in assembly language for similar reasons. In general, since high-level languages are designed without the features of a particular machine in mind and a compiler must do its job in a standardized way to accomodate all valid programs, there are situations where to take advantage of special features of a machine, to program some details that are inaccessible from a high-level language, or perhaps to increase the efficiency of a program, one may reasonably choose to write in assembly language.” —VAX-11 Assembly Language Programming by Sara Baase, page 3-4

    “In situations where programming in a high-level language is not appropriate, it is clear that assembly language is to be preferred to machine language. Assembly language has a number of advantages over machine code aside from the obvious increase in readability. One is that the use of symbolic names for data and instruction labels frees the programmer from computing and recomputing the memory locations whenever a change is made in a program. Another is that assembly languages generally have a feature, called macros, that frees the [programmer] from having to repeat similar sections of code used in several places in a program. Assemblers fo many bookkeeping and other tasks for the user. Often compilers translate into assembly language rather than machine code.” —VAX-11 Assembly Language Programming by Sara Baase, page 3

kinds of processors

    Processors can broadly be divided into the categories of: CISC, RISC, hybrid, and special purpose.

    Complex Instruction Set Computers (CISC) have a large instruction set, with hardware support for a wide variety of operations. In scientific, engineering, and mathematical operations with hand coded assembly language (and some business applications with hand coded assembly language), CISC processors usually perform the most work in the shortest time.

    Reduced Instruction Set Computers (RISC) have a small, compact instruction set. In most business applications and in programs created by compilers from high level language source, RISC processors usually perform the most work in the shortest time.

    Hybrid processors are some combination of CISC and RISC approaches, attempting to balance the advantages of each approach.

    Special purpose processors are optimized to perform specific functions. Digital signal processors, graphics chips, and various kinds of co-processors are the most common kinds of special purpose processors.

executable instructions

    There are four general classes of machine instructions. Some instructions may have characteristics of more than one major group. The four general classes of machine instructions are: computation, data transfer, sequencing, and environment control.

  •     “Computation: Implements a function from n-tuples of values to m-tuples of values. The function may affect the state. Example: A divide instruction whose arguments are a single-length integer divisor and a double-length integer dividend, whose results are a single-length integer quotient and a single-length integer remainder, and which may produce a divide check interrupt.” —Compiler Construction, by William M. Waite and Gerhard Goos, page 52
  •     “Data Transfer: Copies information, either within one storage class or from one storage class to another. Examples: A move instruction that copies the contents of one register to another; a read instruction that copies information from a disc to main storage.” —Compiler Construction, by William M. Waite and Gerhard Goos, page 52
  •     “Sequencing: Alters the normal execution sequence, either conditionally or unconditionally. Examples: a halt instruction that causes execution to terminate; a conditional jump instruction that causes the next instruction to be taken from a given address if a given register contains zero.” —Compiler Construction, by William M. Waite and Gerhard Goos, page 53
  •     “Environment control: Alters the environment in which execution is carried out. The lateration may involve a trasnfer of control. Examples: An interrupt disable instruction that prohibits certain interrupts from occurring; a procedure call instruction that updates addressing registers, thus changing the program’s addressing environment.” —Compiler Construction, by William M. Waite and Gerhard Goos, page 53

    Executable instructions can be divided into several broad categories of related operations:


data representation

    This chapter examines data representation and number systems for assembly languages.

data representation

    Most data structures are abstract structures and are implemented by the programmer with a series of assembly language instructions. Many cardinal data types (bits, bit strings, bit slices, binary integers, binary floating point numbers, binary encoded decimals, binary addresses, characters, etc.) are implemented directly in hardware for at least parts of the instruction set. Some processors also implement some data structures in hardware for some instructions — for example, most processors have a few instructions for directly manipulating character strings.

    An assembly language programmer has to know how the hardware implements these cardinal data types. Some examples: Two basic issues are bit ordering (big endian or little endian) and number of bits (or bytes). The assembly language programmer must also pay attention to word length and optimum (or required) addressing boundaries. Composite data types will also include details of hardware implementation, such as how many bits of mantissa, characteristic, and sign, as well as their order. In many cases there are machine specific encodings for some data types, as well as choice of character codes (such as ASCII or EBCDIC) for character and string implementations.

data size

    The basic building block is the bit, which can contain a single piece of binary data (true/false, zero/one, north/south, positive/negative, high/low, etc.).

    Bits are organized into larger groupings to store values encoded in binary bits. The most basic grouping is the byte. A byte is the smallest normally addressable quantum of main memory (which can be different than the minimum amount of memory fetched at one time). In modern computers this is almost always an eight bit byte, so much so that many skilled programmers believe that a byte is defined as being always eight bits. In the past there have been computers with seven, eight, twelve, and sixteen bits. There have also been bit slice computers where the common memory addressing approach is by single bit; in these kinds of computers the term byte actually has no meaning, although eight bits on these computers are likely to be called a byte. Throughout the rest of this discussion, assume the standard eight bit byte applies unless specifically stated otherwise.

    A nibble is half a byte, or four bits.

    A word is the default data size for a processor. The default size does not apply in all cases. The word size is chosen by the processor’s designer(s) and reflects some basic hardware issues (such as internal or external buses). The most common word sizes are 16 and 32, but words have ranged from 16 to 60 bits. Typically there will be additional data sizes that are defined relative to the size of a word: halfword, half the size of a word; longword, usually double the size of a word; doubleword, usually double the size of a word (sometimes double the size of a longword); and quadword, four times the size of a word. Whether or not there is a space between the size designation and “word” is designated by the manufacturer, and varies by processor.

    Some processors require that data be aligned. That is, two byte quantities must start on byte addresses that are multiples of two; four byte quantities must start on byte addresses that are multiples of four; etc. The general rule follows a progression of exponents of two (2, 4, 8, 16, ƒ). Some processors allow data to be unaligned, but this usually results in a slow down in performance.

endian

    Endian is the ordering of bytes in multibyte scalar data. The term comes from Jonathan Swift’s Gulliver’s Travels. For a given multibyte scalar value, big- and little-endian formats are byte-reversed mappings of each other. While processors handle endian issues invisibly when making multibyte memory accesses, knowledge of endian is vital when directly manipulating individual bytes of multibyte scalar data and when moving data across hardware platforms.

    Big endian stores scalars in their “natural order”, with most significant byte in the lowest numeric byte address. Examples of big endian processors are the IBM System 360 and 370, Motorola 680x0, Motorola 68300, and most RISC processors.

    Little endian stores scalars with the least significant byte in the lowest numeric byte address. Examples of little endian processors are the Digital VAX and Intel x86 (including Pentium).

    Bi-endian processors can run in either big endian or little endian mode under software control. An example is the Motorola/IBM PowerPC, which has two separate bits in the Machine State Register (MSR) for controlling endian: the ILE bit controls endian during interrupts and the LE bit controls endian for all other processes. Big endian is the default for the PowerPC.

number systems

    Binary is a number system using only ones and zeros (or two states).

    Decimal is a number system based on ten digits (including zero).

    Hexadecimal is a number system based on sixteen digits (including zero).

    Octal is a number system based on eight digits (including zero).

    Duodecimal is a number system based on twelve digits (including zero).

binaryoctaldecimalduodecimalhexadecimal
00000
11111
102222
113333
1004444
1015555
1106666
1117777
100010888
100111999
10101210AA
10111311BB
1100141210C
1101151311D
1110161412E
1111171513F
1000020161410
1000121171511
1001022181612
1001123191713
1010024201814
1010125211915
1011026221A16
1011127231B17
1100030242018

number representations

integer representations

    Sign-magnitude is the simplest method for representing signed binary numbers. One bit (by universal convention, the highest order or leftmost bit) is the sign bit, indicating positive or negative, and the remaining bits are the absolute value of the binary integer. Sign-magnitude is simple for representing binary numbers, but has the drawbacks of two different zeros and much more complicates (and therefore, slower) hardware for performing addition, subtraction, and any binary integer operations other than complement (which only requires a sign bit change).

    In one’s complement representation, positive numbers are represented in the “normal” manner (same as unsigned integers with a zero sign bit), while negative numbers are represented by complementing all of the bits of the absolute value of the number. Numbers are negated by complementing all bits. Addition of two integers is peformed by treating the numbers as unsigned integers (ignoring sign bit), with a carry out of the leftmost bit position being added to the least significant bit (technically, the carry bit is always added to the least significant bit, but when it is zero, the add has no effect). The ripple effect of adding the carry bit can almost double the time to do an addition. And there are still two zeros, a positive zero (all zero bits) and a negative zero (all one bits).

    In two’s complement representation, positive numbers are represented in the “normal” manner (same as unsigned integers with a zero sign bit), while negative numbers are represented by complementing all of the bits of the absolute value of the number and adding one. Negation of a negative number in two’s complement representation is accomplished by complementing all of the bits and adding one. Addition is performed by adding the two numbers as unsigned integers and ignoring the carry. Two’s complement has the further advantage that there is only one zero (all zero bits). Two’s complement representation does result in one more negative number (all one bits) than positive numbers.

    Two’s complement is used in just about every binary computer ever made. Most processors have one more negative number than positive numbers. Some processors use the “extra” neagtive number (all one bits) as a special indicator, depicting invalid results, not a number (NaN), or other special codes.

    In unsigned representation, only positive numbers are represented. Instead of the high order bit being interpretted as the sign of the integer, the high order bit is part of the number. An unsigned number has one power of two greater range than a signed number (any representation) of the same number of bits.

bit patternsign-mag.one’s comp.two’s compunsigned
0000000
0011111
0102222
0113333
100-0-3-44
101-1-2-35
110-2-1-26
111-3-0-17

floating point representations

    Floating point numbers are the computer equivalent of “scientific notation” or “engineering notation”. A floating point number consists of a fraction (binary or decimal) and an exponent (bianry or decimal). Both the fraction and the exponent each have a sign (positive or negative).

    In the past, processors tended to have proprietary floating point formats, although with the development of an IEEE standard, most modern processors use the same format. Floating point numbers are almost always binary representations, although a few early processors had (binary coded) decimal representations. Many processors (especially early mainframes and early microprocessors) did not have any hardware support for floating point numbers. Even when commonly available, it was often in an optional processing unit (such as in the IBM 360/370 series) or coprocessor (such as in the Motorola 680x0 and pre-Pentium Intel 80x86 series).

    Hardware floating point support usually consists of two sizes, called single precision (for the smaller) and double precision (for the larger). Usually the double precision format had twice as many bits as the single precision format (hence, the names single and double). Double precision floating point format offers greater range and precision, while single precision floating point format offers better space compaction and faster processing.

    F_floating format (single precision floating), DEC VAX, 32 bits, the first bit (high order bit in a register, first bit in memory) is the sign magnitude bit (one=negative, zero=positive or zero), followed by 15 bits of an excess 128 binary exponent, followed by a normalized 24-bit fraction with the redundant most significant fraction bit not represented. Zero is represented by all bits being zero (allowing the use of a longword CLR to set a F_floating number to zero). Exponent values of 1 through 255 indicate true binary exponents of -127 through 127. An exponent value of zero together with a sign of zero indicate a zero value. An exponent value of zero together with a sign bit of one is taken as reserved (which produces a reserved operand fault if used as an operand for a floating point instruction). The magnitude is an approximate range of .29*10-38 through 1.7*1038. The precision of an F_floating datum is approximately one part in 223, or approximately seven (7) decimal digits).

    32 bit floating format (single precision floating), AT&T DSP32C, 32 bits, the first bit (high order bit in a register, first bit in memory) is the sign magnitude bit (one=negative, zero=positive or zero), followed by 23 bits of a normalized two’s complement fractional part of the mantissa, followed by an eight bit exponent. The magnitude of the mantissa is always normalized to lie between 1 and 2. The floating point value with exponent equal to zero is reserved to represent the number zero (the sign and mantissa bits must also be zero; a zero exponent with a nonzero sign and/or mantissa is called a “dirty zero” and is never generated by hardware; if a dirty zero is an operand, it is treated as a zero). The range of nonzero positive floating point numbers is N = [1 * 2-127, [2-2-23] * 2127] inclusive. The range of nonzero negative floating point numbers is N = [-[1 + 2-23] * 2-127, -2 * 2127] inclusive.

    40 bit floating format (extended single precision floating), AT&T DSP32C, 40 bits, the first bit (high order bit in a register, first bit in memory) is the sign magnitude bit (one=negative, zero=positive or zero), followed by 31 bits of a normalized two’s complement fractional part of the mantissa, followed by an eight bit exponent. This is an internal format used by the floating point adder, accumulators, and certain DAU units. This format includes an additional eight guard bits to increase accuracy of intermediate results.

    D_floating format (double precision floating), DEC VAX, 64 bits, the first bit (high order bit in a register, first bit in memory) is the sign magnitude bit (one=negative, zero=positive or zero), followed by 15 bits of an excess 128 binary exponent, followed by a normalized 48-bit fraction with the redundant most significant fraction bit not represented. Zero is represented by all bits being zero (allowing the use of a quadword CLR to set a D_floating number to zero). Exponent values of 1 through 255 indicate true binary exponents of -127 through 127. An exponent value of zero together with a sign of zero indicate a zero value. An exponent value of zero together with a sign bit of one is taken as reserved (which produces a reserved operand fault if used as an operand for a floating point instruction). The magnitude is an approximate range of .29*10-38 through 1.7*1038. The precision of an D_floating datum is approximately one part in 255, or approximately 16 decimal digits).


register set

    This chapter examines the use of registers in assembly language. Specific examples of registers from various processors are used to illustrate the general nature of assembly language.

register set

    Registers are fast memory, almost always connected to circuitry that allows various arithmetic, logical, control, and other manipulations, as well as possibly setting internal flags.

    Most early computers had only one data register that could be used for arithmetic and logic instructions. Often there would be additional special purpose registers set aside either for temporary fast internal storage or assigned to logic circuits to implement certain instructions. Some early computers had one or two address registers that pointed to a memory location for memory accesses (a pair of address registers typically would act as source and destination pointers for memory operations). Computers soon had multiple data registers, address registers, and sometimes other special purpose registers. Some computers have general purpose registers that can be used for both data and address operations. Every digital computer using a von Neumann architecture has a register (called the program counter) that points to the next executable instruction. Many computers have additional control registers for implementing various control capabilities. Often some or all of the internal flags are combined into a flag or status register.

accumulators

    Accumulators are registers that can be used for arithmetic, logical, shift, rotate, or other similar operations. The first computers typically only had one accumulator. Many times there were related special purpose registers that contained the source data for an accumulator. Accumulators were replaced with data registers and general purpose registers. Accumulators reappeared in the first microprocessors.

data registers

    Data registers are used for temporary scratch storage of data, as well as for data manipulations (arithmetic, logic, etc.). In some processors, all data registers act in the same manner, while in other processors different operations are performed are specific registers.

address registers

    Address registers store the addresses of specific memory locations. Often many integer and logic operations can be performed on address registers directly (to allow for computation of addresses).

    Sometimes the contents of address register(s) are combined with other special purpose registers to compute the actual physical address. This allows for the hardware implementation of dynamic memory pages, virtual memory, and protected memory.

    The number of bits of an address register (possibly combined with information from other registers) limits the maximum amount of addressable memory. A 16-bit address register can address 64K of physical memory. A 24-bit address register can address address 16 MB of physical memory. A 32-bit address register can address 4 GB of physical memory. A 64-bit address register can address 1.8446744 x 1019 of physical memory. Addresses are always unsigned binary numbers.

general purpose registers

    General purpose registers can be used as either data or address registers.

constant registers

    Constant registers are special read-only registers that store a constant. Attempts to write to a constant register are illegal or ignored. In some RISC processors, constant registers are used to store commonly used values (such as zero, one, or negative one) — for example, a constant register containing zero can be used in register to register data moves, providing the equivalent of a clear instruction without adding one to the instruction set. Constant registers are also often used in floating point units to provide such value as pi or e with additional hidden bits for greater accuracy in computations.

floating point registers

    Floating point registers are special registers set aside for floating point math.

index registers

    Index registers are used to provide more flexibility in addressing modes, allowing the programmer to create a memory address by combining the contents of an address register with the contents of an index register (with displacements, increments, decrements, and other options). In some processors, there are specific index registers (or just one index register) that can only be used only for that purpose. In some processors, any data register, address register, or general register (or some combination of the three) can be used as an index register.

base registers

    Base registers or segment registers are used to segment memory. Effective addresses are computed by adding the contents of the base or segment register to the rest of the effective address computation. In some processors, any register can serve as a base register. In some processors, there are specific base or segment registers (one or more) that can only be used for that purpose. In some processors with multiple base or segment registers, each base or segment register is used for different kinds of memory accesses (such as a segment register for data accesses and a different segment register for program accesses).

control registers

    Control registers control some aspect of processor operation. The most universal control register is the program counter.

program counter

    Almost every digital computer ever made uses a program counter. The program counter points to the memory location that stores the next executable instruction. Branching is implemented by making changes to the program counter. Some processor designs allow software to directly change the program counter, but usually software only indirectly changes the program counter (for example, a JUMP instruction will insert the operand into the program counter). An assembler has a location counter, which is an internal pointer to the address (first byte) of the next location in storage (for instructions, data areas, constants, etc.) while the source code is being converted into object code.

    The VAX uses the 16th of 16 general purpose registers as the program counter (PC). Almost the entire instruction set can directly manipulate the program counter, allowing a very rich set of possible kinds of branching.

    The program counter in System/360 and 370 machines is contained in bits 40-63 of the program status word (PSW), which is directly accessible by some instructions.

processor flags

    Processor flags store information about specific processor functions. The processor flags are usually kept in a flag register or a general status register. This can include result flags that record the results of certain kinds of testing, information about data that is moved, certain kinds of information about the results of compations or transformations, and information about some processor states. Closely related and often stored in the same processor word or status register (although often in a privileged portion) are control flags that control processor actions or processor states or the actions of certain instructions.

    A few typical result flags (with processors that include them):

    Some conditions are determined by combining multiple flags. For example, if a processor has a negative flag and a zero flag, the equivalent of a positive flag is the case of both the negative and zero flags both simultaneously being cleared.

    A few typical control flags (with processors that include them):

stack pointer

    Stack pointers are used to implement a processor stack in memory. In many processors, address registers can be used as generic data stack pointers and queue pointers. A specific stack pointer or address register may be hardwired for certain instructions. The most common use is to store return addresses, processor state information, and temporary variables for subroutines.

subroutine return pointer

    Some RISC processors include a special subroutine return pointer rather than using a stack in memory. The return address for subroutine calls is stored in this register rather than in memory. More than one level of subroutine calls requires storing and saving the contents of this register to and from memory.


Basics of computer memory

    Summary: Memory systems in computers. Access to memory (for storage of programs and data) is one of the most basic and lowest level activities of an operating system.

memory hardware issues

main storage

    Main storage is also called memory or internal memory (to distinguish from external memory, such as hard drives). An older term is working storage.

    Main storage is fast (at least a thousand times faster than external storage, such as hard drives). Main storage (with a few rare exceptions) is volatile, the stored information being lost when power is turned off.

    All data and instructions (programs) must be loaded into main storage for the computer processor.

    RAM is Random Access Memory, and is the basic kind of internal memory. RAM is called “random access” because the processor or computer can access any location in memory (as contrasted with sequential access devices, which must be accessed in order). RAM has been made from reed relays, transistors, integrated circuits, magnetic core, or anything that can hold and store binary values (one/zero, plus/minus, open/close, positive/negative, high/low, etc.). Most modern RAM is made from integrated circuits. At one time the most common kind of memory in mainframes was magnetic core, so many older programmers will refer to main memory as core memory even when the RAM is made from more modern technology. Static RAM is called static because it will continue to hold and store information even when power is removed. Magnetic core and reed relays are examples of static memory. Dynamic RAM is called dynamic because it loses all data when power is removed. Transistors and integrated circuits are examples of dynamic memory. It is possible to have battery back up for devices that are normally dynamic to turn them into static memory.

    ROM is Read Only Memory (it is also random access, but only for reads). ROM is typically used to store thigns that will never change for the life of the computer, such as low level portions of an operating system. Some processors (or variations within processor families) might have RAM and/or ROM built into the same chip as the processor (normally used for processors used in standalone devices, such as arcade video games, ATMs, microwave ovens, car ignition systems, etc.). EPROM is Erasable Programmable Read Only Memory, a special kind of ROM that can be erased and reprogrammed with specialized equipment (but not by the processor it is connected to). EPROMs allow makers of industrial devices (and other similar equipment) to have the benefits of ROM, yet also allow for updating or upgrading the software without having to buy new ROM and throw out the old (the EPROMs are collected, erased and rewritten centrally, then placed back into the machines).

    Registers and flags are a special kind of memory that exists inside a processor. Typically a processor will have several internal registers that are much faster than main memory. These registers usually have specialized capabilities for arithmetic, logic, and other operations. Registers are usually fairly small (8, 16, 32, or 64 bits for integer data, address, and control registers; 32, 64, 96, or 128 bits for floating point registers). Some processors separate integer data and address registers, while other processors have general purpose registers that can be used for both data and address purposes. A processor will typically have one to 32 data or general purpose registers (processors with separate data and address registers typically split the register set in half). Many processors have special floating point registers (and some processors have general purpose registers that can be used for either integer or floating point arithmetic). Flags are single bit memory used for testing, comparison, and conditional operations (especially conditional branching). For a much more detailed look at registers, see the chapter on registers.

external storage

    External storage is any storage other than main memory. In modern times this is mostly hard drives and removeable media (such as floppy disks, Zip disks, optical media, etc.). With the advent of USB and FireWire hard drives, the line between permanent hard drives and removeable media is blurred. Other kinds of external storage include tape drives, drum drives, paper tape, and punched cards. Random access or indexed access devices (such as hard drives, removeable media, and drum drives) provide an extension of memory (although usually accessed through logical file systems). Sequential access devices (such as tape drives, paper tape punch/readers, or dumb terminals) provide for off-line storage of large amounts of information (or back ups of data) and are often called I/O devices (for input/output).

buffers

    Buffers are areas in main memory that are used to store data (or instructions) being transferred to or from external memory.

basic memory software approaches

static and dynamic approaches

    There are two basic approaches to memory usage: static and dynamic.

    Static memory approaches assume that the addresses don’t change. This may be a virtual memory illusion, or may be the actual physical layout. The static memory allocation may be through absolute addresses or through PC relative addresses (to allow for relocatable, reentrant, and/or recursive software), but in either case, the compiler or assembler generates a set of addresses that can not change for the life of a program or process.

    Dynamic memory approaches assume that the addresses can change (although change is often limited to predefined possible conditions). The two most common dynamic approaches are the use of stack frames and the use of pointers or handlers. Stack frames are used primarily for temporary data (such as fucntion or subroutine variables or loop counters). Handles and pointers are used for keeping track of dynamically allocated blocks of memory.

absolute addressing

    To look at memory use by programs and operating systems, let’s first examine the more simple problem of a single program with complete control of the computer (such as in a small-scale embedded system or the earliest days of computing).

    The most basic form of memory access is absolute addressing, in which the program explicitely names the address that is going to be used. An address is a numeric label for a specific location in memory. The numbering system is usually in bytes and always starts counting with zero. The first byte of physical memory is at address 0, the second byte of physical memory is at address 1, the third byte of physical memory is at address 2, etc. Some processors use word addressing rather than byte addressing. The theoretical maximum address is determined by the address size of a processor (a 16 bit address space is limited to no more than 65536 memory locations, a 32 bit address space is limited to approximately 4 GB of memory locations). The actual maximum is limited to the amount of RAM (and ROM) physically installed in the computer.

    A programmer assigns specific absolute addresses for data structures and program routines. These absolute addresses might be assigned arbitrarily or might have to match specific locations expected by an operating system. In practice, the assembler or complier determines the absolute addresses through an orderly predictable assignment scheme (with the ability for the programmer to override the compiler’s scheme to assign specific operating system mandated addresses).

    This simple approach takes advantage of the fact that the compiler or assembler can predict the exact absolute addresses of every program instruction or routine and every data structure or data element. For almost every processor, absolute addresses are the fastest form of memory addressing. The use of absolute addresses makes programs run faster and greatly simplifies the task of compiling or assembling a program.

    Some hardware instructions or operations rely on fixed absolute addresses. For example, when a processor is first turned on, where does it start? Most processors have a specific address that is used as the address of the first instruction run when the processer is first powered on. Some processors provide a method for the start address to be changed for future start-ups. Sometimes this is done by storing the start address internally (with some method for software or external hardware to change this value). For example, on power up the Motorola 680x0, the processor loads the interrupt stack pointer with the longword value located at address 000 hex, loads the program counter with the longword value located at address 004 hex, then starts execution at the frshly loaded program counter location. Sometimes this is done by reading the start address from a data line (or other external input) at power-up (and in this case, there is usually fixed external hardware that always generates the same pre-assigned start address).

    Another common example of hardware related absolute addressing is the handling of traps, exceptions, and interrupts. A processor often has specific memory addresses set aside for specific kinds of traps, exceptions, and interrupts. Using a specific example, a divide by zero exception on the Motorola 680x0 produces an exception vector number 5, with the address of the exception handler being fetched by the hardware from memory address 014 hex.

    Some simple microprocessor operating systems relied heavily on absolute addressing. An example would be the MS-DOS expectation that the start of a program would always be located at absolute memory address x100h (hexadecimal 100, or decimal 256). A typical compiler or assembler directive for this would be the ORG directive (for “origin”).

    The key disadvantage of absolute addressing is that multiple programs clash with each other, competing to use the same absolute memory locations for (possibly different) purposes.

overlay

    So, how do you implement multiple programs on an operating system using absolute addresses? Or, for early computers, how do you implement a program that is larger than available RAM (especially at a time when processors rarely had more than 1k, 2k, or 4k of RAM? The earliest answer was overlay systems.

    With an overlay system, each program or program segment is loaded into the exact same space in memory. An overlay handler exists in another area of memory and is responsible for swapping overlay pages or overlay segments (both are the same thing, but different operating systems used different terminology). When a overlay segment completes its work or needs to access a routine in another overlay segment, it signals the overlay handler, which then swaps out the old program segment and swaps in the next program segment.

    An overlay handler doesn’t take much memory. Typically, the memory space that contained the overlay handler was also padded out with additional routines. These might include key device drivers, interrupt handlers, exception handlers, and small commonly used routines shared by many programs (to save time instead of continual swapping of the small commonly used routines).

    In early systems, all data was global, meaning that it was shared by and available for both read and writes by any running program (in modern times, global almost always means available to a single entire program, no longer meaning available to all software on a computer). A section of memory was set aside for shared system variables, device driver variables, and interrupt handler variables. An additional area would be set aside as “scratch” or temporary data. The temporary data area would be available for individual programs. Because the earliest operating systems were batch systems, only one program other than the operating system would be running at any one time, so it could use the scratch RAM any way it wanted, saving any long term data to files.

relocatable software

    As computer science advance, hardware started to have support for relocatable programs and data. This would allow an operating system to load a program anywhere convenient in memory (including a different location each time the program was loaded). This was a necessary step for the jump to interactive operating systems, but was also useful in early batch systems to allow for multiple overlay segments.

demand paging and swapping

    Overlay systems were superceded by demand paging and swapping systems. In a swapping system, the operating system swaps out an entire program and its data (and any other context information).

    In a swapping system, instead of having programs explicitely request overlays, programs were divided into pages. The operating system would load a program’s starting page and start it running. When a program needed to access a data page or program page not currently in main memory, the hardware would generate a page fault, and the operating system would fetch the requested page from external storage. When all available pages were filled, the operating system would use one of many schemes for figuring out which page to delete from memory to make room for the new page (and if it was a data page that had any changes, the operating system would have to store a temporary copy of the data page). The question of how to decide which page to delete is one of the major problems facing operating system designers.

program counter relative

    One approach for making programs relocatable is program counter relative addressing. Instead of branching using absolute addresses, branches (including subroutine calls, jumps, and other kinds of branching) were based on a relative distance from the current program counter (which points to the address of the currently executing instruction). With PC relative addreses, the program can be loaded anywhere in memory and still work correctly. The location of routines, subroutines, functions, and constant data can be determined by the positive or negative distance from the current instruction.

    Program counter relative addressing can also be used for determining the address of variables, but then data and code get mixed in the same page or segment. At a minimum, mixing data and code in the same segment is bad programming practice, and in most cases it clashes with more sophisticated hardware systems (such as protected memory).

base pointers

    Base pointers (sometimes called segment pointers or page pointers) are special hardware registers that point to the start (or base) of a particular page or segment of memory. Programs can then use an absolute address within a page and either explicitly add the absolute address to the contents of a base pointer or rely on the hardware to add the two together to form the actual effective address of the memory access. Which method was used would depend on the processor capabilities and the operatign system design. Hiding the base pointer from the application program both made the program easier to compile and allowed for the operating system to implement program isolation, data/code isolation, protected memory, and other sophisticated services.

    As an example, the Intel 80x86 processor has a code segment pointer, a data segment pointer, a stack segment pointer, and an extra segment pointer. When a program is loaded into memory, an operating system running on the Intel 80x86 sets the segment pointers with the beginning of the pages assigned for each purpose for that particular program. If a program is swapped out, when it gets swapped back in, the operating system sets the segment pointers to the new memory locations for each segment. The program continues to run, without being aware that it has been moved in memory.

indirection, pointers, and handles

    A method for making data relocatable is to use indirection. Instead of hard coding an absolute memory address for a variable or data structure, the program uses a pointer that gives the memory address of the variable or data structure. Many processors have address pointer registers and a variety of indirect addressing modes available for software.

    In the most simple use of address pointers, software generates the effective address for the pointer just before it is used. Pointers can also be stored, but then the data can’t be moved (unless there is additional hardware support, such as virtual memory or base/segment pointers).

    Closely related to pointers are handles. Handles are two levels of indirection, or a pointer to a pointer. Instead of the program keeping track of an address pointer to a block of memory that can’t be moved, the program keeps track of a pointer to a pointer. Now, the operating system or the application program can move the underlying block of data. As long as the program uses the handle instead of the pointer, the operating system can freely move the data block and update the pointer, and everything will continue to resolve correctly.

    Because it is faster to use pointers than handles, it is common for software to convert a handle into a pointer and use the pointer for data accesses. If this is done, there must be some mechanism to make sure that the data block doesn’t move while the program is using the pointer. As an example, the Macintosh uses a system where data blocks can only be moved at specific known times, and an application program can rely on pointers derived from handles remaining valid between those known, specified times.

stack frames

    Stack frames are a method for generating temporary variables, especially for subroutines, functions, and loops. An are of memory is temporarily allocated on the system or process stack. In a simple version, the variables in the stack frame are accessed by using the stack pointer and an offset to point to the actual location in memory. This simple approach has the problem that there are many hardware instructions that change the stack pointer. The more sophisticated and stable approach is to have a second pointer called a frame pointer. The frame pointer can be set up in software using any address register. Many modern processors also have specific hardware instructions that allocate the stack frame and set up the frame pointer at the same time. Some processors have a specific hardware frame pointer register.

virtual memory

    Virtual memory is a technique in which each process generates addresses as if it had sole access to the entire logical address space of the processor, but in reality memory management hardware remaps the logical addresses into actual physical addresses in physical address space. The DEC VAX-11 gets it name from this technique, VAX standing for Virtual Address eXtension.

    Virtual memory can go beyond just remapping logical addresses into physical addresses. Many virtual memory systems also include software for page or segment swapping, shuffling portions of a program to and from a hard disk, to give the software the impression of having much more RAM than is actually installed on the computer.

OS memory services

    Operating systems offer some kind of mechanism for (both system and user) software to access memory.

    In the most simple approach, the entire memory of the computer is turned over to programs. This approach is most common in single tasking systems (only one program running at a time). Even in this approach, there often will be certain portions of memory designated for certain purposes (such as low memory variables, areas for operating system routines, memory mapped hardware, video RAM, etc.).

    With hardware support for virtual memory, operating systems can give programs the illusion of having the entire memory to themselves (or even give the illusion there is more memory than there actually is, using disk space to provide the extra “memory”), when in reality the operating system is continually moving programs around in memory and dynamically assigning physical memory as needed. Even with this approach, it is possible that some virtual memory locations are mapped to their actual physical addresses (such as for access to low memory variables, video RAM, or similar areas).

    The task of dividing up the available memory space in both of these approaches is left to the programmer and the compiler. Many modern languages (including C and C++) have service routines for allocating and deallocating blocks of memory.

    Some operating systems go beyond basic flat mapping of memory and provide operating system routines for allocating and deallocating memory. The Macintosh, for example, has two heaps (a system heap and an application heap) and an entire set of operating system routines for allocating, deallocating, and managing blocks of memory. The NeXT goes even further and creates an object oriented set of services for memory management.

    With hardware support for segments or demand paging, some operating systems (such as MVS and OS/2) provide operating system routines for programs to manage segments or pages of memory.

    Memory maps (not to be confused with memory mapped I/O) are diagrams or charts that show how an operating system divides up main memory.

    Low memory is the memory at the beginning of the address space. Some processors use designated low memory addresses during power on, exception processing, interrupt processing, and other hardware conditions. Some operating systems use designated low memory addresses for global system variables, global system structures, jump tables, and other system purposes.


address space and addressing modes

    This chapter examines addressing modes in assembly language. Specific examples of addressing modes from various processors are used to illustrate the general nature of assembly language.

address space

    Address space is the maximum amount of memory that a processor can address. Some processors use a multi-level addressing scheme, with main memory divided into segments or pages and some or all instructions mapping into the current segment(s) or page(s).

address modes

    The basic addressing modes are: register direct, moving date to or from a specific register; register indirect, using a register as a pointer to memory; program counter-based, using the program counter as a reference point in memory; absolute, in which the memory addressis contained in the instruction; and immediate, in which the data is contained in the instruction. Some instructions will have an inherent or implicit address (usually a specific register or the memory contents pointed to by a specific register) that is implied by the instruction without explicit declaration.

    One approach to processors places an emphasis on flexibility of addressing modes. Some engineers and programmers believe that the real power of a processor lies in its addressing modes. Most addressing modes can be created by combining two or more basic addressing modes, although building the combination in software will usually take more time than if the combination addressing mode existed in hardware (although there is a trade-off that slows down all operations to allow for more complexity).

    In a purely othogonal instruction set, every addressing mode would be available for every instruction. In practice, this isn’t the case.

    Virtual memory, memory pages, and other hardware mapping methods may be layered on top of the addressing modes.

absolute address

    In absolute address mode, the effective address in memory is part of the instruction. Some processors have full and short versions of absolute addressing (with short versions only pointing to a limited area in memory, normally starting at memory location zero). Unless overridden by hardware for virtual memory mapping, programs that use this address mode can not be moved in memory.

    The most basic form of memory access is absolute addressing, in which the program explicitely names the address that is going to be used. An address is a numeric label for a specific location in memory. The numbering system is usually in bytes and always starts counting with zero. The first byte of physical memory is at address 0, the second byte of physical memory is at address 1, the third byte of physical memory is at address 2, etc. Some processors use word addressing rather than byte addressing. The theoretical maximum address is determined by the address size of a processor (a 16 bit address space is limited to no more than 65536 memory locations, a 32 bit address space is limited to approximately 4 GB of memory locations). The actual maximum is limited to the amount of RAM (and ROM) physically installed in the computer.

    A programmer assigns specific absolute addresses for data structures and program routines. These absolute addresses might be assigned arbitrarily or might have to match specific locations expected by an operating system. In practice, the assembler or complier determines the absolute addresses through an orderly predictable assignment scheme (with the ability for the programmer to override the compiler’s scheme to assign specific operating system mandated addresses).

    This simple approach takes advantage of the fact that the compiler or assembler can predict the exact absolute addresses of every program instruction or routine and every data structure or data element. For almost every processor, absolute addresses are the fastest form of memory addressing. The use of absolute addresses makes programs run faster and greatly simplifies the task of compiling or assembling a program.

    Some hardware instructions or operations rely on fixed absolute addresses. For example, when a processor is first turned on, where does it start? Most processors have a specific address that is used as the address of the first instruction run when the processer is first powered on. Some processors provide a method for the start address to be changed for future start-ups. Sometimes this is done by storing the start address internally (with some method for software or external hardware to change this value). For example, on power up the Motorola 680x0, the processor loads the interrupt stack pointer with the longword value located at address 000 hex, loads the program counter with the longword value located at address 004 hex, then starts execution at the frshly loaded program counter location. Sometimes this is done by reading the start address from a data line (or other external input) at power-up (and in this case, there is usually fixed external hardware that always generates the same pre-assigned start address).

    Another common example of hardware related absolute addressing is the handling of traps, exceptions, and interrupts. A processor often has specific memory addresses set aside for specific kinds of traps, exceptions, and interrupts. Using a specific example, a divide by zero exception on the Motorola 680x0 produces an exception vector number 5, with the address of the exception handler being fetched by the hardware from memory address 014 hex.

    Some simple microprocessor operating systems relied heavily on absolute addressing. An example would be the MS-DOS expectation that the start of a program would always be located at absolute memory address x100h (hexadecimal 100, or decimal 256). A typical compiler or assembler directive for this would be the ORG directive (for “origin”).

    The key disadvantage of absolute addressing is that multiple programs clash with each other (expecting to use the same absolute memory locations for different and competing purposes).

immediate data

    In immediate data address mode, the actual data is stored in the instruction. The sizes allowed for immediate data vary by processor and often by instruction (with some instructions having specific implied sizes).

inherent address

    Many instructions will have one or more inherent or implicit addresses. These are addresses that are implied by the instruction rather than explicitly stated. The two most common forms of inherent address are either a specific register or a memory location designated by the contents of a specific register.

register direct

    In register direct address mode, the source and/or destination is a register.

    Many processors distinguish between data and address register operations (note, in some cases a general purpose register can act as eeither an address or data register).

    In data register direct operations, flags are typically set or cleared. Data that is smaller than the register may be sign extended or zero filled to fill the entire register, or may be placed only in the portion of the register necessary for the size of the data, leaving the rest of the register unchanged.

    In register to register (RR) operations, data is transferred from one register to another register or an instruction uses a source and destination register.

    In address register direct operations, flags are not normally set or cleared. The address is usually sign extended to the full address size of the processor.

register indirect

    In register indirect address mode, the contents of the designated register are used as a pointer to memory. Variations of register indirect include the use of post- or pre- increment, post- or pre- decrement, and displacements.

    In address register indirect operations, the designated register is used as a pointer to memory.

    In address register indirect with postincrement operations, the designated register is used as a pointer to memory, and then the register is incremented by the size of the operation. This is useful for a loop where the same or similar operations are performed on consecutive locations in memory. This address mode can be combined with a complimentary predecrement mode for stack and queue operations.

    In address register indirect with predecrement operations, the designated register is decremented by the size of the operations, and then the designated register is used as a pointer to memory. This is useful for a loop where the same or similar operations are performed on consecutive locations in memory. This address mode can be combined with a complimentary postincrement mode for stack and queue operations.

    In address register indirect with preincrement operations, the designated register is incremented by the size of the operations, and then the designated register is used as a pointer to memory. This is useful for a loop where the same or similar operations are performed on consecutive locations in memory. This address mode can be combined with a complimentary postdecrement mode for stack and queue operations.

    In address register indirect with postdecrement operations, the designated register is used as a pointer to memory, and then the register is decremented by the size of the operation. This is useful for a loop where the same or similar operations are performed on consecutive locations in memory. This address mode can be combined with a complimentary preincrement mode for stack and queue operations.

    In address register indirect with displacement operations, the contents of the designated register are modified by adding or subtracting a dispacement integer, then used as a pointer to memory. The displacement integer is stored in the instruction, and if shorter than the length of a the processor’s address space (the normal case), sign-extended before addition (or subtraction).

base registers

    Base pointers (sometimes called segment pointers or page pointers) are special hardware registers that point to the start (or base) of a particular page or segment of memory. Programs can then use an absolute address within a page and either explicitly add the absolute address to the contents of a base pointer or rely on the hardware to add the two together to form the actual effective address of the memory access. Which method was used would depend on the processor capabilities and the operatign system design. Hiding the base pointer from the application program both made the program easier to compile and allowed for the operating system to implement program isolation, data/code isolation, protected memory, and other sophisticated services.

    As an example, the Intel 80x86 processor has a code segment pointer, a data segment pointer, a stack segment pointer, and an extra segment pointer. When a program is loaded into memory, an operating system running on the Intel 80x86 sets the segment pointers with the beginning of the pages assigned for each purpose for that particular program. If a program is swapped out, when it gets swapped back in, the operating system sets the segment pointers to the new memory locations for each segment. The program continues to run, without being aware that it has been moved in memory.

register indirect with index register

    In a register indirect with index register mode, two registers are added together to form the effective address of a pointer to memory. These are sometimes called the base register and index register. Many processors will have limits on which registers can be used for the base register and/or which registers can be used for the index register.

    In address/base register indirect with index register operations, the contents of the index register are added to the contents of the base address register to form an effective address in memory. Some processors allow for designating that less than the full size of the index register be used in the computation, with the designated low order portion of the index register being sign-extended for the effective address computation. Some processors require that a designated low order portion of the index register be used in the computation, with the designated low order portion of the index register being sign-extended for the effective address computation.

    In address/base register indirect with index register and displacement operations, the contents of the index register are added to the contents of the base address register and then an integer displacement is added or subtracted to form an effective address in memory. Some processors allow for designating that less than the full size of the index register be used in the computation, with the designated low order portion of the index register being sign-extended for the effective address computation. Some processors require that a designated low order portion of the index register be used in the computation, with the designated low order portion of the index register being sign-extended for the effective address computation. The integer displacement is stored in the instruction, and if shorter than the length of a the processor’s address space (the normal case), sign-extended before addition (or subtraction).

absolute address with index register

    In absolute address with index register operations, the contents of an index register are added to an absolute address to form an effective address in memory.

memory indirect

    In memory indirect address mode, a location in memory contains a value that is used as a pointer (with or without additional effective address computations) to another location in memory.

    In memory indirect postindexed operations, the processor calculates an intermediate memory address using a base register and a base displacement. The processor accesses the designated memory location, and adds the contents of the index register and an outer displacement to the memory value to yield the effective address. If either displacement and/or the index register is shorter than the length of a the processor’s address space (the normal case), each is sign-extended before addition (or subtraction). Base and outer displacements are stored in the instruction.

    In memory indirect preindexed operations, the processor calculates an intermediate memory address using a base register, a base displacement, and an index register. The processor accesses the designated memory location, and adds an outer displacement to the memory value to yield the effective address. If either displacement and/or the index register is shorter than the length of a the processor’s address space (the normal case), each is sign-extended before addition (or subtraction). Base and outer displacements are stored in the instruction.

program counter relative

    In program counter indirect addressing, the program counter is used as a reference for the effective address computation. This is most commonly used for short branching relative to the current program counter, allowing for object code that can be placed anywhere in memory.

    One approach for making programs relocatable is program counter relative addressing. Instead of branching using absolute addresses, branches (including subroutine calls, jumps, and other kinds of branching) were based on a relative distance from the current program counter (which points to the address of the currently executing instruction). With PC relative addreses, the program can be loaded anywhere in memory and still work correctly. The location of routines, subroutines, functions, and constant data can be determined by the positive or negative distance from the current instruction.

    Program counter relative addressing can also be used for determining the address of variables, but then data and code get mixed in the same page or segment. At a minimum, mixing data and code in the same segment is bad programming practice, and in most cases it clashes with more sophisticated hardware systems (such as protected memory).

    In program counter indirect with displacement operations, the effective address is the sum of the address in the program counter and the displacement integer stored in the instruction. If the displacement integer is shorter than the length of a the processor’s address space (the normal case), it is sign-extended before addition (or subtraction).

    In program counter indirect with index and displacement operations, the effective address is the sum of the address in the program counter, the contents of the index register, and the displacement integer stored in the instruction. If the displacement integer or designated portion of the index register is shorter than the length of a the processor’s address space (the normal case), each is sign-extended before addition (or subtraction).

    In program counter memory indirect postindexed operations, the processor calculates an intermediate indirect memory address by adding a base displacement to the contents of the program counter. The value accessed at this memory location is added to the scaled contents of the index register and the outer displacement to yield the effective address. If either the base or outer displacement integer or designated portion of the index register is shorter than the length of a the processor’s address space (the normal case), each is sign-extended before addition (or subtraction).

    In program counter memory indirect preindexed operations, the processor calculates an intermediate indirect memory address by adding a base displacement and scaled contents of an index register to the contents of the program counter. The value accessed at this memory location is added to the outer displacement to yield the effective address. If either the base or outer displacement integer or designated portion of the index register is shorter than the length of a the processor’s address space (the normal case), each is sign-extended before addition (or subtraction).


data movement

    This chapter examines data movement instructions in assembly language. Specific examples of instructions from various processors are used to illustrate the general nature of assembly language.

data movement

    Data movement instructions move data from one location to another. The source and destination locations are determined by the addressing modes, and can be registers or memory. Some processors have different instructions for loading registers and storing to memory, while other processors have a single instruction with flexible addressing modes. Data movement instructions generally have the greatest options for addressing modes. Data movement instructions typically come in a variety of sizes. Data movement instructions destroy the previous contents of the destination. Data movement instructions typically set and clear processor flags. When the destination is a register and the data is smaller than the full register size, the data might be placed only in the low order bits (leaving high order bits unchanged), or might be zero- or sign-extended to fill the entire register (some processors only use one choice, others permit the programmer to choose how this is handled). Register to register operations can usually have the same source and destination register.

    Earlier processors had different instructions and different names for different kinds of data movement, while most modern processors group data movement into a single symbolic name, with different kinds of data movement being indicated by address mode and size designation. A load instruction loads a register from memory. A store instruction stores the contents of a register into memory. A transfer instruction loads a register from another register. In processors that have separate names for different kinds of data moves, a memory to memory data move might be specially designated as a “move” instruction.

    An exchange instruction exchanges the contents of two registers, two memory locations, or a register and a memory location (although some processors only have register-register exchanges or other limitations).

    Some processors include versions of data movement instructions that can perform simple operations during the data move (such as compliment, negate, or absolute value).

    Some processors include instructions that can save (to memory) or restore (from memory) a block of registers at one time (useful for implementing subroutines).

    Some processors include instructions that can move a block of memory from one location to another at one time. If a processor includes string instructions, then there will usually be a string instruction that moves a string from one location in memory to another.

address movement

    Address movement instructions move addresses from one location to another. The source and destination locations are determined by the addressing modes, and can be registers or memory. Address movement instructions can come in a variety of sizes. Address movement instructions destroy the previous contents of the destination. Address movement instructions typically do not modify processor flags. When the destination is a register and the address is smaller than the full register size, the data might be placed only in the low order bits (leaving high order bits unchanged), or might be zero- or sign-extended to fill the entire register (some processors only use one choice, others permit the programmer to choose how this is handled).


basic assignment

    The assignment statement transfers data from a source to a destination. The destination is usually a variable. In most languages the source can be a constant, variable, function, or expression.

Ada

    target := source;

    Ada distinguishes between an assignment and an assignment statement.

    In Ada assignment is indicated by the character pair colon and equal := and an assignment statement is terminated with a semicolon ;.

    The EBNF definition of an Ada assignment statement is:

assignment_statement ::= variable_name := expression;

    Only one variable may be on the left side of an assignment statement. The value assigned must be the same type as the variable and within the legal range for that variable.


binary integer arithmetic

    This chapter examines integer arithmetic instructions in assembly language. Specific examples of instructions from various processors are used to illustrate the general nature of assembly language.

integer arithmetic

    See also binary integer data representations.

    For most processors, integer arithmetic is faster than floating point arithmetic. This can be reversed in special cases such digital signal processors.

    The basic four integer arithmetic operations are addition, subtraction, multiplication, and division. Arithmetic operations can be signed or unsigned (unsigned is useful for effective address computations). Some older processors don’t include hardware multiplication and division. Some processors don’t include actual multiplication or division hardware, instead looking up the answer in a massive table of results embedded in the processor.

    A specialized, but common, form of addition is an increment instruction, which adds one to the contents of a register or memory location. For address computations, “increment” may mean the addition of a constant other than one. Some processors have “short” or “quick” addition instructions that extend increment to include a small range of positive values.

    A specialized, but common, form of subtraction is an decrement instruction, which subtracts one from the contents of a register or memory location. For address computations, “decrement” may mean the subtraction of a constant other than one. Some processors have “short” or “quick” subtraction instructions that extend decrement to include a small range of values.

    Compare instructions are used to examine one or more integers non-destructively. These are usually implemented by performing a subtraction in some shadow register or accumulator and then setting flags accordingly. Compare instructions can compare two integers, or can compare a single integer to zero. Triadic compare instructions compare a test value to an upper and lower limit, which can be useful for bounds and range checking.

    Some processors have specific hardware support for large multi-byte integer arithmetic. Even if there is no specific support, generally carry and borrow flags can be used to implement software multi-byte arithmetic routines.

    Some processors have other special integer arithmetic operations. A clear instruction sets a register or memory location to zero. Some processors have special instructions for setting a register to a special value (such as pi) with additional guard bits also being set appropriately. A sign extend operation takes a small value and sign extends it to a larger storage format (such as byte to word). An arithmetic complement gives the arithmetic complement of a number (one’s complement). An arithmetic negate gives the arithmetic inverse of a number (subtract from zero; two’s complement).


floating point arithmetic

    This chapter examines floating point arithmetic instructions in assembly language. Specific examples of instructions from various processors are used to illustrate the general nature of assembly language.

floating point arithmetic

    See also floating point data representations.

    For most processors, integer arithmetic is faster than floating point arithmetic. This can be reversed in special cases such digital signal processors.

    On many processors, floating point arithmetic is in an optional unit or optional coprocessor rather than being included on the main processor. This allows the manufacturer to charge less for the business machines that don’t need floating point arithmetic.

    The basic four floating point arithmetic operations are addition, subtraction, multiplication, and division. Some processors don’t include actual multiplication or division hardware, instead looking up the answer in a massive table of results embedded in the processor.

    Compare instructions are used to examine one or more floating point numbers non-destructively. These are usually implemented by performing a subtraction in some shadow register or accumulator and then setting flags accordingly. Compare instructions can compare two floating point numbers, or can compare a single floating point number to zero.


binary coded decimal arithmetic

    This chapter examines binary coded decimal (BCD) instructions in assembly language. Specific examples of instructions from various processors are used to illustrate the general nature of assembly language.

binary coded decimals

    Binary coded decimal (BCD) is a method for implementing lossless decimal arithmetic (including decimal fractions) on a binary computer. The most obvious uses involve money amounts where round-off error from using binary approximations is unacceptable. Some early computers used BCD exclusively.

    Decimal digits (0-9) can be encoded in a nibble (half a byte), with some left over bit patterns (hexadecimal A-F). In BCD operations, the processor performs ordinary binary computations, then adjusts the result to conform to BCD. For example, if you add the binary number 5 (bit pattern 0101) to binary number 6 (bit pattern 0110), you get the binary result of 11 (bit pattern 1011, or hexadecimal B). With BCD arithmetic, the processor would adjust the result to make it into a valid BCD result (which in this case would be bit pattern 0001 0001).

    BCD arithmetic includes BCD addition, BCD subtraction, BCD multiplication, BCD division, and BCD negate.

    The Intel 80x86 series uses a two step approach for BCD arithmetic. Instead of having separate BCD instructions, the normal binary addition and subtraction instructions are used, then hardware instructions are used to adjust the results to correct BCD results. There are instuctions for both packed and unpacked adjustments. The advantage of this approach is greater flexibility (more addressing modes and choices of arithmetic operations because of the use of regular binary integer instructions in the first step). The disadvantage of this approach is that it is slower and takes more memory.

    Pack (Motorola 680x0) converts byte encoded numeric data (such as ASCII or EBCDIC characters) into binary coded decimals. Unpack (Motorola 680x0) converts binary coded decimals into byte encoded numeric data (such as ASCII or EBCDIC characters). The ASCII adjustment field is $3030; the EBCDIC adjustment field is $F0F0.


advanced math operations

    This chapter examines advanced mathematics instructions in assembly language. Specific examples of instructions from various processors are used to illustrate the general nature of assembly language.

advanced math operations


data conversion

    This chapter examines data conversion instructions in assembly language. Specific examples of instructions from various processors are used to illustrate the general nature of assembly language.

data conversion

    Data conversion instructions change data from one format to another.

    A sign extension operation takes a small value and sign extends it to a larger storage format (such as byte to word).

    A type conversion operation changes data from one format to another (such as signed two’s complement integer into binary coded decimal).


logical operations

    This chapter examines logical instructions in assembly language. Specific examples of instructions from various processors are used to illustrate the general nature of assembly language.

logical

    Logical instructions typically work on a bit by bit basis, although some processors use the entire contents of the operands as whole flags (zero or not zero input, zero or negative one output). Typical logical operations include logical negation or logical complement (NOT), logical and (AND), logical inclusive or (OR or IOR), and logical exclusive or (XOR or EOR). Logical tests are a comparison of a value to a bit string (or operand treated as a bit string) of all zeros. Some processors have an instruction that sets or clears a bit or byte in registers or memory based on the processor condition codes.


shift and rotate instructions

    This chapter examines shift and rotate instructions in assembly language. Specific examples of instructions from various processors are used to illustrate the general nature of assembly language.

shift and rotate

    Shift and rotate instructions move bit strings (or operand treated as a bit string).

    Shift instructions move a bit string (or operand treated as a bit string) to the right or left, with excess bits discarded (although one or more bits might be preserved in flags). In arithmetic shift left or logical shift left zeros are shifted into the low-order bit. In arithmetic shift right the sign bit (most significant bit) is shifted into the high-order bit. In logical shift right zeros are shifted into the high-order bit.

    Rotate instructions are similar to shift instructions, ecept that rotate instructions are circular, with the bits shifted out one end returning on the other end. Rotates can be to the left or right. Rotates can also employ an extend bit for multi-precision rotates.

    A swap instruction swaps the high and low order portions of a register or contents of a series of memory locations.

    The carry bit typically receives the last bit shifted out of the operand. Sometimes an extend bit will receive the last bit shifted out also. Somtimes an overflow bit is used to indicate a sign change has occurred.


bit and bit field manipulation

    This chapter examines bit and bit string instructions in assembly language. Specific examples of instructions from various processors are used to illustrate the general nature of assembly language.

bit manipulation

    Bit manipulation instructions manipulate a specific bit of a bit string (or operand treated as a bit string). Bit clear changes the specified bit to zero. Bit set changes the specified bit to one. Bit change modifies a specified bit, clearing a one bit to zero and setting a zero bit to one. In some processors, the value of the bit before modification is tested. Bit test examines the value of a specified bit.

    Bit scan instructions search a bit string for the first bit that is set or cleared (depending on the processor).

bit field

    Bit field instructions make modifications to bit fields (or operands treated as bit fields). Bit field insert inserts a value into a bit field. Bit field extract extracts a signed or unsigned value from a bit field. Bit field find first one finds the first bit that is set (one) in a bit field. Bit field test evaluates a bit field and sets or clears flags. Bit field test and setevaluates a bit field and set or clear flags then sets the bit field. Bit field test and clearevaluates a bit field and set or clear flags then clears the bit field. Bit field test and changeevaluates a bit field and set or clear flags then changes the bit field.


character and string operations

    This chapter examines string and character instructions in assembly language. Specific examples of instructions from various processors are used to illustrate the general nature of assembly language.

string and character operations


Character Codes

    Summary: This chapter has charts of various common computer character codes.

    ASCII American Standard Code for Information Interchange. Seven (7) bit code.

    EBCDIC Extended Binary Coded Decimal Interchange Code (used primarily on IBM mainframes). Eight (8) bit code.

    Punched Card Code used for Hollerith (punched) cards (punching positions given in decimal). Thirteen (13) bit code.

    International Morse Code The code originally developed for telegraph (Morse Code) had an equal number of bits per character. International Morse Code has a variable number of bits per character, with characters that occur more frequently having a smaller number of bits. Bits are represented by unipolar current pulses of long and short duration (usually 3:1, with pauses used to separate characters).

    International Cable Code International Cable Code is similar to International Morse Code, except that its bits are represented by bipolar current pulses of uniform duration (positive matching Morse’s short pulse and negative matching Morse’s long pulse). Also, there is typically a great deal of variation in punctuation codes.

    HTML metacharacters HTML has both numeric codes and name codes (metacharacters) for producing printable characters. HTML name codes are case sensitive. Older browsers typically don’t support very many name codes (that is, numeric codes are more likely to work in more browsers). Some of the new metacharacters only have name codes (no numeric code). Not all metacharacters work on all platforms or with all fonts (the widest support is on the Macintosh and Windows, which are each slightly different). The universally supported name codes are: &quot;, &num;, &amp;, &lt;, and &gt; (which are also essential as HTML escape metacharacters).

    NOTE: With some browsers, you may have to wait until the entire page is loaded before the following links will work.

Values are in hexadecimal or binary, except as otherwise noted

Note: Each table will not display until the entire table has been downloaded to your computer. Please be patient.

control codes

control code meaning punched
card
EBCDIC ASCII
ACK acknowledge 0-6-8-9 2E 06
BEL bell or alarm 0-7-8-9 2F 07
BS backspace 11-6-9 16 08
BYP bypass 0-4-9 24
CAN cancel 11-8-9 18 18
CC cursor control 11-2-8-9 1A
CR carriage return 12-5-8-9 0D 0D
CU1 customer use 1 11-3-8-9 1B
CU2 customer use 2 0-3-8-9 2B
CU3 customer use 3 3-8-9 3B
DC1 device control 1 11-1-9 11 11
DC2 device control 2 11-2-9 12 12
DC3 device control 3 13
DC4 device control 4 4-8-9 3C 14
DEL delete 12-7-9 07 7F
DLE data link escape 12-11-1-8-9 10 10
DS digit select 11-0-1-8-9 20
EM end of medium 11-1-8-9 19 19
ENQ enquiry 0-5-8-9 2D 05
EOT end of transmission 7-9 37 04
ESC escape 0-7-9 27 1B
ETB end of transmission block 0-6-9 26 17
ETX end of text 12-3-9 03 03
FF form feed 12-4-8-9 0C 0C
FS field separator 0-2-9 22
FS file separator 11-4-8-9 (1C) 1C
GS group separator 11-5-8-9 (1D) 1D
HT horizontal tabulation 12-5-9 05 09
IFS interchange file separator 11-4-8-9 1C (1C)
IGS interchange group separator 11-5-8-9 1D (1D)
IL idle 11-7-9 17
IRS interchange record separator 11-6-8-9 1E (1E)
IUS interchange unit separator 11-7-8-9 1F (1F)
LC lower case 12-6-9 06
LF line feed 0-5-9 25 0A
NAK negative acknowledge 5-8-9 3D 15
NL new line 11-5-9 15
NUL null 12-0-1-8-9 00 00
PF punch off 12-4-9 04
PN punch on 4-9 34
RES restore 11-4-9 14
RS reader stop 5-9 35
RS record separator 11-6-8-9 (1E) 1E
SI shift in 12-7-8-9 0F 0F
SM set mode 0-2-8-9 2A
SMM start of manual message 12-2-8-9 0A
SO shift out 12-6-8-9 0E 0E
SOH start of heading 12-1-9 01 01
SOS start of significance 0-1-9 21
SP space no punches 40 20
STX start of text 12-2-9 02 02
SUB substitute 7-8-9 3F 1A
SYN synchronous idle 2-9 32 16
TM tape mark 11-3-9 13
UC upper case 6-9 36
US unit separator 11-7-8-9 (1F) 1F
VT vertical tabulation 12-3-8-9 0B 0B

control codes for International Morse Code:

SOS . . .   _ _ _   . . .   Break _ . . . _ . _
Attention _ . _ . _   Understand . . . _ .
CQ _ . _ .   _ _ . _   Error . . . . . . . .
DE _ . .   .   OK . _ .
Go Ahead _ . _   End of Message . _ . _ .
Wait . _ . . .   End of Work . . . _ . _

decimal digits

digit Morse punched
card
BCD EBCDIC ASCII HTML
numeric
0 _ _ _ _ _ 0 F0 F0 30 &#48;
1 . _ _ _ _ 1 F1 F1 31 &#49;
2 . . _ _ _ 2 F2 F2 32 &#50;
3 . . . _ _ 3 F3 F3 33 &#51;
4 . . . . _ 4 F4 F4 34 &#52;
5 . . . . . 5 F5 F5 35 &#53;
6 _ . . . . 6 F6 F6 36 &#54;
7 _ _ . . . 7 F7 F7 37 &#55;
8 _ _ _ . . 8 F8 F8 38 &#56;
9 _ _ _ _ . 9 F9 F9 39 &#57;

Roman letters

letter Morse punched
card
BCD EBCDIC ASCII HTML
numeric
HTML
name
notes
A . _ 12-1 C1 C1 41 &#65;
À &#192; &Agrave; UPPERCASE A with accent grave
Á &#193; &Aacute; UPPERCASE A with accent acute
 &#194; &Acirc; UPPERCASE A with accent circumflex
à &#195; &Atilde; UPPERCASE A with tilde
Ä &#196; &Auml; UPPERCASE A with dieresis (umlaut)
Å &#197; &Aring; UPPERCASE A with ring
Æ &#198; &AElig; UPPERCASE AE diphthong (ligature)
B _ . . . 12-2 C2 C2 42 &#66;
C _ . _ . 12-3 C3 C3 43 &#67;
Ç &#199; &Ccedil; UPPERCASE C with cedilla
D _ . . 12-4 C4 C4 44 &#68;
&#208; &#208; &ETH; UPPERCASE ETH
E . 12-5 C5 C5 45 &#69;
È &#200; &Egrave; UPPERCASE E with accent grave
É &#201; &Eacute; UPPERCASE E with accent acute
Ê &#202; &Ecirc; UPPERCASE E with accent circumflex
Ë &#203; &Euml; UPPERCASE E with umlaut (dieresis)
F . . _ . 12-6 C6 C6 46 &#70;
G _ _ . 12-7 C7 C7 47 &#71;
H . . . . 12-8 C8 C8 48 &#72;
I . . 12-9 C9 C9 49 &#73;
Ì &#204; &Igrave; UPPERCASE I with accent grave
Í &#205; &Iacute; UPPERCASE I with accent acute
Î &#206; &Icirc; UPPERCASE I with accent circumflex
Ï &#207; &Iuml; UPPERCASE I with umlaut (dieresis)
J . _ _ _ 11-1 D1 D1 4A &#74;
K _ . _ 11-2 D2 D2 4B &#75;
L . _ . . 11-3 D3 D3 4C &#76;
M _ _ 11-4 D4 D4 4D &#77;
N _ . 11-5 D5 D5 4E &#78;
Ñ &#209; &Ntilde; UPPERCASE N with tilde
O _ _ _ 11-6 D6 D6 4F &#79;
Ò &#210; &Ograve; UPPERCASE O with accent grave
Ó &#211; &Oacute; UPPERCASE O with accent acute
Ô &#212; &Ocirc; UPPERCASE O with accent circumflex
Õ &#213; &Otilde; UPPERCASE O with tilde
Ö &#214; &Ouml; UPPERCASE O with umlaut (dieresis)
Ø &#216; &Oslash; UPPERCASE O with slash
Œ &#140; UPPERCASE OE diphthong
P . _ _ . 11-7 D7 D7 50 &#80;
Q _ _ . _ 11-8 D8 D8 51 &#81;
R . _ . 11-9 D9 D9 52 &#82;
S . . . 0-2 E2 E2 53 &#83;
ß &#223; &szlig; sharp or double s (sz ligature)
T _ 0-3 E3 E3 54 &#84;
&#223; &#223; &THORN; UPPERCASE Thorn
U . . _ 0-4 E4 E4 55 &#85;
Ù &#217; &Ugrave; UPPERCASE U with accent grave
Ú