GVSU CIS 343

The Main Tradeoff

Languages make many tradeoffs
In my opinion, the main tradeoff is between performance and expressiveness
- The best performing languages require you to frame the problem from the computer’s perspective
  - Machine code,
  - assembly,
  - C
- At every step you are (more or less) specifying exactly what the machine should do (this is called imperative programming)
  - Because you have such fine-grained control over the low-level operation, it is more straightforward to optimize.
- However, it would be much easier if we could simply describe what we want the computer to do using the techniques we use to describe problems to each other.
- When doing this, we then need to translate the program into machine instructions. The further we are from the machine language, the less efficient the translation is.
We initially started with machine language, then slowly increased abstraction:
- First languages were machine languages: 1s and 0s.
- Created assembly languages to represent the 1s and 0s in human-readable form.
- Each assembly language statement corresponded to one machine language statement.
- Still had to rephrase the problem in a hardware-centric way, but it was easier to type.
Next steps automatically converted formulas and more complex steps into assembly
- Fortran (FORmula TRANslation)
- C
At first only moved away from machine language as far as existing compiler technology would allow. Hardware was expensive, which limited how complex compilers could get.
Also, consider the following

for (int i = 0; i < size; ++i) {
  c[i] = a[i] + b[i];
}

vs.

for (int i = 0; i < size; i+= 2) {
  c[i] = a[i] + b[i];
  c[i+1] = a[i+1] + b[i+1]
}

vs.

int* end = c + size*sizeof(int)
while (c < end) {
  *c = *a + *b;
  ++a; ++b; ++c;
  *c = *a + *b;
  ++a; ++b; ++c;
    *c = *a + *b;
  ++a; ++b; ++c;
  *c = *a + *b;
  ++a; ++b; ++c;
    *c = *a + *b;
  ++a; ++b; ++c;
  *c = *a + *b;
  ++a; ++b; ++c;
}

Top code is easiest to read; but bottom code is faster. Over time compilers have improved to provide for “bottom” performance using “top” code.

Then, in the late 50s and early 60s, we had enough computing power to re-think languages from the perspective of the programmer:
- What types of things do programmers want that machine code / assembly language doesn’t provide?
  - variables with meaningful names. (Early languages limited the names you could choose)
  - loop constructs. (Initially done with branch / goto)
  - scope / encapsulation (The ability to re-use variable names and/or treat functions independently)
  - data structures (structs, records, unions, objects, etc.)
  - In what ways to do you conceive problem solutions differently from a computer?
Natural language is, of course, still science fiction, but people looked at ideas “in the middle” of the tradeoff spectrum
- Early innovations (scope, data structures) — helped programmers better organize / keep better track of what they were doing.
- Object-Oriented Programming (e.g., Smalltalk)
  - Allowed programmers to switch primary focus from algorithms to data.
  - Earliest programs were primarily scientific calculations and, therefore, very algorithm-centric
  - As computers became more mainstream, they were used more and more for business, which is more data-centric.
  - Inheritance is an important type of re-use.
- Functional Programming (e.g., LISP)
- Logic Programming (e.g., Prolog)
In my opinion, the #2 tradeoff is between writeability and reliability.
- Languages that are easier to write (think Perl, JavaScript) tend to have less strict syntax and/or type checking. However, this means that the compiler can detect fewer errors potentially leading to more run-time errors
From here on, most of what you see is
- different approaches for increasing expressiveness while maintaining reasonable performance and, to a lesser extent,
- approaches for writing code more succinctly without making the code hard to maintain/debug.

Other key features / tradeoffs

(In other words, what makes a language “good”?)

Limited / careful use of operator overloading
- How many different ways is * used in C?
  - multiplication
  - Pointer declaration
  - Pointer dereferencing
- How many different ways is & used in C++?
Names have a clear, obvious meaning
- What does static mean in C/C++?
- What does grep mean in the UNIX environment? Is there any way to guess this based on general computing knowledge?
Orthogonality vs. obscure behavior
- Ideally, keywords and constructs have the same behavior regardless of context
  - The C pointer can be applied to any data type
  - Any data type can be a return value in java, but you can’t return an array in C.
- However, you don’t want to do this at the cost of having obscure, strangely-defined behaviors.
- Poor overloading is one example of poor orthogonality.
Expressiveness vs. Simplicity
- In C/Java how many ways are can you think of to add 1 to count?
  - count = count + 1
  - count += 1
  - ++count
  - count++
- What is the tradeoff?
  - Allowing multiple options makes it easy / more concise to write, but potentially harder to read. (Person reading the code needs to know all the “tricks”)
  - Taken to the extreme, you can have “write only” code (like Perl)
- Balance: Can easily write what you want, but only need to know a few constructs to both read and write the code.
“Writeability” vs. Reliability
- Reliability refers to how likely you are to catch bugs.
- In general, the more errors the compiler / environment can catch, the sooner, quicker, and easier you can fix those errors.
- Many of these “catches” require type checking.
- Better type-checking generally requires being more explicit when writing code (think Java). Less type-checking allows for more “short-cuts” (think Ruby, JavaScript, and other scripting languages), but many mistakes then can’t be detected until run-time, where they are harder to precisely identify and fix.
- In general, the less you type, the harder it is for the compiler to catch mistakes for you.
Readability
- Limited number of operators (COBOL has hundreds)
  - Even if you don’t use them, you need to know to avoid them when choosing variables.
- Enough data types so you don’t have to do awkward things (like use ints for bools in C)
- Prohibits variable names that match special words (e.g., no variables named “while”)
- Which is better: Braces for blocks {} or should the begin and end be more distinct end if, end while etc?
- What about nice readable end if vs concise but goofy looking fi, od?
Readability / Writability tradeoff:

use v5.10;

while (<STDIN>) {
    chomp;
    say if (/z/)
}

vs.

use v5.10;
 
while ($_ = <STDIN>) {
   chomp $_;
   if ($_ =~ /MATCH/) {
      say $_;
   }
}

Support for Abstraction
- DRY: Don’t repeat yourself.
- How easily do languages let you re-use code and/or data structures?
- How can you re-use code / data structures across types without complex syntax to allow for type

Static vs. Dynamic typing

Static typing
- Compiler makes sure all operations are legal before the program begins.
  - e.g., complains if you call a bark method on a Cat object.
- Compiler can catch more errors, but
- Can make code more verbose.
  - Think about interfaces / generics in Java
Dynamic typing
- Validity of operations not checked until program is running.
- Cuts out annoying syntax, but
- Leads to more run-time errors. (e.g., pass an object to a “sort” method that doesn’t have a “compare” operation)

Compile vs. Interpret

Compile
- Main advantage: speed
- Tend to be statically-typed (thereby promoting reliability)
Pure interpreted
- Main disadvantage: 10x to 100x slower.
- Tend to be dynamically typed (thereby promoting writeability)
Hybrid
- Key example: Java