Don’t forget to Subscribe to support my work!
or donate using the donate button:
Introduction
This article is inspired by a question one of my students asked me when I taught programming in Python. The question, in which I have paraphrased: why do we have different programming languages? This article will aim to answer just that question along with standardisation, Backus-Naur Form (BNF) and Syntax Diagrams.
Why Have Different Programming Languages?
There are so many different programming languages – in fact too many to name them all. I have addressed in a previous article what you can do with the different programming languages along with how to teach programming but this article will focus primarily on why we have different languages.
One such reason is the evolution of technology. Technology changes all the time so, we need more tools that can make software for these technologies. The existing programming languages may be unable to deal with specific problems that arise because of the nature of the languages' capabilities. The issue might be so unique that no existing solutions that address their needs exist. People or companies decide to create a new language themselves as a result. There are two examples: C and C++ (Lagutin, 2021; Computer Hope, 2022).
C was created by Dennis Ritchie and Bell Labs in 1972 to overcome older languages' problems. They first used it to run the Unix operating system. Today, it can run on different kinds of software and hardware. That wasn’t enough. In 1985, Bjarne Stroustrup created C++ for two main reasons. The first reason was to upgrade the features of C such as classes and pointers. The second reason was to make it accessible for everyone to use in all fields (Lagutin, 2021).
Another reason there are so many programming languages is that there are different kinds of developer jobs out there requiring different tools. Think in terms of a chef in a kitchen. There are different kinds of chefs cooking different types of foods from pastries to Italian food, Indian food etc. A chef will likely specialise in one of these areas. Same is true for programmers. There are different kinds of software and platforms, each one requiring its own tools and features. Programmers can specialise just like chefs. Game developers use C++ or C# to make video games for PCs and consoles. Web developers use HTML, CSS, JavaScript, and PHP to make websites and web applications. Data scientists use Python, R, and MatLab to analyse data for scientific research and educational purposes. There are more programming languages out there (Lagutin, 2021).
The third reason there are so many programming languages is that not all of them will meet a developer or a company’s goals. Different developers have their own goals and some programming languages are better suited for certain types of tasks than others. Some developers want a fast and performant programming language. Go and C++ are two of those. These languages enable very granular control over system resources like memory and threads. Other developers want a programming language that can build the program in a few days. JavaScript is one. It is hard to find a more versatile language. You can use JavaScript everywhere from the backend to web and mobile apps. Some developers prefer a programming language for a specific task. In 2023 tons of data science gets delivered in Python (Lagutin, 2021).
Standardisation
Standardisation is the process of developing, promoting and in some cases mandating standards-based and compatible technologies and processes within a particular industry. Technology standards focus on ensuring quality, consistency, compatibility, interoperability and safety (Kirvan, 2023). We can apply this to computer languages whether programming or mark-up languages.
HTML and JavaScript standards are outlined and managed by the W3C (World Wide Web Consortium) founded by Tim Berners-Lee. Standards for programming languages are much more difficult to glue down. There are many complexities and differences between different programming languages. Each paradigm would require its own standard. Comparing Python and C++ is like comparing apples and oranges due to the differing syntax.
Below are the following programs in Python and C++. They both output “Hello World!” but the syntax rules are different.
In Python we have:
print("Hello World!")
In C++ we have:
#include <iostream>
using namespace std;
int main()
{
cout << "Hello World!" << endl;
}
The goal of standardisation is to avoid the ambiguities we get in natural languages like English, Spanish, Arabic, Chinese etc. Ambiguities can occur even in calculations such as 2 + 3 x 4.
To do the calculation 2 + 3 x 4 we can do it in two ways. One is the correct way, which is to do 3 multiplied by 4 and then we add on 2 to get 14. If we did the addition first or 2 + 3, that will be 5. Then if we multiply it by 4, that gives us 20. In Maths, we do the rules of BIDMAS (brackets, indices, division, multiplication, addition and subtraction). In computing we call this operator precedence. By using BIDMAS we can turn what was ambiguous to something which can only have a single interpretation.
Ambiguity in natural language means that a statement can be interpreted in more than one way. This can be difficult when dealing with day-to-day English, but for computers ambiguity is impossible to interpret. When defining programming languages it is essential that no ambiguity exist. If a computer, when interpreting a statement, is given two possible alternative paths then it will be unable to determine which one is correct.
IF = 5
THEN = 10
IF IF > THEN THEN PRINT 'Hello'
For example, you cannot use keywords as identifiers. The code above is ambiguous, although a human could make sense of it. The equals after it offers no further help, as we would have to decide if this was assignment or just an equality check. The compiler would be unable to decide on which set of syntax rules to apply and would have to ‘guess’, which leads to two possibilities, the guess is right or wrong. If it was right, the code would run as expected. If the guess was wrong then the code could produce bizarre logic errors and not behave as expected. There can never be a situation where coder and compiler infer different meanings from the same syntax. We need a whole article to talk about Natural Language Processing, which is an area of Computer Science and Artificial Intelligence (AI) that deals with words and their contexts.
Backus-Naur Form
Backus-Naur Form is a way of defining the syntax of a language using context free grammar. Syntax is the definition of what is allowed and the pattern of elements in a language (British Computer Society. Glossary Working Party, 2005). In most languages, ‘char’ has a defined format. What is this?
Something like: only one in length, can be a-z, A-Z, 0-9, @£$ …….
The point is that the computer needs to know what is allowed and what is not allowed.
In BNF there is a formal notation that must be used:
· : : = Means is defined by / consists of
· < > Means the syntactic item
· | Means OR
· { } Means optional
· ‘,’ Means AND
Let us have a look at an example:
<expression> ::= <term> | <expression> "+" <term>
<term> ::= <factor> | <term> "*" <factor>
<factor> ::= <constant> | <variable> | "(" <expression> ")“
<variable> ::= "x" | "y" | "z"
<constant> ::= <digit> | <digit> <constant>
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
In this case, we have the following:
· An expression is defined as containing a term on its own OR an expression followed by a + sign and then a term.
· A term is defined as containing a factor or a term, followed by a * sign and then a factor
· A factor is defined as containing a constant or a variable or an expression in brackets
· A variable is defined as being either an x, y or z.
· A constant is defined as containing either a digit or a digit followed by a constant
· A digit is defined as being a 0 or a 1 or a 2 or a 3 etc. all the way up to 9.
· According to the above definition of an expression, 12+6 would be a valid expression, as would 6+5*3, but not 6-3 or 20/4 as the – and / signs do not appear in the definition
This is the basic BNF. There is also recursion in BNF (Thomas, Surrall and Hamflett, 2017). We have two parts of the above definition, which might not make sense:
<term> ::= <factor> | <term> "*" <factor>
<constant> ::= <digit> | <digit> <constant>
This is a recursive definition. Recursion is when a function calls itself. They can repeat. We can use BNF to define the structure of a variable name in a programming language, for example. In VB.NET a variable cannot start with a number (but a number can appear elsewhere in the variable name) and it cannot contain any spaces at all:
<variable_name> ::= <letter> | <variable_name_part>
<variable_name_part> ::= <letter> | <digit> | "" <variable_name_part>
<letter> ::= "a" | "b" | "c" |…| "x" | "y" | "z" | "A" | "B" | "C" |…| "X" | "Y" | "Z“
<digit> ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
We have:
• A variable_name is defined as containing a letter first and then a variable_name_part
• A variable_name_part is defined containing a letter or a digit or an empty string. This is followed by potentially any number of letters, digits or empty strings, hence the recursive definition
• A letter is defined as being either a lower case a or b or c all the way up to x or y or z, OR an upper case A or B or C all the way up the X or Y or Z
• A digit is again defined as being a 0 or a 1 or a 2 or a 3 etc. all the way up to 9.
BNF comes up every year in the A2 specification for Computer Science Unit 3 so this is an important topic to study and revise. This applies to WJEC and AQA.
Syntax Diagrams
Alongside defining the syntax of a potential language using the BNF notation there is a formal, symbol based method.
Syntax Diagrams are an alternative way to represent a context-free grammar.
Let’s say we have an expression:
The ‘term’ rectangle shape represents a meta variable.
The rounded rectangle shape represents a literal value.
The arrows looping backwards represent a recursive definition – there could be any number of terms followed by + signs in this example.
Let us take the letter part of BNF:
<letter> ::= "a" | "b" | "c" |…| "x" | "y" | "z" | "A" | "B" | "C" |…| "X" | "Y" | "Z“
We do however, need to note that the symbols for LETTER (A, B, C, D) are not defined so are placed in circles as follows:
Now we can link WORD to LETTER symbolically:
Now we are not restricted to specifying how many letters can be included in a WORD. The syntax would allow the program to loop until it had reached the end of a WORD eg by meeting a space or full stop. We can create syntax diagrams for sentences and paragraphs but, for the sake of not making the article too long, we will leave it there but, syntax diagrams and BNF can be revisited in separate articles. This is mainly about why we have different programming languages and the importance of standardisation.
Reference List
British Computer Society. Glossary Working Party, (2005) The BCS Glossary of ICT and Computing Terms Eleventh Edition Harlow, Essex: Pearson Education Limited
Computer Hope (2022) ‘Why are there so many programming languages?’ 31st December Available at: https://www.computerhope.com/issues/ch000569.htm (Date Accessed: 05/07/2023)
Kirvan, P. (2023) ‘standardization’ Tech Target Available at: https://www.techtarget.com/whatis/definition/standardization#:~:text=Standardization%20is%20the%20process%20of,%2C%20compatibility%2C%20interoperability%20and%20safety. (Date Accessed: 05/07/2023)
Lagutin, V. (2021) ‘Why Are There So Many Programming Languages?’ Free Code Camp 14th September Available at: https://www.freecodecamp.org/news/why-are-there-so-many-programming-languages/#:~:text=Conclusion,it%20suitable%20for%20specific%20tasks. (Date Accessed: 05/07/2023)
Thomas, M. Surrall, A. Hamflett, A. (2017) A/AS Level Computer Science for WJEC/Eduqas Student Book Cambridge: Cambridge University Press