Software Engineering for Self-Directed Learners TIC2002 edition - 2019

This is a printer-friendly version. It omits exercises, optional topics (i.e., four-star topics), and other extra content such as learning outcomes.

SECTION: SOFTWARE ENGINEERING

Software Engineering

Introduction

Pros and Cons

Software Engineering: Software Engineering is the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software" -- IEEE Standard Glossary of Software Engineering Terminology

The following description of the Joys of the Programming Craft was taken from Chapter 1 of the famous book The Mythical Man-Month, by Frederick P. Brooks.

Why is programming fun? What delights may its practitioner expect as his reward?

First is the sheer joy of making things. As the child delights in his mud pie, so the adult enjoys building things, especially things of his own design. I think this delight must be an image of God's delight in making things, a delight shown in the distinctness and newness of each leaf and each snowflake.

Second is the pleasure of making things that are useful to other people. Deep within, we want others to use our work and to find it helpful. In this respect the programming system is not essentially different from the child's first clay pencil holder "for Daddy's office."

Third is the fascination of fashioning complex puzzle-like objects of interlocking moving parts and watching them work in subtle cycles, playing out the consequences of principles built in from the beginning. The programmed computer has all the fascination of the pinball machine or the jukebox mechanism, carried to the ultimate.

Fourth is the joy of always learning, which springs from the nonrepeating nature of the task. In one way or another the problem is ever new, and its solver learns something: sometimes practical, sometimes theoretical, and sometimes both.

Finally, there is the delight of working in such a tractable medium. The programmer, like the poet, works only slightly removed from pure thought-stuff. He builds his castles in the air, from air, creating by the exertion of the imagination. Few media of creation are so flexible, so easy to polish and rework, so readily capable of realizing grand conceptual structures....

Yet the program construct, unlike the poet's words, is real in the sense that it moves and works, producing visible outputs separate from the construct itself. It prints results, draws pictures, produces sounds, moves arms. The magic of myth and legend has come true in our time. One types the correct incantation on a keyboard, and a display screen comes to life, showing things that never were nor could be.

Programming then is fun because it gratifies creative longings built deep within us and delights sensibilities we have in common with all men.

Not all is delight, however, and knowing the inherent woes makes it easier to bear them when they appear.

First, one must perform perfectly. The computer resembles the magic of legend in this respect, too. If one character, one pause, of the incantation is not strictly in proper form, the magic doesn't work. Human beings are not accustomed to being perfect, and few areas of human activity demand it. Adjusting to the requirement for perfection is, I think, the most difficult part of learning to program.

Next, other people set one's objectives, provide one's resources, and furnish one's information. One rarely controls the circumstances of his work, or even its goal. In management terms, one's authority is not sufficient for his responsibility. It seems that in all fields, however, the jobs where things get done never have formal authority commensurate with responsibility. In practice, actual (as opposed to formal) authority is acquired from the very momentum of accomplishment.

The dependence upon others has a particular case that is especially painful for the system programmer. He depends upon other people's programs. These are often maldesigned, poorly implemented, incompletely delivered (no source code or test cases), and poorly documented. So he must spend hours studying and fixing things that in an ideal world would be complete, available, and usable.

The next woe is that designing grand concepts is fun; finding nitty little bugs is just work. With any creative activity come dreary hours of tedious, painstaking labor, and programming is no exception.

Next, one finds that debugging has a linear convergence, or worse, where one somehow expects a quadratic sort of approach to the end. So testing drags on and on, the last difficult bugs taking more time to find than the first.

The last woe, and sometimes the last straw, is that the product over which one has labored so long appears to be obsolete upon (or before) completion. Already colleagues and competitors are in hot pursuit of new and better ideas. Already the displacement of one's thought-child is not only conceived, but scheduled.

This always seems worse than it really is. The new and better product is generally not available when one completes his own; it is only talked about. It, too, will require months of development. The real tiger is never a match for the paper one, unless actual use is wanted. Then the virtues of reality have a satisfaction all their own.

Of course the technological base on which one builds is always advancing. As soon as one freezes a design, it becomes obsolete in terms of its concepts. But implementation of real products demands phasing and quantizing. The obsolescence of an implementation must be measured against other existing implementations, not against unrealized concepts. The challenge and the mission are to find real solutions to real problems on actual schedules with available resources.

This then is programming, both a tar pit in which many efforts have floundered and a creative activity with joys and woes all its own. For many, the joys far outweigh the woes....

[Text and book cover source: Wikipedia]

[Fred Brooks photo source]

The Mythical Man-Month: Essays on Software Engineering is a book on software engineering and project management by Fred Brooks, whose central theme is that "adding manpower to a late software project makes it later". This idea is known as Brooks's law, and is presented along with the second-system effect and advocacy of prototyping.

 

SECTION: PROGRAMMING PARADIGMS

Object-Oriented Programming

Introduction

What

Object-Oriented Programming (OOP) is a programming paradigm. A programming paradigm guides programmers to analyze programming problems, and structure programming solutions, in a specific way.

Programming languages have traditionally divided the world into two parts—data and operations on data. Data is static and immutable, except as the operations may change it. The procedures and functions that operate on data have no lasting state of their own; they’re useful only in their ability to affect data.

This division is, of course, grounded in the way computers work, so it’s not one that you can easily ignore or push aside. Like the equally pervasive distinctions between matter and energy and between nouns and verbs, it forms the background against which we work. At some point, all programmers—even object-oriented programmers—must lay out the data structures that their programs will use and define the functions that will act on the data.

With a procedural programming language like C, that’s about all there is to it. The language may offer various kinds of support for organizing data and functions, but it won’t divide the world any differently. Functions and data structures are the basic elements of design.

Object-oriented programming doesn’t so much dispute this view of the world as restructure it at a higher level. It groups operations and data into modular units called objects and lets you combine objects into structured networks to form a complete program. In an object-oriented programming language, objects and object interactions are the basic elements of design.

-- Object-Oriented Programming with Objective-C, Apple

Some other examples of programming paradigms are:

Paradigm Programming Languages
Procedural Programming paradigm C
Functional Programming paradigm F#, Haskel, Scala
Logic Programming paradigm Prolog

Some programming languages support multiple paradigms.

Java is primarily an OOP language but it supports limited forms of functional programming and it can be used to (although not recommended) write procedural code. e.g. se-edu/addressbook-level1

JavaScript and Python support functional, procedural, and OOP programming.

Objects

What

Every object has both state (data) and behavior (operations on data). In that, they’re not much different from ordinary physical objects. It’s easy to see how a mechanical device, such as a pocket watch or a piano, embodies both state and behavior. But almost anything that’s designed to do a job does, too. Even simple things with no moving parts such as an ordinary bottle combine state (how full the bottle is, whether or not it’s open, how warm its contents are) with behavior (the ability to dispense its contents at various flow rates, to be opened or closed, to withstand high or low temperatures).

It’s this resemblance to real things that gives objects much of their power and appeal. They can not only model components of real systems, but equally as well fulfill assigned roles as components in software systems.

-- Object-Oriented Programming with Objective-C, Apple

Object Oriented Programming (OOP) views the world as a network of interacting objects.

A real world scenario viewed as a network of interacting objects:

You are asked to find out the average age of a group of people Adam, Beth, Charlie, and Daisy. You take a piece of paper and pen, go to each person, ask for their age, and note it down. After collecting the age of all four, you enter it into a calculator to find the total. And then, use the same calculator to divide the total by four, to get the average age. This can be viewed as the objects You, Pen, Paper, Calculator, Adam, Beth, Charlie, and Daisy interacting to accomplish the end result of calculating the average age of the four persons. These objects can be considered as connected in a certain network of certain structure.

OOP solutions try to create a similar object network inside the computer’s memory – a sort of a virtual simulation of the corresponding real world scenario – so that a similar result can be achieved programmatically.

OOP does not demand that the virtual world object network follow the real world exactly.

Our previous example can be tweaked a bit as follows:

  • Use an object called Main to represent your role in the scenario.
  • As there is no physical writing involved, we can replace the Pen and Paper with an object called AgeList that is able to keep a list of ages.

Every object has both state (data) and behavior (operations on data).

Object Real World? Virtual World? Example of State (i.e. Data) Examples of Behavior (i.e. Operations)
Adam Name, Date of Birth Calculate age based on birthday
Pen - Ink color, Amount of ink remaining Write
AgeList - Recorded ages Give the number of entries, Accept an entry to record
Calculator Numbers already entered Calculate the sum, divide
You/Main Average age, Sum of ages Use other objects to calculate

Every object has an interface and an implementation.

Every real world object has:

  • an interface through which other objects can interact with it
  • an implementation that supports the interface but may not be accessible to the other object

The interface and implementation of some real-world objects in our example:

  • Calculator: the buttons and the display are part of the interface; circuits are part of the implementation.
  • Adam: In the context of our 'calculate average age' example, the interface of Adam consists of requests that adam will respond to, e.g. "Give age to the nearest year, as at Jan 1st of this year" "State your name"; the implementation includes the mental calculation Adam uses to calculate the age which is not visible to other objects.

Similarly, every object in the virtual world has an interface and an implementation.

The interface and implementation of some virtual-world objects in our example:

  • Adam: the interface might have a method getAge(Date asAt); the implementation of that method is not visible to other objects.

Objects interact by sending messages. Both real world and virtual world object interactions can be viewed as objects sending message to each other. The message can result in the sender object receiving a response and/or the receiver object’s state being changed. Furthermore, the result can vary based on which object received the message, even if the message is identical (see rows 1 and 2 in the example below).

Examples:

World Sender Receiver Message Response State Change
Real You Adam "What is your name?" "Adam" -
Real as above Beth as above "Beth" -
Real You Pen Put nib on paper and apply pressure Makes a mark on your paper Ink level goes down
Virtual Main Calculator (current total is 50) add(int i): int i = 23 73 total = total + 23

Objects as Abstractions

The concept of Objects in OOP is an abstraction mechanism because it allows us to abstract away the lower level details and work with bigger granularity entities i.e. ignore details of data formats and the method implementation details and work at the level of objects.

Abstraction is a technique for dealing with complexity. It works by establishing a level of complexity we are interested in, and suppressing the more complex details below that level.

We can deal with a Person object that represents the person Adam and query the object for Adam's age instead of dealing with details such as Adam’s date of birth (DoB), in what format the DoB is stored, the algorithm used to calculate the age from the DoB, etc.

Encapsulation Of Objects

Encapsulation protects an implementation from unintended actions and from inadvertent access.
-- Object-Oriented Programming with Objective-C, Apple

An object is an encapsulation of some data and related behavior in terms of two aspects:

1. The packaging aspect: An object packages data and related behavior together into one self-contained unit.

2. The information hiding aspect: The data in an object is hidden from the outside world and are only accessible using the object's interface.

Classes

What

Writing an OOP program is essentially writing instructions that the computer will uses to,

  1. create the virtual world of of the object network, and
  2. provide it the inputs to produce the outcome we want.

A class contains instructions for creating a specific kind of objects. It turns out sometimes multiple objects keep the same type of data and have the same behavior because they are of the same kind. Instructions for creating a one kind (or ‘class’) of objects can be done once and that same instructions can be used to instantiate objects of that kind. We call such instructions a Class.

Classes and objects in an example scenario

Consider the example of writing an OOP program to calculate the average age of Adam, Beth, Charlie, and Daisy.

Instructions for creating objects Adam, Beth, Charlie, and Daisy will be very similar because they are all of the same kind : they all represent ‘persons’ with the same interface, the same kind of data (i.e. name, DoB, etc.), and the same kind of behavior (i.e. getAge(Date), getName(), etc.). Therefore, we can have a class called Person containing instructions on how to create Person objects and use that class to instantiate objects Adam, Beth, Charlie, and Daisy.

Similarly, we need classes AgeList, Calculator, and Main classes to instantiate one each of AgeList, Calculator, and Main objects.

Class Objects
Person objects representing Adam, Beth, Charlie, Daisy
AgeList an object to represent the age list
Calculator an object to do the calculations
Main an object to represent you who manages the whole operation

Class Level Members

While all objects of a class has the same attributes, each object has its own copy of the attribute value.

All Person objects have the Name attribute but the value of that attribute varies between Person objects.

However, some attributes are not suitable to be maintained by individual objects. Instead, they should be maintained centrally, shared by all objects of the class. They are like ‘global variables’ but attached to a specific class. Such variables whose value is shared by all instances of a class are called class-level attributes.

The attribute totalPersons should be maintained centrally and shared by all Person objects rather than copied at each Person object.

Similarly, when a normal method is being called, a message is being sent to the receiving object and the result may depend on the receiving object.

Sending the getName() message to Adam object results in the response "Adam" while sending the same message to the Beth object gets the response "Beth".

However, there can be methods related to a specific class but not suitable for sending message to a specific object of that class. Such methods that are called using the class instead of a specific instance are called class-level methods.

The method getTotalPersons() is not suitable to send to a specific Person object because a specific object of the Person class should not know about the total number of Person objects.

Class-level attributes and methods are collectively called class-level members (also called static members sometimes because some programming languages use the keyword static to identify class-level members). They are to be accessed using the class name rather than an instance of the class.

Enumerations

An Enumeration is a fixed set of values that can be considered as a data type. An enumeration is often useful when using a regular data type such as int or String would allow invalid values to be assigned to a variable.

Suppose you want a variable called priority to store the priority of something. There are only three priority levels: high, medium, and low. You can declare the variable priority as of type int and use only values 2, 1, and 0 to indication the three priority levels. However, this opens the possibility of an invalid values such as 9 being assigned to it. But if you define an enumeration type called Priority that has three values HIGH, MEDIUM, LOW only, a variable of type Priority will never be assigned an invalid value because the compiler is able to catch such an error.

Priority: HIGH, MEDIUM, LOW

Associations

What

Objects in an OO solution need to be connected to each other to form a network so that they can interact with each other. Such connections between objects are called associations.

Suppose an OOP program for managing a learning management system creates an object structure to represent the related objects. In that object structure we can expect to have associations between aCourse object that represents a specific course and Student objects that represents students taking that course.

Associations in an object structure can change over time.

To continue the previous example, the associations between a Course object and Student objects can change as students enroll in the module or drop the module over time.

Associations among objects can be generalized as associations between the corresponding classes too.

In our example, as some Course objects can have associations with some Student objects, we can view it as an association between the Course class and the Student class.

Implementing associations

We use instance level variables to implement associations.

When two classes are linked by an association, it does not necessarily mean both classes know about each other. The concept of which class in the association knows about the other class is called navigability.

Multiplicity

Multiplicity is the aspect of an OOP solution that dictates how many objects take part in each association.

The multiplicity of the association between Course objects and Student objects tells you how many Course objects can be associated with one Student object and vice versa.

Implementing multiplicity

A normal instance-level variable gives us a 0..1 multiplicity (also called optional associations) because a variable can hold a reference to a single object or null.

In the code below, the Logic class has a variable that can hold 0..1 i.e., zero or one Minefield objects.

class Logic{
    Minefield minefield;
}

class Minefield{
    ...
}
class Logic:
  
  def __init__(self, minefield):
    self.minefield = minefield
    
  # ...


class Minefield:
  # ...

A variable can be used to implement a 1 multiplicity too (also called compulsory associations).

In the code below, the Logic class will always have a ConfigGenerator object, provided the variable is not set to null at some point.

class Logic {
    ConfigGenerator cg = new ConfigGenerator();
    ...
}
class Logic:

  def __init__(self):
    self.config_gen = ConfigGenerator()

Bi-directional associations require matching variables in both classes.

In the code below, the Foo class has a variable to hold a Bar object and vice versa i.e., each object can have an association with an object of the other type.

class Foo {
    Bar bar;
    //...
}

class Bar {
    Foo foo;
    //...
}

class Foo:
  
  def __init__(self, bar):
    self.bar = bar;


class Bar:
  
  def __init__(self, foo):
    self.foo = foo;
    

To implement other multiplicities, choose a suitable data structure such as Arrays, ArrayLists, HashMaps, Sets, etc.

This code uses an two-dimensional array to implement a 1-to-many association from the Minefield to Cell.

class Minefield {
    Cell[][] cell;
    ...
}
class Minefield:
  
  def __init__(self):
    self.cells = {1:[], 2:[], 3:[]}

Dependencies

In the context of OOP associations, a dependency is a need for one class to depend on another without having a direction association with it. One cause of dependencies is interactions between objects that do not have a long-term link between them.

A Course object can have a dependency on a Registrar object to obtain the maximum number of students it can support.

Implementing dependencies

In the code below, Foo has a dependency on Bar but it is not an association because it is only a transient interaction and there is no long term relationship between a Foo object and a Bar object. i.e. the Foo object does not keep the Bar object it receives as a parameter.

class Foo{
    
    int calculate(Bar bar){
        return bar.getValue();
    }
}

class Bar{
    int value;
    
    int getValue(){
        return value;
    }
}
class Foo:
    
    def calculate(self, bar):
        return bar.value;

class Bar:
    
    def __init__(self, value):
      self.value = value

Composition

A composition is an association that represents a strong whole-part relationship. When the whole is destroyed, parts are destroyed too.

A Board (used for playing board games) consists of Square objects.

Composition also implies that there cannot be cyclical links.

The ‘sub-folder’ association between Folder objects is a composition type association. That means if the Folder object foo is a sub folder of Folder object bar, bar cannot be a sub-folder of foo.

Implementing composition

Composition is implemented using a normal variable. If correctly implemented, the ‘part’ object will be deleted when the ‘whole’ object is deleted. Ideally, the ‘part’ object may not even be visible to clients of the ‘whole’ object.

class Email {
    private Subject subject;
  ...
}
class Email:

  def __init__(self):
    self.__subject = Subject()

In this code, the Email has a composition type relationship with the Subject class, in the sense that the subject is part of the email.

Aggregation

Aggregation represents a container-contained relationship. It is a weaker relationship than composition.

SportsClub can act as a container for Person objects who are members of the club. Person objects can survive without a SportsClub object.

Implementing aggregation

Implementation is similar to that of composition except the containee object can exist even after the container object is deleted.

In the code below, there is an aggregation association between the Team class and the Person in that a Team contains Person a object who is the leader of the team.

class Team {
    Person leader;
    ...
    void setLeader(Person p) {
        leader = p;
    }
}
class Team:
  
  def __init__(self):
    self.__leader = None
    
  def set_leader(self, person):
    self.__leader = person

Inheritance

What

The OOP concept Inheritance allows you to define a new class based on an existing class.

For example, you can use inheritance to define an EvaluationReport class based on an existing Report class so that the EvaluationReport class does not have to duplicate data/behaviors that are already implemented in the Report class. The EvaluationReport can inherit the wordCount attribute and the print() method from the base class Report.

  • Other names for Base class: Parent class, Super class
  • Other names for Derived class: Child class, Sub class, Extended class

A superclass is said to be more general than the subclass. Conversely, a subclass is said to be more specialized than the superclass.

Applying inheritance on a group of similar classes can result in the common parts among classes being extracted into more general classes.

Man and Woman behaves the same way for certain things. However, the two classes cannot be simply replaced with a more general class Person because of the need to distinguish between Man and Woman for certain other things. A solution is to add the Person class as a superclass (to contain the code common to men and woment) and let Man and Woman inherit from Person class.

Inheritance implies the derived class can be considered as a sub-type of the base class (and the base class is a super-type of the derived class), resulting in an is a relationship.

Inheritance does not necessarily mean a sub-type relationship exists. However, the two often go hand-in-hand. For simplicity, at this point let us assume inheritance implies a sub-type relationship.

To continue the previous example,

  • Woman is a Person
  • Man is a Person

Inheritance relationships through a chain of classes can result in inheritance hierarchies (aka inheritance trees).

Two inheritance hierarchies/trees are given below. Note that the triangle points to the parent class. Observe how the Parrot is a Bird as well as it is an Animal.

Multiple Inheritance is when a class inherits directly from multiple classes. Multiple inheritance among classes is allowed in some languages (e.g., Python, C++) but not in other languages (e.g., Java, C#).

The Honey class inherits from the Food class and the Medicine class because honey can be consumed as a food as well as a medicine (in some oriental medicine practices). Similarly, a Car is an Vehicle, an Asset and a Liability.

Overriding

Method overriding is when a sub-class changes the behavior inherited from the parent class by re-implementing the method. Overridden methods have the same name, same type signature, and same return type.

Consider the following case of EvaluationReport class inheriting the Report class:

Report methods EvaluationReport methods Overrides?
print() print() Yes
write(String) write(String) Yes
read():String read(int):String No. Reason: the two methods have different signatures; This is a case of overloading (rather than overriding).

Paradigms → OOP → Inheritance →

Overloading

Method overloading is when there are multiple methods with the same name but different type signatures. Overloading is used to indicate that multiple operations do similar things but take different parameters.

Type Signature: The type signature of an operation is the type sequence of the parameters. The return type and parameter names are not part of the type signature. However, the parameter order is significant.

Example:

Method Type Signature
int add(int X, int Y) (int, int)
void add(int A, int B) (int, int)
void m(int X, double Y) (int, double)
void m(double X, int Y) (double, int)

In the case below, the calculate method is overloaded because the two methods have the same name but different type signatures (String) and (int)

  • calculate(String): void
  • calculate(int): void

Overloading

Method overloading is when there are multiple methods with the same name but different type signatures. Overloading is used to indicate that multiple operations do similar things but take different parameters.

Type Signature: The type signature of an operation is the type sequence of the parameters. The return type and parameter names are not part of the type signature. However, the parameter order is significant.

Example:

Method Type Signature
int add(int X, int Y) (int, int)
void add(int A, int B) (int, int)
void m(int X, double Y) (int, double)
void m(double X, int Y) (double, int)

In the case below, the calculate method is overloaded because the two methods have the same name but different type signatures (String) and (int)

  • calculate(String): void
  • calculate(int): void

Interfaces

An interface is a behavior specification i.e. a collection of method specifications. If a class implements the interface, it means the class is able to support the behaviors specified by the said interface.

There are a number of situations in software engineering when it is important for disparate groups of programmers to agree to a "contract" that spells out how their software interacts. Each group should be able to write their code without any knowledge of how the other group's code is written. Generally speaking, interfaces are such contracts. --Oracle Docs on Java

Suppose SalariedStaff is an interface that contains two methods setSalary(int) and getSalary(). AcademicStaff can declare itself as implementing the SalariedStaff interface, which means the AcademicStaff class must implement all the methods specified by the SalariedStaff interface i.e., setSalary(int) and getSalary().

A class implementing an interface results in an is-a relationship, just like in class inheritance.

In the example above, AcademicStaff is a SalariedStaff. An AcademicStaff object can be used anywhere a SalariedStaff object is expected e.g. SalariedStaff ss = new AcademicStaff().

Abstract Classes

Abstract Class: A class declared as an abstract class cannot be instantiated, but it can be subclassed.

You can declare a class as abstract when a class is merely a representation of commonalities among its subclasses in which case it does not make sense to instantiate objects of that class.

The Animal class that exist as a generalization of its subclasses Cat, Dog, Horse, Tiger etc. can be declared as abstract because it does not make sense to instantiate an Animal object.

Abstract Method: An abstract method is a method signature without a method implementation.

The move method of the Animal class is likely to be an abstract method as it is not possible to implement a move method at the Animal class level to fit all subclasses because each animal type can move in a different way.

A class that has an abstract method becomes an abstract class because the class definition is incomplete (due to the missing method body) and it is not possible to create objects using an incomplete class definition.

Substitutability

Every instance of a subclass is an instance of the superclass, but not vice-versa. As a result, inheritance allows substitutability : the ability to substitute a child class object where a parent class object is expected.

an Academic is an instance of a Staff, but a Staff is not necessarily an instance of an Academic. i.e. wherever an object of the superclass is expected, it can be substituted by an object of any of its subclasses.

The following code is valid because an AcademicStaff object is substitutable as a Staff object.

Staff staff = new AcademicStaff (); // OK

But the following code is not valid because staff is declared as a Staff type and therefore its value may or may not be of type AcademicStaff, which is the type expected by variable academicStaff.

Staff staff;
...
AcademicStaff academicStaff = staff; // Not OK

Polymorphism

What

Polymorphism:

The ability of different objects to respond, each in its own way, to identical messages is called polymorphism. -- Object-Oriented Programming with Objective-C, Apple

Polymorphism allows you to write code targeting superclass objects, use that code on subclass objects, and achieve possibly different results based on the actual class of the object.

Assume classes Cat and Dog are both subclasses of the Animal class. You can write code targeting Animal objects and use that code on Cat and Dog objects, achieving possibly different results based on whether it is a Cat object or a Dog object. Some examples:

  • Declare an array of type Animal and still be able to store Dog and Cat objects in it.
  • Define a method that takes an Animal object as a parameter and yet be able to pass Dog and Cat objects to it.
  • Call a method on a Dog or a Cat object as if it is an Animal object (i.e., without knowing whether it is a Dog object or a Cat object) and get a different response from it based on its actual class e.g., call the Animal class' method speak() on object a and get a "Meow" as the return value if a is a Cat object and "Woof" if it is a Dog object.

Polymorphism literally means "ability to take many forms".

More

Miscellaneous

What’s the difference between a Class, an Abstract Class, and an Interface?

  • An interface is a behavior specification with no implementation.
  • A class is a behavior specification + implementation.
  • An abstract class is a behavior specification + a possibly incomplete implementation.

How does overriding differ from overloading?

Overloading is used to indicate that multiple operations do similar things but take different parameters. Overloaded methods have the same method name but different method signatures and possibly different return types.

Overriding is when a sub-class redefines an operation using the same method name and the same type signature. Overridden methods have the same name, same method signature, and same return type.

 

SECTION: REQUIREMENTS

Requirements

Introduction

A software requirement specifies a need to be fulfilled by the software product.

A software project may be,

  • a brown-field project i.e., develop a product to replace/update an existing software product
  • a green-field project i.e., develop a totally new system with no precedent

In either case, requirements need to be gathered, analyzed, specified, and managed.

Requirements come from stakeholders.

Stakeholder: A party that is potentially affected by the software project. e.g. users, sponsors, developers, interest groups, government agencies, etc.

Identifying requirements is often not easy. For example, stakeholders may not be aware of their precise needs, may not know how to communicate their requirements correctly, may not be willing to spend effort in identifying requirements, etc.

Non-Functional Requirements

Requirements can be divided into two in the following way:

  1. Functional requirements specify what the system should do.
  2. Non-functional requirements specify the constraints under which system is developed and operated.

Some examples of non-functional requirement categories:

  • Data requirements e.g. size, volatility, persistency etc.,
  • Environment requirements e.g. technical environment in which system would operate or need to be compatible with.
  • Accessibility, Capacity, Compliance with regulations, Documentation, Disaster recovery, Efficiency, Extensibility, Fault tolerance, Interoperability, Maintainability, Privacy, Portability, Quality, Reliability, Response time, Robustness, Scalability, Security, Stability, Testability, and more ...
  • Business/domain rules: e.g. the size of the minefield cannot be smaller than five.
  • Constraints: e.g. the system should be backward compatible with data produced by earlier versions of the system; system testers are available only during the last month of the project; the total project cost should not exceed $1.5 million.
  • Technical requirements: e.g. the system should work on both 32-bit and 64-bit environments.
  • Performance requirements: e.g. the system should respond within two seconds.
  • Quality requirements: e.g. the system should be usable by a novice who has never carried out an online purchase.
  • Process requirements: e.g. the project is expected to adhere to a schedule that delivers a feature set every one month.
  • Notes about project scope: e.g. the product is not required to handle the printing of reports.
  • Any other noteworthy points: e.g. the game should not use images deemed offensive to those injured in real mine clearing activities.

We may have to spend an extra effort in digging NFRs out as early as possible because,

  1. NFRs are easier to miss e.g., stakeholders tend to think of functional requirements first
  2. sometimes NFRs are critical to the success of the software. E.g. A web application that is too slow or that has low security is unlikely to succeed even if it has all the right functionality.

 

Gathering Requirements

Brainstorming

Brainstorming: A group activity designed to generate a large number of diverse and creative ideas for the solution of a problem.

In a brainstorming session there are no "bad" ideas. The aim is to generate ideas; not to validate them. Brainstorming encourages you to "think outside the box" and put "crazy" ideas on the table without fear of rejection.

User Surveys

Surveys can be used to solicit responses and opinions from a large number of stakeholders regarding a current product or a new product.

Observation

Observing users in their natural work environment can uncover product requirements. Usage data of an existing system can also be used to gather information about how an existing system is being used, which can help in building a better replacement e.g. to find the situations where the user makes mistakes when using the current system.

Interviews

Interviewing stakeholders and domain experts can produce useful information that project requirements.

Domain Expert : An expert of a discipline to which the product is connected e.g., for a software used for Accounting, a domain expert is someone who is an expert of Accounting.

Focus Groups


[source]

Focus groups are a kind of informal interview within an interactive group setting. A group of people (e.g. potential users, beta testers) are asked about their understanding of a specific issue, process, product, advertisement, etc.

: How do focus groups work? - Hector Lanz extra

Prototyping

Prototype: A prototype is a mock up, a scaled down version, or a partial system constructed

  • to get users’ feedback.
  • to validate a technical concept (a "proof-of-concept" prototype).
  • to give a preview of what is to come, or to compare multiple alternatives on a small scale before committing fully to one alternative.
  • for early field-testing under controlled conditions.

Prototyping can uncover requirements, in particular, those related to how users interact with the system. UI prototypes or mock ups are often used in brainstorming sessions, or in meetings with the users to get quick feedback from them.

A mock up (also called a wireframe diagram) of a dialog box:


[source: plantuml.com]

Prototyping can be used for discovering as well as specifying requirements e.g. a UI prototype can serve as a specification of what to build.

Product Surveys

Studying existing products can unearth shortcomings of existing solutions that can be addressed by a new product. Product manuals and other forms of documentation of an existing system can tell us how the existing solutions work.

When developing a game for a mobile device, a look at a similar PC game can give insight into the kind of features and interactions the mobile game can offer.

 

Specifying Requirements

Prose

What

A textual description (i.e. prose) can be used to describe requirements. Prose is especially useful when describing abstract ideas such as the vision of a product.

The product vision of the TEAMMATES Project given below is described using prose.

TEAMMATES aims to become the biggest student project in the world (biggest here refers to 'many contributors, many users, large code base, evolving over a long period'). Furthermore, it aims to serve as a training tool for Software Engineering students who want to learn SE skills in the context of a non-trivial real software product.

Avoid using lengthy prose to describe requirements; they can be hard to follow.

Feature Lists

What

Feature List: A list of features of a product grouped according to some criteria such as aspect, priority, order of delivery, etc.

A sample feature list from a simple Minesweeper game (only a brief description has been provided to save space):

  1. Basic play – Single player play.
  2. Difficulty levels
    • Medium-levels
    • Advanced levels
  3. Versus play – Two players can play against each other.
  4. Timer – Additional fixed time restriction on the player.
  5. ...

User Stories

Introduction

User story: User stories are short, simple descriptions of a feature told from the perspective of the person who desires the new capability, usually a user or customer of the system. [Mike Cohn]

A common format for writing user stories is:

User story format: As a {user type/role} I can {function} so that {benefit}

Examples (from a Learning Management System):

  1. As a student, I can download files uploaded by lecturers, so that I can get my own copy of the files
  2. As a lecturer, I can create discussion forums, so that students can discuss things online
  3. As a tutor, I can print attendance sheets, so that I can take attendance during the class

We can write user stories on index cards or sticky notes, and arrange on walls or tables, to facilitate planning and discussion. Alternatively, we can use a software (e.g., GitHub Project Boards, Trello, Google Docs, ...) to manage user stories digitally.

[credit: https://www.flickr.com/photos/jakuza/2682466984/]

[credit: https://www.flickr.com/photos/jakuza/with/2726048607/]

[credit: https://commons.wikimedia.org/wiki/File:User_Story_Map_in_Action.png]

Details

The {benefit} can be omitted if it is obvious.

As a user, I can login to the system so that I can access my data

It is recommended to confirm there is a concrete benefit even if you omit it from the user story. If not, you could end up adding features that have no real benefit.

You can add more characteristics to the {user role} to provide more context to the user story.

  • As a forgetful user, I can view a password hint, so that I can recall my password.
  • As an expert user, I can tweak the underlying formatting tags of the document, so that I can format the document exactly as I need.

You can write user stories at various levels. High-level user stories, called epics (or themes) cover bigger functionality. You can then break down these epics to multiple user stories of normal size.

[Epic] As a lecturer, I can monitor student participation levels

  • As a lecturer, I can view the forum post count of each student
    so that I can identify the activity level of students in the forum
  • As a lecturer, I can view webcast view records of each student
    so that I can identify the students who did not view webcasts
  • As a lecturer, I can view file download statistics of each student
    so that I can identify the students who do not download lecture materials

You can add conditions of satisfaction to a user story to specify things that need to be true for the user story implementation to be accepted as ‘done’.

As a lecturer, I can view the forum post count of each student so that I can identify the activity level of students in the forum.

Conditions:

Separate post count for each forum should be shown
Total post count of a student should be shown
The list should be sortable by student name and post count

Other useful info that can be added to a user story includes (but not limited to)

  • Priority: how important the user story is
  • Size: the estimated effort to implement the user story
  • Urgency: how soon the feature is needed
More examples extra

User stories for a travel website (credit: Mike Cohen)

  • As a registered user, I am required to log in so that I can access the system
  • As a forgetful user, I can request a password reminder so that I can log in if I forget mine
  • [Epic] As a user, I can cancel a reservation
    • As a premium site member, I can cancel a reservation up to the last minute
    • As a non-premium member, I can cancel up to 24 hours in advance
    • As a member, I am emailed a confirmation of any cancelled reservation
  • [Epic] As a frequent flyer, I want to book a trip
    • As a frequent flyer, I want to book a trip using miles
    • As a frequent flyer, I want to rebook a trip I take often
    • As a frequent flyer, I want to request an upgrade
    • As a frequent flyer, I want to see if my upgrade cleared

Usage

User stories capture user requirements in a way that is convenient for scoping, estimation, and scheduling.

[User stories] strongly shift the focus from writing about features to discussing them. In fact, these discussions are more important than whatever text is written. [Mike Cohn, MountainGoat Software 🔗]

User stories differ from traditional requirements specifications mainly in the level of detail. User stories should only provide enough details to make a reasonably low risk estimate of how long the user story will take to implement. When the time comes to implement the user story, the developers will meet with the customer face-to-face to work out a more detailed description of the requirements. [more...]

User stories can capture non-functional requirements too because even NFRs must benefit some stakeholder.

Requirements → Requirements →

Non-Functional Requirements

Requirements can be divided into two in the following way:

  1. Functional requirements specify what the system should do.
  2. Non-functional requirements specify the constraints under which system is developed and operated.

Some examples of non-functional requirement categories:

  • Data requirements e.g. size, volatility, persistency etc.,
  • Environment requirements e.g. technical environment in which system would operate or need to be compatible with.
  • Accessibility, Capacity, Compliance with regulations, Documentation, Disaster recovery, Efficiency, Extensibility, Fault tolerance, Interoperability, Maintainability, Privacy, Portability, Quality, Reliability, Response time, Robustness, Scalability, Security, Stability, Testability, and more ...
  • Business/domain rules: e.g. the size of the minefield cannot be smaller than five.
  • Constraints: e.g. the system should be backward compatible with data produced by earlier versions of the system; system testers are available only during the last month of the project; the total project cost should not exceed $1.5 million.
  • Technical requirements: e.g. the system should work on both 32-bit and 64-bit environments.
  • Performance requirements: e.g. the system should respond within two seconds.
  • Quality requirements: e.g. the system should be usable by a novice who has never carried out an online purchase.
  • Process requirements: e.g. the project is expected to adhere to a schedule that delivers a feature set every one month.
  • Notes about project scope: e.g. the product is not required to handle the printing of reports.
  • Any other noteworthy points: e.g. the game should not use images deemed offensive to those injured in real mine clearing activities.

We may have to spend an extra effort in digging NFRs out as early as possible because,

  1. NFRs are easier to miss e.g., stakeholders tend to think of functional requirements first
  2. sometimes NFRs are critical to the success of the software. E.g. A web application that is too slow or that has low security is unlikely to succeed even if it has all the right functionality.

Given below are some requirements of TEAMMATES (an online peer evaluation system for education). Which one of these are non-functional requirements?

  • a. The response to any use action should become visible within 5 seconds.
  • b. The application admin should be able to view a log of user activities.
  • c. The source code should be open source.
  • d. A course should be able to have up to 2000 students.
  • e. As a student user, I can view details of my team members so that I can know who they are.
  • f. The user interface should be intuitive enough for users who are not IT-savvy.
  • g. The product is offered as a free online service.

(a)(c)(d)(f)(g)

Explanation: (b) are (e) are functions available for a specific user types. Therefore, they are functional requirements. (a), (c), (d), (f) and (g) are either constraints on functionality or constraints on how the project is done, both of which are considered non-functional requirements.

An example of a NFR captured as a user story:

As a I want to so that
impatient user to be able experience reasonable response time from the website while up to 1000 concurrent users are using it I can use the app even when the traffic is at the maximum expected level

Given their lightweight nature, user stories are quite handy for recording requirements during early stages of requirements gathering.

Here are some tips for using user stories for early stages of requirement gathering:

  • Define the target user:
    Decide your target user's profile (e.g. a student, office worker, programmer, sales person) and work patterns (e.g. Does he work in groups or alone? Does he share his computer with others?). A clear understanding of the target user will help when deciding the importance of a user story. You can even give this user a name. e.g. Target user Jean is a university student studying in a non-IT field. She interacts with a lot of people due to her involvement in university clubs/societies. ...
  • Define the problem scope: Decide that exact problem you are going to solve for the target user. e.g. Help Jean keep track of all her school contacts
  • Don't be too hasty to discard 'unusual' user stories:
    Those might make your product unique and stand out from the rest, at least for the target users.
  • Don't go into too much details:
    For example, consider this user story: As a user, I want to see a list of tasks that needs my attention most at the present time, so that I pay attention to them first.
    When discussing this user story, don't worry about what tasks should be considered needs my attention most at the present time. Those details can be worked out later.
  • Don't be biased by preconceived product ideas:
    When you are at the stage of identifying user needs, clear your mind of ideas you have about what your end product will look like.
  • Don't discuss implementation details or whether you are actually going to implement it:
    When gathering requirements, your decision is whether the user's need is important enough for you to want to fulfil it. Implementation details can be discussed later. If a user story turns out to be too difficult to implement later, you can always omit it from the implementation plan.

While use cases can be recorded on physical paper in the initial stages, an online tool is more suitable for longer-term management of user stories, especially if the team is not co-located.

You can create issues for each of the user stories and use a GitHub Project Board to sort them into categories.

Example Project Board:

Example Issue to represent a user story:

A video on GitHub Project Boards:

Example Google Sheet for recording user stories:

Example Trello Board for recording user stories:

eXtreme Programming (XP)

Extreme programming (XP) is a software development methodology which is intended to improve software quality and responsiveness to changing customer requirements. As a type of agile software development, it advocates frequent "releases" in short development cycles, which is intended to improve productivity and introduce checkpoints at which new customer requirements can be adopted. [wikipedia, 2017.05.01]

uses User stories to capture requirements.

This page in their website explains the difference between user stories and traditional requirements.

One of the biggest misunderstandings with user stories is how they differ from traditional requirements specifications. The biggest difference is in the level of detail. User stories should only provide enough detail to make a reasonably low risk estimate of how long the story will take to implement. When the time comes to implement the story developers will go to the customer and receive a detailed description of the requirements face to face.

Use Cases

Introduction

Use Case: A description of a set of sequences of actions, including variants, that a system performs to yield an observable result of value to an actor.[ 📖 : uml-user-guideThe Unified Modeling Language User Guide, 2e, G Booch, J Rumbaugh, and I Jacobson ]

Actor: An actor (in a use case) is a role played by a user. An actor can be a human or another system. Actors are not part of the system; they reside outside the system.

A use case describes an interaction between the user and the system for a specific functionality of the system.

  • System: ATM
  • Actor: Customer
  • Use Case: Check account balance
    1. User inserts an ATM card
    2. ATM prompts for PIN
    3. User enters PIN
    4. ATM prompts for withdrawal amount
    5. User enters the amount
    6. ATM ejects the ATM card and issues cash
    7. User collects the card and the cash.
  • System: A Learning Management System (LMS)
  • Actor: Student
  • Use Case: Upload file
    1. Student requests to upload file
    2. LMS requests for the file location
    3. Student specifies the file location
    4. LMS uploads the file

UML includes a diagram type called use case diagrams that can illustrate use cases of a system visually , providing a visual ‘table of contents’ of the use cases of a system. In the example below, note how use cases are shown as ovals and user roles relevant to each use case are shown as stick figures connected to the corresponding ovals.

Unified Modeling Language (UML) is a graphical notation to describe various aspects of a software system. UML is the brainchild of three software modeling specialists James Rumbaugh, Grady Booch and Ivar Jacobson (also known as the Three Amigos). Each of them has developed their own notation for modeling software systems before joining force to create a unified modeling language (hence, the term ‘Unified’ in UML). UML is currently the de facto modeling notation used in the software industry.

Use cases capture the functional requirements of a system.

Glossary

What

Glossary: A glossary serves to ensure that all stakeholders have a common understanding of the noteworthy terms, abbreviations, acronyms etc.

Here is a partial glossary from a variant of the Snakes and Ladders game:

  • Conditional square: A square that specifies a specific face value which a player has to throw before his/her piece can leave the square.
  • Normal square: a normal square does not have any conditions, snakes, or ladders in it.

Supplementary Requirements

What

A supplementary requirements section can be used to capture requirements that do not fit elsewhere. Typically, this is where most Non Functional Requirements will be listed.

Requirements → Requirements →

Non-Functional Requirements

Requirements can be divided into two in the following way:

  1. Functional requirements specify what the system should do.
  2. Non-functional requirements specify the constraints under which system is developed and operated.

Some examples of non-functional requirement categories:

  • Data requirements e.g. size, volatility, persistency etc.,
  • Environment requirements e.g. technical environment in which system would operate or need to be compatible with.
  • Accessibility, Capacity, Compliance with regulations, Documentation, Disaster recovery, Efficiency, Extensibility, Fault tolerance, Interoperability, Maintainability, Privacy, Portability, Quality, Reliability, Response time, Robustness, Scalability, Security, Stability, Testability, and more ...
  • Business/domain rules: e.g. the size of the minefield cannot be smaller than five.
  • Constraints: e.g. the system should be backward compatible with data produced by earlier versions of the system; system testers are available only during the last month of the project; the total project cost should not exceed $1.5 million.
  • Technical requirements: e.g. the system should work on both 32-bit and 64-bit environments.
  • Performance requirements: e.g. the system should respond within two seconds.
  • Quality requirements: e.g. the system should be usable by a novice who has never carried out an online purchase.
  • Process requirements: e.g. the project is expected to adhere to a schedule that delivers a feature set every one month.
  • Notes about project scope: e.g. the product is not required to handle the printing of reports.
  • Any other noteworthy points: e.g. the game should not use images deemed offensive to those injured in real mine clearing activities.

We may have to spend an extra effort in digging NFRs out as early as possible because,

  1. NFRs are easier to miss e.g., stakeholders tend to think of functional requirements first
  2. sometimes NFRs are critical to the success of the software. E.g. A web application that is too slow or that has low security is unlikely to succeed even if it has all the right functionality.

Given below are some requirements of TEAMMATES (an online peer evaluation system for education). Which one of these are non-functional requirements?

  • a. The response to any use action should become visible within 5 seconds.
  • b. The application admin should be able to view a log of user activities.
  • c. The source code should be open source.
  • d. A course should be able to have up to 2000 students.
  • e. As a student user, I can view details of my team members so that I can know who they are.
  • f. The user interface should be intuitive enough for users who are not IT-savvy.
  • g. The product is offered as a free online service.

(a)(c)(d)(f)(g)

Explanation: (b) are (e) are functions available for a specific user types. Therefore, they are functional requirements. (a), (c), (d), (f) and (g) are either constraints on functionality or constraints on how the project is done, both of which are considered non-functional requirements.

 

SECTION: DESIGN

Software Design

Introduction

What

Design in the creative process of transforming the problem into a solution; the solution is also called design. -- 📖 Software Engineering Theory and Practice, Shari Lawrence; Atlee, Joanne M. Pfleeger

Software design has two main aspects:

  • Product/external design: designing the external behavior of the product to meet the users' requirements. This is usually done by product designers with the input from business analysts, user experience experts, user representatives, etc.
  • Implementation/internal design: designing how the product will be implemented to meet the required external behavior. This is usually done by software architects and software engineers.

 

Design Fundamentals

Abstraction

What

Abstraction is a technique for dealing with complexity. It works by establishing a level of complexity we are interested in, and suppressing the more complex details below that level.

The guiding principle of abstraction is that only details that are relevant to the current perspective or the task at hand needs to be considered. As most programs are written to solve complex problems involving large amounts of intricate details, it is impossible to deal with all these details at the same time. That is where abstraction can help.

Data abstraction: abstracting away the lower level data items and thinking in terms of bigger entities

Within a certain software component, we might deal with a user data type, while ignoring the details contained in the user data item such as name, and date of birth. These details have been ‘abstracted away’ as they do not affect the task of that software component.

Control abstraction: abstracting away details of the actual control flow to focus on tasks at a higher level

print(“Hello”) is an abstraction of the actual output mechanism within the computer.

Abstraction can be applied repeatedly to obtain progressively higher levels of abstractions.

An example of different levels of data abstraction: a File is a data item that is at a higher level than an array and an array is at a higher level than a bit.

An example of different levels of control abstraction: execute(Game) is at a higher level than print(Char) which is at a higher than an Assembly language instruction MOV.

Abstraction is a general concept that is not limited to just data or control abstractions.

Some more general examples of abstraction:

  • An OOP class is an abstraction over related data and behaviors.
  • An architecture is a higher-level abstraction of the design of a software.
  • Models (e.g., UML models) are abstractions of some aspect of reality.

Coupling

What

Coupling is a measure of the degree of dependence between components, classes, methods, etc. Low coupling indicates that a component is less dependent on other components. High coupling (aka tight coupling or strong coupling) is discouraged due to the following disadvantages:

  • Maintenance is harder because a change in one module could cause changes in other modules coupled to it (i.e. a ripple effect).
  • Integration is harder because multiple components coupled with each other have to be integrated at the same time.
  • Testing and reuse of the module is harder due to its dependence on other modules.

In the example below, design A appears to have a more coupling between the components than design B.

How

X is coupled to Y if a change to Y can potentially require a change in X.

If Foo class calls the method Bar#read(), Foo is coupled to Bar because a change to Bar can potentially (but not always) require a change in the Foo class e.g. if the signature of the Bar#read() is changed, Foo needs to change as well, but a change to the Bar#write() method may not require a change in the Foo class because Foo does not call Bar#write().

class Foo{
    ...
    new Bar().read();
    ...
}

class Bar{
    void read(){
        ...
    }
    
    void write(){
        ...
    }
}

Some examples of coupling: A is coupled to B if,

  • A has access to the internal structure of B (this results in a very high level of coupling)
  • A and B depend on the same global variable
  • A calls B
  • A receives an object of B as a parameter or a return value
  • A inherits from B
  • A and B are required to follow the same data format or communication protocol

Cohesion

What

Cohesion is a measure of how strongly-related and focused the various responsibilities of a component are. A highly-cohesive component keeps related functionalities together while keeping out all other unrelated things.

Higher cohesion is better. Disadvantages of low cohesion (aka weak cohesion):

  • Lowers the understandability of modules as it is difficult to express module functionalities at a higher level.
  • Lowers maintainability because a module can be modified due to unrelated causes (reason: the module contains code unrelated to each other) or many many modules may need to be modified to achieve a small change in behavior (reason: because the code realated to that change is not localized to a single module).
  • Lowers reusability of modules because they do not represent logical units of functionality.

How

Cohesion can be present in many forms. Some examples:

  • Code related to a single concept is kept together, e.g. the Student component handles everything related to students.
  • Code that is invoked close together in time is kept together, e.g. all code related to initializing the system is kept together.
  • Code that manipulates the same data structure is kept together, e.g. the GameArchive component handles everything related to the storage and retrieval of game sessions.

Suppose a Payroll application contains a class that deals with writing data to the database. If the class include some code to show an error dialog to the user if the database is unreachable, that class is not cohesive because it seems to be interacting with the user as well as the database.

 

Modeling

Introduction

What

A model is a representation of something else.

A class diagram is a model that represents a software design.

A class diagram is a diagram drawn using the UML modelling notation.
An example class diagram:

A model provides a simpler view of a complex entity because a model captures only a selected aspect. This omission of some aspects implies models are abstractions.

Design → Design Fundamentals → Abstraction →

What

Abstraction is a technique for dealing with complexity. It works by establishing a level of complexity we are interested in, and suppressing the more complex details below that level.

The guiding principle of abstraction is that only details that are relevant to the current perspective or the task at hand needs to be considered. As most programs are written to solve complex problems involving large amounts of intricate details, it is impossible to deal with all these details at the same time. That is where abstraction can help.

Data abstraction: abstracting away the lower level data items and thinking in terms of bigger entities

Within a certain software component, we might deal with a user data type, while ignoring the details contained in the user data item such as name, and date of birth. These details have been ‘abstracted away’ as they do not affect the task of that software component.

Control abstraction: abstracting away details of the actual control flow to focus on tasks at a higher level

print(“Hello”) is an abstraction of the actual output mechanism within the computer.

Abstraction can be applied repeatedly to obtain progressively higher levels of abstractions.

An example of different levels of data abstraction: a File is a data item that is at a higher level than an array and an array is at a higher level than a bit.

An example of different levels of control abstraction: execute(Game) is at a higher level than print(Char) which is at a higher than an Assembly language instruction MOV.

Abstraction is a general concept that is not limited to just data or control abstractions.

Some more general examples of abstraction:

  • An OOP class is an abstraction over related data and behaviors.
  • An architecture is a higher-level abstraction of the design of a software.
  • Models (e.g., UML models) are abstractions of some aspect of reality.

A class diagram captures the structure of the software design but not the behavior.

Multiple models of the same entity may be needed to capture it fully.

In addition to a class diagram (or even multiple class diagrams), a number of other diagrams may be needed to capture various interesting aspects of the software.

How

In software development, models are useful in several ways:

a) To analyze a complex entity related to software development.

Some examples of using models for analysis:

  1. Models of the problem domain can be built to aid the understanding of the problem to be solved.
  2. When planning a software solution, models can be created to figure out how the solution is to be built. An architecture diagram is such a model.

An architecture diagram depicts the high-level design of a software.

b) To communicate information among stakeholders. Models can be used as a visual aid in discussions and documentations.

Some examples of using models to communicate:

  1. We can use an architecture diagram to explain the high-level design of the software to developers.
  2. A business analyst can use a use case diagram to explain to the customer the functionality of the system.
  3. A class diagram can be reverse-engineered from code so as to help explain the design of a component to a new developer.

c) As a blueprint for creating software. Models can be used as instructions for building software.

Some examples of using models to as blueprints:

  1. A senior developer draws a class diagram to propose a design for an OOP software and passes it to a junior programmer to implement.
  2. A software tool allows users to draw UML models using its interface and the tool automatically generates the code based on the model.
Model Driven Development extra

Model-driven development (MDD), also called Model-driven engineering, is an approach to software development that strives to exploit models as blueprints. MDD uses models as primary engineering artifacts when developing software. That is, the system is first created in the form of models. After that, the models are converted to code using code-generation techniques (usually, automated or semi-automated, but can even involve manual translation from model to code). MDD requires the use of a very expressive modeling notation (graphical or otherwise), often specific to a given problem domain. It also requires sophisticated tools to generate code from models and maintain the link between models and the code. One advantage of MDD is that the same model can be used to create software for different platforms and different languages. MDD has a lot of promise, but it is still an emerging technology

Further reading:

UML Models

The following diagram uses the class diagram notation to show the different types of UML diagrams.

Unified Modeling Language (UML) is a graphical notation to describe various aspects of a software system. UML is the brainchild of three software modeling specialists James Rumbaugh, Grady Booch and Ivar Jacobson (also known as the Three Amigos). Each of them has developed their own notation for modeling software systems before joining force to create a unified modeling language (hence, the term ‘Unified’ in UML). UML is currently the de facto modeling notation used in the software industry.


source:https://en.wikipedia.org/

Modeling Structures

OO Structures

An OO solution is basically a network of objects interacting with each other. Therefore, it is useful to be able to model how the relevant objects are 'networked' together inside a software i.e. how the objects are connected together.

Given below is an illustration of some objects and how they are connected together. Note: the diagram uses an ad-hoc notation.

Note that these object structures within the same software can change over time.

Given below is how the object structure in the previous example could have looked like at a different time.

However, object structures do not change at random; they change based on a set of rules, as was decided by the designer of that software. Those rules that object structures need to follow can be illustrated as a class structure i.e. a structure that exists among the relevant classes.

Here is a class structure (drawn using an ad-hoc notation) that matches the object structures given in the previous two examples. For example, note how this class structure does not allow any connection between Genre objects and Author objects, a rule followed by the two object structures above.

UML Object Diagrams are used to model object structures and UML Class Diagrams are used to model class structures of an OO solution.

Here is an object diagram for the above example:

And here is the class diagram for it:

Class Diagrams (Basics)

Classes form the basis of class diagrams.

Associations are the main connections among the classes in a class diagram.

The most basic class diagram is a bunch of classes with some solid lines among them to represent associations, such as this one.

An example class diagram showing associations between classes.

In addition, associations can show additional decorations such as association labels, association roles, multiplicity and navigability to add more information to a class diagram.

Here is the same class diagram shown earlier but with some additional information included:

Adding More Info to UML Models

UML notes can be used to add more info to any UML model.

Class Diagrams - Intermediate

A class diagram can also show different types of associations: inheritance, compositions, aggregations, dependencies.

Modeling inheritance

Modeling composition

Modeling aggregation

Modeling dependencies

A class diagram can also show different types of class-like entities:

Modeling enumerations

Modeling abstract classes

Modeling interfaces

Object Diagrams

Object diagrams can be used to complement class diagrams. For example, you can use object diagrams to model different object structures that can result from a design represented by a given class diagram.

Modeling Behaviors

Sequence Diagrams - Basic

Sequence Diagrams - Intermediate

 

SECTION: IMPLEMENTATION

IDEs

Introduction

What

Professional software engineers often write code using Integrated Development Environments (IDEs). IDEs support most development-related work within the same tool (hence, the term integrated).

An IDE generally consists of:

  • A source code editor that includes features such as syntax coloring, auto-completion, easy code navigation, error highlighting, and code-snippet generation.
  • A compiler and/or an interpreter (together with other build automation support) that facilitates the compilation/linking/running/deployment of a program.
  • A debugger that allows the developer to execute the program one step at a time to observe the run-time behavior in order to locate bugs.
  • Other tools that aid various aspects of coding e.g. support for automated testing, drag-and-drop construction of UI components, version management support, simulation of the target runtime platform, and modeling support.

Examples of popular IDEs:

  • Java: Eclipse, Intellij IDEA, NetBeans
  • C#, C++: Visual Studio
  • Swift: XCode
  • Python: PyCharm

Some Web-based IDEs have appeared in recent times too e.g., Amazon's Cloud9 IDE.

Some experienced developers, in particular those with a UNIX background, prefer lightweight yet powerful text editors with scripting capabilities (e.g. Emacs) over heavier IDEs.

Debugging

What

Debugging is the process of discovering defects in the program. Here are some approaches to debugging:

  • Bad -- By inserting temporary print statements: This is an ad-hoc approach in which print statements are inserted in the program to print information relevant to debugging, such as variable values. e.g. Exiting process() method, x is 5.347. This approach is not recommended due to these reasons.
    • Incurs extra effort when inserting and removing the print statements.
    • These extraneous program modifications increases the risk of introducing errors into the program.
    • These print statements, if not removed promptly after the debugging, may even appear unexpectedly in the production version.
  • Bad -- By manually tracing through the code: Otherwise known as ‘eye-balling’, this approach doesn't have the cons of the previous approach, but it too is not recommended (other than as a 'quick try') due to these reasons:
    • It is difficult, time consuming, and error-prone technique.
    • If you didn't spot the error while writing code, you might not spot the error when reading code too.
  • Good -- Using a debugger: A debugger tool allows you to pause the execution, then step through one statement at a time while examining the internal state if necessary. Most IDEs come with an inbuilt debugger. This is the recommended approach for debugging.

 

Code Quality

Introduction

Basic

Always code as if the person who ends up maintaining your code will be a violent psychopath who knows where you live. -- Martin Golding

Production code needs to be of high quality . Given how the world is becoming increasingly dependent of software, poor quality code is something we cannot afford to tolerate.

Code being used in an actual product with actual users

Guideline: Maximise Readability

Introduction

Programs should be written and polished until they acquire publication quality. --Niklaus Wirth

Among various dimensions of code quality, such as run-time efficiency, security, and robustness, one of the most important is understandability. This is because in any non-trivial software project, code needs to be read, understood, and modified by other developers later on. Even if we do not intend to pass the code to someone else, code quality is still important because we all become 'strangers' to our own code someday.

The two code samples given below achieve the same functionality, but one is easier to read.

Bad

int subsidy() {
    int subsidy;
    if (!age) {
        if (!sub) {
            if (!notFullTime) {
                subsidy = 500;
            } else {
                subsidy = 250;
            }
        } else {
            subsidy = 250;
        }
    } else {
        subsidy = -1;
    }
    return subsidy;
}
  

Good

int calculateSubsidy() {
    int subsidy;
    if (isSenior) {
        subsidy = REJECT_SENIOR;
    } else if (isAlreadySubsidised) {
        subsidy = SUBSIDISED_SUBSIDY;
    } else if (isPartTime) {
        subsidy = FULLTIME_SUBSIDY * RATIO;
    } else {
        subsidy = FULLTIME_SUBSIDY;
    }
    return subsidy;
}

Bad

def calculate_subs():
    if not age:
        if not sub:
            if not not_fulltime:
                subsidy = 500
            else:
                subsidy = 250
        else:
            subsidy = 250
    else:
        subsidy = -1
    return subsidy
  

Good

def calculate_subsidy():
    if is_senior:
        return REJECT_SENIOR
    elif is_already_subsidised:
        return SUBSIDISED_SUBSIDY
    elif is_parttime:
        return FULLTIME_SUBSIDY * RATIO
    else:
        return FULLTIME_SUBSIDY

Basic

Avoid Long Methods

Be wary when a method is longer than the computer screen, and take corrective action when it goes beyond 30 LOC (lines of code). The bigger the haystack, the harder it is to find a needle.

Avoid Deep Nesting

If you need more than 3 levels of indentation, you're screwed anyway, and should fix your program. --Linux 1.3.53 CodingStyle

In particular, avoid arrowhead style code.

A real code example:

Bad

int subsidy() {
    int subsidy;
    if (!age) {
        if (!sub) {
            if (!notFullTime) {
                subsidy = 500;
            } else {
                subsidy = 250;
            }
        } else {
            subsidy = 250;
        }
    } else {
        subsidy = -1;
    }
    return subsidy;
}
  

Good

int calculateSubsidy() {
    int subsidy;
    if (isSenior) {
        subsidy = REJECT_SENIOR;
    } else if (isAlreadySubsidised) {
        subsidy = SUBSIDISED_SUBSIDY;
    } else if (isPartTime) {
        subsidy = FULLTIME_SUBSIDY * RATIO;
    } else {
        subsidy = FULLTIME_SUBSIDY;
    }
    return subsidy;
}

Bad

def calculate_subs():
    if not age:
        if not sub:
            if not not_fulltime:
                subsidy = 500
            else:
                subsidy = 250
        else:
            subsidy = 250
    else:
        subsidy = -1
    return subsidy
  

Good

def calculate_subsidy():
    if is_senior:
        return REJECT_SENIOR
    elif is_already_subsidised:
        return SUBSIDISED_SUBSIDY
    elif is_parttime:
        return FULLTIME_SUBSIDY * RATIO
    else:
        return FULLTIME_SUBSIDY

Avoid Complicated Expressions

Avoid complicated expressions, especially those having many negations and nested parentheses. If you must evaluate complicated expressions, have it done in steps (i.e. calculate some intermediate values first and use them to calculate the final value).

Example:

Bad

return ((length < MAX_LENGTH) || (previousSize != length)) && (typeCode == URGENT);

Good


boolean isWithinSizeLimit = length < MAX_LENGTH;
boolean isSameSize = previousSize != length;
boolean isValidCode = isWithinSizeLimit || isSameSize;

boolean isUrgent = typeCode == URGENT;

return isValidCode && isUrgent;

Example:

Bad

return ((length < MAX_LENGTH) or (previous_size != length)) and (type_code == URGENT)

Good

is_within_size_limit = length < MAX_LENGTH
is_same_size = previous_size != length
is_valid_code = is_within_size_limit or is_same_size

is_urgent = type_code == URGENT

return is_valid_code and is_urgent

The competent programmer is fully aware of the strictly limited size of his own skull; therefore he approaches the programming task in full humility, and among other things he avoids clever tricks like the plague. -- Edsger Dijkstra

Avoid Magic Numbers

When the code has a number that does not explain the meaning of the number, we call that a "magic number" (as in "the number appears as if by magic"). Using a named constant makes the code easier to understand because the name tells us more about the meaning of the number.

Example:

Bad

return 3.14236;
...
return 9;
  

Good

static final double PI = 3.14236;
static final int MAX_SIZE = 10;
...
return PI;
...
return MAX_SIZE-1;

Note: Python does not have a way to make a variable a constant. However, you can use a normal variable with an ALL_CAPS name to simulate a constant.

Bad

return 3.14236
...
return 9
  

Good

PI = 3.14236
MAX_SIZE = 10
...
return PI
...
return MAX_SIZE-1

Similarly, we can have ‘magic’ values of other data types.

Bad

"Error 1432"  // A magic string!

In general, try to avoid any magic literals.

Make the Code Obvious

Make the code as explicit as possible, even if the language syntax allows them to be implicit. Here are some examples:

  • [Java] Use explicit type conversion instead of implicit type conversion.
  • [Java, Python] Use parentheses/braces to show grouping even when they can be skipped.
  • [Java, Python] Use enumerations when a certain variable can take only a small number of finite values. For example, instead of declaring the variable 'state' as an integer and using values 0,1,2 to denote the states 'starting', 'enabled', and 'disabled' respectively, declare 'state' as type SystemState and define an enumeration SystemState that has values 'STARTING', 'ENABLED', and 'DISABLED'.

Intermediate

Structure Code Logically

Lay out the code so that it adheres to the logical structure. The code should read like a story. Just like we use section breaks, chapters and paragraphs to organize a story, use classes, methods, indentation and line spacing in your code to group related segments of the code. For example, you can use blank lines to group related statements together. Sometimes, the correctness of your code does not depend on the order in which you perform certain intermediary steps. Nevertheless, this order may affect the clarity of the story you are trying to tell. Choose the order that makes the story most readable.

Guideline: Follow a Standard

Introduction

One essential way to improve code quality is to follow a consistent style. That is why software engineers follow a strict coding standard (aka style guide).

The aim of a coding standard is to make the entire code base look like it was written by one person. A coding standard is usually specific to a programming language and specifies guidelines such as the location of opening and closing braces, indentation styles and naming styles (e.g. whether to use Hungarian style, Pascal casing, Camel casing, etc.). It is important that the whole team/company use the same coding standard and that standard is not generally inconsistent with typical industry practices. If a company's coding standards is very different from what is used typically in the industry, new recruits will take longer to get used to the company's coding style.

IDEs can help to enforce some parts of a coding standard e.g. indentation rules.

Basic

Learn basic guidelines of the Java coding standard (by OSS-Generic)

Guideline: Name Well

Introduction

Proper naming improves the readability. It also reduces bugs caused by ambiguities regarding the intent of a variable or a method.

There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton

Basic

Use Nouns for Things and Verbs for Actions

Every system is built from a domain-specific language designed by the programmers to describe that system. Functions are the verbs of that language, and classes are the nouns. ― Robert C. Martin, Clean Code: A Handbook of Agile Software Craftsmanship

Use nouns for classes/variables and verbs for methods/functions.

Examples:

Name for a Bad Good
Class CheckLimit LimitChecker
method result() calculate()

Distinguish clearly between single-valued and multivalued variables.

Examples:

Good

Person student;
ArrayList<Person> students;

Good

name = 'Jim'
names = ['Jim', 'Alice']

Use Standard Words

Use correct spelling in names. Avoid 'texting-style' spelling. Avoid foreign language words, slang, and names that are only meaningful within specific contexts/times e.g. terms from private jokes, a TV show currently popular in your country

Guideline: Comment Minimally, but Sufficiently

Introduction

Good code is its own best documentation. As you’re about to add a comment, ask yourself, ‘How can I improve the code so that this comment isn’t needed?’ Improve the code and then document it to make it even clearer. --Steve McConnell, Author of Clean Code

Some think commenting heavily increases the 'code quality'. This is not so. Avoid writing comments to explain bad code. Improve the code to make it self-explanatory.

Basic

Do Not Repeat the Obvious

If the code is self-explanatory, refrain from repeating the description in a comment just for the sake of 'good documentation'.

Bad

// increment x
x++;

//trim the input
trimInput();

Bad

// increment x
x = x + 1

//trim the input
trim_input()

Write to the Reader

Do not write comments as if they are private notes to yourself. Instead, write them well enough to be understood by another programmer. One type of comments that is almost always useful is the header comment that you write for a class or an operation to explain its purpose.

Examples:

Bad Reason: this comment will only make sense to the person who wrote it

// a quick trim function used to fix bug I detected overnight
void trimInput(){
    ....
}

Good

/** Trims the input of leading and trailing spaces */
void trimInput(){
    ....
}

Bad Reason: this comment will only make sense to the person who wrote it

# a quick trim function used to fix bug I detected overnight
def trim_input():
    ...

Good

def trim_input():
"""Trim the input of leading and trailing spaces"""
    ...

 

Refactoring

What

The first version of the code you write may not be of production quality. It is OK to first concentrate on making the code work, rather than worry over the quality of the code, as long as you improve the quality later. This process of improving a program's internal structure in small steps without modifying its external behavior is called refactoring.

  • Refactoring is not rewriting: Discarding poorly-written code entirely and re-writing it from scratch is not refactoring because refactoring needs to be done in small steps.
  • Refactoring is not bug fixing: By definition, refactoring is different from bug fixing or any other modifications that alter the external behavior (e.g. adding a feature) of the component in concern.

Improving code structure can have many secondary benefits: e.g.

  • hidden bugs become easier to spot
  • improve performance (sometimes, simpler code runs faster than complex code because simpler code is easier for the compiler to optimize).

Given below are two common refactorings (more).

Refactoring Name: Consolidate Duplicate Conditional Fragments

Situation: The same fragment of code is in all branches of a conditional expression.

Method: Move it outside of the expression.

Example:

if (isSpecialDeal()) {
    total = price * 0.95;
    send();
} else {
    total = price * 0.98;
    send();
}
 → 
if (isSpecialDeal()){
    total = price * 0.95;
} else {
    total = price * 0.98;
}
send();

if is_special_deal:
    total = price * 0.95
    send()
else:
    total = price * 0.98
    send()
 → 
if is_special_deal:
    total = price * 0.95
else:
    total = price * 0.98
    
send()

Refactoring Name: Extract Method

Situation: You have a code fragment that can be grouped together.

Method: Turn the fragment into a method whose name explains the purpose of the method.

Example:

void printOwing() {
    printBanner();

    //print details
    System.out.println("name:	" + name);
    System.out.println("amount	" + getOutstanding());
}

void printOwing() {
    printBanner();
    printDetails(getOutstanding());
}

void printDetails (double outstanding) {
    System.out.println("name:	" + name);
    System.out.println("amount	" + outstanding);
}
def print_owing():
    print_banner()

    //print details
    print("name:	" + name)
    print("amount	" + get_outstanding())

def print_owing():
    print_banner()
    print_details(get_outstanding())

def print_details(amount):
    print("name:	" + name)
    print("amount	" + amount)

Some IDEs have built in support for basic refactorings such as automatically renaming a variable/method/class in all places it has been used.

Refactoring, even if done with the aid of an IDE, may still result in regressions. Therefore, each small refactoring should be followed by regression testing.

How

Given below are some more commonly used refactorings. A more comprehensive list is available at refactoring-catalog .

  1. Consolidate Conditional Expression
  2. Decompose Conditional
  3. Inline Method
  4. Remove Double Negative
  5. Replace Magic Literal
  6. Replace Nested Conditional with Guard Clauses
  7. Replace Parameter with Explicit Methods
  8. Reverse Conditional
  9. Split Loop
  10. Split Temporary Variable

 

Error Handling

Introduction

What

Well-written applications include error-handling code that allows them to recover gracefully from unexpected errors. When an error occurs, the application may need to request user intervention, or it may be able to recover on its own. In extreme cases, the application may log the user off or shut down the system. --Microsoft

Exceptions

What

Exceptions are used to deal with 'unusual' but not entirely unexpected situations that the program might encounter at run time.

Exception:

The term exception is shorthand for the phrase "exceptional event." An exception is an event, which occurs during the execution of a program, that disrupts the normal flow of the program's instructions. –- Java Tutorial (Oracle Inc.)

Examples:

  • A network connection encounters a timeout due to a slow server.
  • The code tries to read a file from the hard disk but the file is corrupted and cannot be read.

How

Most languages allow code that encountered an "exceptional" situation to encapsulate details of the situation in an Exception object and throw/raise that object so that another piece of code can catch it and deal with it. This is especially useful when the code that encountered the unusual situation does not know how to deal with it.

The extract below from the -- Java Tutorial (with slight adaptations) explains how exceptions are typically handled.

When an error occurs at some point in the execution, the code being executed creates an exception object and hands it off to the runtime system. The exception object contains information about the error, including its type and the state of the program when the error occurred. Creating an exception object and handing it to the runtime system is called throwing an exception.

After a method throws an exception, the runtime system attempts to find something to handle it in the call stack. The runtime system searches the call stack for a method that contains a block of code that can handle the exception. This block of code is called an exception handler. The search begins with the method in which the error occurred and proceeds through the call stack in the reverse order in which the methods were called. When an appropriate handler is found, the runtime system passes the exception to the handler. An exception handler is considered appropriate if the type of the exception object thrown matches the type that can be handled by the handler.

The exception handler chosen is said to catch the exception. If the runtime system exhaustively searches all the methods on the call stack without finding an appropriate exception handler, the program terminates.

Advantages of exception handling in this way:

  • The ability to propagate error information through the call stack.
  • The separation of code that deals with 'unusual' situations from the code that does the 'usual' work.

When

In general, use exceptions only for 'unusual' conditions. Use normal return statements to pass control to the caller for conditions that are 'normal'.

Assertions

What

Assertions are used to define assumptions about the program state so that the runtime can verify them. An assertion failure indicates a possible bug in the code because the code has resulted in a program state that violates an assumption about how the code should behave.

An assertion can be used to express something like when the execution comes to this point, the variable v cannot be null.

If the runtime detects an assertion failure, it typically take some drastic action such as terminating the execution with an error message. This is because an assertion failure indicates a possible bug and the sooner the execution stops, the safer it is.

In the Java code below, suppose we set an assertion that timeout returned by Config.getTimeout() is greater than 0. Now, if the Config.getTimeout() returned -1 in a specific execution of this line, the runtime can detect it as a assertion failure -- i.e. an assumption about the expected behavior of the code turned out to be wrong which could potentially be the result of a bug -- and take some drastic action such as terminating the execution.

int timeout = Config.getTimeout(); 

How

Use the assert keyword to define assertions.

This assertion will fail with the message x should be 0 if x is not 0 at this point.

x = getX();
assert x == 0 : "x should be 0";
...

Assertions can be disabled without modifying the code.

java -enableassertions HelloWorld (or java -ea HelloWorld) will run HelloWorld with assertions enabled while java -disableassertions HelloWorld will run it without verifying assertions.

Java disables assertions by default. This could create a situation where you think all assertions are being verified as true while in fact they are not being verified at all. Therefore, remember to enable assertions when you run the program if you want them to be in effect.

Enable assertions in Intellij (how?) and get an assertion to fail temporarily (e.g. insert an assert false into the code temporarily) to confirm assertions are being verified.

Java assert vs JUnit assertions: They are similar in purpose but JUnit assertions are more powerful and customized for testing. In addition, JUnit assertions are not disabled by default. We recommend you use JUnit assertions in test code and Java assert in functional code.

When

It is recommended that assertions be used liberally in the code. Their impact on performance is considered low and worth the additional safety they provide.

Do not use assertions to do work because assertions can be disabled. If not, your program will stop working when assertions are not enabled.

The code below will not invoke the writeFile() method when assertions are disabled. If that method is performing some work that is necessary for your program, your program will not work correctly when assertions are disabled.

...
assert writeFile() : "File writing is supposed to return true";

Assertions are suitable for verifying assumptions about Internal Invariants, Control-Flow Invariants, Preconditions, Postconditions, and Class Invariants. Refer to [Programming with Assertions (second half)] to learn more.

Exceptions and assertions are two complementary ways of handling errors in software but they serve different purposes. Therefore, both assertions and exceptions should be used in code.

  • The raising of an exception indicates an unusual condition created by the user (e.g. user inputs an unacceptable input) or the environment (e.g., a file needed for the program is missing).
  • An assertion failure indicates the programmer made a mistake in the code (e.g., a null value is returned from a method that is not supposed to return null under any circumstances).

Logging

What

Logging is the deliberate recording of certain information during a program execution for future reference. Logs are typically written to a log file but it is also possible to log information in other ways e.g. into a database or a remote server.

Logging can be useful for troubleshooting problems. A good logging system records some system information regularly. When bad things happen to a system e.g. an unanticipated failure, their associated log files may provide indications of what went wrong and action can then be taken to prevent it from happening again.

A log file is like the black box of an airplane; they don't prevent problems but they can be helpful in understanding what went wrong after the fact.

 

Reuse

Introduction

What

Reuse is a major theme in software engineering practices. By reusing tried-and-tested components, the robustness of a new software system can be enhanced while reducing the manpower and time requirement. Reusable components come in many forms; it can be reusing a piece of code, a subsystem, or a whole software.

When

While you may be tempted to use many libraries/frameworks/platform that seem to crop up on a regular basis and promise to bring great benefits, note that there are costs associated with reuse. Here are some:

  • The reused code may be an overkill (think using a sledgehammer to crack a nut) increasing the size of, or/and degrading the performance of, your software.
  • The reused software may not be mature/stable enough to be used in an important product. That means the software can change drastically and rapidly, possibly in ways that break your software.
  • Non-mature software has the risk of dying off as fast as they emerged, leaving you with a dependency that is no longer maintained.
  • The license of the reused software (or its dependencies) restrict how you can use/develop your software.
  • The reused software might have bugs, missing features, or security vulnerabilities that are important to your product but not so important to the maintainers of that software, which means those flaws will not get fixed as fast as you need them to.
  • Malicious code can sneak into your product via compromised dependencies.

APIs

What

An Application Programming Interface (API) specifies the interface through which other programs can interact with a software component. It is a contract between the component and its clients.

A class has an API (e.g., API of the Java String class, API of the Python str class) which is a collection of public methods that you can invoke to make use of the class.

The GitHub API is a collection of Web request formats GitHub server accepts and the corresponding responses. We can write a program that interacts with GitHub through that API.

When developing large systems, if you define the API of each components early, the development team can develop the components in parallel because the future behavior of the other components are now more predictable.

Libraries

What

A library is a collection of modular code that is general and can be used by other programs.

Java classes you get with the JDK (such as String, ArrayList, HashMap, etc.) are library classes that are provided in the default Java distribution.

Natty is a Java library that can be used for parsing strings that represent dates e.g. The 31st of April in the year 2008

built-in modules you get with Python (such as csv, random, sys, etc.) are libraries that are provided in the default Python distribution. Classes such as list, str, dict are built-in library classes that you get with Python.

Colorama is a Python library that can be used for colorizing text in a CLI.

How

These are the typical steps required to use a library.

  1. Read the documentation to confirm that its functionality fits your needs
  2. Check the license to confirm that it allows reuse in the way you plan to reuse it. For example, some libraries might allow non-commercial use only.
  3. Download the library and make it accessible to your project. Alternatively, you can configure your dependency management tool to do it for you.
  4. Call the library API from your code where you need to use the library functionality.

Frameworks

What

The overall structure and execution flow of a specific category of software systems can be very similar. The similarity is an opportunity to reuse at a high scale.

Running example:

IDEs for different programming languages are similar in how they support editing code, organizing project files, debugging, etc.

A software framework is a reusable implementation of a software (or part thereof) providing generic functionality that can be selectively customized to produce a specific application.

Running example:

Eclipse is an IDE framework that can be used to create IDEs for different programming languages.

Some frameworks provide a complete implementation of a default behavior which makes them immediately usable.

Running example:

Eclipse is a fully functional Java IDE out-of-the-box.

A framework facilitates the adaptation and customization of some desired functionality.

Running example:

Eclipse plugin system can be used to create an IDE for different programming languages while reusing most of the existing IDE features of Eclipse. E.g. https://marketplace.eclipse.org/content/pydev-python-ide-eclipse

Some frameworks cover only a specific components or an aspect.

JavaFx a framework for creating Java GUIs. TkInter is a GUI framework for Python.

More examples of frameworks

  • Frameworks for Web-based applications: Drupal(PHP), Django(Python), Ruby on Rails (Ruby), Spring (Java)
  • Frameworks for testing: JUnit (Java), unittest (Python), Jest (Java Script)

Frameworks vs Libraries

Although both frameworks and libraries are reuse mechanisms, there are notable differences:

  • Libraries are meant to be used ‘as is’ while frameworks are meant to be customized/extended. e.g., writing plugins for Eclipse so that it can be used as an IDE for different languages (C++, PHP, etc.), adding modules and themes to Drupal, and adding test cases to JUnit.

  • Your code calls the library code while the framework code calls your code. Frameworks use a technique called inversion of control, aka the “Hollywood principle” (i.e. don’t call us, we’ll call you!). That is, you write code that will be called by the framework, e.g. writing test methods that will be called by the JUnit framework. In the case of libraries, your code calls libraries.

Platforms

What

A platform provides a runtime environment for applications. A platform is often bundled with various libraries, tools, frameworks, and technologies in addition to a runtime environment but the defining characteristic of a software platform is the presence of a runtime environment.

Technically, an operating system can be called a platform. For example, Windows PC is a platform for desktop applications while iOS is a platform for mobile apps.

Two well-known examples of platforms are JavaEE and .NET, both of which sit above Operating systems layer, and are used to develop enterprise applications. Infrastructure services such as connection pooling, load balancing, remote code execution, transaction management, authentication, security, messaging etc. are done similarly in most enterprise applications. Both JavaEE and .NET provide these services to applications in a customizable way without developers having to implement them from scratch every time.

  • JavaEE (Java Enterprise Edition) is both a framework and a platform for writing enterprise applications. The runtime used by the JavaEE applications is the JVM (Java Virtual Machine) that can run on different Operating Systems.
  • .NET is a similar platform and a framework. Its runtime is called CLR (Common Language Runtime) and usually used on Windows machines.

Enterprise Application: ‘enterprise applications’ means software applications used at organizations level and therefore has to meet much higher demands (such as in scalability, security, performance, and robustness) than software meant for individual use.

 

SECTION: QUALITY ASSURANCE

Quality Assurance

Introduction

What

Software Quality Assurance (QA) is the process of ensuring that the software being built has the required levels of quality.

While testing is the most common activity used in QA, there are other complementary techniques such as static analysis, code reviews, and formal verification.

Validation vs Verification

Quality Assurance = Validation + Verification

QA involves checking two aspects:

  1. Validation: are we building the right system i.e., are the requirements correct?
  2. Verification: are we building the system right i.e., are the requirements implemented correctly?

Whether something belongs under validation or verification is not that important. What is more important is both are done, instead of limiting to verification (i.e., remember that the requirements can be wrong too).

Code Reviews

What

Code review is the systematic examination code with the intention of finding where the code can be improved.

Reviews can be done in various forms. Some examples below:

  • Pull Request reviews

    • Project Management Platforms such as GitHub and BitBucket allow the new code to be proposed as Pull Requests and provide the ability for others to review the code in the PR.
  • In pair programming

    • As pair programming involves two programmers working on the same code at the same time, there is an implicit review of the code by the other member of the pair.

Pair Programming:

Pair programming is an agile software development technique in which two programmers work together at one workstation. One, the driver, writes code while the other, the observer or navigator, reviews each line of code as it is typed in. The two programmers switch roles frequently. [source: Wikipedia]


[image credit: Wikipedia]

A good introduction to pair programming:

  • Formal inspections

    • Inspections involve a group of people systematically examining a project artifacts to discover defects. Members of the inspection team play various roles during the process, such as:

      • the author - the creator of the artifact
      • the moderator - the planner and executor of the inspection meeting
      • the secretary - the recorder of the findings of the inspection
      • the inspector/reviewer - the one who inspects/reviews the artifact.

Advantages of code reviews over testing:

  • It can detect functionality defects as well as other problems such as coding standard violations.
  • Can verify non-code artifacts and incomplete code
  • Do not require test drivers or stubs.

Disadvantages:

  • It is a manual process and therefore, error prone.

Static Analysis

What

Static analysis: Static analysis is the analysis of code without actually executing the code.

Static analysis of code can find useful information such unused variables, unhandled exceptions, style errors, and statistics. Most modern IDEs come with some inbuilt static analysis capabilities. For example, an IDE can highlight unused variables as you type the code into the editor.

Higher-end static analyzer tools can perform more complex analysis such as locating potential bugs, memory leaks, inefficient code structures etc.

Some example static analyzer for Java: CheckStyle, PMD, FindBugs

Linters are a subset of static analyzers that specifically aim to locate areas where the code can be made 'cleaner'.

Formal Verification

What

Formal verification uses mathematical techniques to prove the correctness of a program.

by Eric Hehner

Advantages:

  • Formal verification can be used to prove the absence of errors. In contrast, testing can only prove the presence of error, not their absence.

Disadvantages:

  • It only proves the compliance with the specification, but not the actual utility of the software.
  • It requires highly specialized notations and knowledge which makes it an expensive technique to administer. Therefore, formal verifications are more commonly used in safety-critical software such as flight control systems.

 

Testing

Introduction

What

Testing: Operating a system or component under specified conditions, observing or recording the results, and making an evaluation of some aspect of the system or component. –- source: IEEE

When testing, we execute a set of test cases. A test case specifies how to perform a test. At a minimum, it specifies the input to the software under test (SUT) and the expected behavior.

Example: A minimal test case for testing a browser:

  • Input – Start the browser using a blank page (vertical scrollbar disabled). Then, load longfile.html located in the test data folder.
  • Expected behavior – The scrollbar should be automatically enabled upon loading longfile.html.

Test cases can be determined based on the specification, reviewing similar existing systems, or comparing to the past behavior of the SUT.

Other details a test case can contain extra A more elaborate test case can have other details such as those given below.
  • A unique identifier : e.g. TC0034-a
  • A descriptive name: e.g. vertical scrollbar activation for long web pages
  • Objectives: e.g. to check whether the vertical scrollbar is correctly activated when a long web page is loaded to the browser
  • Classification information: e.g. priority - medium, category - UI features
  • Cleanup, if any: e.g. empty the browser cache.

For each test case we do the following:

  1. Feed the input to the SUT
  2. Observe the actual output
  3. Compare actual output with the expected output

A test case failure is a mismatch between the expected behavior and the actual behavior. A failure indicates a potential defect (or a bug), unless the error is in the test case itself.

Example: In the browser example above, a test case failure is implied if the scrollbar remains disabled after loading longfile.html. The defect/bug causing that failure could be an uninitialized variable.

A deeper look at the definition of testing extra

Here is another definition of testing:

Software testing consists of the dynamic verification that a program provides expected behaviors on a finite set of test cases, suitably selected from the usually infinite execution domain. -– source: Software Engineering Book of Knowledge V3

Some things to note (indicated by keywords in the above definition):

  • Dynamic: Testing involves executing the software. It is not by examining the code statically.
  • Finite: In most non-trivial cases there are potentially infinite test scenarios but resource constraints dictate that we can test only a finite number of scenarios.
  • Selected: In most cases it is not possible to test all scenarios. That means we need to select what scenarios to test.
  • Expected: Testing requires some knowledge of how the software is expected to behave.

Testing Types

Unit Testing

What

Unit testing: testing individual units (methods, classes, subsystems, ...) to ensure each piece works correctly

In OOP code, it is common to write one or more unit tests for each public method of a class.

Here are the code skeletons for a Foo class containing two methods and a FooTest class that contains unit tests for those two methods.

class Foo{
    String read(){
        //...
    }
    
    void write(String input){
        //...
    }
    
}
class FooTest{
    
    @Test
    void read(){
        //a unit test for Foo#read() method
    }
    
    @Test
    void write_emptyInput_exceptionThrown(){
        //a unit tests for Foo#write(String) method
    }  
    
    @Test
    void write_normalInput_writtenCorrectly(){
        //another unit tests for Foo#write(String) method
    }
}
import unittest

class Foo:
  def read(self):
      # ...
  
  def write(self, input):
      # ...


class FooTest(unittest.TestCase):
  
  def test_read(sefl):
      # a unit test for read() method
  
  def test_write_emptyIntput_ignored(self):
      # a unit tests for write(string) method
  
  def test_write_normalInput_writtenCorrectly(self):
      # another unit tests for write(string) method

Integration Testing

What

Integration testing : testing whether different parts of the software work together (i.e. integrates) as expected. Integration tests aim to discover bugs in the 'glue code' related to how components interact with each other. These bugs are often the result of misunderstanding of what the parts are supposed to do vs what the parts are actually doing.

Suppose a class Car users classes Engine and Wheel. If the Car class assumed a Wheel can support 200 mph speed but the actual Wheel can only support 150 mph, it is the integration test that is supposed to uncover this discrepancy.

System Testing

What

System testing: take the whole system and test it against the system specification.

System testing is typically done by a testing team (also called a QA team).

System test cases are based on the specified external behavior of the system. Sometimes, system tests go beyond the bounds defined in the specification. This is useful when testing that the system fails 'gracefully' having pushed beyond its limits.

Suppose the SUT is a browser supposedly capable of handling web pages containing up to 5000 characters. Given below is a test case to test if the SUT fails gracefully if pushed beyond its limits.

Test case: load a web page that is too big
* Input: load a web page containing more than 5000 characters. 
* Expected behavior: abort the loading of the page and show a meaningful error message. 

This test case would fail if the browser attempted to load the large file anyway and crashed.

System testing includes testing against non-functional requirements too. Here are some examples.

  • Performance testing – to ensure the system responds quickly.
  • Load testing (also called stress testing or scalability testing) – to ensure the system can work under heavy load.
  • Security testing – to test how secure the system is.
  • Compatibility testing, interoperability testing – to check whether the system can work with other systems.
  • Usability testing – to test how easy it is to use the system.
  • Portability testing – to test whether the system works on different platforms.

Alpha and Beta Testing

What

Alpha testing is performed by the users, under controlled conditions set by the software development team.

Beta testing is performed by a selected subset of target users of the system in their natural work setting.

An open beta release is the release of not-yet-production-quality-but-almost-there software to the general population. For example, Google’s Gmail was in 'beta' for many years before the label was finally removed.

Developer Testing

What

Developer testing is the testing done by the developers themselves as opposed to professional testers or end-users.

Why

Delaying testing until the full product is complete has a number of disadvantages:

  • Locating the cause of such a test case failure is difficult due to a large search space; in a large system, the search space could be millions of lines of code, written by hundreds of developers! The failure may also be due to multiple inter-related bugs.
  • Fixing a bug found during such testing could result in major rework, especially if the bug originated during the design or during requirements specification i.e. a faulty design or faulty requirements.
  • One bug might 'hide' other bugs, which could emerge only after the first bug is fixed.
  • The delivery may have to be delayed if too many bugs were found during testing.

Therefore, it is better to do early testing, as hinted by the popular rule of thumb given below, also illustrated by the graph below it.

The earlier a bug is found, the easier and cheaper to have it fixed.

Such early testing of partially developed software is usually, and by necessity, done by the developers themselves i.e. developer testing.

Exploratory vs Scripted Testing

What

Here are two alternative approaches to testing a software: Scripted testing and Exploratory testing

  1. Scripted testing: First write a set of test cases based on the expected behavior of the SUT, and then perform testing based on that set of test cases.

  2. Exploratory testing: Devise test cases on-the-fly, creating new test cases based on the results of the past test cases.

Exploratory testing is ‘the simultaneous learning, test design, and test execution’ [source: bach-et-explained] whereby the nature of the follow-up test case is decided based on the behavior of the previous test cases. In other words, running the system and trying out various operations. It is called exploratory testing because testing is driven by observations during testing. Exploratory testing usually starts with areas identified as error-prone, based on the tester’s past experience with similar systems. One tends to conduct more tests for those operations where more faults are found.

Here is an example thought process behind a segment of an exploratory testing session:

“Hmm... looks like feature x is broken. This usually means feature n and k could be broken too; we need to look at them soon. But before that, let us give a good test run to feature y because users can still use the product if feature y works, even if x doesn’t work. Now, if feature y doesn’t work 100%, we have a major problem and this has to be made known to the development team sooner rather than later...”

Exploratory testing is also known as reactive testing, error guessing technique, attack-based testing, and bug hunting.

Exploratory Testing Explained, an online article by James Bach -- James Bach is an industry thought leader in software testing).

When

Which approach is better – scripted or exploratory? A mix is better.

The success of exploratory testing depends on the tester’s prior experience and intuition. Exploratory testing should be done by experienced testers, using a clear strategy/plan/framework. Ad-hoc exploratory testing by unskilled or inexperienced testers without a clear strategy is not recommended for real-world non-trivial systems. While exploratory testing may allow us to detect some problems in a relatively short time, it is not prudent to use exploratory testing as the sole means of testing a critical system.

Scripted testing is more systematic, and hence, likely to discover more bugs given sufficient time, while exploratory testing would aid in quick error discovery, especially if the tester has a lot of experience in testing similar systems.

In some contexts, you will achieve your testing mission better through a more scripted approach; in other contexts, your mission will benefit more from the ability to create and improve tests as you execute them. I find that most situations benefit from a mix of scripted and exploratory approaches. --[source: bach-et-explained]

Exploratory Testing Explained, an online article by James Bach -- James Bach is an industry thought leader in software testing).

Acceptance Testing

What

Acceptance testing (aka User Acceptance Testing (UAT): test the system to ensure it meets the user requirements.

Acceptance tests give an assurance to the customer that the system does what it is intended to do. Acceptance test cases are often defined at the beginning of the project, usually based on the use case specification. Successful completion of UAT is often a prerequisite to the project sign-off.

Acceptance vs System Testing

Acceptance testing comes after system testing. Similar to system testing, acceptance testing involves testing the whole system.

Some differences between system testing and acceptance testing:

System Testing Acceptance Testing
Done against the system specification Done against the requirements specification
Done by testers of the project team Done by a team that represents the customer
Done on the development environment or a test bed Done on the deployment site or on a close simulation of the deployment site
Both negative and positive test cases More focus on positive test cases

Note: negative test cases: cases where the SUT is not expected to work normally e.g. incorrect inputs; positive test cases: cases where the SUT is expected to work normally

Requirement Specification vs System Specification

The requirement specification need not be the same as the system specification. Some example differences:

Requirements Specification System Specification
limited to how the system behaves in normal working conditions can also include details on how it will fail gracefully when pushed beyond limits, how to recover, etc. specification
written in terms of problems that need to be solved (e.g. provide a method to locate an email quickly) written in terms of how the system solve those problems (e.g. explain the email search feature)
specifies the interface available for intended end-users could contain additional APIs not available for end-users (for the use of developers/testers)

However, in many cases one document serves as both a requirement specification and a system specification.

Passing system tests does not necessarily mean passing acceptance testing. Some examples:

  • The system might work on the testbed environments but might not work the same way in the deployment environment, due to subtle differences between the two environments.
  • The system might conform to the system specification but could fail to solve the problem it was supposed to solve for the user, due to flaws in the system design.

Regression Testing

What

When we modify a system, the modification may result in some unintended and undesirable effects on the system. Such an effect is called a regression.

Regression testing is the re-testing of the software to detect regressions. Note that to detect regressions, we need to retest all related components, even if they had been tested before.

Regression testing is more effective when it is done frequently, after each small change. However, doing so can be prohibitively expensive if testing is done manually. Hence, regression testing is more practical when it is automated.

Test Automation

What

An automated test case can be run programmatically and the result of the test case (pass or fail) is determined programmatically. Compared to manual testing, automated testing reduces the effort required to run tests repeatedly and increases precision of testing (because manual testing is susceptible to human errors).

Automated Testing of CLI Apps

A simple way to semi-automate testing of a CLI(Command Line Interface) app is by using input/output re-direction.

  • First, we feed the app with a sequence of test inputs that is stored in a file while redirecting the output to another file.
  • Next, we compare the actual output file with another file containing the expected output.

Let us assume we are testing a CLI app called AddressBook. Here are the detailed steps:

  1. Store the test input in the text file input.txt.

    add Valid Name p/12345 valid@email.butNoPrefix
    add Valid Name 12345 e/valid@email.butPhonePrefixMissing
    
  2. Store the output we expect from the SUT in another text file expected.txt.

    Command: || [add Valid Name p/12345 valid@email.butNoPrefix]
    Invalid command format: add 
    
    Command: || [add Valid Name 12345 e/valid@email.butPhonePrefixMissing]
    Invalid command format: add 
    
  3. Run the program as given below, which will redirect the text in input.txt as the input to AddressBook and similarly, will redirect the output of AddressBook to a text file output.txt. Note that this does not require any code changes to AddressBook.

    java AddressBook < input.txt > output.txt
    
    • The way to run a CLI program differs based on the language.
      e.g., In Python, assuming the code is in AddressBook.py file, use the command
      python AddressBook.py < input.txt > output.txt

    • If you are using Windows, use a normal command window to run the app, not a Power Shell window.

    More on the > operator and the < operator extra

    A CLI program takes input from the keyboard and outputs to the console. That is because those two are default input and output streams, respectively. But you can change that behavior using < and > operators. For example, if you run AddressBook in a command window, the output will be shown in the console, but if you run it like this,

    java AddressBook > output.txt 
    

    the Operating System then creates a file output.txt and stores the output in that file instead of displaying it in the console. No file I/O coding is required. Similarly, adding < input.txt (or any other filename) makes the OS redirect the contents of the file as input to the program, as if the user typed the content of the file one line at a time.

    Resources:

  4. Next, we compare output.txt with the expected.txt. This can be done using a utility such as Windows FC (i.e. File Compare) command, Unix diff command, or a GUI tool such as WinMerge.

    FC output.txt expected.txt
    

Note that the above technique is only suitable when testing CLI apps, and only if the exact output can be predetermined. If the output varies from one run to the other (e.g. it contains a time stamp), this technique will not work. In those cases we need more sophisticated ways of automating tests.

CLI App: An application that has a Command Line Interface. i.e. user interacts with the app by typing in commands.

Test Automation Using Test Drivers

A test driver is the code that ‘drives’ the SUT for the purpose of testing i.e. invoking the SUT with test inputs and verifying the behavior is as expected.

PayrollTest ‘drives’ the Payroll class by sending it test inputs and verifies if the output is as expected.

public class PayrollTestDriver {
    public static void main(String[] args) throws Exception {

        //test setup
        Payroll p = new Payroll();

        //test case 1
        p.setEmployees(new String[]{"E001", "E002"});
        // automatically verify the response
        if (p.totalSalary() != 6400) {
            throw new Error("case 1 failed ");
        }

        //test case 2
        p.setEmployees(new String[]{"E001"});
        if (p.totalSalary() != 2300) {
            throw new Error("case 2 failed ");
        }

        //more tests...

        System.out.println("All tests passed");
    }
}

Test Automation Tools

JUnit is a tool for automated testing of Java programs. Similar tools are available for other languages and for automating different types of testing.

This an automated test for a Payroll class, written using JUnit libraries.

@Test
public void testTotalSalary(){
    Payroll p = new Payroll();

    //test case 1
    p.setEmployees(new String[]{"E001", "E002"});
    assertEquals(6400, p.totalSalary());

    //test case 2
    p.setEmployees(new String[]{"E001"});
    assertEquals(2300, p.totalSalary());

    //more tests...
}

Most modern IDEs has integrated support for testing tools. The figure below shows the JUnit output when running some JUnit tests using the Eclipse IDE.

 

SECTION: PROJECT MANAGEMENT

Revision Control

What

Revision control is the process of managing multiple versions of a piece of information. In its simplest form, this is something that many people do by hand: every time you modify a file, save it under a new name that contains a number, each one higher than the number of the preceding version.

Manually managing multiple versions of even a single file is an error-prone task, though, so software tools to help automate this process have long been available. The earliest automated revision control tools were intended to help a single user to manage revisions of a single file. Over the past few decades, the scope of revision control tools has expanded greatly; they now manage multiple files, and help multiple people to work together. The best modern revision control tools have no problem coping with thousands of people working together on projects that consist of hundreds of thousands of files.

Revision control software will track the history and evolution of your project, so you don't have to. For every change, you'll have a log of who made it; why they made it; when they made it; and what the change was.

Revision control software makes it easier for you to collaborate when you're working with other people. For example, when people more or less simultaneously make potentially incompatible changes, the software will help you to identify and resolve those conflicts.

It can help you to recover from mistakes. If you make a change that later turns out to be an error, you can revert to an earlier version of one or more files. In fact, a really good revision control tool will even help you to efficiently figure out exactly when a problem was introduced.

It will help you to work simultaneously on, and manage the drift between, multiple versions of your project. Most of these reasons are equally valid, at least in theory, whether you're working on a project by yourself, or with a hundred other people.

-- [adapted from bryan-mercurial-guide

Mercurial: The Definitive Guide by Bryan O'Sullivan retrieved on 2012/07/11

RCS: Revision Control Software are the software tools that automate the process of Revision Control i.e. managing revisions of software artifacts.

Revision: A revision (some seem to use it interchangeably with version while others seem to distinguish the two -- here, let us treat them as the same, for simplicity) is a state of a piece of information at a specific time that is a result of some changes to it e.g., if you modify the code and save the file, you have a new revision (or a version) of that file.

Revision control software are also known as Version Control Software (VCS), and by a few other names.

Repositories

Repository (repo for short): The database of the history of a directory being tracked by an RCS software (e.g. Git).

The repository is the database where the meta-data about the revision history are stored. Suppose you want to apply revision control on files in a directory called ProjectFoo. In that case you need to set up a repo (short for repository) in ProjectFoo directory, which is referred to as the working directory of the repo. For example, Git uses a hidden folder named .git inside the working directory.

You can have multiple repos in your computer, each repo revision-controlling files of a different working directly, for examples, files of different projects.

Saving History

Tracking and Ignoring

In a repo, we can specify which files to track and which files to ignore. Some files such as temporary log files created during the build/test process should not be revision-controlled.

Staging and Committing

Committing saves a snapshot of the current state of the tracked files in the revision control history. Such a snapshot is also called a commit (i.e. the noun).

When ready to commit, we first stage the specific changes we want to commit. This intermediate step allows us to commit only some changes while saving other changes for a later commit.

Identifying Points in History

Each commit in a repo is a recorded point in the history of the project that is uniquely identified by an auto-generated hash e.g. a16043703f28e5b3dab95915f5c5e5bf4fdc5fc1.

We can tag a specific commit with a more easily identifiable name e.g. v1.0.2

Using History

RCS tools store the history of the working directory as a series of commits. This means we should commit after each change that we want the RCS to 'remember' for us.

To see what changed between two points of the history, you can ask the RCS tool to diff the two commits in concern.

To restore the state of the working directory at a point in the past, you can checkout the commit in concern. i.e., we can traverse the history of the working directory simply by checking out the commits we are interested in.

RCS: Revision Control Software are the software tools that automate the process of Revision Control i.e. managing revisions of software artifacts.

Remote Repositories

Remote repositories are copies of a repo that are hosted on remote computers. They are especially useful for sharing the revision history of a codebase among team members of a multi-person project. They can also serve as a remote backup of your code base.

You can clone a remote repo onto your computer which will create a copy of a remote repo on your computer, including the version history as the remote repo.

You can push new commits in your clone to the remote repo which will copy the new commits onto the remote repo. Note that pushing to a remote repo requires you to have write-access to it.

You can pull from the remote repos to receive new commits in the remote repo. Pulling is used to sync your local repo with latest changes to the remote repo.

While it is possible to set up your own remote repo on a server, an easier option is to use a remote repo hosting service such as GitHub or BitBucket.

A fork is a remote copy of a remote repo. If there is a remote repo that you want to push to but you do not have write access to it, you can fork the remote repo, which gives you your own remote repo that you can push to.

A pull request is mechanism for contributing code to a remote repo. It is a formal request sent to the maintainers of the repo asking them to pull your new code to their repo.

Here is a scenario that includes all the concepts introduced above (click on the slide to advance the animation):

Branching

Branching is the process of evolving multiple versions of the software in parallel. For example, one team member can create a new branch and add an experimental feature to it while the rest of the team keeps working on another branch. Branches can be given names e.g. master, release, dev.

A branch can be merged into another branch. Merging usually result in a new commit that represents the changes done in the branch being merged.

Branching and merging

Merge conflicts happen when you try to merge two branches that had changed the same part of the code and the RCS software cannot decide which changes to keep. In those cases we have to ‘resolve’ those conflicts manually.

 

Project Planning

Work Breakdown Structure

A Work Breakdown Structure (WBS) depicts information about tasks and their details in terms of subtasks. When managing projects it is useful to divide the total work into smaller, well-defined units. Relatively complex tasks can be further split into subtasks. In complex projects a WBS can also include prerequisite tasks and effort estimates for each task.

The high level tasks for a single iteration of a small project could look like the following:

Task ID Task Estimated Effort Prerequisite Task
A Analysis 1 man day -
B Design 2 man day A
C Implementation 4.5 man day B
D Testing 1 man day C
E Planning for next version 1 man day D

The effort is traditionally measured in man hour/day/month i.e. work that can be done by one person in one hour/day/month. The Task ID is a label for easy reference to a task. Simple labeling is suitable for a small project, while a more informative labeling system can be adopted for bigger projects.

An example WBS for a project for developing a game.

Task ID Task Estimated Effort Prerequisite Task
A High level design 1 man day -
B Detail design
  1. User Interface
  2. Game Logic
  3. Persistency Support
2 man day
  • 0.5 man day
  • 1 man day
  • 0.5 man day
A
C Implementation
  1. User Interface
  2. Game Logic
  3. Persistency Support
4.5 man day
  • 1.5 man day
  • 2 man day
  • 1 man day
  • B.1
  • B.2
  • B.3
D System Testing 1 man day C
E Planning for next version 1 man day D

All tasks should be well-defined. In particular, it should be clear as to when the task will be considered done.

Some examples of ill-defined tasks and their better-defined counterparts:

Bad Better
more coding implement component X
do research on UI testing find a suitable tool for testing the UI

Milestones

A milestone is the end of a stage which indicates a significant progress. We should take into account dependencies and priorities when deciding on the features to be delivered at a certain milestone.

Each intermediate product release is a milestone.

In some projects, it is not practical to have a very detailed plan for the whole project due to the uncertainty and unavailability of required information. In such cases, we can use a high-level plan for the whole project and a detailed plan for the next few milestones.

Milestones for the Minesweeper project, iteration 1

Day Milestones
Day 1 Architecture skeleton completed
Day 3 ‘new game’ feature implemented
Day 4 ‘new game’ feature tested

Buffers

A buffer is a time set aside to absorb any unforeseen delays. It is very important to include buffers in a software project schedule because effort/time estimations for software development is notoriously hard. However, do not inflate task estimates to create hidden buffers; have explicit buffers instead. Reason: With explicit buffers it is easier to detect incorrect effort estimates which can serve as a feedback to improve future effort estimates.

Issue Trackers

Keeping track of project tasks (who is doing what, which tasks are ongoing, which tasks are done etc.) is an essential part of project management. In small projects it may be possible to track tasks using simple tools as online spreadsheets or general-purpose/light-weight tasks tracking tools such as Trello. Bigger projects need more sophisticated task tracking tools.

Issue trackers (sometimes called bug trackers) are commonly used to track task assignment and progress. Most online project management software such as GitHub, SourceForge, and BitBucket come with an integrated issue tracker.

A screenshot from the Jira Issue tracker software (Jira is part of the BitBucket project management tool suite):

 

SECTION: PRINCIPLES

Principles

Single Responsibility Principle

Single Responsibility Principle (SRP): A class should have one, and only one, reason to change. -- Robert C. Martin

If a class has only one responsibility, it needs to change only when there is a change to that responsibility.

Consider a TextUi class that does parsing of the user commands as well as interacting with the user. That class needs to change when the formatting of the UI changes as well as when the syntax of the user command changes. Hence, such a class does not follow the SRP.

Gather together the things that change for the same reasons. Separate those things that change for different reasons. Agile Software Development, Principles, Patterns, and Practices by Robert C. Martin

Separation of Concerns Principle

Separation of Concerns Principle (SoC): To achieve better modularity, separate the code into distinct sections, such that each section addresses a separate concern. -- Proposed by Edsger W. Dijkstra

A concern in this context is a set of information that affects the code of a computer program.

Examples for concerns:

  • A specific feature, such as the code related to add employee feature
  • A specific aspect, such as the code related to persistence or security
  • A specific entity, such as the code related to the Employee entity

Applying SoC reduces functional overlaps among code sections and also limits the ripple effect when changes are introduced to a specific part of the system.

If the code related to persistence is separated from the code related to security, a change to how the data are persisted will not need changes to how the security is implemented.

This principle can be applied at the class level, as well as on higher levels.

The n-tier architecture utilizes this principle. Each layer in the architecture has a well-defined functionality that has no functional overlap with each other.

Design → Architecture → Styles → n-Tier Style →

What

In the n-tier style, higher layers make use of services provided by lower layers. Lower layers are independent of higher layers. Other names: multi-layered, layered.

Operating systems and network communication software often use n-tier style.

This principle should lead to higher cohesion and lower coupling.

Design → Design Fundamentals → Coupling →

What

Coupling is a measure of the degree of dependence between components, classes, methods, etc. Low coupling indicates that a component is less dependent on other components. High coupling (aka tight coupling or strong coupling) is discouraged due to the following disadvantages:

  • Maintenance is harder because a change in one module could cause changes in other modules coupled to it (i.e. a ripple effect).
  • Integration is harder because multiple components coupled with each other have to be integrated at the same time.
  • Testing and reuse of the module is harder due to its dependence on other modules.

In the example below, design A appears to have a more coupling between the components than design B.

Discuss the coupling levels of alternative designs x and y.

Overall coupling levels in x and y seem to be similar (neither has more dependencies than the other). (Note that the number of dependency links is not a definitive measure of the level of coupling. Some links may be stronger than the others.). However, in x, A is highly-coupled to the rest of the system while B, C, D, and E are standalone (do not depend on anything else). In y, no component is as highly-coupled as A of x. However, only D and E are standalone.

Explain the link (if any) between regressions and coupling.

When the system is highly-coupled, the risk of regressions is higher too e.g. when component A is modified, all components ‘coupled’ to component A risk ‘unintended behavioral changes’.

Discuss the relationship between coupling and testability.

Coupling decreases testability because if the SUT is coupled to many other components it becomes difficult to test the SUI in isolation of its dependencies.

Choose the correct statements.

  • a. As coupling increases, testability decreases.
  • b. As coupling increases, the risk of regression increases.
  • c. As coupling increases, the value of automated regression testing increases.
  • d. As coupling increases, integration becomes easier as everything is connected together.
  • e. As coupling increases, maintainability decreases.

(a)(b)(c)(d)(e)

Explanation: High coupling means either more components require to be integrated at once in a big-bang fashion (increasing the risk of things going wrong) or more drivers and stubs are required when integrating incrementally.

Design → Design Fundamentals → Cohesion →

What

Cohesion is a measure of how strongly-related and focused the various responsibilities of a component are. A highly-cohesive component keeps related functionalities together while keeping out all other unrelated things.

Higher cohesion is better. Disadvantages of low cohesion (aka weak cohesion):

  • Lowers the understandability of modules as it is difficult to express module functionalities at a higher level.
  • Lowers maintainability because a module can be modified due to unrelated causes (reason: the module contains code unrelated to each other) or many many modules may need to be modified to achieve a small change in behavior (reason: because the code realated to that change is not localized to a single module).
  • Lowers reusability of modules because they do not represent logical units of functionality.

 

SECTION: TOOLS

UML

Class Diagrams

Introduction

What

UML class diagrams describe the structure (but not the behavior) of an OOP solution. These are possibly the most often used diagrams in the industry and an indispensable tool for an OO programmer.

An example class diagram:

Classes

What

The basic UML notations used to represent a class:

A Table class shown in UML notation:

class Table{

    Integer number;
    Chair[] chairs = null;
    
    Integer getNumber(){
        //...
    }

    void setNumber(Integer n){
        //...
    }
}
class Table:

  def __init__(self):
    self.number = 0
    self.chairs = []

  def get_number(self):
    # ...

  def set_number(self, n):
    # ...

The 'Operations' compartment and/or the 'Attributes' compartment may be omitted if such details are not important for the task at hand. 'Attributes' always appear above the 'Operations' compartment. All operations should be in one compartment rather than each operation in a separate compartment. Same goes for attributes.

The visibility of attributes and operations is used to indicate the level of access allowed for each attribute or operation. The types of visibility and their exact meanings depend on the programming language used. Here are some common visibilities and how they are indicated in a class diagram:

  • + : public
  • - : private
  • # : protected
  • ~ : package private
visibility Java Python
- private private at least two leading underscores (and at most one trailing underscores) in the name
# protected protected one leading underscore in the name
+ public public all other cases
~ package private default visibility not applicable

Table class with visibilities shown:

Associations

What

We use a solid line to show an association between two classes.

This example shows an association between the Admin class and the Student class:

We use arrow heads to indication the navigability of an association.

Logic is aware of Minefield, but Minefield is not aware of Logic

class Logic{
    Minefield minefield;
}

class Minefield{
    ...
}
class Logic:
  
  def __init__(self, minefield):
    self.minefield = minefield
    
  # ...


class Minefield:
  # ...

Here is an example of a bidirectional navigability; each class is aware of the other.

Navigability can be shown in class diagrams as well as object diagrams.

According to this object diagram the given Logic object is associated with and aware of two MineField objects.

Roles

Association Role labels are used to indicate the role played by the classes in the association.

This association represents a marriage between a Man object and a Woman object. The respective roles played by objects of these two classes are husband and wife.

Note how the variable names match closely with the association roles.

class Man{
    Woman wife;
}

class Woman{
    Man husband;
}
class Man:
  def __init__(self):
    self.wife = None # a Woman object

class Woman:
   def __init__(self):
     self.husband = None # a Man object

The role of Student objects in this association is charges (i.e. Admin is in charge of students)

class Admin{
    List<Student> charges;
}
class Admin:
  def __init__(self):
    self.charges = [] # list of Student objects

Labels

Association labels describe the meaning of the association. The arrow head indicates the direction in which the label is to be read.

In this example, the same association is described using two different labels.

  • Diagram on the left: Admin class is associated with Student class because an Admin object uses a Student object.
  • Diagram on the right: Admin class is associated with Student class because a Student object is used by an Admin object.

Multiplicity

Commonly used multiplicities:

  • 0..1 : optional, can be linked to 0 or 1 objects
  • 1 : compulsory, must be linked to one object at all times.
  • * : can be linked to 0 or more objects.
  • n..m : the number of linked objects must be n to m inclusive

In the diagram below, an Admin object administers (in charge of) any number of students but a Student object must always be under the charge of exactly one Admin object

In the diagram below,

  • Each student must be supervised by exactly one professor. i.e. There cannot be a student who doesn't have a supervisor or has multiple supervisors.
  • A professor cannot supervise more than 5 students but can have no students to supervise.
  • An admin can handle any number of professors and any number of students, including none.
  • A professor/student can be handled by any number of admins, including none.

Dependencies

What

UML uses a dashed arrow to show dependencies.

Two examples of dependencies:

Enumerations

What

Notation:

In the class diagram below, there are two enumerations in use:

Class-Level Members

What

In UML class diagrams, underlines denote class-level attributes and variables.

In the class below, totalStudents attribute and the getTotalStudents method are class-level.

Composition

What

UML uses a solid diamond symbol to denote composition.

Notation:

A Book consists of Chapter objects. When the Book object is destroyed, its Chapter objects are destroyed too.

Aggregation

What

UML uses a hollow diamond is used to indicate an aggregation.

Notation:

Example:

Aggregation vs Composition

The distinction between composition (◆) and aggregation (◇) is rather blurred. Martin Fowler’s famous book UML Distilled advocates omitting the aggregation symbol altogether because using it adds more confusion than clarity.

Class Inheritance

What

You can use a triangle and a solid line (not to be confused with an arrow) to indicate class inheritance.

Notation:

Examples: The Car class inherits from the Vehicle class. The Cat and Dog classes inherit from the Pet class.

Interfaces

What

An interface is shown similar to a class with an additional keyword <<interface>>. When a class implements an interface, it is shown similar to class inheritance except a dashed line is used instead of a solid line.

The AcademicStaff and the AdminStaff classes implement the SalariedStaff interface.

Abstract Classes

What

You can use italics or {abstract} (preferred) keyword to denote abstract classes/methods.

Example:

Sequence Diagrams

Introduction

A UML sequence diagram captures the interactions between multiple objects for a given scenario.

Consider the code below.

class Machine {

    Unit producePrototype() {
        Unit prototype = new Unit();
        for (int i = 0; i < 5; i++) {
            prototype.stressTest();
        }
        return prototype;
    }
}

class Unit {

    public void stressTest() {

    }
}

Here is the sequence diagram to model the interactions for the method call producePrototype() on a Machine object.

Basic

Notation:

This sequence diagram shows some interactions between a human user and the Text UI of a CLI Minesweeper game.

The player runs the newgame action on the TextUi object which results in the TextUi showing the minefield to the player. Then, the player runs the clear x y command; in response, the TextUi object shows the updated minefield.

The :TextUi in the above example denotes an unnamed instance of the class TextUi. If there were two instances of TextUi in the diagram, they can be distinguished by naming them e.g. TextUi1:TextUi and TextUi2:TextUi.

Arrows representing method calls should be solid arrows while those representing method returns should be dashed arrows.

Note that unlike in object diagrams, the class/object name is not underlined in sequence diagrams.

[Common notation error] Activation bar too long: The activation bar of a method cannot start before the method call arrives and a method cannot remain active after the method had returned. In the two sequence diagrams below, the one on the left commits this error because the activation bar starts before the method Foo#xyz() is called and remains active after the method returns.

[Common notation error] Broken activation bar: The activation bar should remain unbroken from the point the method is called until the method returns. In the two sequence diagrams below, the one on the left commits this error because the activation bar for the method Foo#abc() is not contiguous, but appears as two pieces instead.

Object Creation

Notation:

  • The arrow that represents the constructor arrives at the side of the box representing the instance.
  • The activation bar represents the period the constructor is active.

The Logic object creates a Minefield object.

Object Deletion

UML uses an X at the end of the lifeline of an object to show it's deletion.

Although object deletion is not that important in languages such as Java that support automatic memory management, you can still show object deletion in UML diagrams to indicate the point at which the object ceases to be used.

Notation:

Note how the diagrams shows the deletion of the Minefield object

Loops

Notation:

The Player calls the mark x,y command or clear x y command repeatedly until the game is won or lost.

Self Invocation

UML can show a method of an object calling another of its own methods.

Notation:

The markCellAt(...) method of a Logic object is calling its own updateState(...) method.

In this variation, the Book#write() method is calling the Chapter#getText() method which in turn does a call back by calling the getAuthor() method of the calling object.

Alternative Paths

UML uses alt frames to indicate alternative paths.

Notation:

Minefield calls the Cell#setMine if the cell is supposed to be a mined cell, and calls the Cell:setMineCount(...) method otherwise.

Optional Paths

UML uses opt frames to indicate optional paths.

Notation:

Logic#markCellAt(...) calls Timer#start() only if it is the first move of the player.

Parallel Paths

UML uses par frames to indicate parallel paths.

Notation:

Logic is calling methods CloudServer#poll() and LocalServer#poll() in parallel.

If you show parallel paths in a sequence diagram, the corresponding Java implementation is likely to be multi-threaded because a normal Java program cannot do multiple things at the same time.

Reference Frames

UML uses ref frame to allow a segment of the interaction to be omitted and shown as a separate sequence diagram. Reference frames help us to break complicated sequence diagrams into multiple parts or simply to omit details we are not interested in showing.

Notation:

The details of the get minefield appearance interactions have been omitted from the diagram.

Those details are shown in a separate sequence diagram given below.

Minimal Notation

To reduce clutter, activation bars and return arrows may be omitted if they do not result in ambiguities or loss of information. Informal operation descriptions such as those given in the example below can be used, if more precise details are not required for the task at hand.

A minimal sequence diagram

Object Diagrams

Introduction

An object diagram shows an object structure at a given point of time.

An example object diagram:

Objects

Notation:

Notes:

  • The class name and object name e.g. car1:Car are underlined.
  • objectName:ClassName is meant to say 'an instance of ClassName identified as objectName'.
  • Unlike classes, there is no compartment for methods.
  • Attributes compartment can be omitted if it is not relevant to the task at hand.
  • Object name can be omitted too e.g. :Car which is meant to say 'an unnamed instance of a Car object'.

Some example objects:

What

A solid line indicates an association between two objects.

An example object diagram showing two associations:

Notes

Notes

UML notes can augment UML diagrams with additional information. These notes can be shown connected to a particular element in the diagram or can be shown without a connection. The diagram below shows examples of both.

Example:

Miscellaneous

Object vs Class Diagrams

Compared to the notation for a class diagrams, object diagrams differ in the following ways:

  • Shows objects instead of classes:
    • Instance name may be shown
    • There is a : before the class name
    • Instance and class names are underlined
  • Methods are omitted
  • Multiplicities are omitted

Furthermore, multiple object diagrams can correspond to a single class diagram.

Both object diagrams are derived from the same class diagram shown earlier. In other words, each of these object diagrams shows ‘an instance of’ the same class diagram.

 

Git and Github

Init

Soon you are going to take your first step in using Git. If you would like to see a quick overview of the full Git landscape before jumping in, watch the video below.

Install SourceTree which is Git + a GUI for Git. If you prefer to use Git via the command line (i.e., without a GUI), you can install Git instead.

Suppose you want to create a repository in an empty directory things. Here are the steps:

Windows: Click FileClone/New…. Click on Create button.
Mac: New...Create New Repository.

Enter the location of the directory (Windows version shown below) and click Create.

Go to the things folder and observe how a hidden folder .git has been created.

Note: If you are on Windows, you might have to configure Windows Explorer to show hidden files.

Open a Git Bash Terminal.

If you installed SourceTree, you can click the Terminal button to open a GitBash terminal.

Navigate to the things directory.

Use the command git init which should initialize the repo.

$ git init
Initialized empty Git repository in c:/repos/things/.git/

You can use the command ls -a to view all files, which should show the .git directory that was created by the previous command.

$ ls -a
.  ..  .git

You can also use the git status command to check the status of the newly-created repo. It should respond with something like the bellow

git status

# On branch master
#
# Initial commit
#
nothing to commit (create/copy files and use "git add" to track)

Commit

Create an empty repo.

Create a file named fruits.txt in the working directory and add some dummy text to it.

Working directory: The directory the repo is based in is called the working directory.

Observe how the file is detected by Git.

The file is shown as ‘unstaged’

You can use the git status command to check the status of the working directory.

git status

# On branch master
#
# Initial commit
#
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#   a.txt
nothing added to commit but untracked files present (use "git add" to track)

Although git has detected the file in the working directory, it will not do anything with the file unless you tell it to. Suppose we want to commit the current state of the file. First, we should stage the file.

Commit: Saving the current state of the working folder into the Git revision history.

Stage: Instructing Git to prepare a file for committing.

Select the fruits.txt and click on the Stage Selected button

fruits.txt should appear in the Staged files panel now.

You can use the stage or the add command (they are synonyms, add is the more popular choice) to stage files.

git add fruits.txt
git status

# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#       new file:   fruits.txt
#

Now, you can commit the staged version of fruits.txt

Click the Commit button, enter a commit message e.g. add fruits.txt in to the text box, and click Commit

Use the commit command to commit. The -m switch is used to specify the commit message.

git commit -m "add fruits.txt"

You can use the log command to see the commit history

git log

commit 8fd30a6910efb28bb258cd01be93e481caeab846
Author: … < … @... >
Date:   Wed Jul 5 16:06:28 2017 +0800

  Add fruits.txt

Note the existence of something called the master branch. Git allows you to have multiple branches (i.e. it is a way to evolve the content in parallel) and Git creates a default branch named master on which the commits go on by default.

Do some changes to fruits.txt (e.g. add some text and delete some text). Stage the changes, and commit the changes using the same steps you followed before. You should end up with something like this.

Next, add two more files colors.txt and shapes.txt to the same working directory. Add a third commit to record the current state of the working directory.

Ignore

Add a file named temp.txt to the things repo you created. Suppose we don’t want this file to be revision controlled by Git. Let’s instruct Git to ignore temp.txt

The file should be currently listed under Unstaged files. Right-click it and choose Ignore…. Choose Ignore exact filename(s) and click OK.

Observe that a file named .gitignore has been created in the working directory root and has the following line in it.

temp.txt

Create a file named .gitignore in the working directory root and add the following line in it.

temp.txt

The .gitignore file tells Git which files to ignore when tracking revision history. That file itself can be either revision controlled or ignored.

  • To version control it (the more common choice – which allows you to track how the .gitignore file changed over time), simply commit it as you would commit any other file.
  • To ignore it, follow the same steps we followed above when we set Git to ignore the temp.txt file.

Tag

Let's tag a commit in a local repo you have (e.g. the sampelrepo-things repo)

Right-click on the commit (in the graphical revision graph) you want to tag and choose Tag…

Specify the tag name e.g. v1.0 and click Add Tag.

The added tag will appear in the revision graph view.

To add a tag to the current commit as v1.0,

git tag –a v1.0

To view tags

git tag

To learn how to add a tag to a past commit, go to the ‘Git Basics – Tagging’ page of the git-scm book and refer the ‘Tagging Later’ section.

Remember to push tags to the repo. A normal push does not include tags.

# push a specific tag
git push origin v1.0b

# push all tags
git push origin --tags

Checkout

Git can show you what changed in each commit.

To see which files changed in a commit, click on the commit. To see what changed in a specific file in that commit, click on the file name.

git show < part-of-commit-hash >

Example:

git show 251b4cf

commit 5bc0e30635a754908dbdd3d2d833756cc4b52ef3
Author: … < … >
Date:   Sat Jul 8 16:50:27 2017 +0800

    fruits.txt: replace banana with berries

diff --git a/fruits.txt b/fruits.txt
index 15b57f7..17f4528 100644
--- a/fruits.txt
+++ b/fruits.txt
@@ -1,3 +1,3 @@
 apples
-bananas
+berries
 cherries

Git can also show you the difference between two points in the history of the repo.

Select the two points you want to compare using Ctrl+Click.

The same method can be used to compare the current state of the working directory (which might have uncommitted changes) to a point in the history.

The diff command can be used to view the differences between two points of the history.

  • git diff : shows the changes (uncommitted) since the last commit
  • git diff 0023cdd..fcd6199: shows the changes between the points indicated by by commit hashes
  • git diff v1.0..HEAD: shows changes that happened from the commit tagged as v1.0 to the most recent commit.

Git can load a specific version of the history to the working directory. Note that if you have uncommitted changes in the working directory, you need to stash them first to prevent them from being overwritten.

Tools → Git and GitHub →

Stash

You can use the git's stash feature to temporarily shelve (or stash) changes you've made to your working copy so that you can work on something else, and then come back and re-apply the stashed changes later on. -- adapted from this

Follow this article from SourceTree creators. Note the GUI shown in the article is slightly outdated but you should be able to map it to the current GUI.

Follow this article from Atlassian.

Double-click the commit you want to load to the working directory, or right-click on that commit and choose Checkout....

Click OK to the warning about ‘detached HEAD’ (similar to below).

The specified version is now loaded to the working folder, as indicated by the HEAD label. HEAD is a reference to the currently checked out commit.

If you checkout a commit that come before the commit in which you added the .gitignore file, Git will now show ignored fiels as ‘unstaged modifications’ because at that stage Git hasn’t been told to ignore those files.

To go back to the latest commit, double-click it.

Use the checkout <commit-identifier> command to change the working directory to the state it was in at a specific past commit.

  • git checkout v1.0: loads the state as at commit tagged v1.0
  • git checkout 0023cdd: loads the state as at commit with the hash 0023cdd
  • git checkout HEAD~2: loads the state that is 2 commits behind the most recent commit

For now, you can ignore the warning about ‘detached HEAD’.

Use the checkout <branch-name> to go back to the most recent commit of the current branch (the default branch in git is named master)

  • git checkout master

Clone

Clone the sample repo samplerepo-things to your computer.

Note that the URL of the Github project is different form the URL you need to clone a repo in that Github project. e.g.

Github project URL: https://github.com/se-edu/samplerepo-things
Git repo URL: https://github.com/se-edu/samplerepo-things.git (note the .git at the end)

FileClone / New… and provide the URL of the repo and the destination directory.

You can use the clone command to clone a repo.

Follow instructions given here.

Pull

Clone the sample repo as explained in [Textbook Tools → Git & GitHub → Clone].

Delete the last two commits to simulate cloning the repo 2 commits ago.

Clone

Can clone a remote repo

Clone the sample repo samplerepo-things to your computer.

Note that the URL of the Github project is different form the URL you need to clone a repo in that Github project. e.g.

Github project URL: https://github.com/se-edu/samplerepo-things
Git repo URL: https://github.com/se-edu/samplerepo-things.git (note the .git at the end)

FileClone / New… and provide the URL of the repo and the destination directory.

You can use the clone command to clone a repo.

Follow instructions given here.

Right-click the target commit (i.e. the commit that is 2 commits behind the tip) and choose Reset current branch to this commit.

Choose the Hard - … option and click OK.

This is what you will see.

Note the following (cross refer the screenshot above):

Arrow marked as a: The local repo is now at this commit, marked by the master label.
Arrow marked as b: origin/master label shows what is the latest commit in the master branch in the remote repo.

Use the reset command to delete commits at the tip of the revision history.

git reset --hard HEAD~2

Now, your local repo state is exactly how it would be if you had cloned the repo 2 commits ago, as if somebody has added two more commits to the remote repo since you cloned it. To get those commits to your local repo (i.e. to sync your local repo with upstream repo) you can do a pull.

Click the Pull button in the main menu, choose origin and master in the next dialog, and click OK.

Now you should see something like this where master and origin/master are both pointing the same commit.

git pull origin

Push

  1. Create a GitHub account if you don't have one yet.

  2. Fork the samplerepo-things to your GitHub account:

    Navigate to the on GitHub and click on the button on the top-right corner.

  3. Clone the fork (not the original) to your computer.

  4. Create some commits in your repo.

  5. Push the new commits to your fork on GitHub

Click the Push button on the main menu, ensure the settings are as follows in the next dialog, and click the Push button on the dialog.

Tags are not included in a normal push. Remember to tick Push all tags when pushing to the remote repo if you want them to be pushed to the repo.

Use the command git push origin master. Enter Github username and password when prompted.

Tags are not included in a normal push. To push a tag, use this command: git push origin <tag_name> e.g.. git push origin v1.0

Pushing an existing local repo into a new remote repo on GitHub extra

First, you need to create an empty remote repo on GitHub.

  1. Login to your GitHub account and choose to create a new Repo.

  2. In the next screen, provide a name for your repo but keep the Initialize this repo ... tick box unchecked.

  3. Note the URL of the repo. It will be of the form https://github.com/{your_user_name}/{repo_name}.git
    e.g., https://github.com/johndoe/foobar.git

Next, you can push the existing local repo to the new remote repo as follows:

  1. Open the local repo in SourceTree.
  2. Choose RepositoryRepository Settings menu option.
  3. Add a new remote to the repo with the following values.
    • Remote name: the name you want to assign to the remote repo. Recommended origin
    • URL/path: the URL of your repo (ending in .git) that you collected earlier.
    • Username: your GitHub username
  4. Now you can push your repo to the new remote the usual way.
  1. Navigate to the folder containing the local repo.
  2. Set the new remote repo as a remote of the local repo.
    command: git remote add {remote_name} {remote_repo_url}
    e.g., git remote add origin https://github.com/johndoe/foobar.git
  3. Push to the new remote the usual way. You can use the -u flag to inform Git that you wish to track the branch.
    e.g., git push -u origin master

Branch

0. Observe that you are normally in the branch called master. For this, you can take any repo you have on your computer (e.g. a clone of the samplerepo-things).

git status

on branch master

1. Start a branch named feature1 and switch to the new branch.

Click on the Branch button on the main menu. In the next dialog, enter the branch name and click Create Branch

Note how the feature1 is indicated as the current branch.

You can use the branch command to create a new branch and the checkout command to switch to a specific branch.

git branch feature1
git checkout feature1

One-step shortcut to create a branch and switch to it at the same time:

git checkout –b feature1

2. Create some commits in the new branch. Just commit as per normal. Commits you add while on a certain branch will become part of that branch.

3. Switch to the master branch. Note how the changes you did in the feature1 branch are no longer in the working directory.

Double-click the master branch

git checkout master

4. Add a commit to the master branch. Let’s imagine it’s a bug fix.

5. Switch back to the feature1 branch (similar to step 3).

6. Merge the master branch to the feature1 branch, giving an end-result like the below. Also note how Git has created a merge commit.

Right-click on the master branch and choose merge master into the current branch. Click OK in the next dialog.

git merge master

Observe how the changes you did in the master branch (i.e. the imaginary bug fix) is now available even when you are in the feature1 branch.

7. Add another commit to the feature1 branch.

8. Switch to the master branch and add one more commit.

9. Merge feature1 to the master branch, giving and end-result like this:

Right-click on the feature1 branch and choose Merge....

git merge feature1

10. Create a new branch called add-countries, switch to it, and add some commits to it (similar to steps 1-2 above). You should have something like this now:

11. Go back to the master branch and merge the add-countries branch onto the master branch (similar to steps 8-9 above). While you might expect to see something like the below,

... you are likely to see something like this instead:

That is because Git does a fast forward merge if possible. Seeing that the master branch has not changed since you started the add-countries branch, Git has decided it is simpler to just put the commits of the add-countries branch in front of the master branch, without going into the trouble of creating an extra merge commit.

It is possible to force Git to create a merge commit even if fast forwarding is possible.

Tick the box shown below when you merge a branch:

Use the --no-ff switch (short for no fast forward):

git merge --no-ff add-countries

 

JUnit

JUnit: Basic

When writing JUnit tests for a class Foo, the common practice is to create a FooTest class, which will contain various test methods.

Suppose we want to write tests for the IntPair class below.

public class IntPair {
    int first;
    int second;

    public IntPair(int first, int second) {
        this.first = first;
        this.second = second;
    }

    public int intDivision() throws Exception {
        if (second == 0){
            throw new Exception("Divisor is zero");
        }
        return first/second;
    }

    @Override
    public String toString() {
        return first + "," + second;
    }
}

Here's a IntPairTest class to match (using JUnit 5).

import org.junit.jupiter.api.Test;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.fail;

public class IntPairTest {


    @Test
    public void testStringConversion() {
        assertEquals("4,7", new IntPair(4, 7).toString());
    }

    @Test
    public void intDivision_nonZeroDivisor_success() throws Exception {
        assertEquals(2, new IntPair(4, 2).intDivision());
        assertEquals(0, new IntPair(1, 2).intDivision());
        assertEquals(0, new IntPair(0, 5).intDivision());
    }

    @Test
    public void intDivision_zeroDivisor_exceptionThrown() {
        try {
            assertEquals(0, new IntPair(1, 0).intDivision());
            fail(); // the test should not reach this line
        } catch (Exception e) {
            assertEquals("Divisor is zero", e.getMessage());
        }
    }
}

Notes:

  • Each test method is marked with a @Test annotation.
  • Tests use Assert.assertEquals(expected, actual) methods to compare the expected output with the actual output. If they do not match, the test will fail. JUnit comes with other similar methods such as Assert.assertNull and Assert.assertTrue.
  • Java code normally use camelCase for method names e.g., testStringConversion but when writing test methods, sometimes another convention is used: whatIsBeingTested_descriptionOfTestInputs_expectedOutcome e.g., intDivision_zeroDivisor_exceptionThrown
  • There are several ways to verify the code throws the correct exception. The third test method in the example above shows one of the simpler methods. If the exception is thrown, it will be caught and further verified inside the catch block. But if it is not thrown as expected, the test will reach Assert.fail() line and will fail as a result.
  • The easiest way to run JUnit tests is to do it via the IDE. For example, in Intellij you can right-click the folder containing test classes and choose 'Run all tests...'

Adding JUnit 5 to your IntelliJ Project -- by Kevintroko@YouTube